BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Posterior sampling via autoregressive generation - Kelly Zhang (Im
 perial College London)
DTSTART:20241129T140000Z
DTEND:20241129T150000Z
UID:TALK223975@talks.cam.ac.uk
CONTACT:Qingyuan Zhao
DESCRIPTION:Uncertainty quantification remains a critical challenge when u
 sing deep learning models\, particularly in complex decision-making settin
 gs. We propose a new framework for learning bandit algorithms from massive
  historical data\, by combining classical ideas from multiple imputation w
 ith autoregressive generative sequence modeling. We demonstrate our approa
 ch in a cold-start recommendation problem where\, first\, we use historica
 l data to pretrain an autoregressive model to predict sequences of repeate
 d feedback/rewards (e.g.\, responses to news articles shown to different u
 sers over time). In learning to make accurate predictions\, the model impl
 icitly learns an informed prior based on rich action features (e.g.\, arti
 cle headlines) and how to sharpen beliefs as more rewards are gathered (e.
 g.\, clicks as each article is recommended). At decision-time\, the algori
 thm autoregressively samples (imputes) a hypothetical sequence of rewards 
 for each action and chooses the action with the largest average imputed re
 ward. Far from a heuristic\, our approach is an implementation of Thompson
  sampling (with a learned prior)\, a prominent active exploration algorith
 m. We prove our pretraining sequence loss directly controls online decisio
 n-making performance\, and we demonstrate our framework on a news recommen
 dation task where we integrate end-to-end fine-tuning of a pretrained lang
 uage model to process news article headline text to improve performance.
LOCATION:Centre for Mathematical Sciences MR12\, CMS
END:VEVENT
END:VCALENDAR
