BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Natural Experiments in NLP and Where to Find Them - Pietro Lesci (
 University of Cambridge)
DTSTART:20241112T130000Z
DTEND:20241112T140000Z
UID:TALK223093@talks.cam.ac.uk
CONTACT:Mateja Jamnik
DESCRIPTION:In training language models\, training choices—such as the r
 andom seed for data ordering or the token vocabulary size—significantly 
 influence model behaviour. Answering counterfactual questions like "How wo
 uld the model perform if this instance were excluded from training?" is co
 mputationally expensive\, as it requires re-training the model. Once these
  training configurations are set\, they become fixed\, creating a "natural
  experiment" where modifying the experimental conditions incurs high compu
 tational costs. Using econometric techniques to estimate causal effects fr
 om observational studies enables us to analyse the impact of these choices
  without requiring full experimental control or repeated model training. I
 n this talk\, I will present our paper\, _Causal Estimation of Memorisatio
 n Profiles_ (Best Paper Award at ACL 2024)\, which introduces a novel meth
 od based on the difference-in-differences technique from econometrics to e
 stimate memorisation without requiring model re-training.\n\n"You can also
  join us on Zoom":https://cam-ac-uk.zoom.us/j/83400335522?pwd=LkjYvMOvVpMb
 abOV1MVTm8QU6DrGN7.1\n
LOCATION:Lecture Theatre 2\, Computer Laboratory\, William Gates Building
END:VEVENT
END:VCALENDAR
