BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Natural Experiments in NLP and Where to Find Them - Pietro Lesci\,
  University of Cambridge
DTSTART:20241106T110000Z
DTEND:20241106T123000Z
UID:TALK224137@talks.cam.ac.uk
CONTACT:120952
DESCRIPTION:In training language models\, training choices—such as the r
 andom seed for data ordering or the token vocabulary size—significantly 
 influence model behaviour. Answering counterfactual questions like "How wo
 uld the model perform if this instance were excluded from training?" is co
 mputationally expensive\, as it requires re-training the model. Once these
  training configurations are set\, they become fixed\, creating a "natural
  experiment" where modifying the experimental conditions incurs high compu
 tational costs. Using econometric techniques to estimate causal effects fr
 om observational studies enables us to analyse the impact of these choices
  without requiring full experimental control or repeated model training. I
 n this talk\, I will present our paper\, Causal Estimation of Memorisation
  Profiles (Best Paper Award at ACL 2024)\, which introduces a novel method
  based on the difference-in-differences technique from econometrics to est
 imate memorisation without requiring model re-training. I will also cover 
 the necessary econometric concepts and key literature on memorisation in l
 anguage models.\n\n*Suggested readings:*\n\nCounterfactual memorization in
  neural language models (https://proceedings.neurips.cc/paper_files/paper/
 2023/file/7bc4f74e35bcfe8cfe43b0a860786d6a-Paper-Conference.pdf)\n\nQuanti
 fying memorization across neural language models (https://arxiv.org/pdf/22
 02.07646)\n
LOCATION:Cambridge University Engineering Department\, CBL Seminar room BE
 4-38.
END:VEVENT
END:VCALENDAR