Adaptive Tokenization and Memory in Foundation Models
- 👤 Speaker: Edoardo Maria Ponti (University of Edinburgh)
- 📅 Date & Time: Friday 01 November 2024, 12:00 - 13:00
- 📍 Venue: Zoom link: https://cam-ac-uk.zoom.us/j/4751389294?pwd=Z2ZOSDk0eG1wZldVWG1GVVhrTzFIZz09
Abstract
Abstract: State-of-the-art foundation models (FMs) process information as a sequence of internal representations; however, the length of this sequence is fixed and entirely determined by tokenization. This essentially decouples representation granularity from information content, which exacerbates the deployment costs of FMs and narrows their “horizons” in long sequences. What if, instead, we could dynamically adapt tokenization and memory in FMs to save computation while maintaining or even enhancing performance?
First, I will show how we can dynamically compress the key-value cache of Transformers by deciding when to append or merge items to memory. This offers a compromise between Transformers, whose linear key-value cache growth exhausts memory space and increases latency, and State Space Models, whose finite capacity may result in forgetfulness. Secondly, I will demonstrate how FMs can be “freed” from the tokenizers they are bound to by swapping them on-the-fly with arbitrary ones. Taking a step further, we can even get rid of tokenizers entirely by learning end-to-end how to jointly segment and model language.
Crucially, this new family of FM architectures equipped with adaptive memory and tokenization does not require to be trained from scratch; instead, pre-existing open-weight FMs can be retrofitted with a negligible amount of data for this purpose.
Bio: Edoardo M. Ponti is a Lecturer (≈ Assistant Professor) in Natural Language Processing at the University of Edinburgh, an Affiliated Lecturer at the University of Cambridge, and a visiting professor at NVIDIA . Previously, he was a visiting postdoctoral scholar at Stanford University and a postdoctoral fellow at Mila and McGill University in Montreal. In 2021, he obtained a PhD in computational linguistics from the University of Cambridge, St John’s College. His main research foci are efficient memory and tokenization, modular deep learning, and computational typology. His research earned him a Google Research Faculty Award and 2 Best Paper Awards at EMNLP 2021 and RepL4NLP 2019. He is a board member and co-founder of SIGTYP , the ACL special interest group for computational typology, and a scholar of the European Lab for Learning and Intelligent Systems (ELLIS). He is a (terrible) violinist, football player, and an aspiring practitioner of heroic viticulture.
Series This talk is part of the NLIP Seminar Series series.
Included in Lists
- All Talks (aka the CURE list)
- bld31
- Cambridge Centre for Data-Driven Discovery (C2D3)
- Cambridge Forum of Science and Humanities
- Cambridge Language Sciences
- Cambridge talks
- Chris Davis' list
- Computer Education Research
- Computing Education Research
- Department of Computer Science and Technology talks and seminars
- Graduate-Seminars
- Guy Emerson's list
- Interested Talks
- Language Sciences for Graduate Students
- ndk22's list
- NLIP Seminar Series
- ob366-ai4er
- PMRFPS's
- rp587
- School of Technology
- Simon Baker's List
- Trust & Technology Initiative - interesting events
- yk449
- Zoom link: https://cam-ac-uk.zoom.us/j/4751389294?pwd=Z2ZOSDk0eG1wZldVWG1GVVhrTzFIZz09
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)


Friday 01 November 2024, 12:00-13:00