Evaluation with LLMs - Theoretical and Practical insights
- 👤 Speaker: Eyal Kolman (Microsoft)
- 📅 Date & Time: Friday 24 October 2025, 12:00 - 13:00
- 📍 Venue: SS03 Hybrid (In-Person + Online). Google Meet Link: https://meet.google.com/yeu-pqce-rsn
Abstract
Abstract: As large language models (LLMs) continue to evolve, the task of assessing their performance becomes increasingly crucial and complex, and LLMs are being used to evaluate the quality of other models. In this talk, I will explore LLM -as-a-Judge, combining theoretical foundations with practical insights from the industry. Topics include benchmark design, pre-LLM metrics, common pitfalls illustrated with real examples, methods for automatic tuning of evaluation metrics, and the industry-academy gaps. I will conclude with a vision for the future of robust and meaningful LLM assessment.
Bio: Dr. Eyal Kolman is a Senior Researcher at Microsoft and an adjunct lecturer at Tel Aviv University and Bar-Ilan University, where he teaches courses in Deep Learning. He holds a Ph.D. in Electrical Engineering from Tel Aviv University and has over 25 years of experience in machine learning and artificial intelligence. His work spans evaluation methodologies, applied AI systems, and large-scale learning models. Dr. Kolman has authored numerous research papers, holds dozens of patents, and is the author of Knowledge‑Based Neurocomputing: A Fuzzy Logic Approach.
Series This talk is part of the NLIP Seminar Series series.
Included in Lists
- All Talks (aka the CURE list)
- bld31
- Cambridge Centre for Data-Driven Discovery (C2D3)
- Cambridge Forum of Science and Humanities
- Cambridge Language Sciences
- Cambridge talks
- Chris Davis' list
- Computer Education Research
- Computing Education Research
- Department of Computer Science and Technology talks and seminars
- Graduate-Seminars
- Guy Emerson's list
- Interested Talks
- Language Sciences for Graduate Students
- ndk22's list
- NLIP Seminar Series
- ob366-ai4er
- PMRFPS's
- rp587
- School of Technology
- Simon Baker's List
- SS03 Hybrid (In-Person + Online). Google Meet Link: https://meet.google.com/yeu-pqce-rsn
- Trust & Technology Initiative - interesting events
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)


Friday 24 October 2025, 12:00-13:00