University of Cambridge > Talks.cam > Technical Talks - Department of Computer Science and Technology > Perplexity AI: Under the Hood of LLM Inference

Perplexity AI: Under the Hood of LLM Inference

Download to your calendar using vCal

If you have a question about this talk, please contact Ben Karniely .

Abstract: Perplexity is a search and answer engine which leverages LLMs to provide high-quality citation-backed answers. The AI Inference team within the company is responsible for serving the models behind the product, ranging from single-GPU embedding models to multi-node sparse Mixture-of-Experts language models. This talk provides more insight into the in-house runtime behind inference at Perplexity, with a particular focus on efficiently serving some of the largest available open-source models.

Biography:Nandor Licker is an AI Inference Engineer at Perplexity, focusing on LLM runtime implementation and GPU performance optimization.

Register for the talk at the following link: https://luma.com/dx1ggxgk

Some catering will be provided after the talk.

This talk is part of the Technical Talks - Department of Computer Science and Technology series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

Š 2006-2025 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity