University of Cambridge > > Computer Laboratory Systems Research Group Seminar > Hey, you got your distributed algorithm in my ML!

Hey, you got your distributed algorithm in my ML!

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Srinivasan Keshav.

I started work in systems for machine learning around a year ago, joining the AI Frameworks team at Microsoft, and working from inference on small client devices up to model training over distributed GPU clusters.

I am going to talk about some of the things that surprised me. Central to this is the difficulty in identifying abstractions. We have techniques such as pipeline parallelism and tensor parallelism, but the implementations are tightly coupled with the models themselves, and optimized alongside them. It is tempting to design fresh alternatives that can decouple distribution techniques from models; however, deploying these into an evolving ecosystem is itself a challenge.

I will wrap up by describing some of our recent work to chart an incremental path through these topics.

Short Bio: I am a Principal Architect at Microsoft, focused on PyTorch and the ONNX runtime. Prior to that I was with AWS and worked on large-scale storage performance and data analytics with Amazon S3. Further back, I led the Oracle Labs group in Cambridge, UK working on runtime systems for in-memory graph analytics, and the confluence of work on “big data” and ideas from high-performance computing. Before joining Oracle I was with Microsoft in a prior stint (2004–2012), and on the faculty of the University of Cambridge Computer Laboratory (2000–2004) during the early days of the Xen hypervisor project.

This talk is part of the Computer Laboratory Systems Research Group Seminar series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2023, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity