BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Cambridge Compiler Social Talks - Markus Böck\, Jeff Niu
DTSTART:20240903T140000Z
DTEND:20240903T150000Z
UID:TALK220468@talks.cam.ac.uk
CONTACT:Luisa Cicolini
DESCRIPTION:Within the next Compiler Social we will host two talks: \n\n\n
 Quidditch: An end-to-end deep learning compiler for highly-concurrent acce
 lerators with software-managed caches - by Markus Boeck (University of Cam
 bridge)\n\nThe wide adoption of Deep Neural Networks and the resulting des
 ire for more hardware resources has fueled the rapid development of innova
 tive custom hardware accelerators that are increasingly difficult to progr
 am. Many proposed hardware designs are only evaluated with hand-written mi
 cro-kernels\, and the few evaluated on entire neural networks typically re
 quire significant investments in building the necessary software stacks. H
 ighly sophisticated neural network compilers emerged to generate DNNs out 
 of expert-written microkernels\, but they were traditionally hand-crafted 
 for each platform\, which prevented both scaling and synergy with industry
 -supported compilation flows.\nWe present Quidditch\, a novel neural netwo
 rk compiler and runtime\, that provides an end-to-end workflow from a high
 -level network description to high-performance code running on ETH Occamy\
 , one of the first chiplet-based AI research hardware accelerators. Quiddi
 tch builds on IREE\, an industry-strength AI compiler and imports NNs from
  PyTorch\, JAX\, and Tensorflow and offers optimisations such as fusion\, 
 scheduling\, buffer allocation\, memory and multi-level concurrency-guided
  tiling and asynchronous memory transfers to scratchpads. We present a set
  of preliminary novel optimisations\, SSA-based double-buffering and barri
 er management for scratchpads\, and redundant transfer elimination tailore
 d for explicitly managed memory. We pair this with a high-performance micr
 okernel generator\, which enables us to run full DNNs with full FPU occupa
 ncy and a more than 20x speed-up over IREE’s generic LLVM backend on our
  custom hardware accelerator. By providing key building blocks for scaling
  AI accelerator compilation to full neural networks\, we aim to accelerate
  the evaluation of custom AI hardware and\, as a result\, AI hardware deve
 lopment overall.\n\nMojo’s Wishlist for MLIR 2.0 - by Jeff Niu (Mojo)\n\
 nMojo is a systems programming language built natively on top of MLIR and 
 leverages MLIR to build state-of-the-art compiler technology. Mojo is the 
 foundation of Modular’s heterogeneous compute platform\, enabling perfor
 mance portability across different hardware and application domains.\nAfte
 r 2 years of building Mojo with MLIR\, design misalignments between the co
 mpiler infrastructure and the desired language semantics have clearly emer
 ged. This talk will delve into what an ideal MLIR 2.0 would look like pure
 ly in the context of the design of Mojo: first-class dependent types\, uni
 fied types and attributes\, control flow\, etc. We will also explore our c
 hallenges scaling MLIR compilation to the massive amounts of code backing 
 LLMs and our experience building a multithreaded compiler.\n\nMore on: htt
 ps://grosser.science/compiler-social-2024-09-03/
LOCATION:Computer Laboratory\, William Gates Building\, LT1
END:VEVENT
END:VCALENDAR
