University of Cambridge > Talks.cam > Computer Laboratory Systems Research Group Seminar > Musketeer: all for one, one for all in data processing systems

Musketeer: all for one, one for all in data processing systems

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Eiko Yoneki.

Many systems for the parallel processing of big data are available today. Yet, few users can tell by intuition which system, or combination of systems, is “best” for a given workflow. Porting workflows between systems is tedious. Hence, users become “locked in”, despite faster or more efficient systems being available. This is a direct consequence of the tight coupling between user-facing front-ends that express workflows (e.g., Hive, SparkSQL, Lindi, GraphLINQ) and the back-end execution engines that run them (e.g., MapReduce, Spark, PowerGraph, Naiad).

In this talk, I will present Musketeer, a system that decouples the ways workflows are defined from the manner in which they are executed. Musketeer dynamically maps front-end workflow descriptions to a broad range of back-end execution engines. Without requiring any manual porting effort, users now have a choice of many systems. Musketeer currently supports four high-level query languages and generates code for seven popular data processing systems, in some cases speeding up realistic workflows by up to 9x.

This talk is part of the Computer Laboratory Systems Research Group Seminar series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2020 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity