Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Musketeer: all for one, one for all in data processing systems

Add to your list(s) Download to your calendar using vCal

Ionel Gog (University of Cambridge)
Thursday 09 April 2015, 15:00-16:00
FW26, Computer Laboratory, William Gates Builiding.

If you have a question about this talk, please contact Eiko Yoneki.

Many systems for the parallel processing of big data are available today. Yet, few users can tell by intuition which system, or combination of systems, is “best” for a given workflow. Porting workflows between systems is tedious. Hence, users become “locked in”, despite faster or more efficient systems being available. This is a direct consequence of the tight coupling between user-facing front-ends that express workflows (e.g., Hive, SparkSQL, Lindi, GraphLINQ) and the back-end execution engines that run them (e.g., MapReduce, Spark, PowerGraph, Naiad).

In this talk, I will present Musketeer, a system that decouples the ways workflows are defined from the manner in which they are executed. Musketeer dynamically maps front-end workflow descriptions to a broad range of back-end execution engines. Without requiring any manual porting effort, users now have a choice of many systems. Musketeer currently supports four high-level query languages and generates code for seven popular data processing systems, in some cases speeding up realistic workflows by up to 9x.

This talk is part of the Computer Laboratory Systems Research Group Seminar series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

Musketeer: all for one, one for all in data processing systems

This talk is included in these lists:

Other lists

Other talks