|COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring.|
Stratosphere: Massively parallel dataflow programming
If you have a question about this talk, please contact Microsoft Research Cambridge Talks Admins.
This event may be recorded and made available internally or externally via http://research.microsoft.com. Microsoft will own the copyright of any recordings made. If you do not wish to have your image/voice recorded please consider this before attending
As a reaction to the recent “Big Data” trend, a new breed of systems for scalable data processing has emerged. Our system, Stratosphere, offers an extensible query language for posing queries on complex nested data, an efficient processing engine designed to scale on very large clusters and leverage cloud elasticity, as well as a query optimizer and a runtime engine that guarantee the efficient execution of queries, including iterative queries. Stratosphere pushes the MapReduce paradigm forward by incorporating several optimizations known from parallel databases, as well as novel techniques, while retaining the flexibility of in-situ processing of data using complex user-defined functions.
In this talk, I will provide an overview of the Stratosphere system, placing emphasis on how to optimize and execute in parallel an extended dataflow programming model with user-defined functions and iterative constructs. I will then provide a research outlook for scalable data analytics that includes research topics in the intersection of programming languages, databases, and networks.
This talk is part of the Microsoft Research Cambridge, public talks series.
This talk is included in these lists:
Note that ex-directory lists are not shown.
Other listsThe Fitzwilliam Museum Cambridge University Southeast Asian Forum Pembroke Politics
Other talksDistributed, Private and Bayesian Machine Learning Cambridge - Corporate Finance Theory Symposium September 2017 - Day 1 Group Discussions (Isaac Newton Institute & Centre for Mathematical Sciences) The 5th Annual Sir John Walker Lecture, "The molecular calcium reporter: molecular identity and physiological role" Introduction: Challenge 3. Estimating Flood Probability Using Historical Data Stress-activated kinase MKK7 governs epigenetics of cardiac repolarisation for arrhythmia prevention