What do Real Life Hadoop Workloads Look Like?
- đ¤ Speaker: Yanpei Chen, Cloudera
- đ Date & Time: Friday 31 August 2012, 11:00 - 12:00
- đ Venue: SS03, William Gates Building
Abstract
Within the past few years, organizations in diverse industries have adopted MapReduce-based systems for large-scale data processing. Along with these new users, important new workloads have emerged which feature many small, short, and increasingly interactive jobs in addition to the large, long-running batch jobs for which MapReduce was originally designed. These new workloads have not yet been empirically studied. We fill this gap with an analysis of MapReduce traces from six separate business-critical deployments inside Facebook and at Cloudera customers in e-commerce, telecommunications, media, and retail. Our key contribution is a characterization of new MapReduce workloads which are driven in part by interactive analysis, and which make heavy use of query-like programming frameworks on top of MapReduce. These workloads display diverse behaviors which invalidate prior assumptions about MapReduce such as uniform data access, regular diurnal patterns, and prevalence of large jobs. A secondary contribution is a first step towards creating a TPC -like data processing benchmark for MapReduce.
Series This talk is part of the Computer Laboratory Digital Technology Group (DTG) Meetings series.
Included in Lists
- All Talks (aka the CURE list)
- bld31
- Cambridge talks
- Computer Laboratory Digital Technology Group (DTG) Meetings
- Department of Computer Science and Technology talks and seminars
- Interested Talks
- School of Technology
- SS03, William Gates Building
- Trust & Technology Initiative - interesting events
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)

Yanpei Chen, Cloudera
Friday 31 August 2012, 11:00-12:00