BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:What do Real Life Hadoop Workloads Look Like? - Yanpei Chen\, Clou
 dera
DTSTART:20120831T100000Z
DTEND:20120831T110000Z
UID:TALK39483@talks.cam.ac.uk
CONTACT:Andrew Rice
DESCRIPTION:Within the past few years\, organizations in diverse industrie
 s have adopted MapReduce-based systems for large-scale data processing. Al
 ong with these new users\, important new workloads have emerged which feat
 ure many small\, short\, and increasingly interactive jobs in addition to 
 the large\, long-running batch jobs for which MapReduce was originally des
 igned. These new workloads have not yet been empirically studied. We fill 
 this gap with an analysis of MapReduce traces from six separate business-c
 ritical deployments inside Facebook and at Cloudera customers in e-commerc
 e\, telecommunications\, media\, and retail. Our key contribution is a cha
 racterization of new MapReduce workloads which are driven in part by inter
 active analysis\, and which make heavy use of query-like programming frame
 works on top of MapReduce. These workloads display diverse behaviors which
  invalidate prior assumptions about MapReduce such as uniform data access\
 , regular diurnal patterns\, and prevalence of large jobs. A secondary con
 tribution is a first step towards creating a TPC-like data processing benc
 hmark for MapReduce.
LOCATION:SS03\, William Gates Building
END:VEVENT
END:VCALENDAR
