BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Big Data Analytics with All-or-Nothing Parallel Jobs - Ganesh Anan
 thanarayanan\, University of California
DTSTART:20130411T100000Z
DTEND:20130411T110000Z
UID:TALK43890@talks.cam.ac.uk
CONTACT:Microsoft Research Cambridge Talks Admins
DESCRIPTION:Extensive data analysis has become the enabler for diagnostics
  and decision making in many modern systems. These analyses have both comp
 etitive as well as social benefits. To cope with the deluge in data that i
 s growing faster than Moore’s law\, computation frameworks have resorted
  to massive parallelization of analytics jobs into many fine-grained tasks
 . These frameworks promised to provide efficient and fault-tolerant execut
 ion of these tasks. However\, meeting this promise in clusters spanning hu
 ndreds of thousands of machines is challenging and a key departure from ea
 rlier work on parallel computing.\nA simple but key aspect of parallel job
 s is the all-or-nothing property: unless all tasks of a job are provided e
 qual improvement\, there is no speedup in the completion of the job. This 
 talk will demonstrate how the all-or-nothing property impacts replacement 
 algorithms in distributed caches for parallel jobs. Our coordinated cachin
 g system\, PACMan\, makes global caching decisions and employs a provably 
 optimal cache replacement algorithm. A highlight of our evaluation using w
 orkloads from Facebook and Bing datacenters is that PACMan’s replacement
  algorithm outperforms even Belady’s MIN (that uses an oracle) in speedi
 ng up jobs. Along the way\, I will also describe how we broke the myth of 
 disk-locality’s importance in datacenter computing and solutions to miti
 gate straggler tasks.
LOCATION:Small Lecture Theatre\, Microsoft Research Ltd\, 21 Station Road\
 , Cambridge\, CB1 2FB
END:VEVENT
END:VCALENDAR
