BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Complementing User-Level Coarse-Grain Parallelism with Implicit Sp
 eculative Parallelism - Nikolas Ioannou - University of Edinburgh
DTSTART:20111122T100000Z
DTEND:20111122T110000Z
UID:TALK34650@talks.cam.ac.uk
CONTACT:Microsoft Research Cambridge Talks Admins
DESCRIPTION:Multi-core and many-core systems are the norm in contemporary 
 processor technology and are expected to remain so for the foreseeable fut
 ure. Programs using parallel programming primitives like PThreads or OpenM
 P often exploit coarse-grain parallelism\, because it offers a good trade-
 off between programming effort versus performance gain. Some parallel appl
 ications show limited or no scaling beyond a number of cores. Given the ab
 undant number of cores expected in future many-cores\, several cores would
  remain idle in such cases while execution performance stagnates. This pap
 er proposes using cores that do not contribute to performance improvement 
 for running implicit fine-grain speculative threads. In particular\, we pr
 esent a many-core architecture and protocol that allow applications with c
 oarse-grain explicit parallelism to further exploit implicit speculative p
 arallelism within each thread. Implicit speculative parallelism frees the 
 programmer from the additional effort to explicitly partition the work int
 o finer and properly synchronized tasks. Our results show that\, for a man
 y-core comprising of 128 cores supporting implicit speculative parallelism
  in clusters of 2 or 4 cores\, performance improves on top of the highest 
 scalability point by 41% on average for the 4-core cluster and by 27% on a
 verage for the 2-core cluster. These performance improvements come with an
  energy consumption that is close to -- and sometimes better than -- the b
 aseline. This approach often leads to better performance and energy effici
 ency compared to existing alternatives such as Core Fusion and Frequency B
 oosting. We also investigate the trade-offs between explicit and implicit 
 threads as input dataset sizes vary. Finally\, we present a dynamic mechan
 ism to choose the number of explicit and implicit threads\, which performs
  within 6% of the static oracle selection of threads.
LOCATION:Small lecture theatre\, Microsoft Research Ltd\, 7 J J Thomson Av
 enue (Off Madingley Road)\, Cambridge
END:VEVENT
END:VCALENDAR