BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:From Pose Estimation to Fine Grained Activity Recognition - Micha 
 Andriluka\, Max Planck Institute for Informatics
DTSTART:20120906T140000Z
DTEND:20120906T150000Z
UID:TALK39613@talks.cam.ac.uk
CONTACT:Microsoft Research Cambridge Talks Admins
DESCRIPTION:Title:\n\nFrom Pose Estimation to Fine Grained Activity Recogn
 ition\n\n\nAbstract:\n\nHuman pose estimation and activity recognition in 
 monocular images are challenging problems\, especially when these tasks mu
 st be solved in unconstrained environments such as street scenes. The majo
 r sources of complexity are cluttered and dynamically changing backgrounds
  and the presence of multiple people that often partially or fully occlude
  each other.\n\nWhile previous work has largely neglected interactions bet
 ween people\, we show that modeling them is crucial for good performance. 
 In the first part of the talk I will to demonstrate that for the case of d
 etection of people in crowded street scenes and for the case of monocular 
 3D pose estimation. In the case of people detection we propose a new occlu
 sion-aware detector that exploits the patterns emerging from person-person
  occlusions\, and quantify its performance on several publicly available b
 enchmarks\, improving over the state-of-the-art. In the case of human pose
  estimation we propose to incroporate interactions at two level. The 2D po
 ses of people are inferred with a multi-person pictorial structures model 
 that captures interactions between subjects. The 3D poses are then recover
 ed by lifting 2D poses to 3D relying on the learned joined prior model of 
 human poses and motion. We demonstrate that including interactions between
  subjects both in 2D and in 3D improves pose estimation results.\n\nIn the
  second part of the talk I will focus on the challenge of fine grained act
 ivity recognition\, where the goal is to recognize a large number of visua
 lly similar activities such as those performed during a complex medical pr
 ocedure\, devide maintaince or cooking. I will rely on the cooking activit
 ies as a working example and describe our recently introduced dataset\, co
 ntaining over 65 cooking activities and about 9 hours of video footage. I 
 will present initial results on the dataset and discuss open questions rel
 ated to the use of pose estimation for fine grained activity recognition.
LOCATION:Small lecture theatre\, Microsoft Research Ltd\, 7 J J Thomson Av
 enue (Off Madingley Road)\, Cambridge
END:VEVENT
END:VCALENDAR
