BEGIN:VCALENDAR VERSION:2.0 PRODID:-//talks.cam.ac.uk//v3//EN BEGIN:VTIMEZONE TZID:Europe/London BEGIN:DAYLIGHT TZOFFSETFROM:+0000 TZOFFSETTO:+0100 TZNAME:BST DTSTART:19700329T010000 RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU END:DAYLIGHT BEGIN:STANDARD TZOFFSETFROM:+0100 TZOFFSETTO:+0000 TZNAME:GMT DTSTART:19701025T020000 RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU END:STANDARD END:VTIMEZONE BEGIN:VEVENT CATEGORIES:Rainbow Interaction Seminars SUMMARY:Audiovisual Discrimination Between Laughter and Sp eech - Stavros Petridis\, Imperial College London DTSTART;TZID=Europe/London:20081204T141500 DTEND;TZID=Europe/London:20081204T151500 UID:TALK15122AThttp://talks.cam.ac.uk URL:http://talks.cam.ac.uk/talk/index/15122 DESCRIPTION:In human - human interaction\, information is comm unicated between the parties through various chann els. Speech is usually the dominant channel but other cues lik e facial expressions\, head gestures\, hand gestur es and non-linguistic vocalizations play an important role in communication as well. One of the most important n on-linguistic vocalizations is laughter\, which is reported to be the most frequently annotated non- verbal behaviour in meeting corpora. Laughter is a powerful affective and social signal since people very often express their emotion and regulate con versations by laughing. Although there are a few w orks on automatic laughter detection the focus of past research has mainly been on audio-based detec tion. \n
\nInspired by the results in audiovisu al speech recognition and audiovisual affect recog nition\, this talk presents an audiovisual approac h to distinguishing spontaneous episodes of laught er from speech. Information is extracted simultane ously from the audio and visual channel and fused using decision and feature level fusion leading to improved performance over single- modal approaches. The first part of the talk investigates the performance of different combina tions of audio/visual cues\, facial expressions an d head movements for video and spectral and prosod ic features for audio. Once the most informative c ues are found then\, in the second part\, two type s of features are compared\, static features extra cted on an audio/video frame basis and temporal fe atures extracted over a temporal window\, describi ng the evolution of static features over time. Thi s is followed by a comparison of the two different fusion levels\, decision- and feature-level fusio n. Finally\, initial results on recognizing two ty pes of laughter are presented. LOCATION:SS03 CONTACT:Laurel D. Riek END:VEVENT END:VCALENDAR