Disfluency detection in spoken learner English
- đ¤ Speaker: Andrew Caines, DTAL, University of Cambridge
- đ Date & Time: Friday 01 May 2015, 12:30 - 13:00
- đ Venue: FW26, Computer Laboratory
Abstract
Due to the non-canonical nature of spoken language (containing filled pauses, non-standard grammatical variations, hesitations and other disfluencies) and compounded by a lack of available training data, spoken language parsing has been a challenge for standard NLP tools. Recently the Redshift parser (Honnibal et al., CoNLL 2013) has been shown to be successful in identifying grammatical relations and certain disfluencies in native speaker spoken language, returning unlabelled dependency accuracy of 90.5% and a disfluency F-measure of 84.1% (Honnibal & Johnson, TACL 2014 ). We investigate how this parser handles spoken data from learners of English at various proficiency levels. Firstly, we find that Redshift’s parsing accuracy on non-native speech data is comparable to Honnibal & Johnson’s results, with 91.1% of dependency relations correctly identified. However, disfluency detection is markedly down, with an F-measure of just 47.8%. We consider why this should be, and relate our findings to the use of NLP technology for automatic language assessment and computer-assisted language learning applications.
Series This talk is part of the NLIP Seminar Series series.
Included in Lists
This talk is not included in any other list.
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)


Friday 01 May 2015, 12:30-13:00