University of Cambridge > > Cambridge University Linguistic Society (LingSoc) > The push to pool: Testing the effects of matched and mismatched reference populations in forensic voice comparison

The push to pool: Testing the effects of matched and mismatched reference populations in forensic voice comparison

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Yixin Zhang.

The use of Automatic Speaker Recognition (ASR) software systems in forensic speaker comparison casework is expanding internationally (Hughes et al. 2018). Its uptake is dependent upon the availability of appropriate reference databases, since making valid assessments of the similarity of the speech of known and unknown talkers hangs upon how typical the speech samples are with respect to the relevant population. Ideally, we would have databases at our disposal which are closely matched to the accent(s) to be heard in the samples, and also comparable in terms of factors such as recording channel and speaking style. However, in many cases the available reference databases are small, dated, fragmentary, or composed of inappropriate material, thereby compromising the quality of our ASR -based comparisons.

The extent to which the reliability and accuracy of comparisons is affected by the characteristics of matched and mismatched reference databases is currently the focus of investigation by a number of groups (e.g. Enzinger & Morrison 2017; van der Vloed et al. 2017; Hughes et al. 2018). It is clear that the results reported by ASR systems are sensitive to the nature of the reference data used, in terms of parameters such as speaker accent, sample duration, database size, and channel characteristics. From the point of view of obtaining greater statistical power and correspondingly higher levels of confidence in the results of ASR comparisons, it is reasonable to suppose that bigger reference databases are superior to small ones. In pursuit of this goal, we might wish for practical reasons to pool two or more pre-existing corpora, rather than devoting resources to collecting new material.

But how valid it is to follow this strategy? How much difference to the output of our ASR system does it make if the pooled corpora in question are mismatched for speaker accent? If the difference turns out to be negligible, we might decide to combine accent-mismatched corpora as a matter of routine. Alternatively, if excessive heterogeneity in the reference corpus degrades system performance we might advocate collecting bespoke corpora, perhaps even at the level of individual cases, as has been argued for by Morrison (2018). Obtaining case-specific corpora has major time and cost implications that may render the latter approach unfeasible, however.

Thus far, the consequences of combining apparently incompatible databases so as to maximise the size of the reference population have not been fully explored. In this paper I report on a study assessing the extent to which the performance of a leading ASR software package was affected by rolling the ‘Dynamic Variability in Speech’ (DyViS) database (Nolan et al. 2009) of recordings of 100 young (18-25 year old) male speakers of Standard Southern British English together with a newly-collected corpus of recordings of speakers from three urban communities in North-East England (Newcastle, Sunderland, Middlesbrough) gathered for the ongoing ‘The Use and Utility of Localised Speech Forms in Determining Identity: Forensic and Sociophonetic Perspectives’ (TUULS) project. The results are encouraging in the sense that even using a mixed-accent reference population yields good system performance, though it is acknowledged that using more forensically-realistic samples might lead us to draw less optimistic conclusions.

This talk is part of the Cambridge University Linguistic Society (LingSoc) series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2022, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity