Computational Linguistics Colloquium

Thursday, 5 June 2014, 16:15
Conference Room, Building C7.4

Getting it right: Proficiency testing, collaborative exercises, and system evaluations in forensic voice comparison

Michael Jessen
Bundeskriminalamt, Forensic Science Institute, Department of Speaker Identification and Audio Analysis

Although forensic voice comparison has been conducted in some countries since about the early 1960s using a variety of methods and approaches, it has not been until the late 1980s and early 1990s that a solid level of methodological standardization has been achieved. This stage coincides with, among other events, the publication of various textbooks in forensic phonetics, the foundation of the International Association for Forensic Phonetics (later "Acoustics" was added to its name) and their annual meetings, as well as the representation of forensic phonetics at the International Congress of Phonetic Sciences. Although the foundations for good and internationally recognized practice in the field had been laid, the actual quality of the practical case work had to be trusted upon without any mechanism for evaluation and quality control in place yet at the time. Since around the early 2000s this situation has improved upon from two different directions. Firstly, as part of an increased interest of forensic science institutes to establish quality management systems and accreditation standards, proficiency testing (German: "Ringversuche") and collaborative exercises in forensic voice comparison are being organized both on the national level (organized by the BKA) and to some extent internationally, e.g. by ENFSI (European Network of Forensic Science Institutes). These proficiency tests and collaborative exercises are based on the processing of a few representative (simulated or real and anonymized) cases, using the full range of methods and procedures that is also used in casework. Secondly, system evaluations have been established as a method of evaluating the performance of voice comparison methods. The tradition of system evaluations has been established by the automatic speaker recognition community (e.g. through regular NIST evaluations) and by phoneticians — such as the group around Phil Rose — interested in the Likelihood Ratio framework. System evaluations are based on a large set of same–speaker and different–speaker comparisons and have to be limited to automatic and semi–automatic methods because using the full range of methods is not realistic. But for these automatic and semi–automatic methods and their underlying features, very accurate and robust information about error rates and other performance indicators can be reported. After providing some background information about forensic phonetics in general and forensic voice comparison in particular, these two traditions in performance testing will be illustrated with audio material and current results, and their mutual advantages and limitations will be compared.

If you would like to meet with the speaker, please contact Jürgen Trouvain.