International Research Training Group
Language Technology
&
Cognitive Systems
Saarland University University of Edinburgh
 

Machine vs. Human: Synthetic Speaker Age Recognition

Speaker: Eva Lasarcyk

Abstract:

We present a cross-discipline study on synthetic speaker age recognition following the analysis-by-synthesis paradigm using articulatory speech synthesis. Artificially "aged" synthetic male voices are presented to human listeners as well as a pre-trained automatic age classification system. The samples were manipulated using a set of features derived from the literature. Confirming the results of previous studies, our listeners were successful in identifying the correct age class. Furthermore we show that the synthetic voices were natural enough to "fool" the automatic classification system as the age models produced meaningful scores. The overall classification results indicate that different age cues were important for "machine" vs. "humans".

This experiment is part of a PhD project on the use of articulatory speech synthesis (VocalTractLab.de) for speech science research. A series of experiments covers different aspects of supraglottal, glottal, and subglottal speech production details to assess the suitability and appropriateness of VocalTractLab for phonetic research. In this talk, we present an exploratory model of synthetic speaker age that focusses on *glottal* manipulations of the speech signal. Additionally, this study is an example of fruitful collaboration between speech sciences and speech technology. Approaches and techniques from both fields were combined to benefit research progress in both areas.


Joint work with Michael Feld and Christian Müller, DFKI.

Last modified: Fri, May 29, 2009 10:57:04 by

Valid HTML 4.01 Transitional Valid CSS!