Computational Linguistics Colloquium

(Note unusual day & time!)
Friday, 1 July 2011, 14:15

Phonetic variation and the perception of massive reductions in speech

Frank Zimmerer
Institut für Phonetik, Goethe-Universität Frankfurt
While, for a long time, laboratory speech has been used for linguistic research and theoretical modeling, over the last years, there has been a shift in focus on data from more natural, conversational speech. This shift has demonstrated that some of the assumptions about processes in speech production and speech perception have to be re-evaluated. This is mainly due to one of the most striking features of natural speech: its huge amount of variability, including reductions and deletions.
Despite the imperfect nature of naturally (i.e. conversationally) produced speech, listeners usually do not have problems to understand what speakers have said. Several different approaches have aimed at explaining the amount of variation for speech perception on the one hand and the ease in speech perception on the other. Broadly, two opposing camps can be identified: approaches with single, abstractionist representations and episodic memory approaches. The first camp, i.e. abstractionist approaches possibly encounter problems when reductions occur in drastic and non-rule based fashion. Proponents of the latter camp assume very detailed storage of experienced episodes of utterances. These models assume storage of phonetic variation (including reductions and deletions) directly in the lexicon and see the main challenge for listeners in identifying the correct lexical entry. These two camps, exemplified by two models, the featurally underspecified lexicon model (FUL - Lahiri and Reetz, 2002, 2010) for the single entry abstractionist camp and X-MOD (Johnson, 1997, 2007) for the episodic camp will be evaluated by their successes and failures to cope with reduction data from conversational German, both in production and perception.
The results of this evaluation suggest that despite unimpressed listeners in natural situations, there is a cost attached to the recognition of reduced words. Massively reduced words are not as easily recognized as would be suggested by everyday experience. Results show that contextual information is crucial and that listeners make use of whatever information they can grasp from the speech signal. More generally, the findings see an advantage for models positing only one single abstract representation per word.

Johnson, K. 1997. Speech perception without speaker normalization. Talker Variability in Speech Processing, ed. by Keith Johnson and John W. Mullennix, 145-166. New York: Academic Press.
Johnson, K. 2007. Decisions and mechanisms in exemplar-based phonology. Experimental approaches to phonology, ed. by Maria-Josep Sole, Patrice Speeter Beddor and Manjari Ohala, 25-40. Oxford: Oxford University Press.
Lahiri, A. and Reetz, H. 2002. Underspecified recognition. Laboratory Phonology 7, ed. by Carlos Gussenhoven and Natasha Warner, 637-675. Berlin: Mouton de Gruyter.
Lahiri, A. and Reetz, H. 2010. Distinctive features: Phonological underspecification in representation and processing. Journal of Phonetics, 38.44-59.

If you would like to meet with the speaker, please contact Jürgen Trouvain.