Computational Linguistics Colloquium
(Note unusual day & time!)
Friday, 1 July 2011, 14:15
Phonetic variation and the perception of massive reductions in speech
Frank Zimmerer
Institut für Phonetik, Goethe-Universität Frankfurt
While, for a long time, laboratory speech has been used for
linguistic research and theoretical modeling, over the last years, there has
been a shift in focus on data from more natural, conversational speech.
This shift has demonstrated that some of the assumptions about processes in
speech production and speech perception have to be re-evaluated. This is mainly
due to one of the most striking features of natural speech: its huge amount of
variability, including reductions and deletions.
Despite the imperfect nature of naturally (i.e. conversationally) produced speech,
listeners usually do not have problems to understand what speakers have said.
Several different approaches have aimed at explaining the amount of variation
for speech perception on the one hand and the ease in speech perception on the
other. Broadly, two opposing camps can be identified: approaches with single,
abstractionist representations and episodic memory approaches. The first camp,
i.e. abstractionist approaches possibly encounter problems when reductions occur
in drastic and non-rule based fashion.
Proponents of the latter camp assume very detailed storage of experienced
episodes of utterances. These models assume storage of phonetic variation
(including reductions and deletions) directly in the lexicon and see the main
challenge for listeners in identifying the correct lexical entry. These two
camps, exemplified by two models, the featurally underspecified lexicon model
(FUL - Lahiri and Reetz, 2002, 2010) for the single entry abstractionist camp
and X-MOD (Johnson, 1997, 2007) for the episodic camp will be evaluated by
their successes and failures to cope with reduction data from conversational
German, both in production and perception.
The results of this evaluation suggest that despite unimpressed listeners in
natural situations, there is a cost attached to the recognition of reduced
words. Massively reduced words are not as easily recognized as would be
suggested by everyday experience. Results show that contextual information is
crucial and that listeners make use of whatever information they can grasp
from the speech signal. More generally, the findings see an advantage for
models positing only one single abstract representation per word.
Johnson, K. 1997. Speech perception without speaker normalization. Talker
Variability in Speech Processing, ed. by Keith Johnson and John W. Mullennix,
145-166. New York: Academic Press.
Johnson, K. 2007. Decisions and mechanisms in exemplar-based phonology.
Experimental approaches to phonology, ed. by Maria-Josep Sole, Patrice
Speeter Beddor and Manjari Ohala, 25-40. Oxford: Oxford University Press.
Lahiri, A. and Reetz, H. 2002. Underspecified recognition. Laboratory
Phonology 7, ed. by Carlos Gussenhoven and Natasha Warner, 637-675.
Berlin: Mouton de Gruyter.
Lahiri, A. and Reetz, H. 2010. Distinctive features: Phonological
underspecification in representation and processing. Journal of Phonetics, 38.44-59.
If you would like to meet with the speaker, please contact
Jürgen Trouvain.