Using EMA data in articulatory synthesis
Speaker:Ingmar Steiner
Institution: Saarland University
Abstract:
Electromagnetic Articulography (EMA) is one of several techniques to capture the movements of the articulatory organs during speech production. Together with synchronized acoustic recordings of these utterances, EMA data can be used to analyze the temporal alignment of articulatory gestures with speech parameters such as formant contours and what is traditionally regarded as segmental boundaries.
This talk will present preliminary results in quasi-automatic articulatory resynthesis of EMA parameters. Such resynthesis can be used as an instrument to test the performance of an articulatory synthesis platform by comparing the resynthesized parameters with the original.
Furthermore, several possible applications will be outlined for using EMA data to optimize the interface of an acoustics-based text-to-speech (TTS) front-end with a high-level articulatory synthesis platform. Among these are:
- Using EMA data to optimize phasing rules (governing the temporal alignment of gestures), used in the conversion of predicted acoustic durations to the gestural domain. Alternatively, the durational module of the TTS front-end could be retrained on EMA data to predict gestural durations directly.
- Extracting features corresponding to articulatory effort from EMA data, using this information to improve the accuracy of coarticulation and related phenomena in the synthesis of speech at various articulation rates.
Finally, a multilevel concatenative (diphone or possibly unit-selection) approach is conceivable, where the articulatory parameters are predicted by applying traditional concatenative synthesis techniques to the individual articulatory trajectories. These could then be used as input to a low-level articulatory synthesizer.