Computational Linguistics Colloquium

Thursday, 29 January, 16:15
Seminar Room, Building C7 2

Audiovisual-to-acoustic inversion of speech

The objective of the ASPI project is to perform audiovisual-to-articulatory inversion. In order to achieve this goal this requires:

The acquisition of articulatory data which enable the evaluation of inversion techniques and the elaboration of analyzing articulatory models. We will describe the system designed within the framework of ASPI. This system merges ultrasound images (tongue), stereo images (face), electromagnetic sensors (tongue apex), and the speech signal. The main achievement is the geometrical merging which requires the calibration and the synchronization of the different modalities. Another important facet is the use of existing X-ray data and the acquisition of MRI images which offer a complete view of the vocal tract.
The development of inversion algorithms. Our work relies on an analysis by synthesis method which exploits the Maeda's articulatory synthesizer. The principle is to explore the articulatory space efficiently to find at each time point all the inverse solutions, and then to reconstruct optimal articulatory trajectories from these local solutions. We will describe our framework for inversion, additional phonetic constraints to penalize unrealistic vocal tract shapes, and improvements enabling faster and better inversion. Finally, we will present the exploitation of articulatory data to evaluate inversion.

If you would like to meet with the speaker, please contact Ingmar Steiner.