Making Synthetic Speech Output as Natural, Flexible and Efficient as Human Speech
Speaker: Alan W. Black
Institution:Carnegie Mellon University
Abstract:
As speech technology matures to a level where it becomes practical for human-machine communication, much greater demands are now placed on the quality of the voice output. It is no longer sufficient to simply provide an understandable voice, communication demands an appropriate voice, of course, in the appropriate language, but also in the right style, and even particular identity.
This talk will present a series of work, that describes the basic processes involved in building synthetic voices. Over the past 10 years we have developed core synthesis techniques, engines and tools to make the building process better defined and more successful. Using data-driven techniques we have refined and optimized the processes of corpus-based synthesis itself, prompt selection, automatic labeling, lexicon construction, articulatory voice conversion and evaluation techniques. Our synthesizers, Festival and Flite, and the voices constructed with the FestVox tools have been used in a large number of speech applications, including: spoken dialog systems, speech-to-speech translation, and talking heads.