Modelling prosody for speech
synthesis: example from Polish
Dominika Oliver
During the talk I will present the issues encountered during the
process of intonation prediction and generation in a text
to speech system on the example from Polish.
The prosodic analysis in speech synthesis usually involves the
modelling of various components: segmental duration,
division into prosodic phrases, stress and accent place assignment,
modelling different accent/boundary tone types as well as
F0 contour generation. Each of these plays a role in
generating natural sounding speech, essential but not a trivial task
for any text to speech system.
The prosody generation implementation discussed here concentrates
on two components, accent type and F0 prediction, a
process, carried out in two stages using machine learning techniques:
-prediction of accent placement and accent type using classification
and regression trees
-prediction/generation of F0 contour using linear regression
In this study, based on a speech database PoInt, the analysis of
the acoustic parameters characterising accent types in
Polish has been performed in which features characteristic for each
accent type were derived. Additionally, accent type study
involved classification of contour types using machine learning
techniques, especially neural networks and hierarchical clustering.
In the process of current work, both prediction of accent
placement and accent type as well as prediction/generation
of F0 contour has been implemented in Festival TTS system. The accent
types classified by ML methods serve as input to Festival's
prosody prediction/generation module using language specific
features. The evaluation of classification process, symbolic results
from prosody prediction and generation, and future work
will also be presented.
back to IGK4 schedule