Prosodic Models in Speech Technology

SoSe 2013, Möbius (Seminar, 2 SWS LSF 69390)
M.Sc. Language Science and Technology

Thu 10.15-11.45, C7.2/5.09

Entrance requirements

M.Sc. LST: Speech Science

Course description

Prosody research is quite diverse and controversial concerning theoretical approaches and methologies. In this seminar, the most important intonation and duration models will be presented and discussed, with respect to both their phonological and phonetic assumptions and their implementation and application to speech processing systems (speech synthesis, speech recognition). Participants will read, present, and discuss selected papers.

Course credits

M.Sc.: 7 CP (presentation and paper) or 4 CP (presentation only)

Active participation on a regular basis required.

Structure

Date	Topic	Papers / Slides	Presented by
25.04.	Introduction; Organization
02.05.	Overview I: Intonation models	Botinis/etal:2001 pdf	all
16.05.	Overview II: Prosodic models	Batliner/Mobius:2005 pdf	all
23.05.	ToBI-based intonation synthesis	Jilka/etal:1999, Pierrehumbert:1981
06.06.	Fujisaki's intonation model	Fujisaki:1988, Möbius:1995
13.06.	Quantitative alignment model	Santen/Mobius:2000
20.06.	Segmental duration model	Santen:1994, Santen:1998
27.06.	Segmental vs. syllable duration models	Santen/Shih:2000
04.07.	Verbmobil prosody module	Batliner/etal:2000 pdf
11.07.	Prosodic models in ASR/ASU	Shriberg/Stolcke:2004 pdf
18.07.	Prosody-based topic segmentation	Shriberg/etal:2000 pdf
25.07.	SmartKom prosody module	Zeissler/etal:2006 pdf

Literature

BibTex entries of all references (books, papers, URL):

@InCollection{Batliner/Mobius:2005,
  author = 	 {Batliner, Anton and M{\"o}bius, Bernd},
  title = 	 {Prosodic models, automatic speech understanding, and
		  speech synthesis: Towards the common ground?},
  booktitle = 	 {The Integration of Phonetic Knowledge in Speech Technology},
  publisher =	 {Springer},
  year =	 2005,
  editor =	 {Barry, William~J. and van Dommelen, Wim~A.},
  address =	 {Dordrecht},
  pages =	 {21--44}
}

@InCollection{Batliner/etal:2000c,
  author = 	 {Batliner, Anton and Buckow, J. and Niemann, Heinrich
		  and N{\"o}th, Elmar and Warnke, Volker},
  title = 	 {The prosody module},
  booktitle =    {Verbmobil: Foundations of Speech-to-Speech Translation},
  publisher =    {Springer},
  year =         2000,
  editor =       {Wahlster, Wolfgang},
  address =      {Berlin},
  pages = 	 {106--121}
}

@Article{Botinis/etal:2001,
  author =       {Botinis, Antonis and Granstr{\"o}m, Bj{\"o}rn and
                  M{\"o}bius, Bernd},
  title =        {Developments and paradigms in intonation research},
  journal =      {Speech Communication},
  year =         2001,
  volume =       33,
  number =       4,
  pages =        {263--296}
}

@InCollection{Fujisaki:1988,
  author = 	 {Fujisaki, Hiroya},
  title = 	 {A note on the physiological and physical basis for the
		  phrase and accent components in the voice fundamental
		  frequency contour},
  booktitle = 	 {Vocal Physiology: Voice Production, Mechanisms and
		  Functions},
  publisher =	 {Raven},
  year =	 1988,
  editor =	 {Fujimura, Osamu},
  address =	 {New York},
  pages =	 {347--355}
}

@Article{Jilka/etal:1999,
  author =       {Jilka, Matthias and M{\"o}hler, Gregor and Dogil,
                  Grzegorz},
  title =        {{Rules for the generation of ToBI-based American
                  English intonation}},
  journal =      {Speech Communication},
  year =         1999,
  volume =       28,
  pages =        {83--108}
}

@InProceedings{Mobius:1995,
  author = 	 {M{\"o}bius, Bernd},
  title = 	 {Components of a quantitative model of {G}erman
		  intonation},
  booktitle =    {Proceedings of the 13th International Congress of
                  Phonetic Sciences (Stockholm)},
  year =         1995,
  volume =       2,
  pages =	 {108--115}
}

@Article{Pierrehumbert:1981,
  author = 	 {Pierrehumbert, Janet},
  title = 	 {Synthesizing intonation},
  journal = 	 {Journal of the Acoustical Society of America},
  year = 	 1981,
  volume =	 70,
  pages =	 {985--995}
}

@InCollection{Santen/Mobius:2000,
  author = 	 {van Santen, Jan P.~H. and M{\"o}bius, Bernd},
  title = 	 {A quantitative model of {F0} generation and alignment},
  editor = 	 {Botinis, Antonis},
  booktitle = 	 {Intonation---Analysis, Modelling and Technology},
  publisher = 	 {Kluwer},
  year = 	 2000,
  address =      {Dordrecht},
  pages =        {269--288}
}

@@Article{Santen/Shih:2000,
  author =       {van Santen, Jan P.~H. and Shih, Chilin},
  title =        {Suprasegmental and segmental timing models in
                  {M}andarin {C}hinese and {A}merican {E}nglish},
  journal =      {Journal of the Acoustical Society of America},
  year =         2000,
  volume =       107,
  number =       2,
  pages =        {1012--1026}
}

Article{Santen:1994,
  author = 	 {van Santen, Jan P.~H.},
  title = 	 {Assignment of segmental duration in text-to-speech
		  synthesis},
  journal = 	 {Computer Speech and Language},
  year = 	 1994,
  volume =	 8,
  pages =	 {95--128}
}

@InCollection{Santen:1998,
  author =       {van Santen, Jan P. H.},
  title =        {Timing},
  booktitle = 	 {Multilingual Text-to-Speech Synthesis: The {B}ell
                  {L}abs Approach},
  editor =       {Sproat, Richard},
  publisher = 	 {Kluwer},
  year = 	 1998,
  address =	 {Dordrecht},
  pages =        {115--139}
}

@InCollection{Shriberg/Stolcke:2004,
  author = 	 {Shrberg, Elizabeth and Stolcke, Andreas},
  title = 	 {Prosody modeling for automatic speech recognition
                  and understanding},
  booktitle = 	 {Mathematical Foundations of Speech and Language Processing},
  pages = 	 {105--114},
  publisher = {Springer},
  year = 	 2004,
  editor = 	 {Johnson, Mark and Khudanpur, S. and Ostendorf, Mari
                  and Rosenfeld, R.},
  volume = 	 138,
  series = 	 {IMA Volumes in Mathematics and Its Applications}
}

@Article{Shriberg/etal:2000,
  author =       {Shriberg, Elizabeth and Stolcke, Andreas and
                  Hakkani-T{\"u}r, Dilek and T{\"u}r, G{\"o}khan},
  title =        {Prosody-based automatic segmentation of speech into
                  sentences and topics},
  journal =      {Speech Communication},
  year =         2000,
  volume =       32,
  number =       {1--2},
  pages =        {??}
}

@InCollection{Zeissler/etal:2006a,
  author = 	 {Zei{\ss}ler, Viktor and Adelhardt, Johann and
                  Batliner, Anton and Frank, Carmen and N{\"o}th,
                  Elmar and Shi, Rui Ping and Niemann, Heinrich},
  title = 	 {The prosody module},
  booktitle = 	 {{SmartKom}: Foundations of Multimodal Dialogue Systems},
  editor =	 {Wahlster, Wolfgang},
  publisher = 	 {Springer},
  year = 	 2006,
  pages =	 {139--152}
}

bm 25.4.2013