Prosodic Models in Speech Technology

SoSe 2013, Möbius (Seminar, 2 SWS LSF 69390)

M.Sc. Language Science and Technology

Thu 10.15-11.45, C7.2/5.09

Entrance requirements

M.Sc. LST: Speech Science

Course description

Prosody research is quite diverse and controversial concerning theoretical approaches and methologies. In this seminar, the most important intonation and duration models will be presented and discussed, with respect to both their phonological and phonetic assumptions and their implementation and application to speech processing systems (speech synthesis, speech recognition). Participants will read, present, and discuss selected papers.

Course credits

M.Sc.: 7 CP (presentation and paper) or 4 CP (presentation only)

Active participation on a regular basis required.

Structure

Date Topic Papers / Slides Presented by
25.04. Introduction; Organization  
02.05. Overview I: Intonation models Botinis/etal:2001 pdf all
16.05. Overview II: Prosodic models Batliner/Mobius:2005 pdf all
23.05. ToBI-based intonation synthesis Jilka/etal:1999, Pierrehumbert:1981  
06.06. Fujisaki's intonation model Fujisaki:1988, Möbius:1995  
13.06. Quantitative alignment model Santen/Mobius:2000  
20.06. Segmental duration model Santen:1994, Santen:1998  
27.06. Segmental vs. syllable duration models Santen/Shih:2000  
04.07. Verbmobil prosody module Batliner/etal:2000 pdf  
11.07. Prosodic models in ASR/ASU Shriberg/Stolcke:2004 pdf  
18.07. Prosody-based topic segmentation Shriberg/etal:2000 pdf  
25.07. SmartKom prosody module Zeissler/etal:2006 pdf  

Literature

BibTex entries of all references (books, papers, URL):

@InCollection{Batliner/Mobius:2005,
  author = 	 {Batliner, Anton and M{\"o}bius, Bernd},
  title = 	 {Prosodic models, automatic speech understanding, and
		  speech synthesis: Towards the common ground?},
  booktitle = 	 {The Integration of Phonetic Knowledge in Speech Technology},
  publisher =	 {Springer},
  year =	 2005,
  editor =	 {Barry, William~J. and van Dommelen, Wim~A.},
  address =	 {Dordrecht},
  pages =	 {21--44}
}

@InCollection{Batliner/etal:2000c,
  author = 	 {Batliner, Anton and Buckow, J. and Niemann, Heinrich
		  and N{\"o}th, Elmar and Warnke, Volker},
  title = 	 {The prosody module},
  booktitle =    {Verbmobil: Foundations of Speech-to-Speech Translation},
  publisher =    {Springer},
  year =         2000,
  editor =       {Wahlster, Wolfgang},
  address =      {Berlin},
  pages = 	 {106--121}
}

@Article{Botinis/etal:2001,
  author =       {Botinis, Antonis and Granstr{\"o}m, Bj{\"o}rn and
                  M{\"o}bius, Bernd},
  title =        {Developments and paradigms in intonation research},
  journal =      {Speech Communication},
  year =         2001,
  volume =       33,
  number =       4,
  pages =        {263--296}
}

@InCollection{Fujisaki:1988,
  author = 	 {Fujisaki, Hiroya},
  title = 	 {A note on the physiological and physical basis for the
		  phrase and accent components in the voice fundamental
		  frequency contour},
  booktitle = 	 {Vocal Physiology: Voice Production, Mechanisms and
		  Functions},
  publisher =	 {Raven},
  year =	 1988,
  editor =	 {Fujimura, Osamu},
  address =	 {New York},
  pages =	 {347--355}
}

@Article{Jilka/etal:1999,
  author =       {Jilka, Matthias and M{\"o}hler, Gregor and Dogil,
                  Grzegorz},
  title =        {{Rules for the generation of ToBI-based American
                  English intonation}},
  journal =      {Speech Communication},
  year =         1999,
  volume =       28,
  pages =        {83--108}
}

@InProceedings{Mobius:1995,
  author = 	 {M{\"o}bius, Bernd},
  title = 	 {Components of a quantitative model of {G}erman
		  intonation},
  booktitle =    {Proceedings of the 13th International Congress of
                  Phonetic Sciences (Stockholm)},
  year =         1995,
  volume =       2,
  pages =	 {108--115}
}

@Article{Pierrehumbert:1981,
  author = 	 {Pierrehumbert, Janet},
  title = 	 {Synthesizing intonation},
  journal = 	 {Journal of the Acoustical Society of America},
  year = 	 1981,
  volume =	 70,
  pages =	 {985--995}
}

@InCollection{Santen/Mobius:2000,
  author = 	 {van Santen, Jan P.~H. and M{\"o}bius, Bernd},
  title = 	 {A quantitative model of {F0} generation and alignment},
  editor = 	 {Botinis, Antonis},
  booktitle = 	 {Intonation---Analysis, Modelling and Technology},
  publisher = 	 {Kluwer},
  year = 	 2000,
  address =      {Dordrecht},
  pages =        {269--288}
}

@@Article{Santen/Shih:2000,
  author =       {van Santen, Jan P.~H. and Shih, Chilin},
  title =        {Suprasegmental and segmental timing models in
                  {M}andarin {C}hinese and {A}merican {E}nglish},
  journal =      {Journal of the Acoustical Society of America},
  year =         2000,
  volume =       107,
  number =       2,
  pages =        {1012--1026}
}

Article{Santen:1994,
  author = 	 {van Santen, Jan P.~H.},
  title = 	 {Assignment of segmental duration in text-to-speech
		  synthesis},
  journal = 	 {Computer Speech and Language},
  year = 	 1994,
  volume =	 8,
  pages =	 {95--128}
}

@InCollection{Santen:1998,
  author =       {van Santen, Jan P. H.},
  title =        {Timing},
  booktitle = 	 {Multilingual Text-to-Speech Synthesis: The {B}ell
                  {L}abs Approach},
  editor =       {Sproat, Richard},
  publisher = 	 {Kluwer},
  year = 	 1998,
  address =	 {Dordrecht},
  pages =        {115--139}
}

@InCollection{Shriberg/Stolcke:2004,
  author = 	 {Shrberg, Elizabeth and Stolcke, Andreas},
  title = 	 {Prosody modeling for automatic speech recognition
                  and understanding},
  booktitle = 	 {Mathematical Foundations of Speech and Language Processing},
  pages = 	 {105--114},
  publisher = {Springer},
  year = 	 2004,
  editor = 	 {Johnson, Mark and Khudanpur, S. and Ostendorf, Mari
                  and Rosenfeld, R.},
  volume = 	 138,
  series = 	 {IMA Volumes in Mathematics and Its Applications}
}

@Article{Shriberg/etal:2000,
  author =       {Shriberg, Elizabeth and Stolcke, Andreas and
                  Hakkani-T{\"u}r, Dilek and T{\"u}r, G{\"o}khan},
  title =        {Prosody-based automatic segmentation of speech into
                  sentences and topics},
  journal =      {Speech Communication},
  year =         2000,
  volume =       32,
  number =       {1--2},
  pages =        {??}
}

@InCollection{Zeissler/etal:2006a,
  author = 	 {Zei{\ss}ler, Viktor and Adelhardt, Johann and
                  Batliner, Anton and Frank, Carmen and N{\"o}th,
                  Elmar and Shi, Rui Ping and Niemann, Heinrich},
  title = 	 {The prosody module},
  booktitle = 	 {{SmartKom}: Foundations of Multimodal Dialogue Systems},
  editor =	 {Wahlster, Wolfgang},
  publisher = 	 {Springer},
  year = 	 2006,
  pages =	 {139--152}
}

bm 25.4.2013