Prosodic Models in Speech Science and Technology

Winter 2018/2019, Möbius (Seminar, 2 SWS), LSF/HIS #111071

MSc Language Science and Technology / LCT

Wed 16:15-17:45, C7.2/5.09

Entrance requirements

Some background in Phonetics and Speech Science (recommended).

Course description

Prosody research is quite diverse and controversial concerning theoretical approaches and methologies. In this seminar, the most important intonation and duration models will be presented and discussed, with respect to both their phonological and phonetic assumptions and their implementation and application to speech processing systems (speech synthesis, speech recognition). Participants will read, present, and discuss selected papers.

Course credits

7 CP (presentation and paper) or 4 CP (presentation only).
Active participation on a regular basis required.

Requirements

Participation: You are expected to be physically present throughout the seminar and take part in the discussion.
You may miss maximally one class without formal consequences. Please send me an email message in this case, just saying that you will not take part, no explanation required. In case you cannot make it a second or third time, you have to write and submit a summary of the papers to be read (minimum one page per paper).

Reading: For each class, you are required to read one or two papers (see Schedule). For each paper, please send me one question that you want to be answered or discussed in class (on the day preceding the class, before midnight).

Presentation: An oral presentation of 30-45 minutes, typically based on a core paper and maybe some complementary reading. Please contact me (1) when you have been assigned a topic/paper and want to start working on it; (2) when you have a pre-final draft version of the presentation. After your presentation I will provide feedback to you. The final version of your slides will be posted on the course homepage.

Term Paper: MSc students opting for the 7 CP version have to write a term paper (15-20 pages, deadlines see below). The topic of the paper need not be identical or overlap with the topic of your oral presentation.

Oral Exam: If you decide to take an oral exam, we will together select 2 topics, which are disjoint from the topics of your presentation and term paper. Exam duration is 15-20 minutes.

Deadlines

Exam registration: January 23, 2019
Term paper: April 30, 2019

Contact:
  Prof. Dr. Bernd Möbius
  Email
  C7.2/4.10
  0681/302-4500


Please follow this link to enter your contact information: FORM

Schedule

Date Topic Papers / Slides / Questions Presented by
24.10. Planning and organization, paper assignment BM
07.11. Overview: Intonation models Botinis/etal:2001 pdf
questions
all
14.11. ToBI-based intonation synthesis Jilka/etal:1999 pdf, Pierrehumbert:1981 pdf / slides Georgis
19.12. Fujisaki's intonation model Fujisaki:1988 pdf, Möbius:1995 pdf / slides Hoang
23.01. Verbmobil and SmartKom prosody module Batliner/etal:2000 pdf, Zeissler/etal:2006 pdf / slides Saveleva
23.01. Prosodic models in ASR/ASU Shriberg/Stolcke:2004 pdf / slides Kröger
30.01. Quantitative alignment model Santen/Mobius:2000 pdf John
30.01. Discussion: Prosodic models Batliner/Mobius:2005 pdf BM/all

Literature

BibTex entries of all references (books, papers, URL):

@InCollection{Batliner/Mobius:2005,
  author = 	 {Batliner, Anton and M{\"o}bius, Bernd},
  title = 	 {Prosodic models, automatic speech understanding, and
		  speech synthesis: Towards the common ground?},
  booktitle = 	 {The Integration of Phonetic Knowledge in Speech Technology},
  publisher =	 {Springer},
  year =	 2005,
  editor =	 {Barry, William~J. and van Dommelen, Wim~A.},
  address =	 {Dordrecht},
  pages =	 {21--44}
}

@InCollection{Batliner/etal:2000c,
  author = 	 {Batliner, Anton and Buckow, J. and Niemann, Heinrich
		  and N{\"o}th, Elmar and Warnke, Volker},
  title = 	 {The prosody module},
  booktitle =    {Verbmobil: Foundations of Speech-to-Speech Translation},
  publisher =    {Springer},
  year =         2000,
  editor =       {Wahlster, Wolfgang},
  address =      {Berlin},
  pages = 	 {106--121}
}

@Article{Botinis/etal:2001,
  author =       {Botinis, Antonis and Granstr{\"o}m, Bj{\"o}rn and
                  M{\"o}bius, Bernd},
  title =        {Developments and paradigms in intonation research},
  journal =      {Speech Communication},
  year =         2001,
  volume =       33,
  number =       4,
  pages =        {263--296}
}

@InCollection{Fujisaki:1988,
  author = 	 {Fujisaki, Hiroya},
  title = 	 {A note on the physiological and physical basis for the
		  phrase and accent components in the voice fundamental
		  frequency contour},
  booktitle = 	 {Vocal Physiology: Voice Production, Mechanisms and
		  Functions},
  publisher =	 {Raven},
  year =	 1988,
  editor =	 {Fujimura, Osamu},
  address =	 {New York},
  pages =	 {347--355}
}

@Article{Jilka/etal:1999,
  author =       {Jilka, Matthias and M{\"o}hler, Gregor and Dogil,
                  Grzegorz},
  title =        {{Rules for the generation of ToBI-based American
                  English intonation}},
  journal =      {Speech Communication},
  year =         1999,
  volume =       28,
  pages =        {83--108}
}

@InProceedings{Mobius:1995,
  author = 	 {M{\"o}bius, Bernd},
  title = 	 {Components of a quantitative model of {G}erman
		  intonation},
  booktitle =    {Proceedings of the 13th International Congress of
                  Phonetic Sciences (Stockholm)},
  year =         1995,
  volume =       2,
  pages =	 {108--115}
}

@Article{Pierrehumbert:1981,
  author = 	 {Pierrehumbert, Janet},
  title = 	 {Synthesizing intonation},
  journal = 	 {Journal of the Acoustical Society of America},
  year = 	 1981,
  volume =	 70,
  pages =	 {985--995}
}

@InCollection{Santen/Mobius:2000,
  author = 	 {van Santen, Jan P.~H. and M{\"o}bius, Bernd},
  title = 	 {A quantitative model of {F0} generation and alignment},
  editor = 	 {Botinis, Antonis},
  booktitle = 	 {Intonation---Analysis, Modelling and Technology},
  publisher = 	 {Kluwer},
  year = 	 2000,
  address =      {Dordrecht},
  pages =        {269--288}
}

@@Article{Santen/Shih:2000,
  author =       {van Santen, Jan P.~H. and Shih, Chilin},
  title =        {Suprasegmental and segmental timing models in
                  {M}andarin {C}hinese and {A}merican {E}nglish},
  journal =      {Journal of the Acoustical Society of America},
  year =         2000,
  volume =       107,
  number =       2,
  pages =        {1012--1026}
}

Article{Santen:1994,
  author = 	 {van Santen, Jan P.~H.},
  title = 	 {Assignment of segmental duration in text-to-speech
		  synthesis},
  journal = 	 {Computer Speech and Language},
  year = 	 1994,
  volume =	 8,
  pages =	 {95--128}
}

@InCollection{Santen:1998,
  author =       {van Santen, Jan P. H.},
  title =        {Timing},
  booktitle = 	 {Multilingual Text-to-Speech Synthesis: The {B}ell
                  {L}abs Approach},
  editor =       {Sproat, Richard},
  publisher = 	 {Kluwer},
  year = 	 1998,
  address =	 {Dordrecht},
  pages =        {115--139}
}

@InCollection{Shriberg/Stolcke:2004,
  author = 	 {Shriberg, Elizabeth and Stolcke, Andreas},
  title = 	 {Prosody modeling for automatic speech recognition
                  and understanding},
  booktitle = 	 {Mathematical Foundations of Speech and Language Processing},
  pages = 	 {105--114},
  publisher = {Springer},
  year = 	 2004,
  editor = 	 {Johnson, Mark and Khudanpur, S. and Ostendorf, Mari
                  and Rosenfeld, R.},
  volume = 	 138,
  series = 	 {IMA Volumes in Mathematics and Its Applications}
}

@Article{Shriberg/etal:2000,
  author =       {Shriberg, Elizabeth and Stolcke, Andreas and
                  Hakkani-T{\"u}r, Dilek and T{\"u}r, G{\"o}khan},
  title =        {Prosody-based automatic segmentation of speech into
                  sentences and topics},
  journal =      {Speech Communication},
  year =         2000,
  volume =       32,
  number =       {1--2},
  pages =        {??}
}

@InCollection{Zeissler/etal:2006a,
  author = 	 {Zei{\ss}ler, Viktor and Adelhardt, Johann and
                  Batliner, Anton and Frank, Carmen and N{\"o}th,
                  Elmar and Shi, Rui Ping and Niemann, Heinrich},
  title = 	 {The prosody module},
  booktitle = 	 {{SmartKom}: Foundations of Multimodal Dialogue Systems},
  editor =	 {Wahlster, Wolfgang},
  publisher = 	 {Springer},
  year = 	 2006,
  pages =	 {139--152}
}

bm 1.2.2019