Prosodic Models in Speech Science and Technology

Winter 2016/2017, Möbius (Seminar, 2 SWS), LSF/HIS 98198

MSc Language Science and Technology / LCT

Thu 8:30-10:00, C7.2/5.09

Registration for final exam by Feb 1, 2017!

Entrance requirements

Foundations of Language Science and Technology, and Speech Science (recommended).

Course description

Prosody research is quite diverse and controversial concerning theoretical approaches and methologies. In this seminar, the most important intonation and duration models will be presented and discussed, with respect to both their phonological and phonetic assumptions and their implementation and application to speech processing systems (speech synthesis, speech recognition). Participants will read, present, and discuss selected papers.

Course credits

7 CP (presentation and paper) or 4 CP (presentation only).
Active participation on a regular basis required.

Requirements

Participation: You are expected to be physically present throughout the seminar and take part in the discussion.
You may miss maximally one class without formal consequences. Please send me an email message in this case, just saying that you will not take part, no explanation required. In case you cannot make it a second or third time, you have to write and submit a summary of the papers to be read (minimum one page per paper).

Reading: For each class, you are required to read one or two papers (see Schedule). For each paper, please send me one question that you want to be answered or discussed in class (on the day preceding the class, before midnight).

Presentation: An oral presentation of 30-45 minutes, typically based on a core paper and maybe some complementary reading. Please contact me (1) when you have been assigned a topic/paper and want to start working on it; (2) when you have a pre-final draft version of the presentation. After your presentation I will provide feedback to you. The final version of your slides will be posted on the course homepage.

Term Paper: MSc students opting for the 7 CP version have to write a term paper (deadlines see below). The topic of the paper need not be identical or overlap with the topic of your oral presentation.

Oral Exam: If you decide to take an oral exam, we will together select 2 topics, which are disjoint from the topics of your presentation and term paper. Exam duration is 15-20 minutes.

Deadlines

Exam registration: t.b.a.
Term paper: April 30, 2017

Contact:
  Prof. Dr. Bernd Möbius
  Email
  C7.2/4.10
  0681/302-4500

Schedule (tentative, to be discussed)

Date Topic Papers / Slides Presented by
03.11. Introduction; Organization  
17.11. Overview I: Intonation models Botinis/etal:2001 pdf all
01.12. Fujisaki's intonation model Fujisaki:1988 pdf, Möbius:1995 pdf Möbius
08.12. Pierrehumbert's intonation model
ToBI-based synthesis
Pierrehumbert:1980 pdf
Pierrehumbert:1981 pdf, Jilka/etal:1999 pdf
O'Mahony
slides
15.12. Verbmobil prosody module
SmartKom prosody module
Batliner/etal:2000 pdf
Zeissler/etal:2006 pdf
Ruiter
slides
05.01. Segmental duration model
Segmental vs. syllable duration models
Santen:1994 pdf, Santen:1998 pdf
Santen/Shih:2000 pdf
Liu
slides-1, slides-2
12.01. Prosodic models in ASR/ASU
Prosody-based topic segmentation
Shriberg/Stolcke:2004 pdf
Shriberg/etal:2000 pdf
Muljadi
slides
t.b.d. Discussion: Prosodic models Batliner/Mobius:2005 pdf all

Literature

BibTex entries of all references (books, papers, URL):

@InCollection{Batliner/Mobius:2005,
  author = 	 {Batliner, Anton and M{\"o}bius, Bernd},
  title = 	 {Prosodic models, automatic speech understanding, and
		  speech synthesis: Towards the common ground?},
  booktitle = 	 {The Integration of Phonetic Knowledge in Speech Technology},
  publisher =	 {Springer},
  year =	 2005,
  editor =	 {Barry, William~J. and van Dommelen, Wim~A.},
  address =	 {Dordrecht},
  pages =	 {21--44}
}

@InCollection{Batliner/etal:2000c,
  author = 	 {Batliner, Anton and Buckow, J. and Niemann, Heinrich
		  and N{\"o}th, Elmar and Warnke, Volker},
  title = 	 {The prosody module},
  booktitle =    {Verbmobil: Foundations of Speech-to-Speech Translation},
  publisher =    {Springer},
  year =         2000,
  editor =       {Wahlster, Wolfgang},
  address =      {Berlin},
  pages = 	 {106--121}
}

@Article{Botinis/etal:2001,
  author =       {Botinis, Antonis and Granstr{\"o}m, Bj{\"o}rn and
                  M{\"o}bius, Bernd},
  title =        {Developments and paradigms in intonation research},
  journal =      {Speech Communication},
  year =         2001,
  volume =       33,
  number =       4,
  pages =        {263--296}
}

@InCollection{Fujisaki:1988,
  author = 	 {Fujisaki, Hiroya},
  title = 	 {A note on the physiological and physical basis for the
		  phrase and accent components in the voice fundamental
		  frequency contour},
  booktitle = 	 {Vocal Physiology: Voice Production, Mechanisms and
		  Functions},
  publisher =	 {Raven},
  year =	 1988,
  editor =	 {Fujimura, Osamu},
  address =	 {New York},
  pages =	 {347--355}
}

@Article{Jilka/etal:1999,
  author =       {Jilka, Matthias and M{\"o}hler, Gregor and Dogil,
                  Grzegorz},
  title =        {{Rules for the generation of ToBI-based American
                  English intonation}},
  journal =      {Speech Communication},
  year =         1999,
  volume =       28,
  pages =        {83--108}
}

@InProceedings{Mobius:1995,
  author = 	 {M{\"o}bius, Bernd},
  title = 	 {Components of a quantitative model of {G}erman
		  intonation},
  booktitle =    {Proceedings of the 13th International Congress of
                  Phonetic Sciences (Stockholm)},
  year =         1995,
  volume =       2,
  pages =	 {108--115}
}

@Article{Pierrehumbert:1981,
  author = 	 {Pierrehumbert, Janet},
  title = 	 {Synthesizing intonation},
  journal = 	 {Journal of the Acoustical Society of America},
  year = 	 1981,
  volume =	 70,
  pages =	 {985--995}
}

@InCollection{Santen/Mobius:2000,
  author = 	 {van Santen, Jan P.~H. and M{\"o}bius, Bernd},
  title = 	 {A quantitative model of {F0} generation and alignment},
  editor = 	 {Botinis, Antonis},
  booktitle = 	 {Intonation---Analysis, Modelling and Technology},
  publisher = 	 {Kluwer},
  year = 	 2000,
  address =      {Dordrecht},
  pages =        {269--288}
}

@@Article{Santen/Shih:2000,
  author =       {van Santen, Jan P.~H. and Shih, Chilin},
  title =        {Suprasegmental and segmental timing models in
                  {M}andarin {C}hinese and {A}merican {E}nglish},
  journal =      {Journal of the Acoustical Society of America},
  year =         2000,
  volume =       107,
  number =       2,
  pages =        {1012--1026}
}

Article{Santen:1994,
  author = 	 {van Santen, Jan P.~H.},
  title = 	 {Assignment of segmental duration in text-to-speech
		  synthesis},
  journal = 	 {Computer Speech and Language},
  year = 	 1994,
  volume =	 8,
  pages =	 {95--128}
}

@InCollection{Santen:1998,
  author =       {van Santen, Jan P. H.},
  title =        {Timing},
  booktitle = 	 {Multilingual Text-to-Speech Synthesis: The {B}ell
                  {L}abs Approach},
  editor =       {Sproat, Richard},
  publisher = 	 {Kluwer},
  year = 	 1998,
  address =	 {Dordrecht},
  pages =        {115--139}
}

@InCollection{Shriberg/Stolcke:2004,
  author = 	 {Shrberg, Elizabeth and Stolcke, Andreas},
  title = 	 {Prosody modeling for automatic speech recognition
                  and understanding},
  booktitle = 	 {Mathematical Foundations of Speech and Language Processing},
  pages = 	 {105--114},
  publisher = {Springer},
  year = 	 2004,
  editor = 	 {Johnson, Mark and Khudanpur, S. and Ostendorf, Mari
                  and Rosenfeld, R.},
  volume = 	 138,
  series = 	 {IMA Volumes in Mathematics and Its Applications}
}

@Article{Shriberg/etal:2000,
  author =       {Shriberg, Elizabeth and Stolcke, Andreas and
                  Hakkani-T{\"u}r, Dilek and T{\"u}r, G{\"o}khan},
  title =        {Prosody-based automatic segmentation of speech into
                  sentences and topics},
  journal =      {Speech Communication},
  year =         2000,
  volume =       32,
  number =       {1--2},
  pages =        {??}
}

@InCollection{Zeissler/etal:2006a,
  author = 	 {Zei{\ss}ler, Viktor and Adelhardt, Johann and
                  Batliner, Anton and Frank, Carmen and N{\"o}th,
                  Elmar and Shi, Rui Ping and Niemann, Heinrich},
  title = 	 {The prosody module},
  booktitle = 	 {{SmartKom}: Foundations of Multimodal Dialogue Systems},
  editor =	 {Wahlster, Wolfgang},
  publisher = 	 {Springer},
  year = 	 2006,
  pages =	 {139--152}
}

bm 13.1.2017