Text-to-speech synthesis

Winter 2017/2018, Möbius

advanced course, lecture, 2 SWS, 3 LP/ECTS, LSF/HIS #104118

MSc LST/LCT; BSc CL; MSc/BSc Inf; MSc/BSc CuKT

Tue 16.15-17.45, C7.2 / Seminar room (foyer)

Entrance requirements

None, but some background in computational linguistics, speech science, or signal processing will be advantageous.

Assessments / Exams

Written exam at the end of the winter semester. Registration via HISPOS/LSF or, if this fails, by email to the lecturer.

Content

Speech synthesis is an essential component of any system relying on intuitive human-machine communication. Speech synthesis systems are also used in phonetic research to gain further insight into speech production and acoustic properties of speech. This advanced course offers an introduction to text-to-speech (TTS) synthesis systems and strategies. Various approaches to speech synthesis are presented, including formant synthesis, concatenative synthesis, and state-of-the-art corpus-based unit selection synthesis. Linguistic text analysis and natural language processing modules typically included in TTS systems are covered as well.

This course will be complemented by a research seminar on voice building (offered by Ingmar Steiner, spring break 2018, see below). Entrance requirement for the voice building course: passing the first written exam of the lecture course.

Contact:
  Prof. Dr. Bernd Möbius
  Email
  C7.2/4.10
  0681/302-4500

Structure

Date Topic Slides Assignment
24.10. Introduction Taylor 2009, ch. 1 and 2
(31.10.) (public holiday, no class)
07.11. Components of TTS systems Taylor 2009, ch. 3.1-3.4
14.11. Speech production and Formant synthesis Taylor 2009, ch. 7.1/7.2/13
21.11. Concatenative synthesis I Taylor 2009, ch. 14
28.11. Concatenative synthesis II Taylor 2009, ch. 14
(05.12.) (no class)
12.12. Unit Selection synthesis Taylor 2009, ch. 16
19.12. Statistical parametric synthesis I lecturer: Le Maguer Taylor 2009, ch. 15
02.01. Statistical parametric synthesis II lecturer: Le Maguer Taylor 2009, ch. 15
09.01. Duration and intonation modeling Taylor 2009, ch. 9
16.01. Linguistic text analysis Taylor 2009, sec. 7.4, 8.3-8.5
Möbius 2001, ch. 3-6
23.01. TTS system evaluation
Tue 30.01.
16:15-17:45
Written exam
Registration for exam by
Jan 16, 2018
registration via HISPOS/LSF,
or else by email to Prof. Möbius

Research seminar on voice building (March 2018)

TTS systems

Literature

Suggested companion text book Further text books: The books by Dutoit, Sproat, and Taylor are available in the CS library.

BibTex entries of all references (books, papers, URL):

@Book{Allen/etal:1987,
  author = 	 {Allen, Jonathan and Hunnicutt, M.~Sharon and Klatt,
		  Dennis}, 
  title = 	 {From Text to Speech: {T}he {MIT}alk System},
  publisher = 	 {Cambridge University Press},
  year = 	 1987,
  address =	 {Cambridge},
  annote =	 {tts, formant synthesis, textbook}
}

@Proceedings{Alter/etal:1997,
  title = 	 {Concept to Speech Generation Systems---Proceedings
		  of a Workshop in conjunction with 35th Annual
		  Meeting of the Association for Computational
		  Linguistics (Madrid, Spain)},
  year = 	 1997,
  editor =	 {Kai Alter and Hannes Pirker and Wolfgang Finkler}
}

@InProceedings{Black/Taylor:1994,
  author = 	 {Black, Alan W. and Taylor, Paul},
  title = 	 {C\textsc{hatr}: a generic speech synthesis system},
  booktitle = 	 {Proceedings of the International Conference on
                  Computational Linguistics (Kyoto, Japan)},
  volume =	 2,
  year =	 1994,
  pages =	 {983--986}
}

@Book{Breiman/etal:1984,
  author = 	 {Breiman, Leo and Friedman, Jerome~H. and Olshen,
		  Richard~A. and Stone, Charles~J.},
  title = 	 {Classification and Regression Trees},
  publisher = 	 {Wadsworth \& Brooks},
  year = 	 1984,
  address =	 {Pacific Grove, CA}
}

@InCollection{Campbell:1992,
  author = 	 {Campbell, W. Nick},
  title = 	 {Syllable-based segmental duration},
  editor = 	 {Bailly, G{\'e}rard and Beno{\^{\i}}t, Christian and
		  Sawallis, Thomas R.},
  booktitle = 	 {Talking Machines: Theories, Models, and Designs},
  publisher = 	 {Elsevier},
  year = 	 1992,
  address =	 {Amsterdam},
  pages =	 {211--224}
}

@Article{Campbell:1999,
  author =       {Campbell, W. Nick},
  title =        {A call for generic-use large-scale single-speaker
                  speech corpora and an example of their application
                  in concatenative speech synthesis},
  journal =      {Technical Publications, ATR Interpreting
		  Telecommunications Research Laboratories},
  year =         1999,
  pages =        {42--47},
  annote =       {unit selection}
}

@Article{Carlson/Granstrom:1991,
  author = 	 {Carlson, Rolf and Granstr{\"o}m, Bj{\"o}rn},
  title = 	 {Speech synthesis development and phonetic research---a
		  personal introduction},
  journal = 	 {Journal of Phonetics},
  year = 	 1991,
  volume =	 19,
  pages =	 {3--8},
  annote =       {synthesis}
}

@Book{Clark/Yallop:1995,
  author = 	 {Clark, John and Yallop, Colin},
  title = 	 {An Introduction to Phonetics and Phonology},
  publisher = 	 {Blackwell},
  year = 	 1995,
  address =	 {Oxford},
  edition =	 {2nd},
  note =	 {1st edition 1990}
}

@Book{Clark/etal:2007a,
  author = 	 {Clark, John and Yallop, Colin and Fletcher, Janet},
  title = 	 {An Introduction to Phonetics and Phonology},
  publisher = 	 {Blackwell},
  year = 	 2007,
  address =	 {Oxford},
  edition =	 {3rd},
  annote =	 {textbook, phonetics}
}

@Article{Clark/etal:2007b,
  author = 	 {Clark, Robert A.~J. and Richmond, Korin and King, Simon},
  title = 	 {Multisyn: Open-domain unit selection for the
                  {Festival} speech synthesis system},
  journal = 	 {Speech Communication},
  year = 	 2007,
  volume =	 49,
  number =	 4,
  pages =	 {317--330},
  annote =	 {unit selection, synthesis, Festival, voice building,
                  overview}
}

@Article{Dudley:1939a,
  author = 	 {Homer Dudley},
  title = 	 {The vocoder},
  journal = 	 {Bell Labs Record},
  year = 	 1939,
  volume =	 17,
  pages =	 {122--126}
}

@Book{Dutoit:1997,
  author = 	 {Dutoit, Thierry},
  title = 	 {An Introduction to Text-to-Speech Synthesis},
  publisher = 	 {Kluwer},
  year = 	 1997,
  address =	 {Dordrecht},
  annote =	 {Review by Eileen Fitzpatrick in CL 24 (2), 1998, 322--323},
  annote =	 {textbook, synthesis, tts}
}

@Article{Fant:1953,
  author =       {Fant, Gunnar},
  title =        {Speech communication research},
  journal =      {Ing. Vetenskaps Akad. Stockholm},
  year =         1953,
  volume =       24,
  pages =        {331--337},
  annote =       {formant synthesis, OVE I}
}

@Book{Fant:1960,
  author = 	 {Fant, Gunnar},
  title = 	 {Acoustic Theory of Speech Production},
  publisher = 	 {Mouton},
  year = 	 1960,
  address =	 {The Hague}
}

@InCollection{Fujisaki:1983,
  author = 	 {Fujisaki, Hiroya},
  title = 	 {Dynamic characteristics of voice fundamental frequency in
		  speech and singing},
  booktitle = 	 {The Production of Speech},
  publisher =	 {Springer},
  year =	 1983,
  editor =	 {MacNeilage, Peter F.},
  address =	 {New York},
  pages =	 {39--55}
}

@Article{Fujisaki:1987,
  author = 	 {Fujisaki, Hiroya},
  title = 	 {A note on the physiological and physical basis for the
		  phrase and accent components in the voice fundamental
		  frequency contours},
  journal = 	 {Annual Bulletin of the Research Institute for Logopedics
		  and Phoniatrics (Tokyo)},
  year = 	 1987,
  volume =	 21,
  pages =	 {165--175}
}

@Article{Fujisaki/etal:1979b,
  author = 	 {Fujisaki, Hiroya and Hirose, Keikichi and Ohta, K.},
  title = 	 {Acoustic features of the fundamental frequency contours
		  of declarative sentences in {J}apanese}, 
  journal = 	 {Annual Bulletin of the Research Institute for Logopedics
		  and Phoniatrics (Tokyo)},
  year = 	 1979,
  volume =	 13,
  pages =	 {163--172}
}

@Book{Gibbon/etal:1997,
  editor = 	 {Gibbon, Dafydd and Moore, Roger and Winski, Richard},
  title = 	 {Handbook of Standards and Resources for Spoken
		  Language Systems},
  publisher = 	 {Mouton de Gruyter},
  year = 	 1997,
  address =	 {Berlin},
  annote =	 {ISBN 3-11-015366-1; Review by Jan P. H. van Santen
		  in Computational Linguistics 24 (3), 512--515},
  annote =	 {EAGLES}
}

@Article{Holmes:1973,
  author = 	 {Holmes, John N.},
  title = 	 {The influence of glottal waveform on the naturalness of
		  speech from a parallel formant synthesizer}, 
  journal = 	 {IEEE Transactions AU},
  year = 	 1973,
  volume =	 21,
  pages =	 {298--305}
}

@Article{House/etal:1965,
  author =       {House, Arthur~S. and Williams, Carl and Hecker,
                  Michael H.~L. and Kryter, Karl~D.},
  title =        {Articulatory testing methods: Consonantal
                  differentiation with a closed-response set},
  journal =      {Journal of the Acoustical Society of America},
  year =         1965,
  volume =       37,
  pages =        {158--166},
  annote =       {evaluation, assessment},
  annote =       {modified rhyme test (MRT)}
}

@InProceedings{Hunt/Black:1996,
  author = 	 {Andrew J. Hunt and Alan W. Black},
  title = 	 {Unit selection in a concatenative speech synthesis
		  system using a large speech database},
  booktitle = 	 {Proceedings of the {IEEE} International Conference
		  on Acoustics and Speech Signal Processing
		  (M{\"u}nchen, Germany)},
  year = 	 1996,
  volume =	 1,
  pages =	 {373--376},
  annote =	 {unit selection}
}

@InProceedings{Iwahashi/Sagisaka:1993,
  author = 	 {Iwahashi, Naoto and Sagisaka, Yoshinori},
  title = 	 {Duration modelling with multiple split regression},
  booktitle = 	 {Proceedings of the European Conference on Speech
		  Communication and Technology (Berlin, Germany)},
  year = 	 1993,
  volume =	 {??},
  pages =	 {329--332},
  annote =	 {tts, prosody, duration}
}

@InProceedings{Jekosch:1993,
  author =       {Jekosch, Ute},
  title =        {Speech quality assessment and evaluation},
  booktitle = 	 {Proceedings of the European Conference on Speech
		  Communication and Technology (Berlin, Germany)},
  year = 	 1993,
  volume =       2,
  pages =        {1387--1394},
  annote =       {tts, evaluation, assessment},
  annote =       {cluster identification (CLID) test}
}

@Article{Kaplan/Kay:1994,
  author = 	 {Kaplan, Ronald and Kay, Martin},
  title = 	 {Regular models of phonological rule systems},
  journal = 	 {Computational Linguistics},
  year = 	 1994,
  volume =	 20,
  pages =	 {331--378}
}

@Book{Kempelen:1791,
  author = 	 {Kempelen, Wolfgang von},
  title = 	 {{Mechanismus der menschlichen Sprache nebst
		  Beschreibung einer sprechenden Maschine}},
  publisher = 	 {J. V. Degen},
  year = 	 1791,
  address =	 {Wien},
   note = {Facsimile Neudruck, 1970, der Ausgabe Wien 1791 mit einer
		  Einleitung von Herbert E. Brekle und Wolfgang
		  Wildgen. Friedrich Frommann, Stuttgart}
}

@Article{Klatt:1980a,
  author = 	 {Klatt, Dennis H.},
  title = 	 {Software for a cascade/parallel formant synthesizer},
  journal = 	 {Journal of the Acoustical Society of America},
  year = 	 1980,
  volume =	 67,
  number =	 3,
  pages =	 {971--980},
  annote =	 {synthesis, Klatt80 synthesizer}
}

@Article{Kratzenstein:1782,
  author = 	 {Kratzenstein, Christian Gottlieb},
  title = 	 {Sur la naissance de la formation des voyelles},
  journal = 	 {Journal de Physique},
  year = 	 1782,
  volume =	 21,
  pages =	 {358--380},
  note =	 {French translation of: Tentamen coronatum de voce,
		  Acta Acad. Petrog., 1780}
}

@Book{Leeuwen:1990,
  editor = 	 {van Leeuwen, J.},
  title = 	 {Handbook of Theoretical Computer Science},
  publisher = 	 {Elsevier, Amsterdam; MIT Press, Cambridge, MA},
  year = 	 1990,
  volume =	 {B},
  annote =	 {FST, fsm; ISBN 0 444 88074 7}
}

@Book{Mobius:1993a,
  author = 	 {M{\"o}bius, Bernd},
  title = 	 {{Ein quantitatives Modell der deut\-schen
		  Intona\-tion---Analyse und Synthese von 
		  Grund\-frequenz\-ver\-l{\"a}ufen}},
  publisher = 	 {Niemeyer},
  year = 	 1993,
  OPTnumber =	 305,
  OPTseries =	 {Linguistische Arbeiten},
  address =	 {T{\"u}bingen}
}

@Article{Mobius:1999,
  author =       {M{\"o}bius, Bernd},
  title =        {{The Bell Labs German text-to-speech system}},
  journal =      {Computer Speech and Language},
  year =         1999,
  volume =       13,
  pages =        {319--358},
  annote =       {tts, german}
}

@Book{Mobius:2001a,
  author = 	 {M{\"o}bius, Bernd},
  title = 	 {German and Multilingual Speech Synthesis},
  publisher = 	 {University of Stuttgart},
  year = 	 2001,
  series =	 {Arbeitspapiere des Instituts f{\"u}r Maschinelle
		  Sprachverarbeitung (Univ. Stuttgart), AIMS 7 (4)},
  pages =	 {1--300},
  annote =	 {synthesis, textbook, tts}
}

@InProceedings{Mobius/Santen:1996,
  author = 	 {M{\"o}bius, Bernd and van Santen, Jan},
  title = 	 {Modeling segmental duration in {G}erman text-to-speech
		  synthesis},
  booktitle = 	 {Proceedings of the International Conference on Spoken
		  Language Processing (Philadelphia, PA)},
  year = 	 1996,
  volume =	 4,
  pages =	 {2395--2398}
}

@Article{Mohri:1997,
  author = 	 {Mohri, Mehryar},
  title = 	 {Finite-state transducers in language and speech
		  processing},
  journal = 	 {Computational Linguistics},
  year = 	 1997,
  volume =	 23,
  number =	 2,
  pages =	 {269--311}
}

@InCollection{Mohri/etal:1998,
  author = 	 {Mohri, Mehryar and Pereira, Fernando and Riley, Michael},
  title = 	 {A rational design for a weighted finite-state
		  transducer library},
  booktitle = 	 {Lecture Notes in Computer Science 1436},
  publisher =	 {Springer},
  year =	 1998,
  editor =	 {Wood, D. and Yu, S.},
  address =	 {New York},
  pages =	 {144--158},
  annote =	 {fsm}
}

@InProceedings{Nikleczy/Olaszy:2003,
  author = 	 {Nikl{\'e}czy, P. and Olaszy, Gabor},
  title = 	 {A reconstruction of {Farkas Kempelen's} speaking machine},
  booktitle = 	 {Proceedings of the European Conference on Speech
		  Communication and Technology (Geneva, Switzerland)},
  year = 	 2003,
  pages =	 {2453--2456},
  annote =	 {Kempelen, speaking machine, synthesis}
}

@Article{Ohman:1967,
  author = 	 {{\"O}hman, Sven E. G.},
  title = 	 {Word and sentence intonation: a quantitative model},
  journal = 	 {Speech Transmission Laboratory---Quarterly Progress
		  and Status Report},
  year = 	 1967,
  volume =	 {2--3},
  pages =	 {20--54}
}

@Article{Ohman/Lindqvist:1966,
  author = 	 {{\"O}hman, Sven E. G. and Lindqvist, Jan},
  title = 	 {Analysis-by-synthesis of prosodic pitch contours},
  journal = 	 {Speech Transmission Laboratory---Quarterly Progress and Status Report},
  year = 	 1966,
  volume =	 4,
  pages =	 {1--6}
}

@InCollection{Olive/etal:1998,
  author =       {Olive, Joseph and van~Santen, Jan and M{\"o}bius,
                  Bernd and Shih, Chilin},
  title =        {Synthesis},
  booktitle = 	 {Multilingual Text-to-Speech Synthesis: The {B}ell
                  {L}abs Approach},
  editor = 	 {Richard Sproat},
  publisher = 	 {Kluwer},
  year = 	 1998,
  address =	 {Dordrecht},
  chapter =      7,
  pages =        {191--228}
}

@PhdThesis{Pierrehumbert:1980,
  author = 	 {Pierrehumbert, Janet},
  title = 	 {The phonology and phonetics of {E}nglish intonation},
  school = 	 {MIT},
  address =	 {Cambridge, MA},
  year = 	 1980
}

@InProceedings{Pols/etal:1992,
  author =       {Pols, Louis C. W. and SAM-Partners},
  title =        {Multi-lingual synthesis evaluation methods},
  booktitle = 	 {Proceedings of the International Conference on Spoken
		  Language Processing (Banff, Alberta)},
  year = 	 1992,
  volume =       1,
  pages =        {181--184},
  annote =       {tts, evaluation}
}

@Book{PompinoMarschall:1995,
  author = 	 {Pompino-Marschall, Bernd},
  title = 	 {{Einf{\"u}hrung in die Phonetik}},
  publisher = 	 {de Gruyter},
  year = 	 1995,
  address =	 {Berlin}
}

@InCollection{Riley:1992,
  author  = {Riley, Michael D.},
  title   = {Tree-based modeling for speech synthesis},
  editor = 	 {Bailly, G{\'e}rard and Beno{\^{\i}}t, Christian and
		  Sawallis, Thomas R.},
  booktitle = 	 {Talking Machines: Theories, Models, and Designs},
  publisher = 	 {Elsevier},
  year = 	 1992,
  address =	 {Amsterdam},
  pages   = {265--273}
}

@InProceedings{Santen:1992d,
  author =       {van Santen, Jan P.~H.},
  title =        {Diagnostic perceptual experiments for text-to-speech
                  system evaluation},
  booktitle = 	 {Proceedings of the International Conference on Spoken
		  Language Processing (Banff, Alberta)},
  year = 	 1992,
  volume =       1,
  pages =        {555--558},
  annote =       {tts, evaluation}
}

@Article{Santen:1993b,
  author = 	 {van Santen, Jan P.~H.},
  title = 	 {Exploring \textit{N}-way tables with sums-of-products
		  models},
  journal = 	 {Journal of Mathematical Psychology},
  year = 	 1993,
  volume =	 37,
  number =	 3,
  pages =	 {327--371}
}

@Article{Santen:1993c,
  author = 	 {van Santen, Jan P.~H.},
  title = 	 {Perceptual experiments for diagnostic testing of
		  text-to-speech systems}, 
  journal = 	 {Computer Speech and Language},
  year = 	 1993,
  volume =	 7,
  pages =	 {49--100}
}

@InCollection{Santen:1998a,
  author =       {van Santen, Jan P. H.},
  title =        {Timing},
  booktitle = 	 {Multilingual Text-to-Speech Synthesis: The {B}ell
                  {L}abs Approach},
  editor =       {Sproat, Richard},
  publisher = 	 {Kluwer},
  year = 	 1998,
  address =	 {Dordrecht},
  pages =        {115--139},
  annote =       {tts, duration}
}

@InCollection{Santen:1998b,
  author =       {van Santen, Jan P. H.},
  title =        {Evaluation},
  booktitle = 	 {Multilingual Text-to-Speech Synthesis: The {B}ell
                  {L}abs Approach},
  editor =       {Sproat, Richard},
  publisher = 	 {Kluwer},
  year = 	 1998,
  address =	 {Dordrecht},
  pages =        {229--244},
  annote =       {tts, evaluation}
}

@InCollection{Schweitzer/etal:2006a,
  author = 	 {Schweitzer, Antje and Braunschweiler, Norbert and
		  Dogil, Grzegorz and Klankert, Tanja and M{\"o}bius,
		  Bernd and M\"{o}hler, Gregor and Morais, Edmilson
		  and S{\"a}uberlich, Bettina and Thomae, Matthias},
  title = 	 {Multimodal speech synthesis},
  booktitle = 	 {{SmartKom}: Foundations of Multimodal Dialogue Systems},
  editor =	 {Wahlster, Wolfgang},
  publisher = 	 {Springer},
  year = 	 2004,
  pages =	 {411--435},
  annote =	 {sk, synthesis, unitsel}
}

@InProceedings{Silverman/etal:1992,
  author = 	 {Silverman, Kim and Beckman, Mary and Pitrelli,
		  John and Ostendorf, Mari and Wightman, Colin and
		  Price, Patti and Pierrehumbert, Janet and
		  Hirschberg, Julia},
  title = 	 {{ToBI: A standard for labelling English prosody}},
  booktitle = 	 {Proceedings of the International Conference on Spoken
		  Language Processing (Banff, Alberta)},
  year = 	 1992,
  volume =       2,
  pages =	 {867--870}
}

@InProceedings{Spiegel/etal:1988,
  author =       {Spiegel, Murray and Altom, Mary Jo and Macchi,
                  Marian and Wallace, Karen},
  title =        {Using a monosyllabic test corpus to evaluate the
                  intelligibility of synthesized and natural speech},
  booktitle =    {Proceedings of the American Voice I/O Systems Conference},
  year =         1988,
  OPTpages =        {??},
  annote =       {tts, evaluation, assessment},
  annote =       {Bellcore test}
}

@InProceedings{Spiegel/etal:1989,
  author = 	 {Murray Spiegel and Mary Jo Altom and Marian Macchi
		  and Karen Wallace},
  title = 	 {A monosyllabic test corpus to evaluate the
		  intelligibility of synthesized and natural speech},
  booktitle = 	 {Proceedings of the ESCA Workshop on Speech Input /
		  Output Assessment and Speech Databases},
  editor =	 {Institute of Phonetic Sciences, Amsterdam},
  year =	 1989,
  address =	 {Noordwijkerhout, The Netherlands},
  pages =	 {??}
}

@TechReport{Sproat:1995a,
  author = 	 {Sproat, Richard},
  title = 	 {{LEXTOOLS}: {T}ools for finite-state linguistic analysis},
  institution =  {AT\&T Bell Laboratories},
  year = 	 1995,
  note =	 {11522-951108-10TM}
}

@InProceedings{Sproat:1995b,
  author = 	 {Sproat, Richard},
  title = 	 {A finite-state architecture for tokenization and
		  grapheme-to-phoneme conversion in multilingual text
		  analysis},
  booktitle = 	 {{From text to tags---Issues in multilingual language
		  analysis. Proceedings of the ACL SIGDAT Workshop}},
  year = 	 1995,
  address =	 {University College, Belfield, Dublin, Ireland}
  pages =	 {65--72}
}

@Book{Sproat:1998,
  title = 	 {Multilingual Text-to-Speech Synthesis: The {B}ell
                  {L}abs Approach},
  editor = 	 {Sproat, Richard},
  publisher = 	 {Kluwer},
  year = 	 1998,
  address =	 {Dordrecht},
  annote =	 {ISBN 0-7923-8027-4; Review by Douglas O'Shaughnessy
		  in Computational Linguistics 24(4), 1998, 656--658},
  annote =       {textbook, tts, synthesis}
}

@Article{Syrdal/etal:1997,
  author = 	 {Syrdal, Ann K. and Conkie, Alistair and Stylianou,
		  Yannis and Schroeter, J{\"u}rgen and Garrison,
		  L.F. and Dutton, D.},
  title = 	 {Voice selection for speech synthesis},
  journal = 	 {Journal of the Acoustical Society of America},
  year = 	 1997,
  volume =	 102,
  number =	 5,
  pages =	 3191,
  note =	 {(abstract)},
  annote =	 {tts}
}

@InProceedings{Syrdal/etal:1998b,
  author =       {Syrdal, Ann K. and Conkie, Alistair and Stylianou,
		  Yannis},
  title =        {Exploration of acoustic correlates in speaker
                  selection for concatenative synthesis},
  booktitle = 	 {Proceedings of the International Conference on Spoken
		  Language Processing (Sydney, Australia)},
  year = 	 1998,
  volume =       6,
  pages =        {2743--2746},
  annote =       {tts, inventory, speaker}
}

@Book{Taylor:2009,
  author =	 {Taylor, Paul},
  title = 	 {Text-to-Speech Synthesis},
  publisher = 	 {Cambridge University Press},
  year = 	 2009,
  annote =	 {textbook, synthesis, tts}
}

@Book{Ungeheuer:1962a,
  author = 	 {Ungeheuer, Gerold},
  title = 	 {{Elemente einer akustischen Theorie der
		  Vokalartikulation}},
  publisher = 	 {Springer},
  year = 	 1962,
  address =	 {Berlin}
}

@TechReport{Voiers/etal:1972,
  author =       {Voiers, William and Sharpley, A. and Hehmsoth, C.},
  title =        {Research on diagnostic evaluation of speech
                  intelligibility},
  institution =  {Cambridge Research Laboratories},
  year =         1972,
  note =         {Research Report AFCRL-72-0694},
  annote =       {evaluation, assessment},
  annote =       {evaluation, diagnostic rhyme test (DRT)}
}

@Article{Voiers:1983,
  author = 	 {Voiers, William~D.},
  title = 	 {Evaluating processed speech using the {Diagnostic
		  Rhyme Test}},
  journal = 	 {Speech Technology},
  year = 	 1983,
  number =	 {Jan/Feb},
  pages =	 {30--39},
  annote =	 {evaluation, diagnostic rhyme test (DRT)}
}

@Article{Wheatstone:1838,
  author =       {Wheatstone, Charles},
  title =        {Art. II. -- 1. On the vowel sounds, and on Reed
    Organ Pipes. By Robert Willis [...] 2. Le Méchanisme de la Parole,
    suivi de la Description d'une Machine Parlante. Par M. de Kempelen
    [...] 3. C.G. Kratzenstein. Tentamen Coronatum de Voce. [...]}
  journal =      {The London and Westminster Review},
  year =         1838,
  pages =        {27ff.},
  annote =       {Review of von Kempelen, Kratzenstein, Willis}
}


bm 12.10.2017