Computational Linguistics & Phonetics Computational Linguistics & Phonetics Fachrichtung 4.7 Universität des Saarlandes

Computational Linguistics Colloquium

Friday, 19 December, 11:00
Conference Room, Building C7 4

Latent-Variable Modeling of String Transductions

Markus Dreyer
Johns Hopkins University

String-to-string transduction is a central problem in computational linguistics and natural language processing. It occurs in tasks as diverse as name transliteration, spelling correction, pronunciation modeling and inflectional morphology. We present a conditional log-linear model for string-to-string transduction that employs overlapping features over latent alignment sequences and learns latent classes and latent string pair regions from incomplete training data. We evaluate our approach on morphological tasks and demonstrate that latent variables can dramatically improve results, even when trained on small data sets. On the task of generating morphological forms we outperform a baseline method reducing the error rate by up to 44%. On lemmatization we reduce the error rates in Wicentowski (2001) by up to 89%.

If you would like to meet with the speaker, please contact Michael Jellinghaus.