Computational Linguistics Colloquium
Friday, 19 December, 11:00
Conference Room, Building C7 4
Latent-Variable Modeling of String Transductions
Markus DreyerJohns Hopkins University
String-to-string transduction is a central problem in computational linguistics and natural language processing. It occurs in tasks as diverse as name transliteration, spelling correction, pronunciation modeling and inflectional morphology. We present a conditional log-linear model for string-to-string transduction that employs overlapping features over latent alignment sequences and learns latent classes and latent string pair regions from incomplete training data. We evaluate our approach on morphological tasks and demonstrate that latent variables can dramatically improve results, even when trained on small data sets. On the task of generating morphological forms we outperform a baseline method reducing the error rate by up to 44%. On lemmatization we reduce the error rates in Wicentowski (2001) by up to 89%.
If you would like to meet with the speaker, please contact
Michael Jellinghaus.