Computational Linguistics & Phonetics Computational Linguistics & Phonetics Fachrichtung 4.7 Universität des Saarlandes

Computational Linguistics Colloquium

Thursday, 9 July, 16:15
Seminar Room, Building C7 2

Translation Model Search Spaces

Adam Lopez
Department of Informatics, University of Edinburgh

Despite a wealth of literature on statistical translation, many tradeoffs in the design of large-scale systems are not well understood. I introduce new theoretical and empirical techniques to identify the common elements and isolate the differences of competing systems, and assess the performance of individual components. First, I present a theoretical framework for search space analysis based on semiring parsing, using it to derive some surprising conclusions about phrase-based models and simplify the construction of new models. Next, I describe an empirical study on induction errors, which occur when good translations are absent from model search spaces. The results show that a common pruning heuristic drastically increases induction error, and prove that the high-probability regions of phrase-based and hierarchical model search spaces are nearly identical. Finally, I will outline new efforts to capitalize on these discoveries. This talk represents joint work with Michael Auli, Hieu Hoang, and Philipp Koehn.

If you would like to meet with the speaker, please contact Andreas Eisele.