International Post-Graduate College
Language Technology
&
Cognitive Systems
Saarland University University of Edinburgh
 

Hybrid Data-Driven Models of Machine Translation

Speaker: Andy Way

Institution:

National Centre for Language Technology
School of Computing
Dublin City University

Abstract:

This talk presents an our work on combining sub-sentential alignments from phrase-based statistical machine translation (SMT) and example-based MT (EBMT) systems to create novel hybrid data-driven systems capable of outperforming the baseline SMT and EBMT systems from which they were derived.

In previous work, we demonstrated that while an EBMT system is capable of outperforming a phrase-based SMT system constructed from freely available resources, a hybrid 'example-based' SMT system incorporating marker chunks and SMT sub-sentential alignments is capable of outperforming both baseline translation models (for French--English translation, at least).

Unlike the previous research, here we use the Europarl training and test sets. On these data sets, while all hybrid 'statistical' EBMT variants still fall short of the quality achieved by the baseline phrase-based SMT system, we show that adding the marker chunks to create a hybrid 'example-based' SMT system outperforms the two baseline systems from which it is derived.

Furthermore, we provide further evidence in favour of hybrid systems by adding an SMT target language model to the EBMT system, and demonstrate that this too has a positive effect on translation quality. We also show that many of the sub-sentential alignments derived from the Europarl corpus are created by either the PBSMT or the EBMT system, but not by both.

In sum, therefore, despite the obvious convergence of the two paradigms, the crucial differences between SMT and EBMT contribute positively to the overall translation quality. The central thesis of this talk, then, is that any researcher who continues to develop an MT system using either of these approaches will benefit further from integrating the advantages of the other model; dogged adherence to one approach to the exclusion of others will lead to inferior systems being developed.

<< Back

Last modified: Thu, Jul 13, 2006 11:39:40 by