Project Seminar: Current Topics in Machine Translation

Project Seminar: Current Topics in Machine Translation

Summer semester, 2010

The goal of this seminar is to familiarize students with current hot topics in Machine Translation research. Students will form groups and choose one of the presented projects to work on. We also encourage students with little to no programming experience to participate, as we would like the groups to be balanced, i.e. a programmer and a linguist can work together to achieve a goal. This will also help you figuring out how to work with people from different backgrounds and with different expectations.
This document contains brief descriptions of the potential projects. Empty circle bullets denote optional requirements. It is also possible to propose your own project. Please email your two choices of projects to Dr. Andreas Eisele (eisele at dfki.de) by May 3, 2010.
50% of the grade will be based on the project report (incl. source codes), 30% will be based on the presentation and 20% will be based on the system's performance. The grading will be based on clarity of the report and the presentation, as well as the creativity and ambitiousness shown in the execution of the project. It is highly encouraged to incorporate novel ideas.

Schedule

# Date Topic
1 21.04.2009 General Introduction
2 03.05.2009 Selection of projects
3 ? Progress report
4 ? Progress report
5 ? Progress report
6 23.07.2010 Final session

A: Improving word alignment with PoS annotations

Supervisor: Andreas
Miriam, Dominikus
GIZA++ is a freely available word alignment system, which includes the training program for IBM models and HMM model and Model 6. The goal of this project is to improve the word alignment quality by incorporating PoS information through alignment processes. This may involve the following steps: Prerequisites:

B: Structure-aware translation models

Supervisor: Yu
Anne, Ruth, Olga
The aim of this project is to develop a new rule extraction method to induce linguistically motivated models from a parallel corpus with existing dependency parses for at least one side. This includes: Prerequisites: Knowledge of Python/Java is a plus.
[Lopez] Adam Lopez 2008. Machine translation by pattern matching. Ph.D. thesis, College Park, MD, USA.

C: Dependency parsing for a more robust RBMT engine

Supervisor: Sabine
Andreas, Maria, Patricia
Many translation errors produced by a rule-based MT engine are due to the deterministic mechanism built in the system. The project aims at investigating the first phase, parsing analysis, of a RBMT system "Lucy" and seeking solutions to alleviate the problems. It includes the following parts: Prerequisites:

D: Inducing parallel lexicons from non-parallel corpora

Supervisor: Jia
Casey, Ceslav, Lilian
The aim of this project is to acquire bilingual information from language pairs for which parallel corpora of sufficient size are not available. It starts with lexicon acquisition and can be extended to much larger units such as sentence pairs. This may includes: Prerequisites:

E: Morphological classification for new languages via parallel corpora

This project is from the opposite angle of project D: use bilingual data to induce linguistic information on only one side. Only morphology is considered in this project. One possible approach is as follows: Prerequisites:

F: Post-editing interface for machine translations

This project requires to build a web-based interface that allows user to acquire and post-edit machine translations. The interface should not only facilitate the post-editing procedure, but also help to collect evidence for more reliable evaluation of MT outputs. Members of this project need to do the followings: Prerequisites:
[Koehn] Philipp Koehn and Barry Haddow, 2009, Interactive Assistance to Human Translators using Statistical Machine Translation Methods, MT Summit XII




Created on 21 Apr 2010.