Computational Linguistics & Phonetics Computerlinguistik Phonetik   Fachrichtung 4.7 Allgemeine Linguistik, Universitšt des Saarlandes Fachrichtung 4.7 Allgemeine Linguistik Universitšt des Saarlandes
LMD-TAZ Corpus
Old Term Papers

Le Monde Diplomatique-Die Tageszeitung Parallel Corpus

The LMD-TAZ French-German Parallel Corpus consists of articles taken from the CD-Rom archive of the French monthly newspaper Le Monde Diplomatique (2001) and their German translation, published by taz, die tageszeitung. The corpus comprises the 136 articles in each language. It contains almost 243000 word tokens for French and a little over 224000 for German, or a little more than 9200 sentences in each language. It has been manually aligned and tagged with parts-of-speech using the TreeTagger. If you want to know more about the corpus, have a look at this short description of how it was built, which also contains some example sentences.

The corpus is provided for free for non-commercial purposes of research and education. The exact conditions are listed in the following license. If you are interested please fill it out and send it back to the address at the top of the agreement. We will then send you a link to download the corpus (approximately 2MB). For questions please write to Garance PARIS.