Prof. Tania Avgustinova
Computational Linguistics

Prof. Roland Marti
Slavic Studies

Prof. Dietrich Klakow
Statistical NLP


Mutual intelligibility and surprisal in Slavic intercomprehension

SFB 1102 (C4)

The project INCOMSLAV is concerned with the differential encodings of linguistic categories in a cross-linguistic perspective (here: Slavic languages) focusing on density. In particular, the project investigates the relation of grammaticalisation, encoding density and information density. As a relevant application, intercomprehension within the family of Slavic languages is explored. The project brings together results from the analysis of parallel corpora and from a variety of experiments with native speakers of Slavic languages and compares them with insights of comparative historical linguistics on the relationship between Slavic languages. A statistical language model of surprisal is used to measure information density and as a tool to gauge how language users master high degrees of surprisal, due to partial incomprehensibility. The key idea here is that comprehension of an unknown, but related, language should be better, when the language model adapted for understanding the unknown language exhibits relatively low average surprisal, or density.

PhD research staff: Andrea Fischer, Klára Jágrová, Irina Stenger


INCOMSLAV materials
Lexical ressource: top 100 nouns of BG, CS, PL, RU
2016-09-16; 2017-02-17
Database for entropy and adaptation surprisal calculations (word pair lists BG-RU and PL-CS)
Computer code (scripts)



Jágrová, Stenger, Marti, Avgustinova. (2017). Lexical and Orthographic Distances between Czech, Polish, Russian, and Bulgarian - a Comparative Analysis of the Most Frequent Nouns. In: Language  Use  and  Linguistic Structure.  Olomouc  Modern  Language  Series,  Palacký University Olomouc. pp. 401-416 (online)

Stenger, Avgustinova, Marti. (2017) Levenshtein distance and word adaptation surprisal as methods of measuring mutual intelligibility in reading comprehension of Slavic languages. Computational Linguistics and Intellectual Technologies: International Conference "Dialogue 2017" Proceedings. Issue 16 (23), vol. 1, 304–317.(online)


Jágrová, Stenger, Avgustinova, Marti: Polski to język nieskomplikowany? Theoretische und praktische Interkomprehension der 100 häufigsten polnischen Substantive. In: Federalny Związek Nauczycieli Języka Polskiego (ed.). Polski w Niemczech - Polnisch in Deutschland 4(2016). pp. 5-19

Fischer, Jágrová, Stenger, Avgustinova, Klakow, Marti. (2016). Orthographic and Morphological Correspondences between Related Slavic Languages as a Base for Modeling of Mutual Intelligibility. In: Calzolari, Choukri, Declerck, Goggi, Grobelnik, Maegaard, Mariani, Mazo, Moreno, Odijk, Piperidis.(eds.) Language Resources and Evaluation Conference LREC 2016, pp. 4202-4209, included linguistic resources, Portorož (Slovenia)

Stenger. (2016) How Reading Intercomprehension Works among Slavic Languages with Cyrillic Script. In: Köllner,. Ziai (eds.): Proceedings of the ESSLLI 2016 Student Session: pp. 30-42


Fischer, Jágrová, Stenger, Avgustinova, Klakow, Marti. (2015). An Orthography Transformation Experiment with Czech-Polish and Bulgarian-Russian Parallel Word Sets. In: Sharp, Lubaszewski, Delmonte (eds.) Natural Language Processing and Cognitive Science 2015 Proceedings. Ca Foscarina Editrice, Venezia.

Fischer, Jágrová, Stenger, Avgustinova, Klakow, Marti (2015) Orthography in Language Modelling of Mutual Intelligibility. REMU International Conference on Receptive Multilingualism, University of Eastern Finland. (poster)

Avgustinova, Fischer, Jágrová, Klakow, Marti, Stenger (2015) The Empirical Basis of Slavic Intercomprehension. REMU International Conference on Receptive Multilingualism, University of Eastern Finland. (slides)


Klakow, Avgustinova, Stenger, Fischer, Jágrová: The INCOMSLAV project. Seminar in formal linguistics at Charles University, Prague. November 24, 2014. Video recording, abstract & presentation: