Prof. Tania Avgustinova
Computational Linguistics

Prof. Roland Marti
Slavic Studies

Prof. Dietrich Klakow
Statistical NLP

Prof. Bernd Möbius


Mutual intelligibility and surprisal in Slavic intercomprehension

SFB 1102 (C4) | Web experiments  | Project wiki

The project INCOMSLAV investigates the relation between information density, encoding density and grammaticalisation in a cross-linguistic perspective, focusing on intercomprehension within the family of Slavic languages. In the initial funding period (2014-2018), the project brings together results from the analysis of parallel corpora and from a variety of experiments with native speakers of Slavic languages and compares them with insights of comparative historical linguistics on the relationship between Slavic languages. A statistical language model of surprisal is used to measure information density and as a tool to gauge how language users master high degrees of surprisal, due to partial incomprehensibility. The key idea here is that comprehension of an unknown, but related, language should be better, when the language model adapted for understanding the unknown language exhibits relatively low average surprisal, or density. In the second funding period (2018-2022), the research agenda is extended to spoken language, which allows us to investigate how information density is balanced between the acoustic and the text level in successful intercomprehension. At all levels from the acoustic signal and its phonetic structure to the texts generated from speech we develop similarity metrics and information density measures related to Slavic intercomprehension.

PhD research staff (phase 1): Andrea Fischer, Klára Jágrová, Irina Stenger

PhD research staff (phase 2): Yu Tracy Chen, Badr Abdullah , Jacek Kudera


Release INCOMSLAV materials Status Outdated
13.05.2016 Video: e-presentation by Klára Jágrová  public
15.03.2017 Lexical ressource: top 100 nouns of BG, CS, PL, RU request access 2016-09-16; 2017-02-17
07.06.2017 Database for entropy and adaptation surprisal calculations (word pair lists BG-RU and PL-CS) request access
07.06.2017 Computer code (scripts) public
Polish NP stimuli with distance and surprisal values

Highly predictive contexts (PL sentences)



Jágrová, Avgustinova: Intelligibility of highly predictable Polish target words in sentences presented to Czech readers. CICLing 2019. Preprint.

Stenger, Avgustinova, Belousov, Baranov, Erofeeva. 2019. Interaction of linguistic and socio-cognitive factors in receptive multilingualism [Vzaimodejstvie lingvističeskich i sociokognitivnych parametrov pri receptivnom mul’tilingvisme], 25th International Conference on Computational Linguistics and Intellectual Technologies (Dialogue 2019), Proceedings, Moscow, Russia:


Jágrová: Processing effort of Polish NPs for Czech readers  – A+N vs. N+A.In: Guz, Szymanek (eds.): Canonical and Non-Canonical Structures in Polish. Studies in Linguistics and Methodology vol. 12. Wydawnictwo KUL, pp. 123-143. Preprint

Jágrová, Avgustinova, Stenger, Fischer: Language models, surprisal and fantasy in Slavic intercomprehension, Computer Speech & Language, Available online 12 June 2018, ISSN 0885-2308,


Jágrová, Stenger, Avgustinova: Polski nadal naluesieskomplikowany? Interkomprehensionsexperimente mit Nominalphrasen. In: Federalny Związek Nauczycieli Języka Polskiego (ed.). Polski w Niemczech - Polnisch in Deutschland 5(2017). pp. 20-37

Stenger, Jágrová, Fischer, Avgustinova, Klakow, & Marti. (2017). Modeling the impact of orthographic coding on Czech–Polish and Bulgarian–Russian reading intercomprehension. Nordic Journal of Linguistics, 40(2), 175-199. doi:10.1017/S0332586517000130

Jágrová, Stenger, Marti, Avgustinova. (2017). Lexical and Orthographic Distances between Czech, Polish, Russian, and Bulgarian - a Comparative Analysis of the Most Frequent Nouns. In: Language  Use  and  Linguistic Structure.  Olomouc  Modern  Language  Series,  Palacký University Olomouc. pp. 401-416 (online)

Stenger, Avgustinova, Marti. (2017) Levenshtein distance and word adaptation surprisal as methods of measuring mutual intelligibility in reading comprehension of Slavic languages. Computational Linguistics and Intellectual Technologies: International Conference "Dialogue 2017" Proceedings. Issue 16 (23), vol. 1, 304–317.(online)


Jágrová, Stenger, Avgustinova, Marti: Polski to język nieskomplikowany? Theoretische und praktische Interkomprehension der 100 häufigsten polnischen Substantive. In: Federalny Związek Nauczycieli Języka Polskiego (ed.). Polski w Niemczech - Polnisch in Deutschland 4(2016). pp. 5-19

Fischer, Jágrová, Stenger, Avgustinova, Klakow, Marti. (2016). Orthographic and Morphological Correspondences between Related Slavic Languages as a Base for Modeling of Mutual Intelligibility. In: Calzolari, Choukri, Declerck, Goggi, Grobelnik, Maegaard, Mariani, Mazo, Moreno, Odijk, Piperidis.(eds.) Language Resources and Evaluation Conference LREC 2016, pp. 4202-4209, included linguistic resources, Portorož (Slovenia)

Stenger. (2016) How Reading Intercomprehension Works among Slavic Languages with Cyrillic Script. In: Köllner,. Ziai (eds.): Proceedings of the ESSLLI 2016 Student Session: pp. 30-42


Fischer, Jágrová, Stenger, Avgustinova, Klakow, Marti. (2015). An Orthography Transformation Experiment with Czech-Polish and Bulgarian-Russian Parallel Word Sets. In: Sharp, Lubaszewski, Delmonte (eds.) Natural Language Processing and Cognitive Science 2015 Proceedings. Ca Foscarina Editrice, Venezia.

Fischer, Jágrová, Stenger, Avgustinova, Klakow, Marti (2015) Orthography in Language Modelling of Mutual Intelligibility. REMU International Conference on Receptive Multilingualism, University of Eastern Finland. (poster)

Avgustinova, Fischer, Jágrová, Klakow, Marti, Stenger (2015) The Empirical Basis of Slavic Intercomprehension. REMU International Conference on Receptive Multilingualism, University of Eastern Finland. (slides)


Klakow, Avgustinova, Stenger, Fischer, Jágrová: The INCOMSLAV project. Seminar in formal linguistics at Charles University, Prague. November 24, 2014. Video recording, abstract & presentation: