Prof. Tania Avgustinova
Computational Linguistics

Prof. Roland Marti
Slavic Studies

Prof. Dietrich Klakow
Statistical NLP

Prof. Bernd Möbius
Phonetics

Mutual intelligibility and surprisal in Slavic intercomprehension

SFB 1102 (C4) | Web experiments | Project wiki

The project INCOMSLAV investigates the relation between information density, encoding density and grammaticalisation in a cross-linguistic perspective, focusing on intercomprehension within the family of Slavic languages. In the initial funding period (2014-2018), the project brings together results from the analysis of parallel corpora and from a variety of experiments with native speakers of Slavic languages and compares them with insights of comparative historical linguistics on the relationship between Slavic languages. A statistical language model of surprisal is used to measure information density and as a tool to gauge how language users master high degrees of surprisal, due to partial incomprehensibility. The key idea here is that comprehension of an unknown, but related, language should be better, when the language model adapted for understanding the unknown language exhibits relatively low average surprisal, or density. In the second funding period (2018-2022), the research agenda is extended to spoken language, which allows us to investigate how information density is balanced between the acoustic and the text level in successful intercomprehension. At all levels from the acoustic signal and its phonetic structure to the texts generated from speech we develop similarity metrics and information density measures related to Slavic intercomprehension.

PhD research staff (phase 1): Andrea Fischer, Klára Jágrová, Irina Stenger

PhD research staff (phase 2): Yu Tracy Chen, Badr Abdullah , Jacek Kudera

Resources

Release	INCOMSLAV materials	Status	Outdated
13.05.2016	Video: e-presentation by Klára Jágrová	public
15.03.2017	Lexical ressource: top 100 nouns of BG, CS, PL, RU	request access	2016-09-16; 2017-02-17
07.06.2017	Database for entropy and adaptation surprisal calculations (word pair lists BG-RU and PL-CS)	request access
07.06.2017	Computer code (scripts)	public
23.05.2019	Polish NP stimuli with distance and surprisal values	public
23.05.2019	Highly predictive contexts (PL sentences)	public

Publications

2019

Jágrová, Avgustinova: Intelligibility of highly predictable Polish target words in sentences presented to Czech readers. CICLing 2019. Preprint.

Stenger, Avgustinova, Belousov, Baranov, Erofeeva. 2019. Interaction of linguistic and socio-cognitive factors in receptive multilingualism [Vzaimodejstvie lingvističeskich i sociokognitivnych parametrov pri receptivnom mul’tilingvisme], 25th International Conference on Computational Linguistics and Intellectual Technologies (Dialogue 2019), Proceedings, Moscow, Russia: http://www.dialog-21.ru/digest/2019/online/.

2018

Jágrová: Processing effort of Polish NPs for Czech readers – A+N vs. N+A.In: Guz, Szymanek (eds.): Canonical and Non-Canonical Structures in Polish. Studies in Linguistics and Methodology vol. 12. Wydawnictwo KUL, pp. 123-143. Preprint

Jágrová, Avgustinova, Stenger, Fischer: Language models, surprisal and fantasy in Slavic intercomprehension, Computer Speech & Language, Available online 12 June 2018, ISSN 0885-2308, https://doi.org/10.1016/j.csl.2018.04.005.

2017

Jágrová, Stenger, Avgustinova: Polski nadal naluesieskomplikowany? Interkomprehensionsexperimente mit Nominalphrasen. In: Federalny Związek Nauczycieli Języka Polskiego (ed.). Polski w Niemczech - Polnisch in Deutschland 5(2017). pp. 20-37

Stenger, Jágrová, Fischer, Avgustinova, Klakow, & Marti. (2017). Modeling the impact of orthographic coding on Czech–Polish and Bulgarian–Russian reading intercomprehension. Nordic Journal of Linguistics, 40(2), 175-199. doi:10.1017/S0332586517000130

Jágrová, Stenger, Marti, Avgustinova. (2017). Lexical and Orthographic Distances between Czech, Polish, Russian, and Bulgarian - a Comparative Analysis of the Most Frequent Nouns. In: Language Use and Linguistic Structure. Olomouc Modern Language Series, Palacký University Olomouc. pp. 401-416 (online)

Stenger, Avgustinova, Marti. (2017) Levenshtein distance and word adaptation surprisal as methods of measuring mutual intelligibility in reading comprehension of Slavic languages. Computational Linguistics and Intellectual Technologies: International Conference "Dialogue 2017" Proceedings. Issue 16 (23), vol. 1, 304–317.(online)

2016

Jágrová, Stenger, Avgustinova, Marti: Polski to język nieskomplikowany? Theoretische und praktische Interkomprehension der 100 häufigsten polnischen Substantive. In: Federalny Związek Nauczycieli Języka Polskiego (ed.). Polski w Niemczech - Polnisch in Deutschland 4(2016). pp. 5-19

Fischer, Jágrová, Stenger, Avgustinova, Klakow, Marti. (2016). Orthographic and Morphological Correspondences between Related Slavic Languages as a Base for Modeling of Mutual Intelligibility. In: Calzolari, Choukri, Declerck, Goggi, Grobelnik, Maegaard, Mariani, Mazo, Moreno, Odijk, Piperidis.(eds.) Language Resources and Evaluation Conference LREC 2016, pp. 4202-4209, included linguistic resources, Portorož (Slovenia)

Stenger. (2016) How Reading Intercomprehension Works among Slavic Languages with Cyrillic Script. In: Köllner,. Ziai (eds.): Proceedings of the ESSLLI 2016 Student Session: pp. 30-42

2015

Fischer, Jágrová, Stenger, Avgustinova, Klakow, Marti. (2015). An Orthography Transformation Experiment with Czech-Polish and Bulgarian-Russian Parallel Word Sets. In: Sharp, Lubaszewski, Delmonte (eds.) Natural Language Processing and Cognitive Science 2015 Proceedings. Ca Foscarina Editrice, Venezia.

Fischer, Jágrová, Stenger, Avgustinova, Klakow, Marti (2015) Orthography in Language Modelling of Mutual Intelligibility. REMU International Conference on Receptive Multilingualism, University of Eastern Finland. (poster)

Avgustinova, Fischer, Jágrová, Klakow, Marti, Stenger (2015) The Empirical Basis of Slavic Intercomprehension. REMU International Conference on Receptive Multilingualism, University of Eastern Finland. (slides)

2014

Klakow, Avgustinova, Stenger, Fischer, Jágrová: The INCOMSLAV project. Seminar in formal linguistics at Charles University, Prague. November 24, 2014. Video recording, abstract & presentation: http://lectures.ms.mff.cuni.cz/view.php?rec=238