INCOMSLAV

Prof. Tania Avgustinova
Computational Linguistics

Prof. Roland Marti
Slavic Studies

Prof. Dietrich Klakow
Statistical NLP

Prof. Bernd Möbius
Phonetics


Fachrichtung
              4.7

Mutual intelligibility and surprisal in Slavic intercomprehension

SFB 1102 (C4) | Web experiments  | Project wiki

he project INCOMSLAV investigates the relation between information density, encoding density and grammaticalisation in a cross-linguistic perspective, focusing on intercomprehension within the family of Slavic languages. In the initial funding period (2014-2018), the project brings together results from the analysis of parallel corpora and from a variety of experiments with native speakers of Slavic languages and compares them with insights of comparative historical linguistics on the relationship between Slavic languages. A statistical language model of surprisal is used to measure information density and as a tool to gauge how language users master high degrees of surprisal, due to partial incomprehensibility. The key idea here is that comprehension of an unknown, but related, language should be better, when the language model adapted for understanding the unknown language exhibits relatively low average surprisal, or density. In the second funding period (2018-2022), the research agenda is extended to spoken language, which allows us to investigate how information density is balanced between the acoustic and the text level in successful intercomprehension. At all levels from the acoustic signal and its phonetic structure to the texts generated from speech we develop similarity metrics and information density measures related to Slavic intercomprehension.

In the first two phases of the CRC, the empirical focus of C4 was on the mutual intelligibility of visual (written) or auditory (spoken) input for speakers of closely related languages in the Slavic language family. Experimental and modelling work in the second phase, which has combined methods from language, speech and translation technology, has provided a wealth of findings highlighting how information density is distributed across the acoustic and the text channels in successful intercomprehension. Based on these results, we are now in a position to address, in the third phase (2022-2026), core properties of intercomprehension as they unfold in goal-oriented communication, characterized by cooperative behaviour and adaptive interaction. This overarching goal entails the investigation of linguistic structures beyond lexical similarity and word sequence based predictability, taking into account constructional similarity, the crosslingual transparency of multi-component units, and prosody. Specifically, conversational dialogue-style experimental setups are employed in order to explore the (ex)change of information as the interaction unfolds. We will develop models of surprisal capturing the information conveyed by multi-component units and prosodic features, in particular intonation. Finally, C4 will validate the scalability of our results and models in terms of a transfer to a selected set of features of other language families, e.g. Semitic.

PhD research staff (phase 1): Andrea Fischer, Klára Jágrová, Irina Stenger

PhD research staff (phase 2): Yu Tracy Chen, Badr Abdullah, Jacek Kudera

PhD research staff (phase 3): Iiuliia Zaitova, Badr Abdullah (Postdoc)

Release INCOMSLAV materials Status Outdated
24.11 2014. The INCOMSLAV project. Seminar in formal linguistics at Charles University, Prague.
Video recording, abstract & presentation: http://lectures.ms.mff.cuni.cz/view.php?rec=238
public

28.05.2015
Avgustinova, Fischer, Jágrová, Klakow, Marti, Stenger : The Empirical Basis of Slavic Intercomprehension.(slides) public

29.05.2015 Fischer, Jágrová, Stenger, Avgustinova, Klakow, Marti : Orthography in Language Modelling of Mutual Intelligibility. (poster) public

13.05.2016 Video: e-presentation by Klára Jágrová  public
15.03.2017 Lexical ressource: top 100 nouns of BG, CS, PL, RU request access 2016-09-16; 2017-02-17
07.06.2017 Database for entropy and adaptation surprisal calculations (word pair lists BG-RU and PL-CS) request access
07.06.2017 Computer code (scripts) public
23.05.2019
Polish NP stimuli with distance and surprisal values
public

23.05.2019
Highly predictive contexts (PL sentences)
public

Recent pblications & Resources


Prediction in language comprehension | James Gleick |

...



2020

Stenger, Jágrová, Avgustinova. 2020. The INCOMSLAV Platform: Experimental Website with Integrated Methods for Measuring Linguistic Distances and Asymmetries in Receptive Multilingualism. In J.Fiumara, C.Cieri, M.Liberman, C.Callison-Burch (eds.), LREC 2020 Workshop Language Resources and Evaluation Conference 11-16 May 2020, Citizen Linguistics in Language Resource Development (CLLRD 2020), Proceedings, pp. 40–48

Stenger, Jágrová, Fischer, Avgustinova (2020): “Reading Polish with Czech Eyes” or “How Russian Can a Bulgarian Text Be?”: Orthographic Differences as an Experimental Variable in Slavic Intercomprehension. In T.Radeva-Bork and P.Kosta (eds.), Current developments in Slavic Linguistics. Twenty years after (based on selected papers from FDSL 11). Peter Lang, 483-500 (preprint, link to publication)

2019

Avgustinova (2019) Gegenseitige Verstehbarkeit und Surprisal in Slavischer Interkomprehension: empirische Basis und linguistische Modellierung. Invited lecture at University of Hamburg

Jagrova, Stenger, Avgustinova (2019) Slavic Intercomprehension Matrix. 13.Deutscher Slavistentag, Internationaler Kongress der deutschsprachigen Slavistik. Sektion: Didaktik der slavischen Sprachen und Kulturen

Avgustinova, Iomdin (2019) Towards a Typology of Microsyntactic Constructions. In: G.Corpas-Pastor, R.Mitkov (Eds.) Computational and Corpus-Based Phraseology. Springer, Cham:15-30

Mosbach, Stenger, Avgustinova, Klakow. (2019): incom.py - A Toolbox for Calculating Linguistic Distances and Asymmetries between Related Languages. In: Galia Angelova, Ruslan Mitkov, Ivelina Nikolova, Irina Temnikova (eds.), Proceedings of Recent Advances in Natural Language Processing, RANLP 2019, Varna, Bulgaria, 2-4 September 2019, pages 811-819

Jágrová, Avgustinova: Intelligibility of highly predictable Polish target words in sentences presented to Czech readers. CICLing 2019. Preprint.

Stenger, Avgustinova, Belousov, Baranov, Erofeeva. 2019. Interaction of linguistic and socio-cognitive factors in receptive multilingualism [Vzaimodejstvie lingvističeskich i sociokognitivnych parametrov pri receptivnom mul’tilingvisme], 25th International Conference on Computational Linguistics and Intellectual Technologies (Dialogue 2019), Proceedings, Moscow, Russia: http://www.dialog-21.ru/digest/2019/online/.

2018

Jágrová: Processing effort of Polish NPs for Czech readers  – A+N vs. N+A.In: Guz, Szymanek (eds.): Canonical and Non-Canonical Structures in Polish. Studies in Linguistics and Methodology vol. 12. Wydawnictwo KUL, pp. 123-143. Preprint

Jágrová, Avgustinova, Stenger, Fischer: Language models, surprisal and fantasy in Slavic intercomprehension, Computer Speech & Language, Available online 12 June 2018, ISSN 0885-2308, https://doi.org/10.1016/j.csl.2018.04.005.

2017

Jágrová, Stenger, Avgustinova: Polski nadal naluesieskomplikowany? Interkomprehensionsexperimente mit Nominalphrasen. In: Federalny Związek Nauczycieli Języka Polskiego (ed.). Polski w Niemczech - Polnisch in Deutschland 5(2017). pp. 20-37

Stenger, Jágrová, Fischer, Avgustinova, Klakow, & Marti. (2017). Modeling the impact of orthographic coding on Czech–Polish and Bulgarian–Russian reading intercomprehension. Nordic Journal of Linguistics, 40(2), 175-199. doi:10.1017/S0332586517000130

Jágrová, Stenger, Marti, Avgustinova. (2017). Lexical and Orthographic Distances between Czech, Polish, Russian, and Bulgarian - a Comparative Analysis of the Most Frequent Nouns. In: Language  Use  and  Linguistic Structure.  Olomouc  Modern  Language  Series,  Palacký University Olomouc. pp. 401-416 (online)

Stenger, Avgustinova, Marti. (2017) Levenshtein distance and word adaptation surprisal as methods of measuring mutual intelligibility in reading comprehension of Slavic languages. Computational Linguistics and Intellectual Technologies: International Conference "Dialogue 2017" Proceedings. Issue 16 (23), vol. 1, 304–317.(online)

2016

Jágrová, Stenger, Avgustinova, Marti: Polski to język nieskomplikowany? Theoretische und praktische Interkomprehension der 100 häufigsten polnischen Substantive. In: Federalny Związek Nauczycieli Języka Polskiego (ed.). Polski w Niemczech - Polnisch in Deutschland 4(2016). pp. 5-19

Fischer, Jágrová, Stenger, Avgustinova, Klakow, Marti. (2016). Orthographic and Morphological Correspondences between Related Slavic Languages as a Base for Modeling of Mutual Intelligibility. In: Calzolari, Choukri, Declerck, Goggi, Grobelnik, Maegaard, Mariani, Mazo, Moreno, Odijk, Piperidis.(eds.) Language Resources and Evaluation Conference LREC 2016, pp. 4202-4209, included linguistic resources, Portorož (Slovenia)

Stenger. (2016) How Reading Intercomprehension Works among Slavic Languages with Cyrillic Script. In: Köllner,. Ziai (eds.): Proceedings of the ESSLLI 2016 Student Session: pp. 30-42

2015

Fischer, Jágrová, Stenger, Avgustinova, Klakow, Marti. (2015). An Orthography Transformation Experiment with Czech-Polish and Bulgarian-Russian Parallel Word Sets. In: Sharp, Lubaszewski, Delmonte (eds.) Natural Language Processing and Cognitive Science 2015 Proceedings. Ca Foscarina Editrice, Venezia.