The project INCOMSLAV investigates the relation between
information density, encoding density and
grammaticalisation in a cross-linguistic perspective,
focusing on intercomprehension within the family of Slavic
languages. In the initial funding period (2014-2018), the
project brings together results from the analysis of
parallel corpora and from a variety of experiments with
native speakers of Slavic languages and compares them with
insights of comparative historical linguistics on the
relationship between Slavic languages. A statistical
language model of surprisal is used to measure information
density and as a tool to gauge how language users master
high degrees of surprisal, due to partial
incomprehensibility. The key idea here is that
comprehension of an unknown, but related, language should
be better, when the language model adapted for
understanding the unknown language exhibits relatively low
average surprisal, or density. In the second funding
period (2018-2022), the research agenda is extended to
spoken language, which allows us to investigate how
information density is balanced between the acoustic and
the text level in successful intercomprehension. At all
levels from the acoustic signal and its phonetic structure
to the texts generated from speech we develop similarity
metrics and information density measures related to Slavic
intercomprehension.
In the first two phases of the CRC, the empirical focus
of C4 was on the mutual intelligibility of visual
(written) or auditory (spoken) input for speakers of
closely related languages in the Slavic language family.
Experimental and modelling work in the second phase, which
has combined methods from language, speech and translation
technology, has provided a wealth of findings highlighting
how information density is distributed across the acoustic
and the text channels in successful intercomprehension.
Based on these results, we are now in a position to
address, in the third phase (2022-2026), core properties
of intercomprehension as they unfold in goal-oriented
communication, characterized by cooperative behaviour and
adaptive interaction. This overarching goal entails the
investigation of linguistic structures beyond lexical
similarity and word sequence based predictability, taking
into account constructional similarity, the crosslingual
transparency of multi-component units, and prosody.
Specifically, conversational dialogue-style experimental
setups are employed in order to explore the (ex)change of
information as the interaction unfolds. We will develop
models of surprisal capturing the information conveyed by
multi-component units and prosodic features, in particular
intonation. Finally, C4 will validate the scalability of
our results and models in terms of a transfer to a
selected set of features of other language families, e.g.
Semitic.
PhD research staff
(phase 1): Andrea
Fischer, Klára
Jágrová, Irina
Stenger
PhD research staff
(phase 2): Yu
Tracy Chen, Badr
Abdullah, Jacek
Kudera
PhD research staff
(phase 3): Iiuliia
Zaitova, Badr
Abdullah (Postdoc)
Release |
INCOMSLAV materials |
Status |
Outdated |
24.11 2014. |
The INCOMSLAV project. Seminar
in formal linguistics at Charles University,
Prague.
Video recording, abstract & presentation: http://lectures.ms.mff.cuni.cz/view.php?rec=238 |
public
|
|
28.05.2015
|
Avgustinova, Fischer, Jágrová,
Klakow, Marti, Stenger : The Empirical Basis of
Slavic Intercomprehension.(slides) |
public
|
|
29.05.2015 |
Fischer, Jágrová, Stenger,
Avgustinova, Klakow, Marti : Orthography in Language
Modelling of Mutual Intelligibility. (poster) |
public
|
|
13.05.2016 |
Video: e-presentation
by Klára Jágrová |
public |
|
15.03.2017 |
Lexical ressource: top
100 nouns of BG, CS, PL, RU |
request access |
2016-09-16;
2017-02-17 |
07.06.2017 |
Database for entropy and adaptation
surprisal calculations (word
pair lists BG-RU and PL-CS) |
request access |
|
07.06.2017 |
Computer code (scripts) |
public |
|
23.05.2019
|
Polish NP stimuli with
distance and surprisal values
|
public
|
|
23.05.2019
|
Highly predictive
contexts (PL
sentences)
|
public
|
|
Prediction
in language comprehension | James Gleick |
...
|
2020
Stenger, Jágrová, Avgustinova. 2020. The
INCOMSLAV Platform: Experimental Website with Integrated
Methods for Measuring Linguistic Distances and Asymmetries
in Receptive Multilingualism. In J.Fiumara, C.Cieri,
M.Liberman, C.Callison-Burch (eds.), LREC 2020 Workshop
Language Resources and Evaluation Conference 11-16 May 2020,
Citizen Linguistics in Language Resource Development (CLLRD
2020), Proceedings, pp. 40–48
Stenger, Jágrová, Fischer, Avgustinova (2020): “Reading
Polish with Czech Eyes” or “How Russian Can a Bulgarian Text
Be?”: Orthographic Differences as an Experimental Variable
in Slavic Intercomprehension. In T.Radeva-Bork and P.Kosta
(eds.), Current developments in Slavic Linguistics. Twenty
years after (based on selected papers from FDSL 11). Peter
Lang, 483-500 (preprint,
link to
publication)
2019
Avgustinova (2019) Gegenseitige
Verstehbarkeit und Surprisal in Slavischer
Interkomprehension: empirische Basis und linguistische
Modellierung. Invited
lecture at University of Hamburg
Jagrova, Stenger, Avgustinova (2019) Slavic
Intercomprehension Matrix. 13.Deutscher
Slavistentag, Internationaler Kongress der
deutschsprachigen Slavistik. Sektion: Didaktik der
slavischen Sprachen und Kulturen
Avgustinova, Iomdin (2019) Towards
a Typology of Microsyntactic Constructions. In:
G.Corpas-Pastor, R.Mitkov (Eds.) Computational
and Corpus-Based Phraseology. Springer, Cham:15-30
Mosbach, Stenger, Avgustinova, Klakow. (2019): incom.py
- A Toolbox for Calculating Linguistic Distances and
Asymmetries between Related Languages. In: Galia
Angelova, Ruslan Mitkov, Ivelina Nikolova, Irina Temnikova
(eds.), Proceedings of Recent Advances in Natural Language
Processing, RANLP 2019, Varna, Bulgaria, 2-4 September 2019,
pages 811-819
Jágrová, Avgustinova: Intelligibility of highly
predictable Polish target words in sentences presented to
Czech readers. CICLing 2019. Preprint.
Stenger, Avgustinova, Belousov, Baranov, Erofeeva. 2019.
Interaction of linguistic and socio-cognitive factors in
receptive multilingualism [Vzaimodejstvie lingvističeskich i
sociokognitivnych parametrov pri receptivnom
mul’tilingvisme], 25th International Conference on
Computational Linguistics and Intellectual Technologies
(Dialogue 2019), Proceedings, Moscow, Russia: http://www.dialog-21.ru/digest/2019/online/.
2018
Jágrová: Processing effort of Polish
NPs for Czech readers
– A+N vs. N+A.In:
Guz, Szymanek (eds.): Canonical
and Non-Canonical Structures in Polish. Studies in
Linguistics and Methodology vol. 12. Wydawnictwo KUL,
pp. 123-143. Preprint
Jágrová, Avgustinova, Stenger, Fischer: Language
models, surprisal and fantasy in Slavic
intercomprehension, Computer Speech & Language,
Available online 12 June 2018, ISSN 0885-2308,
https://doi.org/10.1016/j.csl.2018.04.005.
2017
Jágrová, Stenger, Avgustinova: Polski nadal
naluesieskomplikowany?
Interkomprehensionsexperimente mit Nominalphrasen.
In: Federalny Związek Nauczycieli Języka Polskiego
(ed.). Polski
w Niemczech - Polnisch in Deutschland
5(2017). pp. 20-37
Stenger, Jágrová, Fischer, Avgustinova, Klakow,
& Marti. (2017). Modeling the impact of
orthographic coding on Czech–Polish and
Bulgarian–Russian reading intercomprehension. Nordic
Journal of Linguistics, 40( 2),
175-199. doi:10.1017/S0332586517000130
Jágrová, Stenger, Marti,
Avgustinova. (2017). Lexical and Orthographic Distances
between Czech, Polish, Russian, and Bulgarian - a
Comparative Analysis of the Most Frequent Nouns. In:
Language Use and Linguistic
Structure. Olomouc Modern Language Series, Palacký
University Olomouc. pp. 401-416 (online)
Stenger, Avgustinova, Marti. (2017) Levenshtein distance and
word adaptation surprisal as methods of measuring mutual
intelligibility in reading comprehension of Slavic
languages. Computational Linguistics and Intellectual
Technologies: International Conference "Dialogue 2017"
Proceedings. Issue 16 (23), vol. 1, 304–317.(online)
2016
Jágrová, Stenger, Avgustinova, Marti: Polski to język
nieskomplikowany? Theoretische und praktische
Interkomprehension der 100 häufigsten polnischen
Substantive. In: Federalny Związek Nauczycieli Języka
Polskiego (ed.). Polski
w Niemczech - Polnisch in Deutschland 4(2016). pp.
5-19
Fischer, Jágrová, Stenger, Avgustinova, Klakow, Marti.
(2016). Orthographic
and Morphological Correspondences between Related Slavic
Languages as a Base for Modeling of Mutual Intelligibility.
In: Calzolari, Choukri, Declerck, Goggi, Grobelnik,
Maegaard, Mariani, Mazo, Moreno, Odijk, Piperidis.(eds.) Language
Resources and Evaluation Conference LREC 2016, pp.
4202-4209, included
linguistic resources, Portorož (Slovenia)
Stenger. (2016) How
Reading Intercomprehension Works among Slavic Languages
with Cyrillic Script. In: Köllner,. Ziai (eds.): Proceedings of the ESSLLI 2016 Student
Session: pp. 30-42
2015
Fischer, Jágrová, Stenger, Avgustinova, Klakow, Marti.
(2015). An Orthography Transformation Experiment with
Czech-Polish and Bulgarian-Russian Parallel Word Sets. In:
Sharp, Lubaszewski, Delmonte (eds.) Natural
Language Processing and Cognitive Science 2015 Proceedings.
Ca Foscarina Editrice, Venezia.
|