Computational Linguistics & Phonetics Computational Linguistics & Phonetics Fachrichtung 4.7 Universität des Saarlandes

Distinguished Speakers in Language Science

07 April 2016, 16:15
Conference Room 407. building E 1.1!

Modeling Linguistic Relationships and Language Evolution

Roman Yangarber
Department of Computer Science, University of Helsinki

Linguists have been studying the relationships among languages for centuries, and in particular the question of how languages evolved over time and from which ancestral languages. In the Etymon Project, we approach these questions via quantitative, statistical means.

We will discuss several approaches to modeling linguistic evolutionary processes, i.e., processes by which language families evolve through time, which are in some ways similar to how biological populations evolve. We begin with datasets of cognates words from different languages in a language family which are believed to be genetically related, i.e., which derive from a common (typically, unobserved) ancestor via unobserved laws of sound change. The only assumption we make is that the sound laws are regular. The methods are based on the information-theoretic Minimum Description Length principle (MDL).

Our goals include:

- to find globally-optimal models of the data at the level of individual sounds,

- to discover the laws of sound change inherent in the observed data,

- to reconstruct the phylogenetic structure of the language family.

We discuss comparing the quality of the proposed models, as well as the quality of alignments in the data, ways of measuring distance between languages, and comparing the quality of different datasets for the same language family. We also consider ways of evaluating the goodness of the resulting phylogenies, relative to available "gold-standard" trees.

Our studies are based on data from the Uralic, Turkic and Indo-European language families.

If you would like to meet with the speaker, please contact Andrea Fischer.