Modelling Collocations as TAG trees with multiple lexemes.

A collocation is a sequence of words that co-occur more often than one would expect by chance, for example "make decision", "crystal clear", "earth quake". Detecting collocations is particularly challenging and interesting in cases where, due to syntactic processes, the two parts of a collocation are far apart in the sentence. See also (Seretan, 2008). This project will start by automatically extracting strong collocations and idioms from text using tree-based methods and, after ranking the extracted collocations, decide how to encode them in tree-adjoining grammar trees. An important aspect will be the directionality of the collocation, i.e. whether the first word is strongly predictive of the second one. The usefulness of encoding strong collocations in the same tree will be evaluated within a sentence processing model with respect to psycholinguistic evidence from an eye-tracking corpus.


Collocation extraction based on syntactic parsing Violeta Seretan and Eric Wehrli

