Modelling Collocations as TAG trees with multiple lexemes.
A collocation is a sequence of words that co-occur more often than one
would expect by chance, for example "make decision", "crystal clear",
"earth quake". Detecting collocations is particularly challenging and
interesting in cases where, due to syntactic processes, the two parts
of a collocation are far apart in the sentence. See also (Seretan, 2008).
This project will start by automatically extracting strong
collocations and idioms from text using tree-based methods and, after
ranking the extracted collocations, decide how to encode them in
tree-adjoining grammar trees. An important aspect will be the
directionality of the collocation, i.e. whether the first word is
strongly predictive of the second one.
The usefulness of encoding strong collocations in the same tree will be
evaluated within a sentence processing model with respect to
psycholinguistic evidence from an eye-tracking corpus.
Collocation extraction based on syntactic parsing
Violeta Seretan and Eric Wehrli
back to thesis / hiwi topics