A
Syntactic Flexibility Measure for Learning Multiword Expressions
Colin Bannard
The extraction of multiword
units from text corpora has been a topic of research in NLP for more that
a decade. This has focused almost exclusively on the extraction of
sequences of words that occur with a higher frequency than
would be predicted from the frequencies of the individual words.
For some lexicographic tasks such as terminology extraction this seems
to have proved useful. However, for most NLP tasks, we are
interested not in statistically idiosyncratic units but
rather in those units that do not behave like regular word combinations
in terms of their syntax or their semantics. Extraction
techniques based solely on cooccurence statistics cannot be
used to acquire these, as the vast majority of the phrases
returned are syntactic and semantically orthodox. This talk
will describe an attempt to automatically extract V+NP multiword expressions
by looking at how their syntactic behaviour observed over a corpus
diverges from what we would expect to see in a free word
combination.