Computational Linguistics Colloquium

Friday, 17 May, 14:15, Room 101, Building 31
NOTE UNUSUAL DAY, TIME and PLACE

On the acquisition of collocations from German text corpora

Uli Heid
Institute for Computational Linguistics (IMS)
University of Stuttgart

Collocations have recently gained quite some interest within linguistics and lexicography as well as beyond: in particular, NLP-based procedures for collocation identification (symbolic, statistical and hybrid) have been addressed, with a view to the provision of lists of lexical collocations.

In this talk, two different techniques for the extraction of noun+verb-collocations and noun+adjective-collocations from text corpora will be discussed: one approach relies on chunking and pattern-based extraction, the other on the use of the statistical grammar developed in the Stuttgart-based Gramotron project (Rooth, Schmid, Schulte im Walde, Zinsmeister).

These techniques will be analysed primarily against a linguistic and lexicographic expectation horizon: we first describe the types of information about collocations which should be extracted from the text material; beyond the information about lexical cooccurrence, in particular data about morphosyntactic properties are needed, e.g. for the application of a collocation list in parsing.

In the last part of the talk, we will give examples of possible uses of collocational data for other tasks of lexical acquisition, both from general language and from legal sublanguage (the data discussed will be from German and Dutch).

If you would like to meet with the speaker, please contact Katrin Erk.