Most work in natural language processing focusses on the news domain. NLP tools such as part-of-speech taggers or syntactic parsers are typically trained and tested on news wire texts, such as the Wall Street Journal. However, most tools are sensitive to domain changes, i.e. their performance degrades (often significantly) when applied to a domain or genre that is different from that of the training data.

The manual annotation of large training corpora for the new target domain is typically not a viable solution. But recently there has been an increased interest in developing alternative techniques for cross-domain portability. In the seminar, we want to look at these techniques.

In addition, the participants will also learn something about the linguistic differences of different domains and genres (e.g. fiction, recipes, historical texts, bio-medical texts etc.). Participants who haven't had much previous practical experience with NLP tools (part-of-speech taggers, syntactic parsers, word sense disambiguation tools / frame assignment) will also learn how to use these tools for concrete tasks.

This course can either be run as a project seminar (with a larger practical component) or as a seminar (with a larger theoretical component) depending on the preferences of the participants

