Cross-lingual Bootstrapping of Resources for Role-Semantic Analysis
Speaker: Sebastian Pado
Institution: Saarland University
Abstract:
The difficulty and high cost of manually creating data with lexical semantic annotation, the so-called "lexical bottleneck problem", has led to a notable absence of broad-coverage lexical semantic resources for virtually all languages except English. This presentation introduces the task of automatic induction of semantic class and role information for new languages. Given that unsupervised methods are still in their infancy, but that hand-crafted resources exist for English, we propose the use of cross-lingual annotation projection, i.e., the transfer of linguistic information between corresponding sentences in parallel corpora.
We present the first application of annotation projection to the semantic domain and show how the task of semantic role projection can be phrased elegantly within an optimisation framework. The models we develop are adaptive in the sense that they do not require much linguistic knowledge, which may be unavailable for resource-poor target languages, but can incorporate it when present. Our evaluation indicates that semantic information can be induced across languages with a high degree of accuracy, at least for related languages such as German.