International Post-Graduate College
Language Technology
&
Cognitive Systems
Saarland University University of Edinburgh
 

Is the End of Supervised Parsing in Sight?

Speaker: Rens Bod

Institution:

School of Computer Science
University of St Andrews

Abstract:

All current state-of-the-art parsing systems are supervised: they are trained on large human-annotated data-sets. Unfortunately, such annotated data-sets are available for a few languages only, and their construction is extremely time-consuming. While semi-supervised methods have recently gained promising results, they still need data-sets of annotated sentences to start with. A key issue in natural language processing is therefore the development of unsupervised methods for extracting parsing models from unlabeled raw data, of which unlimited quantities are available.

During the last few years there has been considerable progress in unsupervised parsing. The performance of unsupervised parsers has gone up from around 40% unlabeled f-score on the ATIS corpus (van Zaanen 2000; Clark 2001) to around 82% f-score on the Wall Street Journal corpus (Klein 2005; Bod 2006). While these scores remain behind the labeled f-score of roughly 91% of the best supervised parsers (Bod 2003; Charniak and Johnson 2005; McClosky et al. 2006), there is an important question how far we can get with a purely unsupervised parsing approach if trained on data-sets that are several magnitudes larger than hitherto attempted. This question is of direct relevance for improving concrete applications such as speech recognition systems, database interfaces and machine translation systems.

In this talk I shall give an overview of unsupervised parsing models and discuss what is needed for unsupervised parsers to compete with supervised ones. I will also go into the problem of evaluation, and argue that parsers (be they supervised or unsupervised) can best be evaluated by isolating their contribution in concrete, large-scale NLP applications such as syntax-based machine translation or structural language models for speech.

<< Back

Last modified: Thu, Jul 13, 2006 11:39:40 by