Computational Linguistics & Phonetics Computational Linguistics & Phonetics Fachrichtung 4.7 Universität des Saarlandes

Computational Linguistics Colloquium

Thursday, 22 November, 16:15
Conference Room, Building C7 4

Natural Language Parsing with Incremental Bayesian Networks

James Henderson
School of Informatics,
University of Edinburgh

Statistical natural language parsing is a particularly challenging problem for machine learning methods because of the complex structural nature of its statistical dependencies. For example, the probability of using a given CFG rule in a parse can depend on a large number of other features of the parse, such as the grand-parent nonterminal or agreement features on a preceding word. Typically these features are specified by hand, but recently there has been interest in using latent variables to induce them automatically. In this talk I will present a framework for parsing with latent variables based on a form of Dynamic Bayesian Network called Incremental Sigmoid Belief Networks (ISBNs). Approximations to ISBNs have achieved state-of-the-art results on Penn Treebank parsing and on dependency parsing for a variety of languages.

ISBNs are designed for non-Markovian problems such an parsing, where the structure of the statistical dependencies is a function of the output structure. Exact inference in ISBNs cannot be done efficiently (as with many complex graphical models), but they are designed to allow efficient approximations. Such approximations include my previous neural network architecture for statistical parsing, and an incremental mean field approximation. The mean field approximation demonstrates that a more accurate approximation does lead to a more accurate parser, but the neural network approximation is much faster and achieves close to the same accuracy.

If you would like to meet with the speaker, please contact Verena Rieser.