Computational Linguistics Colloquium

Thursday, 17 November 2011, 16:15
Conference Room, Building C7 4

Latent Feature Models for the Structure and Meaning of Text

James Henderson
Computational Learning and Computational Linguistics Group, University of Geneva

Much of the meaning of text is reflected in individual words or phrases, but its full information content requires structured analyses of the syntax and semantics of natural language. Our work on methods for extracting such structured meaning representations from natural language has focused on the joint modelling of syntactic and semantic dependency structures. As is increasingly the case as research moves to more complex, deeper levels of semantic analysis, neither our domain knowledge nor the annotations in the data are sufficient to fully characterise the statistical regularities in this joint task. We have addressed this problem by developing latent variable models of structures, which allow us to postulate features without them being annotated in the data, and allow us to incorporate prior knowledge without making overly strong assumptions about the nature of the statistical regularities. We have used these models to achieve state-of-the-art results in both syntactic parsing and semantic role labelling across several languages, to improve semantic dependencies automatically transferred from translations, and to induce latent semantic features that would be useful in other tasks. These robust efficient latent variable models should, in future, allow us to incorporate increasingly sophisticated prior knowledge, learn from data with increasingly little annotation, and model increasingly complex tasks.

If you would like to meet with the speaker, please contact Ivan Titov.