International Post-Graduate College
Language Technology
&
Cognitive Systems
Saarland University University of Edinburgh
 

Data-driven Natural Language Generation: Experiments on Content Selection

Speaker: Mirella Lapata

Institution:

Institute for Communicating and Collaborative Systems and
Human Communication Research Centre
School of Informatics, University of Edinburgh

Abstract:

Generation systems create natural language texts or utterances from some underlying non-linguistic input (e.g., a database). Generation technology to date has been successfully deployed in a variety of domains. Examples include the authoring of stock reports, weather forecasts, help messages, and explanations of medical information.

Content selection is an important component in many generation systems. It determines which pieces of information to include in the generated document and is crucial for authoring coherent and meaningful texts. Existing methods typically rely on handcrafted rules, thus requiring substantial human involvement. Development efforts often span several years and must be repeated for each application domain anew.

In this talk we propose a data-driven method for learning content selection rules automatically from a parallel corpus of texts and their corresponding database. We treat content selection as a collective classification problem. Our method exploits rich structural information present in the database and is able to model complex dependencies between database items. Experiments in a sports domain demonstrate that our approach achieves substantial improvements over state-of-the-art methods.

[Joint work with Regina Barzilay]

<< Back

Last modified: Thu, Jul 13, 2006 11:39:40 by