International Research Training Group
Language Technology
&
Cognitive Systems
Saarland University University of Edinburgh
 

Assessing the Utility of Automatically Labelled Examples for Classifying Discourse Relations

Speaker:Caroline Sporleder

Institution:Saarland University

Abstract:

Being able to identify which discourse relations (for example, Contrast or Explanation) hold between two spans of text is important for many NLP applications such as summarisation or information extraction. While discourse relations are sometimes explicitely signalled by markers like "but" or "because", these markers can be ambiguous and are frequently missing altogether. Hence it is not feasible to rely on discourse markers alone; what is needed is a tool which can identify different discourse relations in the absence of these markers. While it is possible to use machine learning for this task, this strategy normally depends on the availability of manually labelled training data, which is very time-consuming to create. As a way around this problem, Marcu and Echihabi (2002) suggested exploiting the presence of unambiguous discourse markers in some sentences to label data automatically with the correct relation. Once training data has been created in this way, the markers are removed from the sentences and a classifier is trained to determine discourse relations even when no marker is present, based on other linguistic cues such as word co-occurrences.

In this talk, I will present a broad scale empirical study in which we investigated whether (i) discourse relations can in principle be learned from automatically labelled examples and (ii) whether classifiers trained on this type of data generalise to unmarked data. We also explore how training on automatically labelled examples compares to training on a small set of manually labelled, unmarked examples (i.e., examples which naturally occur without a discourse marker).

Reference: Daniel Marcu and Abdessamad Echihabi (2002). An Unsupervised Approach to Recognizing Discourse Relations. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL-2002), Philadelphia, PA, July 7-12.

<< Back

Last modified: Thu, Mar 15, 2007 11:48:06 by