Computational Linguistics Colloquium

Monday June 18, 11:15, Seminar Room, Building 17
NOTE UNUSUAL DAY AND TIME

What is a Summary Good for? Summarization, Citation Indexing, and Task-based Evaluation

Simone Teufel
Department of Computer Science
Columbia Univeristy

Summarization is a relatively new field, and though recently a lot of effort has gone into the development of new sumarization algorithms, little is known about the function of summaries in the human information gathering process. There are many sub-tasks involved in information gathering: search for topic-related or similar documents, detection of relevant passages, skim-reading etc. I will argue that in order to create better summarization algorithms, we should investigate summary use experimentally. I also argue that the task chosen for these experiments plays a crucial role.

I will present a new task-based methodology for summary evalation. I evaluate my own summaries, a new type of extract, which is similar to citation indexes. These new summaries are generated by recognizing document structure. They portray the goal of a scientific article in relation to similar papers. The task my subjects were asked to perform was to guess relations of this article to similar articles.

Results of the experiment show that the task does indeed play a crucial role in deciding which summary works best -- in our case, crude extracts, which contain the right kind of information, outperform smooth, human-written, but generic summaries.

If you would like to meet with the speaker, please contact Frank Keller.