Co-Reference meets Toponym Resolution meets Active Learning


IGK 2004 Project


Proposers: Jochen Leidner, Olga Uryupina, and Markus Becker
Other interested students:
Suggested Lecturers/Guests:
Time constraints:

Description

Two toponyms (place-names) or place-name descriptions can either refer to the same or different locations:

		"London" -[?]->	{	London, UK ;
					London > Ontario > Canada ;
					...	}

		"US"	=[?]=	"U.S.A."

We propose to investigate the task of identifying coreference relationships between toponyms across texts (multi-document fusion for coreference resolution of toponyms).

Using a new gold-standard corpus (Leidner 2004), we want to explore new methods for classification that discriminate between same and different referents, taking linguistic, spatial, and graph-based features into account. Clustering methods will also be expected to yield insights in distributional contexts that help with the resolution.

Furthermore, we will employ Active Learning for this task. Active Learning is a meta learning technique that promises to reduce the amount of manually labeled data by selecting only those data points for annotation that look particular interesting or useful. In simulation experiments over the gold-standard corpus we will explore the suitability for this task of different sample selection strategies such as Uncertainty Sampling and Query by Committee.

References

Fleischman, M. B. and Hovy, E. Multi-Document Person Name Resolution. 42nd Annual Meeting of the Association for Computational Linguistics, Reference Resolution Workshop, Barcelona, Spain. July 2004.

Fleischman, M. and Hovy, E. Fine Grained Classification of Named Entities. 19th International Conference on Computational Linguistics (COLING). Taipei, Taiwan, 2002.

Fleischman, Michael. Automated Subcategorization of Named Entities. 39th Annual Meeting of the Association for Computational Linguistics, Student Research Workshop, Toulouse, France July 2001.

Leidner, Jochen L. (to appear). Towards a Reference Corpus for Automatic Toponym Resolution Evaluation. Proceedings of the Workshop on Geographic Information Retrieval held at the 27th Annual International ACM SIGIR Conference (SIGIR 2004), Sheffield, UK.

Li, Xin, Paul Morie and Dan Roth (2004). Robust Reading: Identification and Tracing of Ambiguous Names Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004

Ng, Vincent and Claire Cardie (2002). Combining Sample Selection and Error-Driven Pruning for Machine Learning of Coreference Rules. Proceedins of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP).

Uryupina, Olga. Semi-supervised learning of Geographical gazetteers from the Internet. To appear in Proceedings of the HLT-NAACL Workshop on the Analysis of Geographic References, Edmonton, 2003.