COLLATE
	Research
		corpus annotation
		dialogue system
		information extraction
		information management
	People
	Publications
	Contact
	Funding

COLLATE (UdS)

Computational Linguistics and Language Technology for Real Life Applications

Research

Web Corpora for Information Management

We investigate the use of corpora with connectivity information (hyperlinks) for information management applications in specific domains. We will build up a web corpus for the language technology domain, which consists of a database of documents (with full-text index and meta-information) and a database of hyperlinks between documents. As a starting point for collection of the web corpus, we use the database of categorised web pages from LT-World. Information management applications include summarisation, categorisation, clustering, information extraction (discovery of relations), information retrieval, terminology extraction, and definition mining.

last change: 18th September 2002 by bering@coli.uni-sb.de