Using Distributed NLP to Bootstrap Semantic Representations from Web Resources


IGK 2004 Project


Proposers: Harry Halpin
Other interested students: Nuria Bertomeu, Yi Zhang
Suggested Lecturers/Guests: Ewan Klein, Hans Uszkoreit,Henry Thompson, Claire Grover, Richard Tobin, Regina Barzilay, Stephen Potter,Johanna Moore
Time constraints:

Description

This project would consist of bootstrapping description logics for web-pages of a domain of the group's choice (The domain could be the coli.uni-sb.de homepages, or the collected web-pages of a Google search) using loosely-coupled Web Service-based NLP components. For those who attended last year, it would be a successor project to proposal sent it by Viktor Tron at the last IGK Summer School. The Web Services approach allows us to bundle NLP components in a distributed manner, and students would be encouraged to set-up a collaborative virtual organization by hosting components on differing computers in both Saarbrucken and Edinburgh. Methods on how to manually increment or automatically prune the results of the bootstrapping procedure should also be explored. Also, techniques on extracting usable text from web-pages have to be used.

The description logics, if successfully retrieved, can be used in a NLP task such as paraphrasing for content navigation though the use of natural language generation techniques. If time constraints allow, this approach can be contrasted to a more shallow approach, such as using Named Entity recognition and multiple-sequence alignment. Possible interfaces between the description logic resource approach and larger-scale ontology efforts such as the Semantic Web, FrameNet and Cyc could be explored. A number of NLP Web-Services will be set-up in Edinburgh beforehand to facilitate faster experimentation, although students should feel free to use whatever methods and tools they want.

An extended description available upon request.

References

Grover, Halpin, Klein, et al. (2004) A Framework for Text Mining Services. UK E-science All-Hands Meeting (pre-print) http://www.ibiblio.org/hhalpin/ahm2004.pdf

Bos, Clark, Steedman, et. al. (2004) Wide-Coverage Semantic Representations from a CCG Parser In Proc. of COLING 2004 http://www.iccs.informatics.ed.ac.uk/~jbos/publications.html

Buitelaar, Olejnik, Hutanu, Schutz, et. al. (2004) Towards Ontology Engineering Based on Linguistic Analysis In: Proceedings of LREC2004, Lissabon, Portugal, May 2004.

Barzilay, Lee (2003) Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment In Proc. of NAACL-HLT 2003 http://www.sls.csail.mit.edu/~regina/ (Some literature references or links for background reading to prepare or judge interest)