Computational Linguistics & Phonetics Computational Linguistics & Phonetics Fachrichtung 4.7 Universität des Saarlandes

Software Projekt "A Java Architecture for Knowledge-Base Population"

The goal of this software project is to develop a software architecture for automatic extraction of knowledge-base relations from free text. This is a challenging task both from the modeling and the design perspective, and in this project we want to aim at an architecture that is maximally modular, robust, and extensible.
This task requires the combination of several techniques and tools that are fundamental for many applications in natural language processing: 1) indexing big collections of text for fast retrieval, 2) tagging, parsing and feature extraction, 3) using state-of-the-art machine learning techniques for prediction. A main focus will lie on good usability of the resulting tool and a configurable workflow management of its components. We want to aim at good coding standards, and the result of this projects should result either in published open source software or a web-service.
Techniques and tools the participants will become familiar during the project will include Google Protocol Buffers, Lucene, the Stanford NLP tools, Weka or RapidMiner, and the Freebase knowledge base. The requirement for bachelor students is the successful participation in Java 2. Master students must have good programming skills in Java for participating in this course.

Students interested in this course please contact
Benjamin.Roth@... or Michael.Wiegand@...
where ... is replaced by

The initial meeting is on Thurday, October 27, 9:00 am in room U 15. Students interested in participating should drop an email to the lecturers until Monday, October 24.

Intoductory Literature:

  • H. Ji and R. Grishman. Knowledge base population: successful approaches and challenges. ACL 2011.
  • M. Mintz, S. Bills, R. Snow and D. Jurafsky. Distant supervision for relation extraction without labeled data. ACL 2009.