All natural language applications need to have some knowledge of how language operates, usually represented in a grammar. A computationally usable grammar states facts about a language in a grammar formalism - a kind of programming language specifically designed for linguistic information. It takes several years of well-trained specialists' labour to develop a grammar that is usable for applications.
Although there has been much progress in grammar formalisms in recent years, there is still a gap between the descriptions linguists use and the expressive means that a grammar formalism offers, so that linguistic concepts must be painstakingly encoded in the formalism. This makes it hard to re-use an existing grammar, since it is almost impossible to adapt it to new requirements.
The aim of the LRE-61-061 project "The Reusability of Grammatical Resources" is to reduce the gap between linguistic descriptions and computationally usable grammars by enriching the grammar formalism, drawing on the latest developments in computational linguistics and logic programming languages.
The implementation, including documentation, will be made freely available for use to the European scientific and business communities.
The project takes the Advanced Linguistic Engineering Platform (ALEP) as its starting point. ALEP is a state-of-the-art feature and unification-based linguistic formalism, which offers advanced text handling and version management facilities, as well as a configurable environment for developing and debugging grammars. ALEP was designed by the European Community project ET6/1 and implemented under the project ET9.
RGR aims to provide extensions to ALEP which support descriptions of natural languages as they are used in current linguistic theories. The project is divided into four phases, which largely follow the standard model for software research and development.
In the first phase, the project surveyed the datatypes that are being used in the most important current linguistic theories, namely Head-Driven Phrase Structure Grammar, Lexical Functional Grammar, Government-Binding Theory and Categorial Grammar.
The following datatypes were selected because they enjoy widespread use in linguistic descriptions, and because they offer additional expressive power over what is available in current formalisms.
In the second phase the main focus of the work was on formalisation of the above datatypes and operations and on the exact specification of the extensions. Some of the extensions have been implemented in this phase.
The major goal of this phase is the implementation by July 1994 of the extensions by according to the specifications.
Testing, evaluation and integration of the extensions into ALEP, as well as documentation are the main tasks for this last phase. The final version of the extensions will be released at the end of this phase. The implemented extensions will be presented at an end-of-project workshop.
The project is supported by the Commission of the European Communities, as part of the Third RTD Framework "Telematic Systems in Areas of General Interest," area 6 "Linguistic Research and Engineering."
January 1993 to January 1995.
The results of each phase are reported in deliverables, and the progress is regularly evaluated by researchers from other institutions.
Deliverable A: Selection of Datatypes. July 1993
Deliverable B: Specification of Datatypes. January 1994
Deliverable C will be available in July 1994, and deliverable D together with the extensions developed in the project in January 1995.
The results are also presented at major international conferences and specialised workshops.
A description of the project has appeared in Elsnews 3.1 (1994), the newsletter of the European Network in Language and Speech.
The project involves researchers from four leading research centres in Natural Language Processing:
Herbert Ruessink Stichting Taaltechnologie Trans 10 NL-3512 JK Utrecht tel.: +31-(0)30-536369 fax : +31-(0)30-536000email: Herbert.Ruessink@let.ruu.nl