ENVGRAM
COMPUTATIONAL ENVIRONMENTS FOR PRACTICAL GRAMMAR DEVELOPMENT,
PROCESSING AND INTEGRATION WITH OTHER NLP MODULES

Madrid, Spain, July 11 or 12, 1997 (in conjunction with ACL-97/EACL-97)

WORKSHOP DESCRIPTION

With a growing number of NLP applications going beyond the status of simple research systems, there is also a more evident need for better methods, tools and environments to support the development and reuse of large scale linguistic resources and efficient processors. This new area of research, often referred to as Linguistic Engineering, is rapidly gaining interest along side the more traditional ones concerned with formalisms or algorithm studies and development.

Aspects of linguistic engineering range from grammar development environments, through the construction and maintenance of large scale linguistic resources, to methodologies for quality assurance and evaluation. Some of the most prominent examples of sophisticated development platforms comprising tracer, debugger and all kinds of highly important visualization tools are ALEP (funded by the European Union), GATE (common infrastructure for building LE architectures using pre-existing components), GWB (LFG-workbench developed at Xerox Parc) PAGE (typed feature logics-based grammar development developed at DFKI), and many others. There have been a number of projects on the development of large-scale computational lexicons (e.g. Acquilex), as well as projects concerned with the development of standards and reference data for diagnostics and evaluation (e.g. TSNLP).

However, while these platforms and components typically provide fairly clean formalisms, processing components and data, it is not yet clear to which extent current results and approaches fit the requirements for scale development and deployment of real NLP applications.

In this connection, a number of pending issues need be addressed, the relevance of which becomes particularly clear when the focus is shifted from linguistic formalism to usability and user/application requirements. The following points are examples of relevant topics:

- What is the state of the art in Grammar Development Environments?

There are a number of systems on the market already. Given the enormous cost of developing such environments, it is unlikely that many others will be developed from scratch. Up to what point do the existing systems meet actual user requirements? What experiences are there in tailoring such systems to specific applications?

- How can we meet the demands arising from distributed grammar development?

Even if in the past the biggest systems have been based on the work of one individual, it is unwise and unpractical to have one large grammar developed by single writers. Thus, the development and maintenance of large grammars tends to be more and more a joint effort involving many computational linguists. What specific requirements and prerequisites have to be met in a development environment to ensure a smooth cooperation between different authors leading to the necessary modularity, consistency and integratability of grammar fragments?

- How can we meet the demands of multi-lingual grammar development?

For many applications (even outside machine translation itself) multi-linguality is becoming an indispensable standard feature. The parallel development of several grammars in different languages will require some synchronization of linguistic knowledge bases and sharing of processing components. Can different language specific grammars share a common core grammar? Is it useful to build on modern formalisms which allow an object oriented design (such as typed feature logics) or even on theories of a putative "universal grammar".

- What is the appropriate division of labour in a large scale development environment?

Sophisticated applications may require a whole range of knowledge sources and processors, addressing, e.g. computational morphology, syntax, semantics, lexicography, corpus analysis, parsing and generation to name but a few. What approaches and methods can be devised and which tools and facilities should be employed to facilitate and support the integration of different levels of linguistic abstraction, of different processing modules and the cooperation between grammar writing and processor design ?

- How can we facilitate the shift from reusability to usability?

Grammar development in academic and research oriented environments has often concentrated on the maximum generality and reusability of the linguistic resources developed. However, for building actual applications and for applying systems to specific domains, this generality can turn out to be a drawback rather than an asset. Thus, the question is how one can support the specialization and customization to more constrained domains without sacrificing the advantages of more a more general and reusable design.

- What are the necessary ingredients for quality assurance in grammar development?

The incremental construction of large grammars in particular in a distributed environment makes it necessary to maintain sufficient control over different versions. Coverage and speed are expected to increase over the development cycles. Quality assurance, testing and diagnostics cannot be carried out properly, if they are based on the odd collection of test items or some arbitrarily chosen corpus fragment. Evaluation of a system, which goes even further, will require a minimum degree of standardization of reference material. What are then the appropriate methods and data to be applied for these purposes? How can they be constructed, collected and customized to specific applications and domains?

The workshop will be the occasion to discuss the results achieved and the most promising directions and to highlight pending problems. Contributions are solicited from institutions (both research-oriented and industrial) involved in the production of NLP applications.

Invited Speaker

Hans Uszkoreit (DFKI) "Reference Data and Grammar Development Environments"

ORGANIZING COMMITTEE

Fabio Pianesi (Primary Contact), IRST, Italy (pianesi@irst.itc.it)
Dominique Estival, University of Melbourne, Australia (D.Estival@linguistics.unimelb.edu.au)
Alberto Lavelli, IRST, Italy (lavelli@irst.itc.it)
Klaus Netter, DFKI, Germany (netter@dfki.uni-sb.de)

PROGRAMME COMMITTEE

Harry Bunt, Tilburg University, The Netherlands Bob Carpenter, Lucent Technologies Bell Labs, USA Jochen Dorre, University of Stuttgart, Germany Dominique Estival, University of Melbourne, Australia Dan Flickinger, CSLI Stanford, USA Klaus Netter, DFKI, Germany Fabio Pianesi, IRST, Italy Steven Pulman, SRI Cambridge, UK Antonio Sanfilippo, Sharp, UK

PROGRAMME CHAIRS

Klaus Netter, DFKI, Germany Fabio Pianesi, IRST, Italy