Advanced technologies for information processing systems require as one of their prerequisites computer processing of natural language. One of the main application areas for language technology is document preparation. High-level word processing technology has been developed mainly for English and partly for other Western languages and Japanese, but is almost completely missing for Slavic languages. The objective of this project is to transfer state-of-the-art natural language technology to two of them: Czech (representing Western Slavic) and Bulgarian (Southern Slavic). The research in the project will be almost purely application driven: the practical outcome of the project will be prototypes of grammar checkers for Bulgarian and Czech.
To meet the main goal of the project, the relevant differences between the investigated Slavic languages and the languages for which the processing technologies mentioned already exist (mainly English) will be explored where necessary - first of all as to the differences in the interplay between syntax and word order, where the two groups of languages typologically differ.
The academic partner in Germany will transfer descriptive formalisms and processing methods for free word order languages to the academic partners in Bulgaria and the Czech republic. These formalisms and methods that have already been applied to small grammars of Bulgarian and Czech on an experimental level will then be combined in a joint effort with the linguistic theories developed by the academic partners in Prague and Sofia. This includes an application-driven investigation of the role of free word order in syntax and semantics and of other problems specific to Slavic languages.
The outcome of these efforts will be pilot implementations of grammar checkers of Bulgarian and Czech which will be passed over to the industrial partners in Prague and Sofia. The industrial partners will use these pilot implementations together with existing low-level language technology (dictionaries with morphology and spelling checkers) for the development of prototypes of grammar checkers for Bulgarian and Czech.
Here we summarize the main tasks of each project sites, in particular those which will be coordinated by them. It must be taken into account however that several tasks will involve close collaboration among the partners. For a detailed description of the work packages and the connections among them, please, refer to Section B.6.
The goal of the project is to provide for a state-of-the-art high-level language technology for Slavic languages. In the context of this project, high-level language technology shall refer to technology that exploits syntactic and semantic linguistic knowledge as opposed to low-level technology that is based on the spelling system and dictionaries with morphology.
As feasible applications, grammar checkers for Bulgarian and Czech have been selected. The applications will provide the focus of the proposed research. They will also serve as an evaluation measure for the success of the project. They are finally means for strengthening the connections between the academic research partners and the software industry.
Building a grammar checker as an industrial product would go far beyond the scope of a research project of the size proposed. It would also violate the precompetitive nature of the program. The goal is therefore to provide prototypes that can be used by the commercial partners as a starting point for product development.
Thus, the research in the project will be restricted to tasks that immediately contribute to the prototypes of grammar checkers, though in the long run, high-level language technology includes the adjustment of methods of syntactic and semantic computer processing (parsing as well as generation) to the target languages as a necessary prerequisite for development of industrial systems including also:
For the immediate goals, the project relies on the fact that the necessary low-level word processing technologies (usage of national alphabets, morphological analysis and synthesis, spelling checkers, automatic dictionaries) have already been independently developed for the languages in question by the academic and commercial partners, who, on the other hand, express their firm interest in developing the industrial word processing software systems currently available in the directions mentioned above.
This project is being funded within the PECO framework of the Commission of European Communities, with an overall contribution of the Commission amounting to 429.999,99 ECU. The supervision of the project on the side of the Commission has been assigned to DG XIII in its Brussels headquarters. The responsible project officer is Ms. Josephine Reimann-Pijls, the reviewers are Prof Anna Sagvall-Hein and Prof Gerard Kempen.
Sergio Balari-Ravera
Departament de Filologia Catalana
Facultat de Lletres, Edifici B
Universitat Autònoma de Barcelona
Campus de Bellaterra
08193 Bellaterra (Barcelona)
ilftg@cc.uab.es
Eva Hajicová
Institute of Formal and Applied Linguistics,
Faculty of Mathematics and Physics, Charles University,
Malostranske nam. 25,
CZ-118 00 Praha 1 - Mala Strana
hajicova@ufal.mff.cuni.cz
Gerard Kempen
University of Leiden
Cognitive Psychology
Pieter de la Court Building
Postbus 9555
NL-2300 Leiden
kempen@rulfsw.leidenuniv.nl
Pavel A.C. Novak
Macron Ltd.
Nad Petruskou 1
CZ-120 00 Praha 2 - Vinohrady
AC@macron.cz
Iordan Penchev
Institute of Bulgarian Language
Bulgarian Academy of Sciences
acad. G. Bonchev St. bl. 25A
BG1113 - Sofia
Bulgaria
jpen@bgearn.bitnet
Josephine Reimann-Pijls
EC Brussels
BU31 2/58
200, Rue de la Loi
B-1049 Bruxelles
jpi@dg13.cec.be
Anna Sagvall-Hein
Dept. of Linguistics
University Uppsala
BGX 513
S-75120 Uppsala
UDUAS@mvs.udac.uu.se
Nikolai Savov
Bulgarian Business Systems Ltd.
Dragan Tsankov Blvd. 36
BG - 1057 Sofia
BULG.GM@AppleLink.Apple.COM
Hans Uszkoreit
Computational Linguistics
University of Saarland
P.O. Box 15 11 50
D-66041 Saarbrücken
Germany
uszkoreit@coli.uni-sb.de