Russian Resource Grammar 20110615

This is the first public release of the Russian Resource Grammar
(RRG), an HPSG for Russian developed at the DFKI Language Technology
Lab and Saarland University, Saarbrücken, Germany since 2009. The main
developers of the grammar are Tania Avgustinova and Yi Zhang.  The
development was partially supported by the DFG funded Cluster of
Excellence on "Multimodal Computing and Interaction" and Saarland
University.

The grammar is released under the LGPL-LR, a version of the LGPL
adapted for linguistic resources. See license.txt for details.

The grammar aims for broad coverage and precise analysis of Russian
mainly in the framework of HPSG, but also incorporating the general
cross-Slavic view as shown by the three-layer design: matrix.tdl,
slavic.tdl and russian.tdl.

Relevant publications to this release of the grammar are:

T.Avgustinova and Y.Zhang (2010) Conversion of a Russian dependency
treebank into HPSG derivations. Proceedings of the 9th International
Workshop on Treebanks and Linguistic Theories (TLT'9), Tartu, Estonia

T.Avgustinova and Y.Zhang (2009) Exploiting the Russian National
Corpus in the Development of a Russian Resource Grammar. Adaptation of
Language Resources and Technology to New Domains at the RANLP 2009
Conference, Borovets, Bulgaria

T.Avgustinova and Y.Zhang (2009) Parallel Grammar Engineering for
Slavic Languages. Workshop on Grammar Engineering Across Frameworks at
the ACL/IJCNLP 2009 Conference, Singapore


The grammar uses UTF-8 and Cyrillics for the lexicon encoding. The LUI
interface cannot display Cyrillic fonts properly. Use
(lkb::lui-shutdown) to switch back to the LKB CLIM interface.

The handling of morphology is currently achieved with an external
morphological analyzer (mystem,
http://company.yandex.ru/technology/mystem/) interfaced to LKB through
SPPP (http://wiki.delph-in.net/moin/LkbSppp).  A perl integration
script is developed to send the surface words to mystem, map the
morphological tags to the corresponding inflectional rules, and
package the results in the SPPP XML format. The script also handles
the necessary conversion of encoding. Mystem is using windows-1251
instead of UTF-8 for its I/O. The mapping rules between the mystem
tags and the inflectional lexical rules are defined in the file
mystem/mystem.mapping.

The integration works fine for both interactive parsing with LKB and
the profile processing in TSDB. However, due to the mono-directional
nature of mystem and SPPP mechanism, we can not generate inflected
words at the moment.

In the long run, we hope to replace the module to allow for
bidirectional processing. But in the meantime, mystem serves as
interim solution.

In using the mystem, we have noticed several problems with this
external morphological analyzer. For instance, numerous wrong analyses
are generated for nominative pronouns я(I-nom) and он(HE-nom).

For debugging purpose, one can print the variable
lkb::*sppp-input-buffer* from the LKB/CL prompt to see the actual
mystem analysis.



Last Modified: 2011-06-17
