Computational Linguistics & Phonetics Computational Linguistics & Phonetics Fachrichtung 4.7 Universität des Saarlandes

Computational Linguistics Colloquium

Thursday, 13 June, 16:15, Seminar Room, Building 17

Parameters of parsing complexity of free word-order languages

Vladislav Kubon and Martin Plátek
Center for Computational Linguistics
Charles University, Prague

The languages exhibiting relatively high degree of word-order freedom constitute a very interesting topic for the research concerning parsing complexity of natural languages. In our talk we would like to concentrate on a method allowing to measure the complexity of sentences from one possible point of view. This point of view concerns a very interesting notion relatively very frequent among those languages, the phenomenon of nonprojectivity.

It is quite clear that if there is a chance that a natural language sentence might contain one or more nonprojective constructions (e.g. long distance dependencies etc.), the parsing process faces a challenge. With the growing potential number of nonprojective constructions the parsing complexity (measured as a number of (sub)trees derived in the process of parsing) grows very fast. We have already shown that there are languages (e.g. Czech) allowing for theoretically infinite number of nonprojective constructions in a sentence.

In order to be able to parse sentences containing nonprojective constructions in a consistent and uniform way we have developed a special kind of a grammar, Free-Order Dependency Grammar. It allows to use the same kind of rules for parsing of both projective and non-projective constructions. The desired power for parsing nonprojective constructions is obtained by combination of relaxation and application of constraints on the word-order. This approach allows to gradually increase the power of FODG starting from context-free power to an unlimited number of grades of mildly context-sensitive tools.

Our approach takes advantage from the use of two special data types: DR-trees (delete-rewrite trees), describing the parsing history, and D-trees (dependency trees), describing the syntactic structure of each sentence. Both types of trees allow for the definition of nonprojectivity measures. These measures are then used for the parameterization of the parser.

If you would like to meet with the speakers, please contact Ivana Kruijff-Korbayova