Algorithms for Computational Linguistics

Abstract:
The name Computational Linguistics already suggests that this displine comprises two related objects of research: natural language (NL) is studied and operational methods are developed. Both fields are investigated in their own right and divide into various topics. This course introduces a variety of NL phenomena together with appropriate implementations in the programming language Prolog. The topics dealt with are among others Morphology, Finite State Techniques, Syntax, Context Free Grammars, Parsing, and Semantics Construction.

Finite State Automata
In this lecture we will

introduce finite state automata and show how they can be used to recognize and generate (formal) languages,
explain the difference between deterministic and non-deterministic automata,
say something about the limitations of finite state methods,
and finally, we will write a simple Prolog recognizer/generator for finite state automata.

Finite State Parsers and Transducers
In this lecture, we will introduce finite state parsers and transducers. We will

learn what the difference between a parser and a recognizer is,
implement a finite state parser and a finite state transducer in Prolog, and
see an important application of finite state transducers: morphology.

Regular Languages
In this lecture we will

introduce regular expressions, an alternative notation for specifying languages that are accepted by FSAs.
look a bit closer at the relation between regular expressions and FSAs and provide an algorithm for constructing FSAs from regular expressions.
learn about certain nice properties of regular languages (the languages characterized by regular expressions or FSAs) and see how to make use of these properties in an implementation.

Finite State Technologies for Dialogue Processing
To finally close the topic of finite state techniques in Computational Linguistics, we will have a look at another area were they are applied with great success (in addition to morphological processing, see » An Application: Morphology). This is the area of dialogue processing. Again we will see that finite state technology provides simple and efficient tools, but that these tools also have obvious limitations. But then, for a lot of applications simplicity and efficiency clearly make up for many limitations.To give you some background on the topic, we will first discuss a set of general characteristics that a dialogue system should exhibit and give some example dialogues that illustrate why those characteristics are important. In the second part we will have a closer look at how to base dialogue systems on finite state automata (FSA). We will look at a dialogue system for a voice-operated elevator as an example.

Context Free Grammars
This lecture has four goals:

To recapitulate some basic concepts of Context Free Grammars.
To review Prolog's built-in DCG mechanism and to show how Context Free Grammars can directly be formulated as DCGs.
To show that the inbuilt feature-passing mechanism of DCGs enables them to handle long distance dependencies. We first give a rather naive DCG for simple English relative clauses, and then show how the gap-threading technique can be used to improve it.
To discuss the good and bad points of DCGs, and set the stage for our later work with grammars and features.

Parsing: Bottom-Up and Top-Down
This lecture has two parts. In the first part, we

explain the basic idea underlying naive bottom-up parsing strategies for CFGs.
give a simple Prolog implementation of a naive bottom-up parser.
discuss the limits of this work, both implementational, and algorithmic.

In the second part, we

introduce two naive top-down parsing/recognition algorithms, one depth-first, the other breadth-first.
give the top-down depth-first recognition algorithm an efficient implementation in Prolog.
extend this recognizer to a parser.

Bottom-up and top-down parsing are in a way the most basic parsing strategies on the market. The more elaborate algorithms we will see later in this course can all be seen as improved versions of one of them (or a combination of both). Therefore you should make sure that you really understand how the algorithms to be presented now work. Although we sometimes stress their naivite.

Left-Corner Parsing
In this lecture, we will

introduce left-corner parsing which combines bottom up and top down search steps, and
give a naive implementation of a left-corner recognizer.

Passive Chart Parsing
This lecture has three main goals:

To explain the basic idea of chart parsing, and in particular, passive chart parsing.
To give a concrete example of (passive) use of a chart. We will examine a simple bottom-up chart recognition algorithm.
To give a Prolog implementation of the recognition algorithm.

Active Chart Parsing
This lecture consists of two parts. In the first part, we will:

Explain the basic ideas of active chart parsing: active edges, the fundamental rule, and the use of agendas.
Present a simple bottom-up active chart recognition algorithm.
Implement this algorithm in Prolog.

In the second part, we will:

Adapt the general active chart parsing algorithm introduced in the first part to work top-down.
Present an example of the top-down algorithm in action.

Semantic Construction
In this chapter, we make the step from syntax to semantics - the study of meaning. The central question we're going to look at is the following: Given a sentence, how do we get to its meaning? And, being programmers, the next thing that interests us is: How can we automate this process? This is, we're going to look at the task of semantic or ^↗meaning construction . But don't be afraid, our work in the previous lectures wasn't for nothing. One of the first things we're going to see is that syntactic structure plays a crucial role in meaning construction.

Algorithms for Computational Linguistics

Table of Contents