Finite State Parsers and Transducers

Abstract:
In this lecture, we will introduce finite state parsers and transducers. We will
  1. learn what the difference between a parser and a recognizer is,
  2. implement a finite state parser and a finite state transducer in Prolog, and
  3. see an important application of finite state transducers: morphology.

Table of Contents

Building Structure while Recognizing
In the previous chapter, we learned that finite state recognizers are machines that tell us whether a given input is accepted by some finite state automaton. We can give a word to a recognizer and the recognizer will say ``yes'' or ``no''. But often that's not enough: in addition to knowing that something is accepted by a certain FSA, we would like to have an explanation of why it was accepted. A finite state parser gives us that kind of explanation by returning the sequence of transitions that was made.This distinction between recognizer s and parser s is a standard one: Recognizers just say ``yes'' or ``no'' while parsers also give an analysis of the input. It does not only apply to finite state machines, but also to all kinds of machines that check whether some input belongs to a language, and we will make use of it throughout the course.

Finite State Transducers
We will now introduce finite state transducer s (or FSTs); another finite state machine that allows to produce output recording the structure of the input.

An Application: Morphology
Morphology is about the inner structure of words. It is interested in what are the smallest units in word that bear some meaning and how can they be combined to form words.

The Complete Programs
Code Summary for this Chapter.

Practical Session
Practical session.


Exercise

  1. Answer ``true'' or ``false'' to the following questions:
    1. It is possible to write a finite state automaton (FSA) to generate/recognize any formal language.
    2. It is not possible to write an FSA that generates/recognizes the formal language anbn.
    3. It is not possible to write an FSA that generates/recognizes the formal language anbm.
    4. We can generate more languages using non-deterministic finite automata than we can using deterministic finite automata.
    5. A finite state transducer (FST) is essentially an FSA that works with 2 (or more) tapes.
  2. Write an FST that transduces between strings of as (of length m), and strings consisting of a c, followed by mbs, followed by another c. For example, your FST should transduce between aaa and cbbbc. Write down your FST in both graphical and Prolog notation.
  3. Write an FST that, given a string of as, bs, and cs, deletes all the as, turns the bs to ds, and leaves the cs alone. For example, given the input abbaabcbca, your FST should transduce return dddcdc. Write down your FST in both graphical and Prolog notation.
  4. Design an FST that transduces number names in written form between two languages you know (for all numbers from 0 to 100). For example, if you write a French_English FST, this should transduce between `ninety four' and `quatre vingt quatorze', and if you write a Dutch_German FST, this should transduce between `een en twintig' and `ein und zwanzig'. You have seen a similar transducer in » What are Finite State Transducers?.
  5. In Lecture 1, we claimed that the formal language anbn was relevant to natural language, because natural languages contain various ``balanced'' syntactic constructions. Give an example from some natural language you are familiar with (German, Irish, Russian, French, English, Japanese, Chinese, Tagalog,...). Make sure you explain why your answer is relevant.