Finite State Automata

Abstract:
In this lecture we will

introduce finite state automata and show how they can be used to recognize and generate (formal) languages,
explain the difference between deterministic and non-deterministic automata,
say something about the limitations of finite state methods,
and finally, we will write a simple Prolog recognizer/generator for finite state automata.

Finite State Recognizers and Generators
This Section discusses what finite state automata are, what they can do, and their limitations.

Some Examples
More examples of Finite State Automata.

Deterministic vs. Non-deterministic FSAs
Deterministic vs. Non-deterministic FSAs.

FSAs in Prolog
Now we will see how to implement FSAs in Prolog. This is actually a misleading way to describe what we are going to do. For, although we have been talking about FSAs as machines, we are going to treat them as passive data structures that are manipulated by other programs.Separating declarative and procedural information is often a good idea. In many cases, it makes it easier to understand what is going on in a program, to maintain a program, and to reuse information. In the chapter on Context Free Grammars (» Context Free Grammars), we will learn more about why the machine oriented view is unattractive.

Finite State Methods in Computational Linguistics and their Limitations
FSAs have certain expressive weaknesses. Many linguistic phenomena can only be described by languages which cannot be generated by FSAs.

The Prolog Files
File listing of all Prolog files needed in this Chapter.

Practical Session
Make sure that you understand how the Prolog programs that we saw in this section work.

Exercises and Solutions
Practise!

Further Reading
Bibliographical references for this Chapter.

Exercise

Now, you are able to write your own automata. You can then use the recognize and generate predicates to test them. So, here is a task which is a bit more interesting: Write an FSA (First, as a graph and, then, in Prolog notation!) that recognizes English noun phrases constructed from the following words: a, the, witch, wizard, Harry, Ron, Hermione, broomstick, brave, fast, with, rat. E.g.
- the brave wizard
- the fast broomstick
- the wizard with the rat
- Ron
- ...

In the lecture, we represented a1 in Prolog as follows:


start(a1,1).


final(a1,4).


trans(a1,1,2,h).


trans(a1,2,3,a).


trans(a1,3,4,!).


trans(a1,3,2,h).

But we could also have represented it like this:


start(a1,1).


final(a1,4).


trans(a1,1,2,h).


trans(a1,2,3,a).


trans(a1,3,2,h).


trans(a1,3,4,!).

Does it make any difference which way we represent it?

Here's the code for recognize1 given in the lecture:


recognize(A,Node,[]) :-

    final(A,Node).

recognize(A,Node_1,String) :-

    trans(A,Node_1,Node_2,Label),

    traverse(Label,String,NewString),

    recognize(A,Node_2,NewString).

Suppose we changed it to this:


recognize(A,Node,[]) :-

    final(A,Node),!.

recognize(A,Node_1,String) :-

    trans(A,Node_1,Node_2,Label),

    traverse(Label,String,NewString),!,

    recognize(A,Node_2,NewString).

What effect would this change have? What sort of FSAs would not be affected by the change?

Write FSAs that generate the following languages (where e.g. by am we mean the result of repeating a m-times):
1. ambn, where m>3,n>2
2. amclbn, where m>1,l=2,n≥3

Finite State Automata

Table of Contents

Exercise