3.1.5 From Regular Expressions to FSAs

Proof, that there is a FSA for every regular expression.

We now prove one direction of the equivalence between regular expressions and FSAs: We will show that there is a FSA for every regular expression. The nice thing about this proof is that it is contructive - it proceeds by showing us what such an automaton will actually look like.

Our proof is by induction over the number of operators (that is, occurences of concatenation, and ) in a regular expression. We first show that there are automata for all possible singleton regular expressions. Then we assume that there are automata for the regular expressions with less than operators, and show that there is one for each regular expression with n operators.

Basis

We first have to look at regular expressions without any operators. This means, we are talking about , or for some . For we can use the following automaton:

Obviously, we can never get to the final state in this automaton, therefore no word is accepted.

For , the following automaton can be given:

In this automaton we are at the final state at the moment we begin our computation (and we cannot get out of it). Therefore the automaton accepts only ``input'' having no symbol (that is: the empty word).

We've already seen (in Section 3.1.2) the pattern of how to construct automata for for any .

Induction

Now let's turn to complex regular expressions, i.e. ones that do contain operators. More precisely, we assume that there are automata for the regular expressions with fewer than operators (with ), and take a regular expression with operators. We have to look at three cases.

Concatenation:
Union:
Kleene Closure:

In all three cases, and (where applicable) have operators. Therefore there are FSAs for them by our induction hypothesis. We now have to build an automaton for on the basis of these given automata. All we have to do is make a few additions according to the additional operator used in . We will basically apply the ideas already seen in Section 3.1.2.

Let's go through the cases one by one. We assume that is an automaton for , corresponds to , and that the states in are disjoint from those in (we can always achieve this by renaming):

Concatenation ():

We construct an automaton that consists of and in series. That is, will accept a word by running on the first part, then jumping to and running it on the second part of the word. will look like this:

The start state of is also the start state of , and the final state of is also the final state of . The final state of and the start state of have become ``normal'' states, but are now connected by a jump arc. Formally (where indices 1 and 2 indicate that an entity originally belongs to or ):

Union ():

We construct an automaton that consists of and in parallel. That is, will accept a word by jumping into either or , running it on the word, and then jumping to a new final state. will look like this:

The new start state is connected by jump arcs to the former start states of and . Thus it serves as the common start state for going through or . Similarly, the former final states of and are now both connected to a new final state by jump arcs. This new final state serves as the common final state for going through or . Formally, the new automaton looks like this:

Kleene Closure (:

We build a new automaton that ``embeds'' . In we can either bypass and jump directly from the start state to the final state, or loop by jumping into , going through , and then jumping back to the (former) start state of . After looping an arbitrary number of times on , we can jump to the final state. Thus accepts (via the bypassing jump arc) as well as all combinations of words that are accepted by (by looping on the -part of ). Here's the picture:

We had to add two states (the new start and final states and ), and four jump edges (one into , one out of , one to bypass it and one to loop). Thus this is the formal characterization of the new automaton :

So we've shown that for any regular expression there is an FSA that characterizes the same language. And in fact we've also shown how to construct such an automaton from any given regular expression. We just have to formulate a recursive procedure based on our inductive proof. We won't give an implementation that builds FSAs based on regular expressions here. But parts of the proof we've just seen will be very useful in Section 3.3, where we implement a tool for performing operations on given FSAs.