5.1.1 The Basics

Context Free Grammars (CFGs) describe a larger class of formal languages than regular languages do.Many natural language structures can be neatly described with context free grammars. Another reason why CFGs are widely used in the field of NLP is the availability of many efficient tools (compilers, parsers, etc.).

context free grammar s (CFGs) describe a larger class of formal languages than regular languages do. As we will see below, the language can in fact be described by a CFG. Even more, many natural language structures can be neatly described with context free grammars. This makes them especially useful for writing grammars for NLP applications. Another reason why CFGs are widely used in the field of NLP is the availability of many efficient tools (compilers, parsers, etc.).

Here's a simple CFG:

Some Terminology

The is called the rewrite arrow , and the four expressions listed above are called context free rule s (you may also see them called rewrite rule s, or something similar).

Apart from the , there are two sorts of symbols in this grammar: terminal symbol s and non-terminal symbol s. In the above grammar, the terminal symbols are `a' and `b', and the non-terminal symbols are `S', `A', and `B'.

Terminal symbols

never occur to the left of a rewrite arrow.

Non-terminal symbols

may occur either to the right or the left of the rewrite arrow (for example, the `S' in the above grammar occurs both to the right and the left of in the second rule).

Every context free grammar has one special symbol, the start symbol (or sentence symbol ), which is usually written `S'. Some context free grammars also use another special symbol, namely , for the empty string. The symbol can only occur on the right hand side of context free rules. For example, we could add the rule to the above grammar, but we could not add .

Rewrite Interpretation

The simplest interpretation of context free grammars is the rewrite interpretation . This views CFGs as rules which let us rewrite the start symbol to other strings: each rule is interpreted as saying that we can replace an occurrence of the symbol on the left side of the rule by the symbol(s) on the right side.

For example, the above grammar lets us rewrite `S' to `aabb':

S

ASB

Rule 2

aSB

Rule 3

aSb

Rule 4

aABb

Rule 1

aaBb

Rule 3

aabb

Rule 4

Such a sequence is called a derivation of the symbols in the last row. For example, the above sequence is a derivation of the string `aabb'. Note that there may be many derivations of the same string. For example,

S

ASB

Rule 2

ASb

Rule 4

aSb

Rule 3

aABb

Rule 1

aAbb

Rule 4

aabb

Rule 3

is another derivation of `aabb'.

Why are such grammars called `context free'? Because all rules contain only one symbol on the left hand side --- and wherever we see that symbol while doing a derivation, we are free to replace it with the stuff on the right hand side. That is, the `context' in which a symbol on the left hand side of a rule occurs is unimportant --- we can always use the rule to make the rewrite while doing a derivation. 1

The language generated by a context free grammar is the set of terminal symbols that can be derived starting from the start symbol `S'. For example, the above grammar generates the language (the language consisting of all strings consisting of a block of 'a's followed by a block of 'b's of equal length, except the empty string). And if we added the rule to this grammar we would generate the language .

A language is called context free if it is generated by some context free grammar. For example, the language is context free, and so is the language , for we have just seen that these can be produced by CFGs. Not all languages are context free. For example, is not. That is, no matter how hard you try to find CFG rules that generate this language, you won't succeed. No CFG can do the job.


1. There are more complex kinds of grammars, with more than one symbol on the left hand side of the rewrite arrow, in which the symbols to the right and left have to be taken into account before making a rewrite. Such grammars can be linguistically important, but we won't study them in this course.

Kristina Striegnitz, Patrick Blackburn, Katrin Erk, Stephan Walter, Aljoscha Burchardt and Dimitra Tsovaltzi
Version 1.2.5 (20030212)