6.6.1 The Code

An easy solution.

It is easy to implement a top-down depth-first recognizer in Prolog --- for this is the strategy Prolog itself uses in its search. Actually, it's not hard to implement a top-down breadth-first recognizer in Prolog either, though we are not going to discuss how to do that. As we said earlier, this implementation will be far better than that used in the naive bottom-up recognizer. This is not because because top-down algorithms are better than bottom-up ones, but simply because we are not going to use append/3. Instead we'll use difference lists.

Here's the main predicate, recognize_topdown/3. Note the operator declaration (we want to use our ---> notation we introduced last week).

:- op(700,xfx,--->).

recognize_topdown(Category,[Word|Reststring],Reststring) :-
        /*** Uncomment the following lines to see the steps the top
             down parser takes ***/
        %%% nl, write('String to recognize: '), write([Word|Reststring]),  
        %%% nl, write('Category to recognize: '), write(Category),
        lex(Word,Category).

recognize_topdown(Category,String,Reststring) :-
        Category ---> RHS,
        /*** Uncomment the following lines to see which steps the
             recognizer takes. ***/
        %%% nl, write('Rule: '), write((Category ---> RHS)),
        matches(RHS,String,Reststring).

Here Category is the category we want to recognize (s, np, vp, and so on). The second and third argument are a difference list representation of the string we are working with (you might find it helpful to think of the second argument as a pointer to the front of the list and the third argument Reststring as a pointer to the rest of the list).

The first clause deals with the case that Category is a preterminal that matches the category of the next word on the input string. That is: we've got a match and can remove Word from the string that is to be recognized.

The second clause deals with phrase structure rules. Note that we are using the CFG rules right-to-left: Category will be instantiated with something, so we look for rules with Category as a left-hand-side, and then we try to match the right-hand-side of these rules (that is, RHS) with the string.

Now for matches/3, the predicate which does all the work:

matches([],String,String).

matches([Category|Categories],String,RestString) :-
        recognize_topdown(Category,String,String1),
        matches(Categories,String1,RestString).

The first clause handles an empty list of symbols to be recognized. The string is returned unchanged. The second clause lets us match a non-empty list against the difference list. This works as follows. We want to see if String begins with strings belonging to the categories

[Category|Categories]

leaving behind RestString. So we see if String starts with a substring of category Category (the first item on the list). Then we recursively call matches to see whether what's left over (String1) starts with substrings belonging to the categories Categories leaving behind RestString. This is classic difference list code.

Finally, we can wrap this up in a driver predicate:

recognize_topdown(String) :-
        recognize_topdown(s,String,[]).

Now we're ready to play. We shall make use of the ourEng.pl grammar that we worked with before.


Kristina Striegnitz, Patrick Blackburn, Katrin Erk, Stephan Walter, Aljoscha Burchardt and Dimitra Tsovaltzi
Version 1.2.5 (20030212)