Stochastic realization with tree-adjoining grammars

The last step of the generation algorithm implemented in the game, surface realization, has the task of computing a string that can be shown to the user from a flat semantic representation. The current implementation does this by converting the semantics and the TAG generation grammar into a dependency grammar, and parsing an abstract "sentence" with this grammar.

This works pretty well already. However, a big problem is that the generation grammar typically allows many different verbalizations of the same semantics. The system currently picks one at random, but this is not necessarily the "best" verbalization by any measure.

In this project, we want to enrich this realization algorithm with a stochastic component that allows for such a selection of a "best" verbalization. For the stochastic model, we want to explore ideas by Bangalore and Rambow (2000), who use an XTAG-annotated treebank in a system that computes best surface strings from partially specified derivation trees. On the computational side, we want to build upon the current implementation of the realizer as a dependency parser (Koller and Striegnitz, 2002). The stochastic data could be used as an oracle that guides the search that takes place during parsing, roughly following unpublished work by Thorsten Brants and Denys Duchier.

There are some obvious ideas on how all these approaches could be combined. We will have to work out whether these ideas really work, and if not, revise them. We can also (hopefully) try to work with the XTAG treebank (if we can get it), and maybe get pretty close to an actual implementation of the whole thing.

We assume some basic familiarity with the concepts of dependency grammar and tree-adjoining grammars, but not technical expertise on any specific incarnation of either. If you can read the papers cited below without feeling terribly intimidated, we'd be delighted to have you in our group! Some additional expertise on stochastic models would be particularly welcome.

Literature:

Srinivas Bangalore and Owen Rambow (2000). Using TAG, a Tree Model, and a Language Model for Generation. In Proceedings of the Fifth Workshop on Tree Adjoining Grammars (TAG+ 5), Paris, France. http://www.research.att.com/~srini/Papers/Generation/TAG+2000.ps

Alexander Koller and Kristina Striegnitz (2002). Generation as Dependency Parsing. In Proceedings of the 40th ACL, Philadelphia, USA. https://www.coli.uni-saarland.de/~koller/papers/gen-dg.html

Contact:

Alexander Koller koller@coli.uni-sb.de