Adaptive Language Generation -- Software Project 2013

Adaptive Language Generation
Software Project Winter 2013/14

Course information
Goals
Mailing list and discussion forum (NEW!)
Requirements and evaluation
Development environment
Calendar
Syllabus/readings

Das Ziel dieses Software Projektes ist, ein Generierungssystem zu bauen, bei dem man die linguistische Komplexität der generierten Sprache kontrollieren kann. Dies ist wichtig in Bezug auf Dialogsysteme, die in sicherheitskritischen Umgebungen wie z.B. dem Autofahren verwendet werden können.

In einem ersten Schritt werden wir ein schon existierendes datengetriebenes Dialogsystem fuer deutsch (in der vom Kurs gewuenschten Anwendungsdomäne) nachimplementieren, und dabei besonderes Augenmerk darauf legen, dass mehr sowie weniger komplexe Formulierungsalternativen generiert werden koennen. In einem zweiten Schritt kann das Generierungssystem dann nach Interesse der Teilnehmer weiter ausgebaut werden (z.B. durch Zufügung eines Grammatikalitaetsfilters, eines Klassifizierers, der komplexe von weniger komplexen Formulierungen unterscheiden kann, oder durch neue Dialogstrategieen in der Generierung.

Gute Programmierkenntnisse sind erforderlich. Siehe auch die etwas ausfuehrlichere Kursbeschreibung auf Englisch.

Now English...

Spoken dialogue systems are increasingly deployed in real-time and mission-critical environments, including in such common but safety-sensitive tasks as driving a car. In this software project course, we will construct together a language generation system, which can generate utterances at different levels of linguistic complexity. Such an adaptive natural language generation system constitutes a missing but necessary component of a future spoken dialogue system that manages cognitive workload in users. This project is motivated by on-going research here at Saarland University that is finding that linguistic complexity has a relevant effect on driving performance.

The software project is already well-defined and will be based on a state- of-the-art data-driven language generation system (see paper by Mairesse et al., 2010). We will first collect a small corpus of utterances from a target domain (chosen according to interests of course participants). Based on the corpus of target texts, the data-driven generation approach will allow us to generate a large number of alternative formulations for conveying the same message. The software project will heavily build on existing tools like the Graphical Models Toolkit (GMTK; Bilmes and Zweig, 2002) and SRILM (Stolcke et al., 2002). A particular focus of the software project will be to build a system that generates alternative formulations which differ in language complexity and the degree of redundancy (this part of the project is entirely new).

Once we have re-implemented Mairesse's generation system for our domain in German, there are several possible options for how to extend the system; we will make a choice with the course depending on interests and time. The possible options are that

we build a classifier on top of the generation system which can automatically pick out simpler / more redundant formulations from more complex formulations.
we build a grammaticality classifier which will allow us to distinguish between system outputs that are grammatical and those that are ungrammatical.
we explore ways to control information density by varying the parameters in the search procedure of the phrase based generation system.
we introduce additional concepts like "summary of information conveyed so far" and "introduction of new concept" and add them to the search space, in order to control the level of redundancy.

The generation system will be in German. The language of instruction will be either English or German, depending on course participants. Students will come away with an understanding of how to compute the structural complexity of sentences and how to build and evaluate systems that are sensitive to this. Basic programming skills are required. However, students will also acquire practical skills, such as co-ordinating a research project and integrating disparate text-processing systems into a larger processing pipeline.

Date	Topic
21.10.2013	Introduction to the project; organizational details (Vera Demberg)
31.10.2013	Introduction to Mairesse et al. (2010) (Asad Sayeed)
7.11.2013	Cognitive workload and dialogue systems (Vera Demberg)

Syllabus/Readings

This will be updated as the course proceeds, but initially, this is the most important reference from the literature:

F. Mairesse, M. Gasic, F. Jurcicek, S. Keizer, B. Thomson, K. Yu and S. Young. Phrase-based Statistical Language Generation using Graphical Models and Active Learning. ACL 2010, Uppsala, Sweden. Data is available here.

We will explain it in the initial lectures for the course.

Last updated 13:35 17.10.13

Adaptive Language Generation
Software Project Winter 2013/14

Contents

Course Information

Goals of the seminar

Mailing List

Requirements and evaluation

Development environment

Calendar

Syllabus/Readings

Adaptive Language GenerationSoftware Project Winter 2013/14

Contents

Course Information

Goals of the seminar

Mailing List

Requirements and evaluation

Development environment

Calendar

Syllabus/Readings

Adaptive Language Generation
Software Project Winter 2013/14