Adaptive Language Generation
Software Project Winter 2013/14

Contents

Course Information

Instructors
Dr. Vera Demberg and Dr. Asad Sayeed
Location
C7.4 Aquarium (3rd floor)
Meeting times
Thursdays 14:00-16:00
Contact
vera at/asayeed at coli d0t uni hyph3n saarland d0t de
Course language
English

Goals of the seminar

English below

Das Ziel dieses Software Projektes ist, ein Generierungssystem zu bauen, bei dem man die linguistische Komplexität der generierten Sprache kontrollieren kann. Dies ist wichtig in Bezug auf Dialogsysteme, die in sicherheitskritischen Umgebungen wie z.B. dem Autofahren verwendet werden können.

In einem ersten Schritt werden wir ein schon existierendes datengetriebenes Dialogsystem fuer deutsch (in der vom Kurs gewuenschten Anwendungsdomäne) nachimplementieren, und dabei besonderes Augenmerk darauf legen, dass mehr sowie weniger komplexe Formulierungsalternativen generiert werden koennen. In einem zweiten Schritt kann das Generierungssystem dann nach Interesse der Teilnehmer weiter ausgebaut werden (z.B. durch Zufügung eines Grammatikalitaetsfilters, eines Klassifizierers, der komplexe von weniger komplexen Formulierungen unterscheiden kann, oder durch neue Dialogstrategieen in der Generierung.

Gute Programmierkenntnisse sind erforderlich. Siehe auch die etwas ausfuehrlichere Kursbeschreibung auf Englisch.

Now English...

Spoken dialogue systems are increasingly deployed in real-time and mission-critical environments, including in such common but safety-sensitive tasks as driving a car. In this software project course, we will construct together a language generation system, which can generate utterances at different levels of linguistic complexity. Such an adaptive natural language generation system constitutes a missing but necessary component of a future spoken dialogue system that manages cognitive workload in users. This project is motivated by on-going research here at Saarland University that is finding that linguistic complexity has a relevant effect on driving performance.

The software project is already well-defined and will be based on a state- of-the-art data-driven language generation system (see paper by Mairesse et al., 2010). We will first collect a small corpus of utterances from a target domain (chosen according to interests of course participants). Based on the corpus of target texts, the data-driven generation approach will allow us to generate a large number of alternative formulations for conveying the same message. The software project will heavily build on existing tools like the Graphical Models Toolkit (GMTK; Bilmes and Zweig, 2002) and SRILM (Stolcke et al., 2002). A particular focus of the software project will be to build a system that generates alternative formulations which differ in language complexity and the degree of redundancy (this part of the project is entirely new).

Once we have re-implemented Mairesse's generation system for our domain in German, there are several possible options for how to extend the system; we will make a choice with the course depending on interests and time. The possible options are that

  1. we build a classifier on top of the generation system which can automatically pick out simpler / more redundant formulations from more complex formulations.
  2. we build a grammaticality classifier which will allow us to distinguish between system outputs that are grammatical and those that are ungrammatical.
  3. we explore ways to control information density by varying the parameters in the search procedure of the phrase based generation system.
  4. we introduce additional concepts like "summary of information conveyed so far" and "introduction of new concept" and add them to the search space, in order to control the level of redundancy.

The generation system will be in German. The language of instruction will be either English or German, depending on course participants. Students will come away with an understanding of how to compute the structural complexity of sentences and how to build and evaluate systems that are sensitive to this. Basic programming skills are required. However, students will also acquire practical skills, such as co-ordinating a research project and integrating disparate text-processing systems into a larger processing pipeline.

Mailing List

If you want to participate in the course, you have to sign up for the mailing list.

NEW! We have also established a discussion forum for the course where we can all discuss whatever technical questions you have, coordinate the team(s), etc. Please create an account, and we'll activate it for you.

Requirements and evaluation

Under construction. This is a software project course, so it will involve collectively building a system as described above.

Development environment

Everyone will be given an account on the COLI systems if they don't have one already. We will prepare an SVN repository and set up the relevant tools as the course progresses and provide instructions here and in class.

Calendar

There will be some amount of lecture, based on students' overall background. We'll proceed otherwise with regular meetings as the project progresses.


Date Topic
21.10.2013 Introduction to the project; organizational details (Vera Demberg)
31.10.2013 Introduction to Mairesse et al. (2010) (Asad Sayeed)
7.11.2013 Cognitive workload and dialogue systems (Vera Demberg)


Syllabus/Readings

This will be updated as the course proceeds, but initially, this is the most important reference from the literature:

We will explain it in the initial lectures for the course.


Last updated 13:35 17.10.13