Computational Linguistics & Phonetics Computational Linguistics & Phonetics Fachrichtung 4.7 Universität des Saarlandes
ESSLLI'04: Modeling Information Structure for Computational Discourse Processing
[ESSLLI
2004] ESSLLI 2004 - The 16th European Summer School
in Logic, Language and Information

Modeling Information Structure for Computational Discourse Processing

Ivana Kruijff-Korbayová

Advanced course, Week 2, 11.00-12.30 (Slot 2)

[introduction][course overview and literature][exercises][additional links]

Introduction

The goal of this course is to help students and researchers to orient themselves in the large amount of literature on information structure, by providing them with a basic understanding of the various approaches and the notions they work with, and by giving a survey of existing attempts at formalizing IS and employing it in computational discourse modeling.

IS concerns utterance-internal structural and semantic properties reflecting the relation of an utterance to the discourse context. Among the dichotomies used to describe IS are Theme-Rheme, Topic-Comment, Topic-Focus, Backround-Focus, Given-New and Contextually Bound-Nonbound.

Languages differ in how they employ various means of IS realization, such as intonation, word order, syntactic constructions or morphological marking. Modelling these phenomena and their interaction in the grammar requires understanding IS and its role in discourse. IS is therefore an important aspect of meaning at the interface between utterance and discourse, that computational models of discourse processing should take into account.

Studying and modelling the interaction of discourse and IS is made difficult by proliferating and often under-formalized terminologies, especially for IS. What is needed is further systematization of the diverse terminologies, formalization, and empirical and corpus-based studies.

For a more detailed introduction, see Course Reader: Chapter 1: Introduction.


Course Overview and Literature

For a more detailed list of references on Information Structure, see Course Reader: References.

  • Lecture 1: "Information Structure as an Inherent Aspect of Sentence Meaning".
    Motivation for IS-sensitivity in discourse and dialog processing. Introduction of basic notions of IS-partitioning. The question test for IS. IS realization means. IS semantics. Meaning differences due to IS. IS-sensitive context update.
    Slides: [PDF]
    Literature::
    Course Reader: Chapter 1: Introduction and Section: 2.1: Two Dimensions of IS.
    Kruijff-Korbayova and Steedman: Discourse and information structure. JOLLI. 2003. [PDF (prepublication version)]
    Hajicova: Issues of sentence structure and discourse patterns. Chapter 2. 1993. [Available in the course reader][scans]
    Krifka: Focus and presupposition in dynamic semantics. 1993. [Available in the course reader.][scans]
    Kruijff-Korbayova and Webber: Information Structure and the Semantics of "otherwise". ESSLLI workshop 2001. [PDF]
    Vallduvi and Vilkuna: On rheme and kontrast. Syntax and Semantics, Vol. 29. 1998.
    Vallduvi and Engdahl: The linguistic realization of information packaging. Linguistics. 1996.

  • Lecture 2: "The Praguian Topic-Focus Articulation. Givenness/Familiarity/Salience. Application of IS to Salience Modeling in Analysis and Generation."
    IS partitioning based on the Prague School approach: Topic-Focus Articulation (TFA). Basic notions: dependency-based linguistic meaning, systemic ordering vs. communicative dynamism, contextual boundness/nonboundness, stock of shared knowledge, salience/activation of entities in the SSK. Modeling salience w.r.t. TFA. Comparison of TFA-based salience model and the Centering Theory approach. Salience/Activation/Familiarity/Givenness. Applications: IS-sensitive salience modeling for anaphora resolution/generation; IS-based control of target word order in machine translation.
    Slides: [PDF]
    Literature:
    Course Reader: Section 2.2: IS in the Prague School.
    Course Reader: Section 2.7: IS and Common Ground.
    Hajicova: Issues of sentence structure and discourse patterns. Chapters 2 and 3. 1993. [Available in the course reader][scans]
    Sgall et al. The meaning of the sentence in its semantic and pragmatic aspects. 1996
    Firbas: Functional Sentence Perspective in Written and Spoken Communication. 1992.
    Danes: Danes: On Prague School Functionalism in Linguistics. (extract). 1995. [scans]
    Hajicova et al.: An automatic procedure for topic-focus identification. CL Journal 1995. [PDF]
    Hajicova et al.: Hierarchy of salience and discourse analysis and production. COLING 1990. [PDF]
    Hajicova et al.: Stock of shared knowledge - a tool for solving pronominal anaphora. COLING 1992. [PDF]
    Strube and Hahn: Functional centering: Grounding referential coherence in information structure. Jo. of CL. 1999. [PDF]
    Krahmer and Theune: Efficient Context-Sensitive Generation of Referring Expressions. 2002. [PS]
    Stys and Zemke. Incorporating Discourse Aspects in English-Polish MT: Towards Robust Implementation. 1995. [Zipped PS]
    Prince: Toward a taxonomy of given-new information. 1981. [PDF]
    Prince: The ZPG Letter: subjects, definiteness and information status. 1992. [PS]
    Gundel et al.: Cognitive status and the form of referring expressions in discourse. Language. 1993.[scans]
    Grosz et al. Centering Theory. CL Journal 1995. [PDF]
    Walker et al. (eds.). Centering Theory in Discourse. 1998.
    Buranova et al.: Tagging of very large corpora: Topic-focus articulation. COLING 2000. [PDF]
    Hajicova and Sgall: Topic-Focus and Salience. ACL 2001. [PDF]

  • Lecture 3: "Vallduví's Information Packaging. File-Change Sematics of IS. Application of IS to Word Order Generation. Halliday's Thematic Structure vs. Information Structure. Danes' Thematic Sequences."
    Information packaging according to Vallduví: Ground: Link+Tail vs. Focus. Interpretation of IS in terms of file-change instructions. Hoffman's applications: IS use to control word order in generation of answers to questions querying a database; IS use to control target word order in machine translation. Theme/Rheme vs. Given/New partitioning according to Halliday. Theme-first principle and the thematic structure of texts. Thematic Sequences.
    Slides: [PDF]
    Literature:
    Course Reader: Sections 2.3 and 2.4.
    Vallduvi: The dynamics of information packaging. 1994. [PDF]
    Hendriks and Dekker. Links without Locations. Amsterdam Colloquium. 1995. [scans]
    Hoffman: Integrating "free" word order syntax and information structure. EACL 1995. [PDF]
    Hoffman: Translating into free word order languages. COLING 1996. [PDF]
    Stys and Zemke. Incorporating Discourse Aspects in English-Polish MT: Towards Robust Implementation. 1995. [Zipped PS]
    Grosz et al. Centering Theory. CL Journal 1995. [PDF]
    Halliday: Notes on transitivity and theme in English -- Part 2. Jo. of Linguistics. 1967. [Available in the course reader.][scans]
    Halliday: Introduction to Functional Grammar. 1985.
    Kruijff-Korbayova et al.: Generation of contextually appropriate word order. 2002. [GZipped PS (prepublication version)]
    Danes: Functional sentence perspective and the organization of the text. 1974. [Available in the course reader.][scans]

  • Lecture 4: "Steedman's Approach: Two Dimensions of IS. Application of IS to Intonation and to Multimodal Realization."
    Theme/Rheme and Background/Focus according to Steedman. Interpretation of IS based on alternative sets. Prevost's applications: IS use to control intonation of answers to questions; IS use in monologue generation and to control its spoken realization; IS use in text-to-speech synthesis. Similar application in the GoDIS dialog system to control system output intonation. Cassell et al.'s applications of IS to control multimodal realization, i.e., IS correlation with gesture and with gaze.
    Slides: [PDF] (These slides are as presented; I still intend to correct them and fill in a couple of things.)
    Literature:
    Course Reader: Section 2.6.
    Steedman: Information structure and the syntax-phonology interface. Linguistic Inquiry. 2000. [GZipped PS (prepublication version)]
    Hirschberg and Pierrehumbert. Intonational structuring of discourse. ACL 1986. [PDF]
    Hirschberg. Pitch accent in context: Predicting intonational prominence from text. AI. 1993.
    Prevost and Steedman. Generating Contextually Appropriate Intonation. EACL 1993. [PDF]
    Prevost and Steedman: Specifying intonation from context for speech synthesis. Speech Communication. 1994.
    Prevost: An Information Structural Approach to Spoken Language Generation. ACL. 1996. [PDF]
    Hiyakumuto et al. Semantic and Discourse Information for Text-to-Speech Intonation. ACL Workshop. 1997. [PDF]
    Kruijff-Korbayova et al. Producing contextually appropriate intonation in an information-state based dialogue system. EACL 2003. [PDF] [Siridus project experiment website]
    Pelachaud et al.: Synthesizing cooperative conversation. 1998. [Soft link]
    Cassell et al.: Turn taking vs. discourse structure: How best to model multimodal conversation. 1999. [PDF]
    Cassell et al. Coordination and context-dependence in the generation of embodied conversation. INLG 2000. [PDF]

  • Lecture 5: "Wrapping Up and Looking Out"
    Comparison of theories, aligning the different terminologies. How to test claims about IS? Empirical and corpus-based studies. Evaluation of practical applications in systems. Corpus annotation with IS and/or IS-relevant notions, e.g., annotation of TFA in the Prague Dependency Treebank project. Annotation of IS-relevant features at multiple levels in the MULI project. Annotation of familiarity status.
    Slides: [PDF]
    Literature:
    Course Reader: Chapter 2: Approaches to IS.
    Kruijff-Korbayova and Steedman: Discourse and information structure. JOLLI. 2003. [PDF (prepublication version)]
    Kruijff-Korbayova et al. Producing contextually appropriate intonation in an information-state based dialogue system. EACL 2003. [PDF] [Siridus project experiment website]
    Buranova et al.: Tagging of very large corpora: Topic-focus articulation. COLING 2000. [PDF]
    Baumann et al.: Multi-dimensional annotation of linguistic corpora for investigating information structure. NAACL/HLT Workshop 2004. [Gzipped PS]
    Nissim et al.: An Annotation Scheme for Information Status in Dialog. LREC. 2004.
    Poesio: The MATE/GNOME Scheme for Anaphoric Annotation, Revisited. SIGDIAL. 2004. [PDF]

Exercises

(to be provided)


Additional links

(to be provided)