Seminar: Information Theoretic Approaches to the Study of Language
Summer 2015


Course Information

Course taught by Vera Demberg
Room: Seminar Room 2.11
Slots: Monday, 14-16
Contact: vera at coli ...


This seminar is concerned with whether and how information-theoretic concepts (surprisal: measuring the amount of information conveyed by a word in context, or entropy: measuring the degree of uncertainty about the rest of a sentence or utterance) can be used to explain a number of fundamental properties of language.
Topics that we're going to consider with the information-theoretic perspective include:

Evidence for the importance of information-theoretic concepts in communication have also influenced recent work in NLP applications, such as deciding when to speak or backchannel in a dialog system.


All participants should have attended the lecture "Information Theory" or have equivalent background on the topic. Contact Vera Demberg if in doubt.

Every participant will prepare a 25 min presentation on a paper. Two students will present per slot, and should prepare their presentation together. This means that we will have two presentations per session, but these presentations should be on a common topic, and explicitly relate to one another. The task is not only to present the paper, but also to prepare a discussion of how the two papers relate to one another, and what this tells us about information-theoretic accounts of the study of language.

Furthermore, participants are required to read at least one of the papers for each session, and send in a summary about the paper by email before the meeting. Following each meeting, every participant should fill in a peer review form.
MSc students can choose between a 4CP and 7CP version of the course. For the 7CP version, students additionally need to write a term paper. BSc students get 5CP for the course, and have to write a term paper.

Link to the peer review form, which you should fill in after each student presentation:
peer review form.


Date Topic Speaker
27.4. Introduction and Organization Vera Demberg
4.5. Referring Expressions and UID:
Tic Tac TOE: Effects of predictability and importance on acoustic prominence in language production
Grammatical and Information-Structural Influences on Pronoun Production
Refer efficiently: Use less informative expressions for more predictable meanings
Jorrig Vogels and Katja Kravtchenko
11.5. Applications of UID for HCI: Dethlefs et al., 2012 Dave Howcroft
18.5. Information Density / Surprisal and Salience
(1) Itti, L., & Baldi, P. (2009). Bayesian surprise attracts human attention. Vision research, 49(10), 1295-1306.
(2) Horstmann, G. (2015). The surprise–attention link: a review. Annals of the New York Academy of Sciences, 1339(1), 106-115.
(3) Awh, E., Belopolsky, A. V., & Theeuwes, J. (2012). Top-down versus bottom-up attentional control: a failed theoretical dichotomy. Trends in cognitive sciences,16(8), 437-443.
Alessandra Zarcone
25.5. Public Holiday (Pfingsten)
1.6. UID on Twitter Anna Currey
8.6.Smooth Signal Redundancy / Jurafsky Valeria Lapina and Stefanie Lund
15.6. UID and Information Processing / Channel Capacity Eva Horch
22.6. Channel and Audience Design Robin Lemke
29.6. no meeting (SFB retreat)
6.7. Reading Group Jaeger and Snider 2013
13.7. Clark, Andy. "Whatever next? Predictive brains, situated agents, and the future of cognitive science." Behavioral and Brain Sciences 36.03 (2013) Reading Group
20.7. UID vs. Paradigmatic effects Reading Group
27.7. Final discussion and wrap-up Vera Demberg


The papers mentioned here are some pointers to get you started. You can choose one of these papers, but you're also free to suggest other papers. Please contact me if you would like to present something that's not on this list.

UID and Information Processing / Channel Capacity

Van Egmond, Marjolein, Lizet Van Ewijk, and Sergey Avrutin. "A New Theoretical Model for Word-Finding Difficulties in Aphasia." Procedia-Social and Behavioral Sciences 23 (2011): 175-176.

Lizet van Ewijk and Sergey Avrutin. 2010. Article Omission in Dutch Children with SLI: A Processing Approach.

Reading between the (head)lines: A processing account of article omissions in newspaper headlines and child speech Joke De Lange, Nada Vasic, Sergey Avrutin

Channel and Audience Design / Effects of Production Difficulty vs. Optimal Communication

Pate, John K., and Sharon Goldwater. "Talkers account for listener and channel characteristics to communicate efficiently." Journal of Memory and Language 78 (2015): 1-17.

Kurumada, C. & Jaeger, T.F. (2013). Communicatively efficient language production and case-marker omission in Japanese. The 35th Annual Meeting of the Cognitive Science Society (CogSci13). Berlin, Germany. August, 2013.

UID applied to Twitter

Gabriel Doyle and Michael C Frank. 2015. Audience size and noise levels modulate information density in Twitter conversations. Proceedings of CMCL.

Gabriel Doyle and Michael C Frank. 2015. Shared common ground influences information density in microblog texts. Proceedings of NAACL-HLT.

The smooth signal redundancy hypothesis and Probabilistic Reduction Hypothesis

Jurafsky, Daniel, et al. "Probabilistic relations between words: Evidence from reduction in lexical production." Typological studies in language 45 (2001): 229-254.

Aylett, M. and Turk, A. (2004). The smooth signal redundancy hypothesis: a functional explanation for relationships between redundancy, prosodic prominence, and duration in spontaneous speech. Language and Speech , 47:31–56.

Pellegrino, François, Christophe Coupé, and Egidio Marsico. "Across-language perspective on speech information rate." Language 87.3 (2011): 539-558.

Aylett, M. and Turk, A. (2006). Language redundancy predicts syllabic duration and the spectral characteristics of vocalic syllable nuclei. Journal of the Acoustical Society of America , 119(1):30–48.

UID and the Lexicon

Michael Ramscar, Richard Futrell, Dan Jurafsky, Melody Dye. Der Pfauenschwanz's Tale: The Evolution of Noun Classification in Two "Awful" Germanic Languages. (unpublished, will be sent out by mail).

Tabak, Wieke, Robert Schreuder, and R. Harald Baayen. "Lexical statistics and lexical processing: semantic density, information complexity, sex, and irregularity in Dutch." Linguistic evidence—Empirical, theoretical, and computational perspectives (2005): 529-555.

UID and Morphology

Fermı́n Moscoso del Prado Martı́n, Aleksandar Kostić, R.Harald Baayen. Putting the bits together: an information theoretical perspective on morphological processing

V Kuperman, M Pluymaekers, M Ernestus, H Baayen. (2007). Morphological predictability and acoustic duration of interfixes in Dutch compounds. The Journal of the Acoustical Society of America 121, 2261

Tily, H., & Kuperman, V. (2012). Rational phonological lengthening in spoken Dutch. Journal of the Acoustical Society of America , 132 , 3935–3940.

UID effects across different linguistic levels

Temperley, David, and Daniel Gildea. "Information Density and Syntactic Repetition." Cognitive science (2015).

Vera Demberg, Asad Sayeed, Philip Gorinski and Nikolaos Engonopoulos. "Syntactic surprisal affects spoken word duration in conversational contexts." Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics, 2012.

How online is the adaptation effect?

Asad Sayeed, Stefan Fischer and Vera Demberg. Vector-space calculation of semantic surprisal for predicting word pronunciation duration. ACL 2015

Scott Seyfarth.
Word informativity influences acoustic duration: Effects of contextual predictability on lexical representation.

UID and discourse relations

Fatemeh Torabi Asr and Vera Demberg "Implicitness of Discourse Relations" In Proceedings of COLING 2012, Mumbai, India, Dec 2012

Uniform Information Density at the Level of Discourse Relations: Negation Markers and Discourse Connective Omission Fatemeh Torabi Asr and Vera Demberg in Proceedings of the 11th International Conference on Computational Semantics (IWCS 2015)

Referring expressions

H. Tily and ST Piantadosi. (2009) Refer efficiently: Use less informative expressions for more predictable meanings. In Proceedings of the workshop on the production of referring expressions: Bridging the gap between computational and empirical approaches to reference.

Rohde, Hannah, and Andrew Kehler. "Grammatical and information-structural influences on pronoun production." Language, Cognition and Neuroscience 29.8 (2014): 912-927.

Information Density and Word Order

Gibson, E., Piantadosi, S., Brink, K., Bergen, L., Lim, E. & Saxe, R. (2013). A noisy-channel account of cross-linguistic word order variation. Psychological Science, 4(7):1079-88. doi: 10.1177

Maurits, L., Perfors, A., and Navarro, D. (2010). Why are some word orders more common than others? a uniform information density account. In Advances in Neural Information Processing Systems 23 , pages 1585–1593, Cambridge, MA. MIT Press.

Information Density and learning

Fine AB, Jaeger TF, Farmer TA, Qian T (2013) Rapid Expectation Adaptation during Syntactic Comprehension. PLoS ONE 8(10): e77661. doi:10.1371/journal.pone.0077661

Fedzechkina, Maryia, T. Florian Jaeger, and Elissa L. Newport. "Language learners restructure their input to facilitate efficient communication." Proceedings of the National Academy of Sciences 109.44 (2012): 17897-17902.

Counterevidence? Paradigmatic effects

V Kuperman, M Pluymaekers, M Ernestus, H Baayen. (2007). Morphological predictability and acoustic duration of interfixes in Dutch compounds. The Journal of the Acoustical Society of America 121, 2261

Milin, Petar, et al. "Paradigms bit by bit: An information theoretic approach to the processing of paradigmatic structure in inflection and derivation." Analogy in grammar: Form and acquisition (2009): 214-252.

Clara Cohen. Probabilistic reduction and probabilistic enhancement.

Surprisal and Uncertainty / Noisy Channel

Levy, Roger. "Integrating surprisal and uncertain-input models in online sentence comprehension: formal techniques and empirical results." Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1. Association for Computational Linguistics, 2011.

Gibson, E., Bergen, L. & Piantadosi, S. (2013). The rational integration of noisy evidence and prior semantic expectations in sentence interpretation. Proceedings of the National Academy of Science, doi:10.1073/pnas.1216438110

Applications of UID for generation

Dethlefs, Nina, et al. "Optimising incremental dialogue decisions using information density for interactive systems." Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics, 2012.

Rajkumar, Rajakrishnan, and Michael White. "Linguistically motivated complementizer choice in surface realization." Proceedings of the UCNLG+ Eval: Language Generation and Evaluation Workshop. Association for Computational Linguistics, 2011.

Information theory and puns

Justine T. Kao, Roger Levy and Noah D. Goodman. In press. A Computational Model of Linguistic Humor in Puns. Cognitive Science.

Eye movements in reading

Levy, R., Bicknell, K., Slattery, T., & Rayner, K. (2009). Eye movement evidence that readers maintain and act on uncertainty about past linguistic input. Proceedings of the National Academy of Sciences, 106(50), 21086-21090.

Bicknell, K., & Levy, R. (2010). A rational model of eye movement control in reading. In Proceedings of the 48th annual meeting of the association for computational linguistics (pp. 1168-1178). Association for Computational Linguistics.

Klinton Bicknell, Roger Levy. (2012.) Word predictability and frequency effects in a rational model of reading. Proceedings of the 34th Annual Conference of the Cognitive Science Society. Austin, TX: Cognitive Science Society