Lecture: Information Theory
Summer 2017


Course Information

Course taught by Matt Crocker and Vera Demberg 

Room and slots: -1.05 in C7.2 (basement) on Monday 8:30 - 10:00 and Wednesday 8:30-10:00 during the first half of the semester (till June 7th), plus one additional meeting on June 26th for poster presentations. See calendar below for details

Contact: crocker / vera at coli ...

Please subscribe to our course mailing list. 


This course will cover the mathematical basis of information theory, and then proceed to information theoretic approaches to the study of language, with respect to language comprehension, language production and language evolution. We will also discuss methodologies for testing hypotheses related to these information-theoretic concepts. 

The course will include tutorials where we look at ways to estimate surprisal from text corpora, in order to test for effects of surprisal or uniform information density.


Each student needs to attend the meetings and participate in discussions. Grades will be determined based on a poster project. Posters will be presented at the last meeting on June 26th.

Students will form groups of two or three to prepare a poster. Each group of students should prepare a research proposal of what linguistic phenomenon could be investigated using surprisal or the UID hypothesis. In particular, students should propose what kind of research method to use for tackling the question, what results they would expect, and what it would mean to find different results than the expected ones. (The research does not actually have to be carried out; the posters are about application ideas / proposals.)

Poster templates: latex poster, ppt poster  






Basics of Information Theory

Matt Crocker


Basics (continued)

Matt Crocker


Tutorial on language models

Clayton Greenberg, in Room 2.11!


Is human language a good code?

Vera Demberg


Is human language a good code? (cntd)

Vera Demberg



Matt Crocker 



Clayton Greenberg


Surprisal, Entropy and Entropy Reduction

Vera Demberg 


tutorial and data

Jesus Calvillo 


Uniform Information Density

Matt Crocker 


Uniform Information Density -- Limits

Vera Demberg 


tutorial and data

Clayton Greenberg 


Poster presentations



Information Density and Word Length 

Piantadosi, S., Tily, H., and Gibson, E. (2011). Word lengths are optimized for efficient communication. Proceedings of the National Academy of Sciences , 108(9):3526.

K. Mahowald, E. Fedorenko, S.T. Piantadosi, and E. Gibson. (2013). Info/information theory: speakers choose shorter words in predictive contexts. Cognition, 126, 313-318. 


Hale, J. (2001). A probabilistic earley parser as a psycholinguistic model. In Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies , NAACL ’01, pages 1–8, Stroudsburg, PA, USA. Association for Computational Linguistics.

Levy, R. (2008). Expectation-based syntactic comprehension. Cognition, 106(3):1126 – 1177. doi:10.1016/j.cognition.2007.05.006. 

Evidence for surprisal 

Demberg, V. and Keller, F. (2008). Data from eye-tracking corpora as evidence for theories of syntactic processing complexity. Cognition , 109:193–210.

Frank, S., Otten, L., Galli, G., and Vigliocco, G. (2013). Word surprisal predicts n400 amplitude during reading. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics , pages 878–883. Association for Computational Linguistics. 


Hale, John. "The information conveyed by words in sentences." Journal of psycholinguistic research 32.2 (2003): 101-123.

Genzel, D. and Charniak, E. (2002). Entropy rate constancy in text. In Proceedings of the 40th meeting of the Association for Computational Linguistics ACL ’02, pages 199–206. Association for Computational Linguistics. 

Experimental evaluation of surprisal and entropy 

Roark, B., Bachrach, A., Cardenas, C., & Pallier, C. (2009, August). Deriving lexical and syntactic expectation-based measures for psycholinguistic modeling via incremental top-down parsing. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1-Volume 1 (pp. 324-333). Association for Computational Linguistics.

Linzen, Tal and Florian T. Jaeger "Investigating the role of entropy in sentence processing." to appear in the Proceedings of the 2014 workshop on Cognitive Modelling and Computational Linguistics (CMCL). Association for Computational Linguistics, 2014. 

The Uniform Information Density (UID) hypothesis 

Jaeger, T. F. (2010). Redundancy and reduction: Speakers manage syntactic information density. Cognitive Psychology , 61:23–62.