Computational Linguistics & Phonetics Computational Linguistics & Phonetics Fachrichtung 4.7 Universität des Saarlandes
Chiara Gambi - Statistics course

Chiara Gambi - Statistical analysis of experimental (and corpus) data with R

Wintersemester 2013

Please note that there has been a change to the time of the first lecture (10/02/14); there might be other changes to the schedule, so please keep checking this page (especially if you miss some of the classes).


If you want to attend this course, you should download these help slides and read them carefully! They should get you started with R and R studio. Please, go through the notes and make yourself familiar with the info provided, as it will be taken for granted at the start of the course. If you have any problems, please contact me using the email address provided on my university profile.

This course will introduce you to the statistical tools you will need for the analysis of experimental data (e.g., behavioral experiments in psycholinguistics, perception studies in phonetics, acceptability rating studies in experimental linguistics) and corpus data. We will start with basic descriptive statistics (distributions, means, standard deviations) and an introduction to the concept of hypothesis testing in inferential statistics. We will then cover the most common statistical methods used in speech and language research:

  - t-test
  - chi-square
  - Generalized linear mixed-effects models

The course will combine brief theoretical introductions from the lecturer with hands-on exercises using the free software R ( The focus will be on learning how to run the analyses, interpret R output, report the findings in standard APA format, and produce suitable graphs.


  10/02/14, 14-16 - Lecture (room 2.11, C 7.2) - Lecture 1 slides
  11/02/14, 13-15 - Lab tutorial (CIP room, C 7.2) - Lab 1 exercises and solutions
  12/02/14, 10-12 - Lecture (room 2.11) - Lecture 2 slides
  12/02/14, 13-15 - Lab tutorial (CIP room) - Lab 2 exercises (old version) (note that this version contains a mistake in Section 4), Lab 2 exercises (new version) (this is correct!), Lab 2 data set and solutions
  17/02/14, 15-17 - Lecture (room 2.11) - Lecture 3 slides. NOTE: I have spotted an imprecision on slide 18 of Lecture 3. The correct version of these slides can be found here, and here you find a brief explanation of what changed.
  18/02/14, 13-15 - Lab tutorial (CIP room) - Lab 3 exercises and Lab 3 data set and solutions
  19/02/14, 10-12 - Lecture (room 2.11) - Lecture 4 slides
  20/02/14, 13-15 - Lab tutorial (CIP room) - Lab 4 exercises, cognates data set, and twoway data set; solutions (Section 1); solutions (complete)
  21/02/14, 10-12 - Lecture (room 2.11) - Lecture 5 slides
  21/02/14, 13-15 - Lab tutorial (CIP room) - Lab 5 exercises, error data set, and solutions
  24/02/14, 10-12 - Lecture (room 2.11) - Lecture 6 slides
  25/02/14, 13-15 - Lab tutorial (CIP room) - Lab 6 exercises

EXAM DATE: 10th March (CIP room); 10-12 (s.t.) - NOTE: please be there at 10am sharp (no academic quarter!) ; exam solutions

R resources on the web

  - R cran mirror: (R download and documentation for all packages)
  - R seek: (web search engine for help on R-related topics)
  - ling-r-lang-L: mailing list for language researcher using R
  - R studio: (R studio download)



Howell, D.C. (2004). Fundamental statistics for the behavioral sciences. Thomson Brooks (Fifth Edition in Coli Library; later editions in SULB)

Field, A. (2000). Discovering statistics using SPSS. Sage Publications (Second edition in Coli Library; later editions in SULB).

NOTE: despite the title, this book is not just about SPSS. If you're hoping to be gently eased into statistics, this is the one for you!

Howitt, D., & Cramer, D. (2011). Introduction to statistics in psychology. Prentice Hall (Fifth Edition in SULB; NOTE: unfortunately, this edition contains mistakes in several formulae!).

Statistics with R

R. H. Baayen (2008). Analyzing linguistic data: A practical introduction using R. Cambridge University Press. (several copie in Coli Library, and an online draft at

P. Dalgaard (2008). Introductory statistics with R. Springer (copies in Coli Library).

A. Field (2013). Discovering statistics using R. Sage Publications

A. Gelman, & J. Hill (2007). Data analysis using regression and multilevel/hierarchical models. Cambridge University Press

Journal articles on linear mixed-effects models (available on Google Scholar)
Baayen, R. H., Davidson, D. J., & Bates, D. M. (2008). Mixed-effects modeling with crossed random effects for subjects and items. Journal of memory and language, 59(4), 390-412.

Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68(3), 255-278.

Cunnings, I. (2012). An overview of mixed-effects statistical models for second language researchers. Second Language Research, 28(3), 369-382.

Jaeger, T. F. (2008). Categorical data analysis: Away from ANOVAs (transformation or not) and towards logit mixed models. Journal of memory and language, 59(4), 434-446.