Foundational Course at ESSLLI 2006
18th European Summer School in Logic, Language and Information
Málaga, Spain
31 July-11 August, 2006

 

Introduction to Corpus Resources, Annotation and Access

 

Sabine Schulte im Walde Heike Zinsmeister
Computerlinguistik Seminar für Sprachwissenschaft
Universität des Saarlandes Universität Tübingen


Course Description

Our course presents an introduction to corpus resources, combining the theoretical background of corpora, resource examples, annotation levels, and tools for exploitation. First, we motivate corpus resources for empirical linguistics, and describe the properties/problems of corpus data, the levels of annotation, and standardisation efforts. We then relate the annotation levels to appropriate tools and uses for exploitation: Finally, we present the web as a corpus. BootCaT is a toolkit to collect data from the web, e.g. for creating corpora for minority languages.


Course Material