TLT 10 Co-located Event:
Workshop on Annotation of Corpora
for Research in the Humanities
January 5, 2012
Heidelberg University, Germany
The workshop aims at building a tighter collaboration between people working in various areas of the Humanities (such as literature, philology, history etc.) and the research community involved in developing, using and making accessible annotated corpora.
Addressing topics related to annotated corpora for research in the Humanities is an interdisciplinary task, which involves corpus and computational linguists (mostly those working in literary computing), philologists, scholars in the Humanities and computer scientists. However, this interdisciplinarity is not fully realised yet. Indeed, philologists and scholars are not used to exploit NLP tools and language resources such as annotated corpora; in turn, computational linguists are more prone to develop language resources for NLP purposes only.
For instance, although several historical corpora are today available in digital format (covering many languages and a wide diachronic span), only a few of them are linguistically tagged, while most still lack linguistic tagging at all. However, developing annotated corpora for the Humanities seems to be a growing research field. Over the past few years a number of historical annotated corpora have been started, among which are treebanks for Middle, Early Modern and Old English, Early New High German, Medieval Portuguese, Ugaritic, Latin, Ancient Greek and several translations of the New Testament into Indo-European languages.
Moreover, we believe that a tighter collaboration between people working in the Humanities and the research community involved in developing annotated corpora is needed since, while annotating a corpus from scratch still remains a labor-intensive and time-consuming task, today this is simplified by intensively exploiting prior experience in the field.