Computational Linguistics & Phonetics Computational Linguistics & Phonetics Fachrichtung 4.7 Universität des Saarlandes

Distinguished Speakers in Language Science

Thursday, 22 January 2015, 16:15
Conference Room, Building C7.4

Non-standard spelling in computer-mediated-communication

Angelika Storrer
German Linguistics department
University of Mannheim

In the past three decades, the Internet has brought forth digital tools for interpersonal communication (e–mail, chat, instant messaging, discussion groups, microblogging etc.) which are the subject of research in the field of “computer–mediated communication” (henceforth CMC). It was evident from the beginning that CMC discourse displays linguistic and structural features which differ from both speech and written text. In my presentation, I will first give a brief summary of these features and highlight the main similarities and differences of CMC with text and speech.

CMC discourse is challenging for NLP tools trained on newspaper corpora; it is thus often regarded as being non–canonical or non–standard data. Tokenization, normalization and POS–tagging are important issues for annotating CMC corpora; these basic annotation levels are already of great value for cross–genre studies on language variation, sentiment analysis, or opinion mining. However, many research fields (e.g. diachronic studies in digital humanities, research on literacy and writing skills) would profit from a more fine–grained annotation of non–standard phenomena in CMC corpora. In the second part of my presentation I will present a basic typology for such an annotation, the focus will be on spelling phenomena and on German data. The typology reflects the main differences between text, speech and CMC outlined in the first part of the presentation with the goal to relate non–standard phenomena in CMC corpora to phenomena in other types of non–standard corpora (speech corpora, learner corpora, historical corpora).

If you would like to meet with the speaker, please contact Andrea Horbach.