Computational Linguistics & Phonetics Computational Linguistics & Phonetics Fachrichtung 4.7 Universität des Saarlandes

Computational Linguistics Colloquium

18 June 2015, 16:15
Conference Room, Building C7.4

Unsupervised Language Acquisition from Raw Speech

Reinhold Häb-Umbach
Department of Communications Engineering University of Paderborn

We consider the problem of segmenting an input sequence of symbols in recurrent patterns. This is achieved by employing nonparametric Bayesian statistical models, in particular the Nested Pitman-Yor process. We then consider the problem that the input sequence is noisy, i.e., contains errors, and propose an iterative word segmentation algorithm. An application is automatic speech recognition for a language for which a pronunciation lexicon and a language model are unavailable. Results will be presented for an English task and, for the segmentation of noisefree input, for two austronesian languages, Wooi and Waima’s.

If you would like to meet with the speaker, please contact Dietrich Klakow.