Computational Linguistics Colloquium
18 June 2015, 16:15
Conference Room, Building C7.4
Unsupervised Language Acquisition from Raw SpeechReinhold Häb-Umbach
We consider the problem of segmenting an input sequence of symbols in recurrent patterns. This is achieved by employing nonparametric Bayesian statistical models, in particular the Nested Pitman-Yor process. We then consider the problem that the input sequence is noisy, i.e., contains errors, and propose an iterative word segmentation algorithm. An application is automatic speech recognition for a language for which a pronunciation lexicon and a language model are unavailable. Results will be presented for an English task and, for the segmentation of noisefree input, for two austronesian languages, Wooi and Waima’s.
If you would like to meet with the speaker, please contact