Towards a vocabulary-independent ASR system
Speaker: Carolina Parada
Abstract:
Current automatic speech recognition systems are certain to make errors when out-of-vocabulary words (OOVs) are uttered by the speaker. The recognizer replaces the OOV words by in-vocabulary terms that are based on the acoustic and language models of the ASR system. Out of vocabulary terms, which typically include proper names and rare or foreign words, occur infrequently but are rich in information. In this work we propose a new method to recover OOV words in the output of a large vocabulary speech recognition system. Given (automatically detected) OOV regions in the lattices put out by a speech recognizer, the proposed technique employs the context of these regions of uncertainty and an external source of knowledge (the web) with the aim to recover the out-of-vocabulary words (correctly spelled!) and thus to improve the ASR performance. Preliminary experimental results are presented based on the RT04 broadcast news task.
This is joint work with Ariya Rastrow and Fred Jelinek.