IRTG Annual Meeting 2009

Code Breaking for Speech Recognition

Abstract:

In the conventional ASR paradigm, acoustic and language models are trained on some training data and then either these models or their adaptation (based on the test data) are used to search the least erroneous hypothesis. But as the progress in ASR especially language modeling has not been satisfactory inspite of three decades of research, we feel it is of atleast academic interest to ask ourselves this challenging question [F Jelinek, 1995]: "In order to transcribe a speech utterance, how much better (in terms of ASR accuracy) can we do if we were given access to a very large amount of data (both speech and text) pertaining to the task at hand and were also given the liberty of retraining the models of our choice or even changing the paradigm of speech recognition process?" As long as there is no access to the truth, this game of code breaking remains fair, although computationally very expensive !

In this talk, we will show some preliminary results obtained as a first step towards code breaking and also present some ideas which we are currently pursuing.

Joint work with Prof. Fred Jelinek, CLSP-JHU

Last modified: Fri, May 29, 2009 10:57:04 by