Building an Annotated Corpus of Reconstructed Speech
Speaker:Erin Fitzgerald
Institution:Johns Hopkins University
Abstract:
A system would accomplish speech reconstruction of its spontaneous speech input if its output were to represent, in flawless, fluent, and content-preserved English, the message that the speaker intended to convey. Transforming errorful text using supervised statistical methods requires a parallel gold-standard corpus of manually reconstructed sentences, which does not exist naturally. We trained a set of annotators to reconstruct and label utterances -- with transformation types as well as basic predicate-argument structure -- from a subset of the Fisher corpus. In this talk I will describe the project and the characteristics of the data produced and its role in the training and evaluation ahead.