Computational Linguistics & Phonetics Computational Linguistics & Phonetics Fachrichtung 4.7 Universität des Saarlandes

Computational Linguistics Colloquium

Thursday June 21, 16:15, Seminar Room, Building 17

Statistical Parsing: A Theme and Two Variations

Eugene Charniak
Computer Science Department
Brown Univeristy

The past ten years have seen a dramatic improvement in our ability to automatically assign grammatical structure to English sentences -- to ``parse'' them. The improvements, coming primarily from the use of statistical and probabilistic techniques, now give us the ability to, say, parse every sentence on the front page of today's New York Times with reasonable (though far from total) accuracy. In the first third of the talk we will review this history, showing how results have improved with increasing use of lexical information, that is, conditioning probabilities on not just the traditional parts of speech (nouns, verbs, etc.) but upon the words themselves.

In the last two thirds of the talk we will look at two directions that we believe will be increasingly important to the future of this technology. First we will look at a small example of how we can use unsupervised learning to enable us to parse to a slightly deeper level of analysis. In particular we will show some results on the unsupervised learning of personal-name structure -- e.g., learning that in ``Defense Secretary William Cohen Jr.'' the first two words are descriptors, the third a first name, etc. We will also show how coreference information can help this process.

Finally we will look at one future application of this technology -- the use of parsing models as language models. Language models assign a probability to every string in a language and are used in speech- recognition systems to distinguish between plausible and implausible sequences of words. Virtually all of todays speech-recognition systems use the so-called ``trigram'' model in which the probability of each word is conditioned on the two previous words. We will present some results showing that basing the probabilities instead on a statistical parser can give a dramatic improvement over the standard trigram model.

If you would like to meet with the speaker, please contact Amit Dubey.