IRTG Annual Meeting 2008

Comparing sources of information for unsupervised learning of syntactic categories

Abstract:

I present ongoing work which examines the types of context information that are useful for learning syntactic categories in a completely unsupervised fashion. Previous work has used various features such as adjacent words and classes. I use a set of Bayesian HMM-like models to test which of these features are useful for categorising words into syntactic categories.

I also present a new evaluation measure that does not use gold-standard part of speech tags. This is motivated by the desire to model syntactic categories as learned by children, since their categorizations are known to differ from gold-standard tags. Instead, I employ Z. Harris' principle of substitutability to evaluate whether the categories learned by the models are appropriate.

Last modified: Sat, Aug 09, 2008 01:48:20 by