Comparing sources of information for unsupervised learning of syntactic categories
Speaker: Stella Frank
Institution: University of Edinburgh
Abstract:
I present ongoing work which examines the types of context information that are useful for learning syntactic categories in a completely unsupervised fashion. Previous work has used various features such as adjacent words and classes. I use a set of Bayesian HMM-like models to test which of these features are useful for categorising words into syntactic categories.
I also present a new evaluation measure that does not use gold-standard part of speech tags. This is motivated by the desire to model syntactic categories as learned by children, since their categorizations are known to differ from gold-standard tags. Instead, I employ Z. Harris' principle of substitutability to evaluate whether the categories learned by the models are appropriate.