Breaking the Resource Bottleneck for Multilingual Processing
Rebecca Hwa

To train an application for natural language processing is a challenging machine learning task: how can a machine automatically and efficiently induce a model of the complex structures of human language? Unsupervised learning is not well-suited for this problem because human languages contain too much ambiguity; on the other hand, fully-supervised methods require large quantities of manually-annotated training data, which are difficult to obtain. The annotation bottleneck is worse for non-English languages because fewer resources have been developed for them. One way to alleviate the problem is to build new resources by bootstrapping from existing English resources. In this talk, I will present my work on inducing annotated resources for Chinese to train applications such as parsers and taggers by bootstrapping from English resources.