Breaking the Resource Bottleneck for Multilingual Processing
Rebecca Hwa
 
To train an application for natural language processing is a challenging machine learning task: how can a machine automatically and efficiently induce a model of the complex structures of human language?  Unsupervised learning is not well-suited for this problem because human languages contain too much ambiguity; on the other hand, fully-supervised methods require large quantities of manually-annotated training data, which are difficult to obtain.  The annotation bottleneck is worse for non-English languages because fewer resources have been developed for them.  One way to alleviate the problem is to build new resources by bootstrapping from existing English resources.  In this talk, I will present my work on inducing annotated resources for Chinese to train applications such as parsers and taggers by bootstrapping from English resources.


back to IGK4 schedule