Computational Linguistics Colloquium
Thursday, 4 December, 16:15
Conference Room, Building C7 4
Adapting a WSJ-trained Lexicalized-Grammar Parser to New Domains
Stephen ClarkOxford University
In this talk I will describe some experiments on adapting the C&C CCG
parser to new domains. The parser was originally developed using
CCGbank, the CCG version of the Penn Treebank, and is therefore tuned
to newspaper text. The two new domains we consider are (1) biomedical
abstracts and (2) questions for a QA system (using the term "domain"
somewhat loosely in the latter case).
The porting approach we use is to train the parser at lower levels of
representation than full syntactic derivations. The lexicalized nature
of CCG (in which words are assigned syntactic categories that include
subcategorization information) makes it possible to use a level of
representation intermediate between POS tags and full derivations. For
the biomedical data, we find that simply retraining the POS tagger
leads to a large improvement in performance, and that using annotated
data at the intermediate CCG lexical category level improves parsing
accuracy further. A similar result is obtained for the question data,
but the impact of retraining at the CCG lexical category level is much
greater. We suggest that this is because the syntax of questions
differs more from that of newspaper text than does the syntax of
biomedical sentences, and we discuss some measures supporting this
idea.
The parsing accuracies obtained for both biomedical and question data
are in the same range as those reported for newspaper text, and higher
than those previously reported for the biomedical domain on the same
evaluation resource. The conclusion is that porting newspaper-trained
parsers to new domains may not be as difficult as first thought (at
least for parsers which use lexicalized grammars), but we note that
different levels of representation may have different impacts on the
porting process, depending on the characteristics of the target
domain.
If you would like to meet with the speaker, please contact
Rebecca Dridan.