Language Processing for Different Domains and Genres (WS 2009/10)

Presentation Topics & Papers

(Every bullet point is one topic!)

Genre Distinctions

General
Biber, Douglas. 1993. Using register-diversified corpora for general language studies. Computational Linguistics 19:2, pp. 219
http://aclweb.org/anthology-new/J/J93/J93-2001.pdf
Discourse
Bonnie Webber. 2009. Genre distinctions for discourse in the Penn TreeBank. Proc. of ACL-09
http://aclweb.org/anthology-new/P/P09/P09-1076.pdf
Verb subcategorisation frequencies Roland, D., & Jurafsky, D. (1998): How verb subcategorization frequencies are affected by corpus choice. Proceedings of COLING-ACL 1998 (pp. 1117-1121), Montreal, Canada.
http://aclweb.org/anthology-new/P/P98/P98-2184.pdf

And:

Roland, D., Jurafsky, D., Menn, L., Gahl, S., Elder, E., & Riddoch, C. (2000) Verb subcategorization frequency differences between business-news and balanced corpora: The role of verb sense. Proceedings of the Workshop on Comparing Corpora (pp. 28-34), Hong Kong, October 2000.
http://portal.acm.org/ft_gateway.cfm?id=979622&type=pdf&coll=GUIDE&dl=GUIDE&CFID=58874365&CFTOKEN=53240364

Domain Adaptation for Machine Learning

Modifying the feature Space Daumé III, H. 2007. Frustratingly easy domain adaptation. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics.
http://aclweb.org/anthology-new/P/P07/P07-1033.pdf

Parsing

Re-ranking (and self-training)
David McClosky and Eugene Charniak and Mark Johnson. 2006 Reranking and self-training for parser adaptation. Proc. of ACL
http://aclweb.org/anthology-new/P/P06/P06-1043.pdf

And:

Jennifer Foster, Joachim Wagner, Djame Seddah and Josef van Genabith. 2007. Adapting WSJ-Trained Parsers to the British National Corpus using In-domain Self-training. Proceedings of IWPT 2007, pp.33-35, Prague, Czech Republic.
http://www.computing.dcu.ie/~jfoster/publications/foster_iwpt2007.pdf
Self-Training
David McClosky, Eugene Charniak, and Mark Johnson. 2008. When is Self-Training Effective for Parsing? Proceedings of the International Conference on Computational Linguistics (COLING 2008).
http://aclweb.org/anthology/C/C08/C08-1071.pdf
Detection of non-generalising rules
Markus Dickinson and Jennifer Foster. 2007. Similarity Rules! Exploring Methods for Ad-Hoc Rule Detection. Proceedings of the Seventh International Workshop on Treebanks and Linguistic Theories (TLT-7 2009). Groningen, The Netherlands.
http://www.computing.dcu.ie/~jfoster/publications/foster_tlt2009.pdf

And possibly as background reading:

Markus Dickinson (2008). Ad Hoc Treebank Structures. The 46th Annual Meeting of the Association for Computational Linguistics (ACL) with the Human Language Technology Conference (HLT) (ACL-08). Columbus, OH.
http://aclweb.org/anthology-new/P/P08/P08-1042.pdf
Detecting parse reliability
Daisuke Kawahara and Kiyotaka Uchimoto. 2006. Learning Reliability of Parses for Domain Adaptation of Dependency Parsing. COLING-ACL 2006.
http://www.aclweb.org/anthology-new/I/I08/I08-2097.pdf
Lexicalised parsing
Laura Rimell, Stephen Clark. 2008. Adapting a Lexicalized-Grammar Parser to Contrasting Domains. EMNLP 2008.
http://www.cl.cam.ac.uk/~lr346/pubs/emnlp08.pdf

Word-Sense Disambiguation

Most Frequent Sense
Rob Koeling; Diana McCarthy; John Carroll. 2005. Domain-Specific Sense Distributions and Predominant Sense Acquisition EMNLP 2005
http://aclweb.org/anthology-new/H/H05/H05-1053.pdf
Active Learning
Yee Seng Chan and Hwee Tou Ng. 2007. Domain Adaptation with Active Learning for Word Sense Disambiguation. In Proceedings of ACL.
http://www.comp.nus.edu.sg/~nght/pubs/acl07_wsd_da.pdf