Computational Linguistics & Phonetics Computational Linguistics & Phonetics Fachrichtung 4.7 Universität des Saarlandes


Korpus-Extraktion aus dem Web

Philip Resnik und Noah A. Smith (2003). The Web as a parallel corpus. Computational Linguistics 29, Vol. 29, Issue 3. [PDF]

Marco Baroni und Adam Kilgarrif (2006). Large linguistically-processed Web corpora for multiple languages. Proceedings of EACL 2006. [PDF]

Linguistische Nutzung von Suchmaschinen

Natalia N. Modjeska, Katja Markert und Malvina Nissim (2003). Using the Web in Machine Learning for Other-Anaphora Resolution. Proceedings of EMNLP 2003. [PDF]

Timothy Chklovski und Patrick Pantel (2004). VerbOcean: Mining the Web for Fine-Grained Semantic Verb Relations.Proceedings of EMNLP 2004. [PDF] [ausprobieren]

Idan Szpektor, Hristo Tanev, Ido Dagan und Bonaventura Coppola (2004). Scaling Web-based Acquisition of Entailment Relations. In Proceedings of EMNLP 2004. [PDF] [Ergebnisse ansehen (.zip)]

Keiji Shinzato und Kentaro Torisawa (2004). Acquiring Hyponymy Relations from Web Documents. In Proceedings of HLT-NAACL 2004. [PDF]

Tony Veale und Yanfen Hao (2007). Comprehending and Generating Apt Metaphors: A Web-driven, Case-based Approach to Figurative Language.Proceedings of AAAI 2007. [PDF] [ausprobieren]

Blogs & Twitter

Jason S. Kessler (2008). Polling the Blogosphere: a Rule-Based Approach to Belief Classification. Proceedings of ICWSM 2008. [PDF]

John D. Burger, John Henderson, George Kim und Guido Zarrella (2011): Discriminating Gender on Twitter. Proceedings of ACL 2011. [PDF]

Bo Han, Paul Cook und Timothy Baldwin (2012): Automatically Constructing a Normalisation Dictionary for Microblogs. Proceedings of EMNLP-CoNNL 2012. [PDF] [Ergebnisse ansehen (*.tar.gz)]

Alexander Pak und Patrick Paroubek (2010): Twitter as a Corpus for Sentiment Analysis and Opinion Mining. In Proceedings of LREC 2010. [PDF]

Alan Ritter, Mausam, Oren Etzioni and Sam Clark (2012):Open Domain Event Extraction from Twitter.In Proceedings of KDD 2012. [PDF]

Crowdsourcing I: Wikipedia als Korpus

Rada Mihalcea (2007). Using Wikipedia for Automatic Word Sense Disambiguation. Proceedings of NAACL 2007. [PDF]

Fabian M. Suchanek, Gjergji Kasneci und Gerhard Weikum (2007). YAGO - A Core of Semantic Knowledge. Proceedings of WWW 2007. [PDF] [ausprobieren]

Alexander E. Richman und Patrick Schone (2008). Mining Wiki Resources for Multilingual Named Entity Recognition. Proceedings of ACL 2008: HLT. [PDF]

David Milne und Ian H. Witten (2008). An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. Proceedings of AAAI 2008. [PDF]

Crowdsourcing II: Mechanical Turk und Online-Spiele

Timothy Chklovski und Yolanda Gil (2005): An Analysis of Knowledge Collected from Volunteer Contributors. In Proceedings of AAAI-2005. [PDF] [Projekt-Webseite]

Jeff Orkin und Deb Roy (2007): The Restaurant Game: Learning Social Behavior and Language from Thousands of Players Online. In Journal of Game Development, Vol. 3, Issue 1. [PDF] [ausprobieren]

Edith Law und Luis von Ahn (2009): Input-Agreement: A New Mechanism for Collecting Data Using Human Computation Games. Proceedings of CHI 2009. [PDF] [ausprobieren]

Jon Chamberlain, Massimo Poesio, und Udo Kruschwitz (2008): Phrase Detectives: A Web-based Collaborative Annotation Game. In Proceedings of I-Semantics. [PDF] [ausprobieren]

Rion Snow, Brendan O'Connor, Daniel Jurafsky und Andrew Y. Ng (2008): Cheap and fast - but is it good?: evaluating non-expert annotations for natural language tasks. Proceedings of EMNLP 2008. [PDF] [ausprobieren (die allg. Plattform Mechanical Turk)]

Chris Biemann und Valerie Nygaard (2010): Crowdsourcing WordNet. In Proceedings of GWC 2010. [PDF]

Kritische Stimmen

Adam Kilgarriff (2007). Googleology is Bad Science. Compuational Linguisics, Vol. 33, Issue 1. [PDF]

Geoffrey Leech (2007). New resources, or just better old ones? The Holy Grail of representativeness. Corpus Linguistics and the Web. Rodopi, Amsterdam, S. 133-149. [Gibt's in der SULB oder bei Michaela.]

Karën Fort, Gilles Adda und K. Bretonnel Cohen (2011): Amazon Mechanical Turk: Gold Mine or Coal Mine? In Computational Linguistics, Vol. 37, Issue 2. [PDF]