Computational Linguistics & Phonetics Computational Linguistics & Phonetics Fachrichtung 4.7 Universität des Saarlandes

Machine Learning for Acquisition of Linguistic Information


Seminar in Computational Linguistics
Lecturer: PD Dr. Valia Kordoni (kordoni@coli.uni-sb.de)
Place: Building C7 2, U.15
Time: Wed 12-14
Start: 28.04.2009
Appropriate for: Diplom, B.Sc., M.Sc.
Office hour: Thursday 15-16 (OR per arrangement via email)

Course Description

Recently, it has become clear that to obtain the kind of syntactic and semantic analyses required for many applications (like machine translation) a judicious combination of deep symbolic analysis with NLP and machine learning techniques is needed for adequate performance. For these applications one important issue is the limitation in coverage of the linguistic resources they employ, especially when dealing with large-scale natural language data or data specific to particular domains. Typical sources of coverage deficiency include: (a) unknown words (b) words for which the dictionary does not contain the relevant syntactic or semantic category, and (c) missing grammatical constructions. The manual extension of such resources is costly and time-consuming. The aim of this course is to imbue participants with an appreciation of the challenges faced by data-driven approaches to linguistic knowledge acquisition, as well as state-of-the-art methods and tools which tackle these issues (Baldwin, to appear; Korhonen, 2003; Lapata and Brew, 2004; McCarthy et al, 2004; van Noord, 2004).


Course functions in the COLI study programs

Elective course for M.Sc., B.Sc. and Diplom; CL and LT

Teaching Material

Handouts will be given to students every week.

  • Meeting of 05.05.2009: Introduction.

  • Meeting of 13.05.2009: Distribution of topics for student presentations and seminar papers.


    Related Papers

  • Tagging and Morphology, 20.05.2009 (Valia):
    • Eric Brill. 1995. Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Computational Linguistics 21(4), pp. 543-565.
    • Marc Light. 1996. Morphological Cues for Lexical Semantics. In Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics (ACL), Santa Cruz, USA, pp. 25-31.
    • John Goldsmith. 2001. Unsupervised Learning of the Morphology of a Natural Language Computational Linguistics 27(2), pp. 153-198.
  • 27.05.2009: No meeting, Visit of EM sites in Bolzano and Trento

  • 03.06.2009: No class meeting, individual student appointments

  • Morphology, 10.06.2009 (Zhenghan):
  • Ambiguity and Disambiguation, 17.06.2009 (Elahi):
    • Hinrich Schütze. 1998. Automatic Word Sense Discrimination. Computational Linguistics, 24(1), 97-123.
    • Mark Stevenson and Yorick Wilks. 2001. The Interaction of Knowledge Sources in Word Sense Disambiguation. Computational Linguistics, 27(3).
  • Vector space methods, 24.06.2009 (Faisal and Israel):
    • Dominic Widdows. 2003. Word Vectors and Search Engines. This is Chapter 5 of Dominic Widdows. 2003. Geometry and Meaning, CSLI publications.
    • Dominic Widdows. 2003. Unsupervised methods for developing taxonomies by combining syntactic and statistical information. In Proceedings of HLT/NAACL 2003, Edmonton, Canada, pages 276-283.
    • Rion Snow, Daniel Jurafsky, Andrew Y. Ng. 2006. Semantic Taxonomy Induction from Heterogenous Evidence. In Proceedings of ACL 2006, Sydney, Australia, pages 801-808.
  • Error Detection and Deep Lexical Acquisition, 01.07.2009 (Henock):
  • More on Deep Lexical Acquisition -- Lexical Acquisition of Multiword Expressions, 08.07.2009 (Valia):
  • Disambiguation and Nominalisations, 15.07.2009 (Milos):
    • Maria Lapata. 2002. The Disambiguation of Nominalisations. Computational Linguistics, 28(3), pp. 357-388.

    • Anna Korhonen and Judita Preiss. 2003. Improving Subcategorization Acquisition using Word Sense Disambiguation. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics. Sapporo, Japan. 48-55.

  • 22.07.2009: Wrap-up


    References

    Timothy Baldwin. "The Deep Lexical Acquisition of English Verb-particle Constructions". To appear in Computer Speech and Language, Special Issue on Multiword Expressions.

    Anna Korhonen and Judita Preiss. 2003. "Improving Subcategorization Acquisition using Word Sense Disambiguation". In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics. Sapporo, Japan. 48-55.

    Diana McCarthy, Rob Koeling, Julie Weeds and John Carroll. 2004. "Finding predominant senses in untagged text". In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics. Barcelona, Spain. pp. 280-287.

    Mirella Lapata and Chris Brew. 2004. "Verb Class Disambiguation Using Informative Priors". Computational Linguistics 30:1, 45-73.

    Gertjan van Noord. "Error Mining for Wide-Coverage Grammar Engineering". In Proceedings of ACL 2004, Barcelona, 2004.

    Dominic Widdows. Geometry and Meaning. CSLI Lecture Notes, 2004.


    Language of instruction

    English

    Course certificate

    Presentation + seminar paper.

    Credit points

    Presentation and seminar paper: M.Sc. und B.Sc.: 7 Credits; Diplom 4 Credits Only presentation: M.Sc. und B.Sc.: 4 Credits; Diplom: 2 Credits
    Valia Kordoni
    Last modified: Wed Jun 3 12:37:11 CEST 2009