Computational Linguistics & Phonetics Computational Linguistics & Phonetics Fachrichtung 4.7 Universität des Saarlandes

Data-driven Methods for Acquiring Linguistic Information


Seminar in Computational Linguistics
Lecturer: PD Dr. Valia Kordoni (kordoni@coli.uni-sb.de)
Place: Building C7 2, Besprechungsraum U.15
Time: Tue 14-16
Start: 25.04.2006
Appropriate for: Diplom, B.Sc., M.Sc.
Office hour: Thursday 15-16 (OR per arrangement via email)

Course Description

Recently, it has become clear that to obtain the kind of syntactic and semantic analyses required for many applications (like machine translation) a judicious combination of deep symbolic analysis with NLP and machine learning techniques is needed for adequate performance. For these applications one important issue is the limitation in coverage of the linguistic resources they employ, especially when dealing with large-scale natural language data or data specific to particular domains. Typical sources of coverage deficiency include: (a) unknown words (b) words for which the dictionary does not contain the relevant syntactic or semantic category, and (c) missing grammatical constructions. The manual extension of such resources is costly and time-consuming. The aim of this course is to imbue participants with an appreciation of the challenges faced by data-driven approaches to linguistic knowledge acquisition, as well as state-of-the-art methods and tools which tackle these issues (Baldwin, to appear; Korhonen, 2003; Lapata and Brew, 2004; McCarthy et al, 2004; van Noord, 2004).


Course functions in the COLI study programs

Elective course for M.Sc., B.Sc. and Diplom; CL and LT

Teaching Material

Handouts will be given to students every week.

  • Meeting of 25.04.2006: Introduction.

  • Meeting of 02.05.2006: Arrangement of the seminar program.

  • Meeting of 09.05.2006 - Word and MWEs discovery (VK): Timothy Baldwin and Aline Villavicencio. 2002. Extracting the Unextractable: A Case Study on Verb-particles. In Proceedings of the Sixth Conference on Computational Natural Language Learning (CoNLL 2002), Taipei, Taiwan, pp. 98-104.

  • Meeting of 16.05.2006 - Tagging and Morphology (VK):
    • Eric Brill. 1995. Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Computational Linguistics 21(4), pp. 543-565.
    • Marc Light. 1996. Morphological Cues for Lexical Semantics. In Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics (ACL), Santa Cruz, USA, pp. 25-31.
  • Meeting of 23.05.2006 will take place on Monday, 22.05.2006, at 14:00 c.t. (place to be announced). We go on with the papers of the previous meeting (see meeting of 16.05.2006).

  • Meeting of 30.05.2006 - Vector space methods (VK):
    • Dominic Widdows. 2003. Word Vectors and Search Engines. This is Chapter 5 of Dominic Widdows. 2003. Geometry and Meaning, CSLI publications.
    • Dominic Widdows. 2003. Unsupervised methods for developing taxonomies by combining syntactic and statistical information. In Proceedings of HLT/NAACL 2003, Edmonton, Canada, pages 276-283.
  • Meeting of 06.06.2006 - Ambiguity and Disambiguation (MW):
    • Hinrich Schütze. 1998. Automatic Word Sense Discrimination. Computational Linguistics, 24(1), 97-123.
    • Mark Stevenson and Yorick Wilks. 2001. The Interaction of Knowledge Sources in Word Sense Disambiguation. Computational Linguistics, 27(3).
  • Meeting of 13.06.2006 took place on Monday, 19.06.2006, at 13:30.
    - Noun countability (VK):

    • Baldwin, Timothy and Francis Bond. 2003. Learning the Countability of English Nouns from Corpus Data. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, Sapporo, Japan, pp. 463-70.
    • Bond, Francis and Caitlin Vatikiotis-Bateson. 2002. Using an ontology to determine English countability. In Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002), Taipei, Taiwan, pp. 99-105.
    • Baldwin, Timothy and Leonoor van der Beek. 2003. The Ins and Outs of Dutch Noun Countability Classification. In Proceedings of the 2003 Australasian Language Technology Workshop (ALTW2003), Melbourne, Australia.
  • Meeting of 20.06.2006 - Error Detection and Deep Lexical Acquisition (VK):

  • Meeting of 27.06.2006 will take place on Friday, 30.06.2006 at 9:00 c.t. (place to be announced) - More on Deep Lexical Acquisition (VK):

    • Gertjan van Noord. 2004. Error Mining for Wide-Coverage Grammar Engineering. In Proceedings of ACL 2004, Barcelona, 2004. More on suffix arrays and perfect hashing.

    • Deep Lexical Acquisition in Alpino.

  • Meeting of 04.07.2006 will take place on Monday, 10.07.2006 at 14:00 s.t. (place to be announced) - Lexical Acquisition of Multiword Expressions (MWEs) (Aline Villavicencio and VK):

  • Meeting of 11.07.2006 - Disambiguation and Nominalisations (AF):

    • Maria Lapata. 2002. The Disambiguation of Nominalisations. Computational Linguistics, 28(3), pp. 357-388.

    • Anna Korhonen and Judita Preiss. 2003. Improving Subcategorization Acquisition using Word Sense Disambiguation. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics. Sapporo, Japan. 48-55.

  • Meeting of 18.07.2006 - Wrapping up.


    References

    Timothy Baldwin. "The Deep Lexical Acquisition of English Verb-particle Constructions". To appear in Computer Speech and Language, Special Issue on Multiword Expressions.

    Anna Korhonen and Judita Preiss. 2003. "Improving Subcategorization Acquisition using Word Sense Disambiguation". In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics. Sapporo, Japan. 48-55.

    Diana McCarthy, Rob Koeling, Julie Weeds and John Carroll. 2004. "Finding predominant senses in untagged text". In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics. Barcelona, Spain. pp. 280-287.

    Mirella Lapata and Chris Brew. 2004. "Verb Class Disambiguation Using Informative Priors". Computational Linguistics 30:1, 45-73.

    Gertjan van Noord. "Error Mining for Wide-Coverage Grammar Engineering". In Proceedings of ACL 2004, Barcelona, 2004.

    Dominic Widdows. Geometry and Meaning. CSLI Lecture Notes, 2004.


    Language of instruction

    English

    Course certificate

    Presentation + seminar paper.

    Credit points

    Presentation and seminar paper: M.Sc. und B.Sc.: 9 Credits; Diplom 4 Credits Only presentation: M.Sc. und B.Sc.: 4 Credits; Diplom: 2 Credits
    Valia Kordoni
    Last modified: Mon Oct 30 14:05:35 CET 2006