Lexicon Acquisition for HPSG Grammars
Practical
Time: Thursday 9-11
Room: Building 17.2, Konferenzraum 2.11
Type: Projektseminar
Language: English
Description
In this course we will explore --- both theoretically
and practically --- methods for (semi-)automatic acquisition of
lexical resources to be used in large scale HPSG grammars.
Methods to be employed include shallow pocessing and machine learning,
as well as robust HPSG parsing.
Prerequisites
Knowledge of HPSG, and of processing with HPSGs is desirable.
Examination
For the evaluation, it is expected that you do a small project in the
area of lexicon acquisition. Possible topics will be given and
discussed in the lectures. By the end of December, you should present
briefly your (planned) project. In the time up to the end of the
term, you have to create an implementation, document it, and present
the techniques and results.
Background Reading:
- Manning, Christopher D. and Schütze, Hinrich (1999). Foundations
of Statistical Natural Language Processing. MIT Press, Cambridge, MA.
- Blaheta, Don and Johnson, Mark (2001). Unsupervised learning of
multi-word verbs. In Proceedings of the ACL Workshop on Collocations,
Toulouse, France, pages 54-60.
(http://www.cog.brown.edu/~mj/papers/2001/dpb-colloc01.pdf)
- Church, Kenneth W. and Hanks, Patrick (1990). Word association
norms, mutual information, and lexicography. Computational Linguistics
16(1), 22-29.
http://www.research.att.com/~kwc/published_1989_CL.ps
- Church, Kenneth W.; Gale, William; Hanks, Patrick; Hindle, Donald
(1991). Using statistics in lexical analysis. In Lexical Acquisition:
Using On-line Resources to Build a Lexicon, Lawrence Erlbaum, pages
115-164.
(http://www.research.att.com/~kwc/published_1991_using_stats.ps)
- Dunning, Ted (1993). Accurate methods for the statistics of
surprise and coincidence. Computational Linguistics 19(1), 61-74.
- Evert, Stefan (2004). The Statistics of Word Cooccurrences: Word
Pairs and Collocations. PhD dissertation, University of
Stuttgart.
(http://www.collocations.de/EK/index.html#Evert_04)
- Evert, Stefan and Krenn, Brigitte (2001). Methods for the
qualitative evaluation of lexical association measures. In Proceedings
of the 39th Annual Meeting of the Association for Computational
Linguistics. Toulouse, France, pages 188-195.
(http://www.collocations.de/EK/index.html#Evert_Krenn_01)
- Evert, Stefan and Krenn, Brigitte (2003). Computational approaches
to collocations. Introductory course at the European Summer School on
Logic, Language, and Information (ESSLLI 2003), Vienna.
(http://www.collocations.de/EK/index.html#Evert_Krenn_03)
- Firth, J. R. (1957). A synopsis of linguistic theory 1930-55. In
Studies in Linguistic Analysis (special volume of the Philological
Society), pages 1-32. The Philological Society, Oxford. [ Reprinted
in: Palmer, F. R. (ed.) (1968). Selected Papers of J. R. Firth
1952-59, pages 168-205. Longmans, London. ]
- Johnson, Mark (2001). Trading recall for precision with confidence
sets. Unpublished technical report.
(http://citeseer.nj.nec.com/378119.html)
- Krenn, Brigitte (2000). The Usual Suspects: Data-Oriented Models
for the Identification and Representation of Lexical Collocations. PhD
Thesis, DFKI & Universität des Saarlandes, Saarbrücken.
- Krenn, Brigitte and Evert, Stefan (2001). Can we do better than
frequency? A case study on extracting PP-verb collocations. In
Proceedings of the ACL Workshop on Collocations, Toulouse, France,
pages 39-46.
(http://www.collocations.de/EK/index.html#Krenn_Evert_01)
- Pearce, Darren (2002). A comparative evaluation of collocation
extraction techniques. In Third International Conference on Language
Resources and Evaluation (LREC). Las Palmas, Spain.
(http://www.cogs.susx.ac.uk/users/darrenp/academic/dphil/publications/data/Conferences/lrec2002/paper.ps)
- The International Workshop on "Computational Approaches to
Collocations". July 22./23. 2002 University of Vienna
(http://www.ai.univie.ac.at/colloc02/workshop_prog.html)
- Grefenstette, Gregory: Explorations in Automatic Thesaurus
Discovery. Kluwer Academic Press (1994).
- Grefenstette, Gregory: Corpus-Derived First, Second and
Third-Order Word Affinities. Technical Report MLTT-009 . Rank Xerox
Res. Center, Meylan (France) (1994).
- Adam Kilgarriff and Gregory Grefenstette: Introduction to the
Special Issue on the Web as Corpus. Computational Linguistics
Vol. 29, Issue 3 - Special Issue on the Web as Corpus, pp. 333 - 348
(http://mitpress.mit.edu/journals/pdf/coli_29_3_333_0.pdf)
- Sabine Schulte im Walde: Induction of Semantic Classes for German
Verbs. In: Stefan Langer and Daniel Schnorbusch (eds.) Semantik im
Lexikon, Gunter Narr Verlag, Tübingen. To appear.
http://www.coli.uni-sb.de/~schulte/Publications/Chapter/DGfS-03.doc
- Sabine Schulte im Walde: Identification, Quantitative Description,
and Preliminary Distributional Analysis of German Particle Verbs
[poster: ppt/pdf, paper: pdf/ps.gz/bib] Proceedings of the COLING
Workshop on Enhancing and Using Electronic Dictionaries, Geneva,
Switzerland, August 2004.
http://www.coli.uni-sb.de/~schulte/Publications/Workshop/coling-pv-04-poster.ppt
http://www.coli.uni-sb.de/~schulte/Publications/Workshop/coling-pv-04.pdf
- Sabine Schulte im Walde: GermaNet Synsets as Selectional
Preferences in Semantic Verb Clustering [ps.gz/bib] LDV-Forum -
Journal for Computational Linguistics and Language Technology,
Vol. 19, No. 1/2, Gesellschaft für Linguistische Datenverarbeitung,
Regensburg, May 2004.
- Sabine Schulte im Walde: Experiments on the Automatic Induction of
German Semantic Verb Classes. PhD Thesis, Institut für Maschinelle
Sprachverarbeitung, Universität Stuttgart, June 2003. Published as
AIMS Report 9(2).
(http://www.coli.uni-sb.de/~schulte/Theses/PhD-Thesis/phd-thesis.ps.gz)
- Markus Becker, Anette Frank: A Stochastic Topological Parser of
German. Proceedings of COLING 2002 Pages 71--77, Taipei, Taiwan,
2002.
http://www.dfki.de/~frank/papers/Coling2002_Becker_Frank_245.ps
-
Helmut Schmid. Lopar - Design and Implementation, 2000.
http://www.ims.uni-stuttgart.de/~schmid/lopar.ps
-
M. Becker and A. Frank. 2002. A Stochastic Topological
Parser of German. In Proceedings of COLING 2002,
Teipei, Taiwan.
http://www.dfki.uni-sb.de/~frank/papers/Coling2002_Becker_Frank_245.ps
- Selected papers from http://www.informatics.susx.ac.uk/research/nlp/rasp/
- Evaluation
Resources for English Subcategorization Acquisition Systems, which also contains a link to "Subcategorization Acquisition" (Korhonen, 2002, PhD Cambridge)
- Peter Eisenberg, Grundriß der deutschen Grammatik (1. Das Wort und 2. Der Satz).
Tools
- Chunkie (Shprot)
- Corpora at CoLi can be found in /proj/corpora.
That directory also includes WordNet (1.6, 1,7 and 2.0), Germanet (unknown version), and the CELEX lexica.
- LoPar
- Stuttgart-Tübingen Tag Set (STTS)
- TnT
- TopP
Input format: one word per line, sentences are separated with an empty line (as for TnT).
Course material
- Overview over the course; overview of possible topics, of
resources, and techniques.
- Reading: Manning & Schütze, Chapter 2
- Reading: Manning & Schütze, Chapter 8
- Reading: Manning & Schütze, Chapter 8.4 and 8.5
- Reading: