Empirical Approaches to Multilingual Lexical Acquisition

Timothy Baldwin



In this seminar series I will present a range of methods for multilingual lexical acquisition. Lexical acquisition is the task of automatically learning the linguistic structure of lexical items, and has applications in areas including deep linguistic processing, computational lexicography, information retrieval and information extract. I will discuss both the details of the machine learning techniques standardly used to perform lexical acquisition, and the task designs/strategies commonly deployed in lexical acquisition tasks. The seminars will take the form of a end-to-end lexical acquisition process, from data discovery through preprocessing and both general-purpose and targeted lexical acquisition. Particular tasks I will cover are language identification, word segmentation, countability learning, subcategorisation learning, alternation detection and lexical type prediction for deep grammars, over languages including English, Dutch, French and Japanese. Machine learning methods I will cover include vector space models, information theoretic measures, Bayesian methods, instance-based learning and structured learning models. The seminar series will consist of three 3-hour seminars over a series of 3 days, with each 3-hour session broken down into 3 55-minute segments (punctuated by well-earned 5 minute breaks!).


Date Location Lecture Topic
Jan 16, 2008 (Wed)
Conference Room, COLI New Building (C7 4) 1 Introduction to multilingual lexical acquisition
2 Introduction to machine learning
3 Data discovery: language identification
Jan 17, 2008 (Thu)
Seminar Room, COLI Main Building (C7 2) 4 Unsupervised approaches to lexical acquisition: word segmentation and MWE extraction
5 Monolingual countability learning
Jan 18, 2008 (Fri)
Conference Room, COLI New Building (C7 4) 6 Crosslingual countability learning
7 Learning verb syntax
8 General-purpose lexical acquisition

Last modified: Thu Jan 17 14:25:45 CET 2008

Valid HTML 4.0 Transitional Valid CSS!