Language and Computation
RECOGNITION, SELECTION AND TRANSLATION ALGORITHMS WITH FINITE AUTOMATA FOR LEXICAL TAGGING AND SEGMENTATION OF PHONETIC STRINGS
Advanced course

ERIC LAPORTE

Centre d'études et de recherches en informatique linguistique, Université de Marne-la-Vallée

Second week
eric.laporte@univ-reims.fr
Course description

The course aims at:

  • providing knowledge on efficient algorithmic tools on finite automata;
  • providing technical knowldege about two specific NLP problems where these tools are useful: lexical tagging and segmentation of phonetic strings;
  • filling the gap between algorithms and NLP problems, i.e. providing efficiency in selecting the right tools to solve a problem, and in adapting and combining them.

We specify the mathematical effect of algorithms in terms of automata theory (properties taken into account: deterministic, minimal, reduced, complete, trim, acyclic...; operations: intersection, complementation, factor decomposition...). We describe algorithmic techniques for implementing automata of a very large size (e.g. transition packing for lexical tagging).

As to lexical tagging, we consider the zero-silence approach with dictionary lookup and subtraction of combinations that can be ruled out on the basis of local grammars (cf. Silberztein, A new approach to tagging, in Applied Computer Translation 1(4)). In this approach, we give prominence to the maintainability and readability of the disambiguation grammars. The other NLP problem is the segmentation of phonetic strings into words or syllables on the basis of a phonetic lexicon or of a specification of syllabification.

Prerequisites
  • Basic knowledge on algorithms and on at least one programming language
  • Practice of mathematical proofs and ability to understand their logical structure
  • Practice of phonetic transcription and of lexical tags with grammtical content

No particular knowledge is required in automata theory, probability theory, graph theory, object-oriented programming, acoustics or syntax. The structure of the course assumes that attendants dispose of some extra time every day to assimilate notions through personal practice.

Literature
No specific recommendation

 

 


HOME
PROGRAMME
CONTACT
REGISTRATION