Linguistics for Computer Scientists, WS 2006/2007

NLP tools for syntax: exercise

This is a (very) simple exercise with off-the-shelf NLP tools (pos-taggers and parsers). The tools are installed in the directory: /proj/contrib/res4cl which you can access from your coli user account. In the file 2sents.txt in this directory, you'll find two of every computational linguist's favourite sentences; 2sents-*.in are prepared inputs to the programs to run (open the *.in files to actually see what's in them...)

Try running the following analysis tools:

Adwait's and TNT part-of-speech taggers,
Abney's chunk parser,
Charniak's and Collins parsers.

Take note of (a) the input formats the tools require, (b) the format and content of the output information (btw, do you understand what's in the output?), (c) processing times.
Consider (1) where in an NLP pipe-line each of these tools fits (what processing steps do you need before the given tool can be run?), (2) how the output can be used, (3) in what kinds of NLP applications you would need tools such as Charniak's or Collins' parsers and where you could do with just a pos tagger and chunker.

To run the part-of-speech taggers type:
./call_mxpost < 2sents-mxpost.in
and
./call_tnt wsj 2sents-tnt.in
[ !! to run TNT, you need to be logged on to a solaris machine! -> log on to gnome ('ssh gnome') and go to the same directory /proj/contrib/res4cl ]

To run the chunker type:
./call_abney_cass 2sents-abney.in
and
./call_abney_tuples 2sents-abney.in

To run the parsers type:
./call_charniak 2sents-charniak.in
and
./call_collins 2sents-collins.in