This package contains a set of shell scripts which simplify tagging
with the TreeTagger. The scripts have been put into the cmd subdirectory.
In order to be able to call these scripts from other directories, you
should replace the relative paths in the scripts with absolute paths
and add the path of the cmd subdirectory to the command search path.
----------------------------------------------------
cmd/tree-tagger-english *
This is a script for tagging English text. It does tokenization
and tagging. The names of the files which are to be tagged are
expected as arguments. If no files have been specified, input from
stdin is expected.
cmd/tree-tagger-french
This is a script for tagging French text. It does tokenization, tagging
and some error correction. It has been provided by Dr. Achim Stein from
the Institut fuer Romanistik, Universitaet Stuttgart. Start the script
with the -h option to get a description.
cmd/tree-tagger-italian
This is a script for tagging Italian text. It does tokenization and
tagging. This script has been provided by Dr. Achim Stein from
the Institut fuer Romanistik, Universitaet Stuttgart. Start the script
with the -h option to get a description.
cmd/tree-tagger-german *
This is a script for tagging German text. It does tokenization, tagging
and some error correction. The names of the files which are to be tagged
are expected as arguments. If no files have been specified, input from
stdin is expected.
cmd/tagger-chunker-german *
This is a script for tagging and chunking German text. It does
tokenization, tagging and annotation with nominal and verbal chunks.
The names of the files which are to be tagged are expected as
arguments. If no files have been specified, input from stdin is
expected.
cmd/tree-tagger-english *
Similar script for tagging and chunking English Texts.
cmd/lookup.perl *
You can use this pretagging script to extend the tagger lexicon
without generating a new parameter file. See the script itself for
more information.
----------------------------------------------------
These files are needed by the shell scripts and are also contained in
this tar-file:
cmd/filter-german-tags error correction script
lib/english-abbreviations list of English abbreviations
lib/german-abbreviations list of German abbreviations
cmd/filter-chunker-output.perl reformatting of the chunker output
----------------------------------------------------
These files are needed by the shell scripts and *not* contained in this
tar-file. They can be downloaded either as Linux programs or as SunOS
programs at the following URL address:
http://www.ims.uni-stuttgart.de/Tools/DecisionTreeTagger.html
The tar files should be unpacked in the same directory as the scripts.
the parameter files should be moved to the lib subdirectory and
uncompressed.
bin/tree-tagger the tagger program proper
bin/separate-punctuation tokenizer program
lib/english.par English parameter file for the tagger
lib/german.par German parameter file for the tagger
lib/french.par French parameter file for the tagger
lib/italian.par Italian parameter file for the tagger
lib/german-chunker.par German parameter file for the chunker
lib/english-chunker.par English parameter file for the chunker
lib/english-ctagger.par English parameter file for the tagger chunker