Resources
Our implementation of the distributional models for the EmpiriST Shared Task can be downloaded here: schreibgebrauch.tgz (Documentation currently in German only.)Additionally, we can make the following resources for non-commercial research purposes available:
- Our POS annotated gold standard corpus for CMC data, consisting of approx. 12 000 Tokens each for forum, chat and twitter data.
- The tagging models trained on these data and the TIGER corpus for TnT TreeTagger and Stanford Tagger
- Our annotated gold standard of literal and idiomatic uses of German infinitive-verb compounds, consisting of approx. 5400 annotated instances from the Wahrig corpus.