Our implementation of the distributional models for the EmpiriST Shared Task can be downloaded here: schreibgebrauch.tgz (Documentation currently in German only.)

Additionally, we can make the following resources for non-commercial research purposes available:
  • Our POS annotated gold standard corpus for CMC data, consisting of approx. 12 000 Tokens each for forum, chat and twitter data.
  • The tagging models trained on these data and the TIGER corpus for TnT TreeTagger and Stanford Tagger
  • Our annotated gold standard of literal and idiomatic uses of German infinitive-verb compounds, consisting of approx. 5400 annotated instances from the Wahrig corpus.
Please contact us under schreibgebrauch -at-