Link: www.coli.uni-saarland.de/projects/smile/data/data.tar
Data consists of both development(dev) data as well as test data. Each of these directories(dev/test) consists of different script scenarios in separate sub-directory for e.g. preparing coffee is in the sub-directory test/coffee. This data comes from Regneri et al.[1] and OMICS corpus[2]. For details regarding preprocessing steps involved in extracting data please refer to [1],[2],[3] and [4](References below)
Each of the scenario subdirectory consists of 4 files in xml format :
event-ordering.xml
event-paraphrasing.xml
orig-data.xml
segmented.xml
**********************************************
event-ordering.xml : This file has the gold event-pairs for event ordering tasks. Each event is split into predicate and arguments. General format for each event pair is :
Label can be of two types : FOLLOWUP(first event follows the second) or NO_FOLLOWUP(first event does not follow the second event).
**********************************************
event-paraphrasing.xml : This file has the gold event-pairs for event paraphrasing task. Each event is split into predicate and arguments. General format for each event pair is :
Label can be of two types : PARAPHRASE(two events are paraphrases) or NO_PARAPHRASE(two events are not paraphrases).
**********************************************
orig-data.xml : This file consists of the original scripts for the scenario. In order words it is collection of ESDs for that scenario.
General format is :
**********************************************
segmented.xml : This file contains the training data for the scenario. It consists of sequence of events involved but the events are splited into predicate and arguments.
General format is :
**********************************************
REFERENCES:
[1] Michaela Regneri, Alexander Koller, and Manfred Pinkal. 2010. Learning script knowledge with web experiments. In Proceedings of ACL.
[2] Rakesh Gupta and Mykel J. Kochenderfer. 2004. Common sense data acquisition for indoor mobile robots. In Proceedings of AAAI.
[3] Lea Frermann, Ivan Titov, and Manfred Pinkal. A hierarchical bayesian model for unsupervised induction of script knowledge. In EACL, Gothenberg, Sweden, 2014.
[4] Ashutosh Modi and Ivan Titov. 2014. Inducing Neural Models of Script Knowledge. CoNLL 2014.