Computational Linguistics & Phonetics Computational Linguistics & Phonetics Fachrichtung 4.7 Universität des Saarlandes

Here we give a short overview of the annotation that is found in the gold corpus. For a more detailed introduction, see this 2003 ACL paper.

The TIGER/SALSA corpus

The TIGER/SALSA corpus is based on the TIGER corpus, a manually syntactically annotated corpus of German newspaper text. SALSA adds a manual annotation of frame semantic roles as given by FrameNet.

How does a frame look in the annotation?

Each frame is annotated in the form of a flat frame of depth one. Its root is labelled with the frame, and its edges are anchored to nodes of the syntactic structure. The edge(s) leading to the word(s) that evoke the frame (the FEE or frame-evoking element) are unmarked. The edge(s) leading to frame semantic role bearers (or frame elements) are marked with the semantic role. Frames are independent of each other.

What is annotated as frame-evoking?

As a rule of thumb, we tag all verbs and the "interesting" nouns (mainly the deverbal ones) as frame-evoking, plus multi-word expressions.

Features of the annotation

  • If a frame elements or frame evoking element consists of more than one node of the syntactic structure, the frame tree will have either one edge that is split, or two or more edges with the same label.

  • Unknown frames: If we find a frame-evoking element for which FrameNet does not yet offer an appropriate frame, it is tagged with the (pseudo-)frame "Unknown", and all its core frame elements that the annotator can find in the sentence are marked with the labels "Fe1", "Fe2", "Fe3", ... "Unknown" frames and their frame elements are considered as different frames for each lemma or expression.

  • Underspecification: Sometimes two different frames are equally fitting for a frame-evoking element, or it is impossible from the context to decide between two frames. In this case, annotators may assign both frames and mark them as underspecified.

    In the same way they can assign underspecified frame elements if two different frame semantic roles are equally fitting for some role-bearer, or it is impossible to decide whether some expression should be assigned a semantic role or no semantic role at all. So annotators can assign underspecification between two or more frame elements, or a single underspecified frame element.

  • Splitting compounds: A compound noun can comprise a frame-evoking element plus a frame element. As compounds are mostly written as a single word in German, they can be split for the frame annotation.