================================================================================ ANNOTATE v3.6 -- Quick Reference ================================================================================ Author: Oliver Plaehn (plaehn@coli.uni-sb.de) Last updated: June 23, 2000 Purpose ======= Annotate is a tool for syntactic annotation of natural language corpora. It provides an easy-to-use graphical user interface, a comprehensive set of commands for manipulation of syntactic structures, an interface to an external tagger and parser, a simple search function, an undo function, postscript/printer output, and many more features. This Quick Reference describes Annotate's features in a concise manner. For a more elaborate manual, see 'annotate-manual.ps'. Note, though, that that manual describes an older version of Annotate and is written in German. 1) Programme Start ================== Type 'annotate' and press . 'annotate -help' lists allowable command line arguments. Frame "General" =============== Combobox "Corpus": Pressing or clicking on the down arrow button opens the corpus list. Choose the corpus you wish to view or alter by selecting it with the cursor keys and pressing or by double-clicking on it. You might also directly enter the name of the corpus you wish to view. If you do so, you may take advantage of Annotate's completion feature by pressing when the insertion cursor is to the right of the rightmost character. Annotate takes what you've already typed, and fills in the rest of the corpus name, given that the prefix you have entered uniquely determines the corpus name. If more than one corpus name starts with the given prefix, Annotate fills in characters as far as all corpus names matching the given prefix are identical. It then beeps to indicate that the shown corpus name is not yet unique. Consider the following example. Assume you typed in "Br" and your corpus list contains two entries that start with "Br", say "Brown corpus (1st part)" and "Brown corpus (2nd part)". If you move the insertion cursor to the right of the "r" and press , Annotate changes the contents of the "Corpus" field to "Brown corpus (" and beeps to indicate that this is not yet a unique corpus name. If you then type "1" and press again, Annotate fills in the rest of the corpus name for you: "Brown corpus (1st part)". Completion works in all combobox fields in Annotate. After selection of a corpus, Annotate switches to that corpus and automatically displays the sentence within that corpus that has most recently been modified. Field "Editor": Displays the name of the current annotator. The name is automatically determined based on the Unix login name of the user who started Annotate. The mapping from Unix login names to real names is maintained in the table "Editor" (database "annotate"). Button "Save": Stores all changes made to the currently displayed sentence in the corpus database. Note that sentences are automatically saved when you switch to another sentence or corpus, or when you exit the programme. Button "Reload": When the programme starts up, it loads programme-specific information from the "annotate" database, e.g. the lists of known corpora and editors. Likewise, when a corpus is selected, all label sets (for part-of-speech tags, edge and parent labels, etc.) are loaded from the corpus database into memory. If you change these data while the programme is running, Annotate will not notice your changes, unless you exit and re-start Annotate. Or unless you press the "Reload" button, which forces Annotate to reload the information from the "annotate" database, the label sets of the current corpus, and the current sentence. Button "Exit": Exits the programme. If the current sentence has been modified, all changes are automatically saved in the corpus database. Button "Options": Opens a popup window, in which several options can be altered (see "Options" below). Frame "Sentence" ================ Field "No.": Displays the number of the current sentence and, in parentheses, the range of sentences in the current corpus. Field "Last edited": Displays the date and time at which the current sentence has lastly been modified and the name of the annotator who performed the modification. Entry field "Comment": Allows you to edit a comment on the current sentence. (Given that you're allowed to alter the current corpus, that is.) For a list of key and mouse bindings, see the manual page for the Tcl/Tk command "entry". Button "...": Pressing this button opens up a popup window, in which you can edit longish comments more comfortably. See the manual page "text.n" for details on key and mouse bindings. Field "Origin": Displays information on the origin of the current sentence, e.g. from which file it has been imported into the corpus database. Frame "Move" ============ Button "<<" (1st row): Clicking on this button with the left mouse button switches to the previous sentence. Clicking on it with the middle mouse button moves 10 sentences "to the left", with the right mouse button 100 sentences. Button ">>" (1st row): Same as "<<", but in the other direction. Bookmark button: Annotate maintains a stack of bookmarks for the last (between "<<" five sentences that have been modified and subsequently and ">>") saved. This button displays the number of the sentence that is on top of the stack, i.e. the one that has been most recently modified. Clicking on the bookmark button switches to this sentence and pops its number from the stack. If you move the mouse cursor onto the button, the current bookmark stack is shown in the message line. Entry field "Go to": Enter an absolute ("100", "42") or relative ("+115", "-2000") sentence number and press to switch to the respective sentence. Button "Mask": Clicking on this button opens up a popup window, in which you can edit a search mask (see "Search Function" below for details). After the "Search" popup has been closed, the search is performed and the first sentence matching the search mask is displayed. Button "<<" (2nd row): Moves to the next sentence matching the current search mask to the left of the current sentence. Button ">>" (2nd row): Moves to the next sentence matching the current search mask to the right of the current sentence. Field "Matches": Displays the number of sentences matching the current search mask. If you move the mouse cursor onto this field, (part of) the list of matching sentences is displayed in the message line. The tags "<<" and ">>" within this list indicate to which sentences the respective buttons switch. Field "Search for": Displays an abbreviated version of the contents of the current search mask. If the contents of the field is cut off at the right, move the mouse cursor onto this field to view the complete information in the message line. Or click and hold the middle mouse button while the mouse cursor is over the field, and then move the mouse to shift the text to the left or right. (This "trick" works with all entry fields in Annotate.) Frame "Dependency": =================== Entry field "Selection": As an alternative to selecting nodes with the mouse (see "Manipulating the Syntactic Structure" below), you can also type in their numbers into this field. Numbers need be separated by commas. Node ranges are also possible and are separated by hyphens. Example inputs: "1,2,501"; "1"; "2,5,10-14,501,503-505"; "500-999". Note that numbers for which no node exists in the current syntactic structure are simply ignored. Thus, the last example ("500-999") selects all non-terminal nodes. Combobox "Command": Enter a command or select one from the list and press to execute it. See "Combobox 'Corpus'" above for details on how to select a command name. See "Command List" below for a description of all available commands. Button "Execute": Clicking this button executes the command displayed in the combobox "Command". Input frame: ============ The input frame is located to the right of the "Dependency" frame. It is activated whenever you choose to alter a label, a word comment or the text of a word. What kind of information you are currently editing is displayed in the frame title. For the discussion below let's assume that you selected the non-terminal nodes 501, 503 and 507, and executed the command "Parentlabel". Entry field "Node no.": Displays the number of the node whose label you're currently editing. To the right of the entry field, the current label of the node is shown. The current node is marked in the syntactic structure with a thick black outline. You can type in a different number and press to switch to that node. Note, though, that you can only switch to nodes that are selected. Entry field: Choose or enter the label you wish to assign to the current node. Button "<<": Switches to the previous node in the list of all selected nodes. The selected nodes are not sorted numerically, but rather from left to right according to their horizontal position in the syntactic structure. That is, "<<" switches to the next selected node to the left of the current one. Button ">>": Switches to the next selected node to the right of the current one. Button "End": Ends altering the parentlabels of the selected nodes. The input frame is deactivated. The following two buttons are only visible, if the input frame has been activated automatically in order to allow the annotator to confirm or reject a suggestion made by the external tagger/parser. See "Interactive Annotation" below for details. Button "Cancel": Rejects the suggestion made by the parser. Button "Parentlabel": If you're asked to reject or confirm an edgelabel that has been suggested by the parser, you may invoke this button to change the label of the parent of the current node. Note that, if you change the parentlabel of a node, the external parser is automatically called to newly determine the labels of the edges that lead to this node. Command List ============ "Group" ------- Introduces a new non-terminal node with all selected nodes as children of the new node. Precondition: none of the selected nodes is already bound to a node. If this command is called when no nodes are selected, Annotate automatically groups all unbound nodes. Well, all unbound nodes except those words who have a tag for which the field "ToBeBound" in the table "Tag" in the corpus database contains an "N". This is useful if your annotation scheme prescribes that certain words, e.g. punctuation marks, are not to be included in the syntactic structure. After the new node is drawn, you're asked to choose labels for it and for all new edges. If you use the external parser (see "Options" below), this is done automatically. See "Interactive Annotation" below for details. If the current sentence is not yet part-of-speech tagged when you invoke the "Group" command, the external tagger is called first to determine part-of-speech tags for the words of the sentence (command "Tagging"). You can also invoke the "Group" command by pressing - or , or by clicking with the right mouse button on an empty area of the canvas. "Ungroup" --------- Select a single non-terminal node and invoke this command to erase the selected node and all edges (primary and secondary) leading to and from it. If the selected node is bound to another node, it is automatically removed from this node (command "Remove") before it is erased. Shortcut: or middle mouse button. "Add to" -------- Adds all nodes that you entered in the entry field "Selection" but the last one to the last node specified in "Selection". Example: If "Selection" contains "4-7,501,500" and you invoke the command "Add to", the words 4 to 7 and the non-terminal 501 are added to the non-terminal 500. If some of the nodes you wish to add to a node are already bound to other nodes, they are removed from these nodes first. Annotate asks for new edge labels for all nodes that you have added to the new parent node. If the external parser/tagger is active, it tries to determine the new edge labels automatically. Keyboard shortcut: -. You can also invoke this command with the mouse as follows. Select all nodes you want to add to a different node and click with the right mouse button on the parent node to which you want to add the selected nodes. "Remove" -------- Removes the selected nodes from their parent node. Preconditions: All selected nodes have a common parent node; at least one node is bound to this parent node after the "Remove" operation. Shortcut: or right mouse button when the mouse cursor is not upon a non-terminal node. "Secondary edge" ---------------- Draws a secondary edge from the first node specified in the "Selection" field to the second one. You are prompted for a label for the newly drawn secondary edge afterwards. Shortcut: "Delete secondary edge" ----------------------- Deletes the secondary edge that leads from the first node in "Selection" to the second node. Shortcut: "Tag" ----- Allows you to alter the part-of-speech tags of the selected words (see "Input frame" above). Shortcut: double-click with the left mouse button on the part-of-speech tag you want to change. "Morphological information" --------------------------- Allows you to alter the morphological information of the selected nodes. Shortcut: double-click with the left mouse button on the morphological information you want to change. (This works only for words.) "Comment" --------- For editing the comments of the selected words. Shortcut: double-click. "Parentlabel" ------------- For editing the labels of the selected non-terminal nodes. Shortcut: double-click. "Edgelabel" ----------- Ditto for edge labels. "Secondary label" ----------------- Ditto for labels of secondary edges. "Edit words" ------------ Allows you to alter the text of the selected words, e.g. to correct spelling mistakes. You can activate or deactivate this command separately for each corpus by setting the "EditWords" column of the "Corpus" table (database "annotate") to "Y" or "N", respectively. "Alternative Group" ------------------- Same as "Group" except that the setting of the option "Ask for edge and parent labels" is reversed. That is to say, if this option is on, "Group" asks for labels and "Alternative Group" does not, and vice versa. Shortcut: "Alternative Add to" -------------------- Same as "Add to" except that the setting of the option "Ask for edge and parent labels" is reversed. Shortcut: "Parse" ------- Calls the external parser to suggest a new phrase based on the existing partial syntactic structure and the part-of-speech tags of the words. (See "Interactive Annotation" below for details.) Shortcut: click with right mouse button on an empty part of the canvas while no nodes are selected. "Tagging" --------- Calls the external tagger to suggest part-of-speech tags for the words of the current sentence. Precondition: All words are untagged, i.e. have the tag "--". (See "Interactive Annotation" below for details.) Shortcut: click with right mouse button on an empty part of the canvas while no nodes are selected and all words are untagged. "Erase" ------- Erases the whole syntactic structure, including part-of-speech tags, morphological information, secondary edges and word comments. "New sentence" -------------- Invoking this command allows you to enter the text of a new sentence which is appended to the end of the corpus. Note that punctuation marks are to be separated from the preceding words by a blank. You can activate or deactivate this command separately for each corpus by setting the "NewSentence" column of the "Corpus" table (database "annotate") to "Y" or "N", respectively. "Word left" ----------- Deletes the first (leftmost) word from the current sentence and appends it to the end of the previous sentence. Precondition: the first word is not bound to any node, neither in the (primary) syntactic structure nor via a secondary edge. This feature is useful if you want to correct tokenization errors during the annotation. "Word right" ------------ Deletes the last (rightmost) word from the current sentence and appends it to the beginning of the next sentence. Precondition: the last word is not bound to any node, neither in the (primary) syntactic structure nor via a secondary edge. "Split word" ------------ Allows you to split the selected word into two words. The new word that consists of the right part of the split word inherits all information from the old word. "Merge words" ------------- Merges the selected words. Precondition: Two adjacent words are selected. The text of the right word is appended to the end of the left word. The leaf node corresponding to the right word is then deleted. "Insert trace" -------------- Inserts a trace ("--") between the two selected words or, if the first word is selected, at the beginning of the sentence or, if the last word is selected, at the end of the sentence. Shortcut: "Delete trace" -------------- Deletes the selected trace. Shortcut: "Print" ------- Opens a popup window in which you can choose between printing the syntactic structure or saving it as postscript output in a file. You can specify the Unix command that is used for printing as well as the file name in which the syntactic structure is saved. The "Options" frame allows you to specify whether the postscript output should be scaled to paper size (A4) and whether the indices on words and nodes should be included in the output. In the "Layout" frame, you can choose between landscape or portrait page orientation. "Help" ------ This calls an arbitrary external programme to display information on, e.g., the annotation scheme. You most probably want to customize this command to suit your needs by changing the column "Script" of the table "DepCommand" (database "annotate"). This column contains a Tcl/Tk script that is executed whenever the respective command is invoked within Annotate. Example: If this column contains "catch{ exec xdvi annot.dvi & }" for the "Help" command, the Unix command "xdvi annot.dvi &" is called. (The Tcl command "catch" executes its argument and silently ignores any error that might occur.) Shortcut: "Undo" ------ Reverts the changes applied to the syntactic structure with the last executed command. "Undo" works for all but the following commands: "Tagging", "New sentence", "Word left", "Word right", "Print", "Help", "Next sentence", "Previous sentence". Note that "Undo" works only for the command that has been invoked most recently. That is to say, if you invoke "Undo" two times in a row, the state of the syntactic structure prior to the first "Undo" is reestablished. Shortcut: Click with the middle mouse button on an empty area of the canvas. "Next sentence" --------------- Switches to the next sentence. "Previous sentence" ------------------- Switches to the previous sentence. Manipulating the Syntactic Structure ==================================== The syntactic structure of the current sentence is displayed in the frame in the middle of the application window, the so-called canvas. It consists of leaf nodes for the words of the sentence, the primary syntactic structure, and labeled secondary edges. Each leaf consists of the word, a part-of-speech tag, morphological information, and an optional word comment. The primary syntactic structure is a tree with possibly crossing branches and labels on non-terminal nodes and edges. Words are numbered from 0 to 499, non-terminal nodes from 500 to 999. This implies that each sentence may contain a maximum number of 500 words and 500 non-terminal nodes. The primary syntactic structure is built bottom-up by repeatedly selecting words and non-terminals and grouping them in new non-terminal nodes. You can also "ungroup" a node, reattach nodes to a different parent node, remove nodes from its parents, etc. See the list of commands above. Moving the mouse pointer on an element of the syntactic structure (labels and indices) displays an explanatory text in the message line. You can change the viewable part of the canvas by using the scrollbars or by pressing -. Or click and hold the middle mouse button and then move the mouse to shift the syntactic structure. The following table summarises the use of the mouse to manipulate the syntactic structure: +========+========+==========================+=================================+ ! Button ! Click ! Conditions ! Event ! +========+========+==========================+=================================+ ! Left ! Double ! Mouse pointer on word ! Command "Edit words" ! +--------+--------+--------------------------+---------------------------------+ ! Left ! Double ! Pointer on POS tag ! Command "Tag" ! +--------+--------+--------------------------+---------------------------------+ ! Left ! Double ! Pointer on word comment ! Command "Comment" ! +--------+--------+--------------------------+---------------------------------+ ! Left ! Double ! Pointer on morphological ! Command "Morphological ! ! ! ! information ! information" ! +--------+--------+--------------------------+---------------------------------+ ! Left ! Double ! Pointer on node label ! Command "Parentlabel" ! +--------+--------+--------------------------+---------------------------------+ ! Left ! Double ! Pointer on edge label ! Command "Edgelabel" ! +--------+--------+--------------------------+---------------------------------+ ! Left ! Double ! Pointer on label of ! Command "Secondary label" ! ! ! ! secondary edge ! ! +--------+--------+--------------------------+---------------------------------+ ! Left ! Single ! Press and hold button ! Selects nodes; old selection is ! ! ! ! ! canceled ! +--------+--------+--------------------------+---------------------------------+ ! Left ! Single ! Press and hold button ! Extends existing selection ! ! ! ! and ! ! +--------+--------+--------------------------+---------------------------------+ ! Middle ! Single ! Press and hold button ! Shifts the viewable part of the ! ! ! ! and move mouse ! canvas ! +--------+--------+--------------------------+---------------------------------+ ! Middle ! Single ! At least one node is ! Command "Ungroup" ! ! ! ! selected ! ! +--------+--------+--------------------------+---------------------------------+ ! Middle ! Single ! No node is selected ! Command "Undo" ! +--------+--------+--------------------------+---------------------------------+ ! Right ! Single ! Nodes selected; pointer ! Command "Add to" ! ! ! ! is on non-terminal node ! ! ! ! ! that is not selected ! ! +--------+--------+--------------------------+---------------------------------+ ! Right ! Single ! Nodes selected; pointer ! Command "Remove" ! ! ! ! not on non-terminal; ! ! ! ! ! all selected nodes are ! ! ! ! ! bound ! ! +--------+--------+--------------------------+---------------------------------+ ! Right ! Single ! Nodes selected; pointer ! Command "Group" ! ! ! ! not on non-terminal; ! ! ! ! ! at least one selected ! ! ! ! ! node is not bound ! ! +--------+--------+--------------------------+---------------------------------+ ! Right ! Single ! No nodes selected; ! Command "Tagging" ! ! ! ! words are not part-of- ! ! ! ! ! speech tagged ! ! +--------+--------+--------------------------+---------------------------------+ ! Right ! Single ! No nodes selected; ! Command "Parse" ! ! ! ! at least one word is ! ! ! ! ! part-of-speech tagged ! ! +========+========+==========================+=================================+ Interactive Annotation ====================== Annotate interacts with an external parser/tagger that suggests part-of-speech tags and edge and node labels as well as entire phrases. There are four levels of automation from which you can choose (in the "Options" popup): Nothing: The parser/tagger is switched off; all phrases and labels have to be assigned manually. Edge labels: You manually introduce a new node with the "Group" command and assign a label for the new node. The external parser/tagger suggests labels for the edges. Edge and parent labels: The parser suggests labels for new nodes and for edge labels. Structure: The parser incrementally suggests new phrases together with parent and edge labels ("Parse" command). You can specify separately whether the external tagger should be called to determine the part-of-speech tags. Semi-automatic tagging ---------------------- The external tagger is automatically called when you click the right mouse button (or invoke the "Tagging" command), provided that the words are not yet part-of-speech tagged. The tagger distinguishes reliable and unreliable suggestions. Reliable tags are assigned automatically. Unreliable suggestions are highlighted and you are asked for confirmation or correction. To this end, Annotate activates the input frame for each tag. The suggestions of the parser together with their probability are shown at the top of the part-of-speech tag list. Semi-automatic assignment of edge labels ---------------------------------------- The external parser is called to determine edge labels whenever a new non-terminal node is introduced, nodes are (re)attached to a (different) node, or the label of a non-terminal node is changed -- in other words, whenever the relation between the two nodes connected by an edge changes. As with part-of-speech tags, reliable labels are assigned automatically and unreliable labels are highlighted and need to be confirmed or corrected. If the parser automatically assigned the label of the parent, the "Parentlabel" button is shown in the input frame while you correct the suggested edge labels. This button allows you to immediately change the label of the parent node, without having to finish assigning the edge labels first. Each time you change the label of a non-terminal node, the parser newly determines the labels of the edges that connect that node with its children. Semi-automatic assignment of parent labels ------------------------------------------ The parser suggests labels for all new non-terminal nodes, either introduced manually or suggested by the parser. The procedure is similar to that used for edge label and part-of-speech tag assignment. Semi-automatic assignment of phrases ------------------------------------ If you click the right mouse button if no nodes are selected or invoke the "Parse" command, the parser suggests a new phrase based on the existing part-of-speech tags and the partial syntactic structure. It also assigns edge and parent labels for the new phrase. Unreliable labels need to be confirmed or corrected, as described above. You can accept the suggested phrase or reject it with the "Undo" command (middle mouse button). Alternatively, you can correct the suggested phrase manually, e.g. by attaching further nodes to it. Search Function =============== If you press the "Mask" button in the frame "Move", a new window for entering a search mask pops up. You can specify one or more criteria that need be fulfilled by the sentences you want to search for. After clicking on the "Ok" button, the search is performed and the first matching sentence is displayed. Entry field "Word": Words to search for, separated by blanks. All sentences match that contain all of the words entered here. Case is ignored. Combobox "Tag": Choose a part-of-speech tag that has to appear in the matching sentences. Combobox "Parentlabel": Ditto for labels on non-terminal nodes. Combobox "Edgelabel": Ditto for edge labels. Combobox "Sec. edgelabel": Ditto for labels on secondary edges. Combobox "Morph. info": Ditto for morphological information. Entry field "Sentence comment": All sentences match that have a sentence comment that contains the string entered here. Case is ignored. "*" finds all sentences that have an arbitrary non-empty comment. Entry field "Word comment": Ditto for word comments. If you enter values in several of these fields, all sentences match that fulfill all of the criteria. The checkboxes "Tagging status" and "Dependency status" allow you to search only in sentences that have a specific status. The "tagging status" of a sentence is "none" if no word is part-of-speech tagged, "completely" if all words are tagged, and "partly" otherwise. The "dependency status" of a syntactic structure is "none" if it contains no non-terminal node, "completely" if there are no unbound nodes and words (except those that need not be bound; see the description of the "Group" command above), and "partly" otherwise. If you activate the checkbox "Read matching sentences from file", no search is performed. Instead, a list of sentence numbers is read from the file specified in the entry field "Filename". These sentences are then used as the result of the "search". That is to say, you can browse through them with the "<<" and ">>" buttons in the second row in the frame "Move". The sentence numbers are read from the first line of the specified file. The numbers have to be separated by whitespaces (blanks and/or tabs) and need to be in ascending order. Options ======= You can specify a number of options in the "Options" popup window. This window pops up when you click on the button "Options" in the frame "General". It has three frames, which are explained below. Frame "Parser": Specifies the level of automation during annotation. For details, see "Interactive Annotation" above. Defaults to "Structure" and "Tagging" being active. Frame "Group/Add to": By default, Annotate prompts you for parent and edge labels of newly constructed nodes and edges (or, if the parser is active, tries to determine them automatically). If you deactivate the checkbox "Ask for edge and parent labels" in this frame, Annotate doesn't ask for labels. This is useful if you want to only annotate syntactic structure without any labels. Frame "Mobile": Allows you to specify whether secondary edges and morphological information should be displayed in the canvas. Default for both checkboxes: active.