MMAX Version 0.91 (beta): Manual (draft)

· Included a version of the new SmartAttributeWindow, which is to replace the former AttributeWindow in the not-so-distant future. Cf. the sample samples\text\textsample_smartwindow.anno. Note that the sample is set to read-only. If you want to modify it, remove the last line in the .anno file. The SmartAttributeWindow strongly benefitted from ideas by Caroline Varaschin Gasperin.

· ‘utterance’ tags are now rendered as pairs of opening and closing elements (rather than just opening). Cf. the XSL style sheets in the SwitchBoard or multimodal sample to see how to use this for display rendering purposes. Note that the selected font can influence the way e.g. indentation is rendered in the display. If the dispaly looks odd, try to use a different font.

· Added some additional sample files, in particular a short excerpt from the SwitchBoard corpus. This nicely demonstrates the MMAX stylesheet capabilities.

· Upgraded XML and XSL processors to the latest versions of Apache Xerces and Xalan resp. (Note: Due to what seems to be a bug in of this processors, this may cause problems with some .anno files that used to work with older versions. If you encounter any problems apparently coming from the XML/XSL part of MMAX,please do let me know!)

· Added new display styles <sub> and <super> for setting text in sub- or superscript Cf. section 4 for details.

· Added keyboard short cuts for saving (ctrl-s) and style-sheet re-application (ctrl-r). The latter is useful for style sheet debugging

· Added user-selectable display font style and font size. Default (as defined in mmaxsettings.xml) is SansSerif 14pt. If you encounter problems at MMAX startup which are caused by this font not being available on your system, edit the file mmaxsettings.xml and enter a font name that is available in the appropriate xml element. Afterwards, changes to the display font can be saved just like any other settings (i.e. “Settings”, “Save Settings”).

New in this version (0.9)

· (build 2) No more useless debug info when handling discontinuous / embedded markables

(build 1) Support for discontinuous markables has been added. Cf. section 2.2 for details.
Display behaviour is more stable now in case of aborted gui actions or clicks in empty spaces.
The display has been sped up (a little) once again.
Incomplete colouring bug for italic, bold or underlined text has been fixed.
Deletion of embedded markables has been fixed.

New in this version (0.88)

· (build 4) Utterance markup processing has been reanimated. Cf. section 3.8 for details.

· (build 4) Really fixed odd behaviour of pop up menu. The behavior did occur if the MMAX window was so small that the pop up menu reached beyond it. The pop up is now flipped (i.e. expands to the left) depending on the distance to the right border, so the behaviour should be banned.

· (build 3) Fixed odd behaviour of pop up menu (hopefully)

Fixed initially misplaced toolbar bug.
Added –quiet mode to suppress screen output during loading
Added –validate mode to force DTD checking of markable files during loading
Began porting MMAX internal stuff to MMAX Discourse object usage (this does not have any external effect yet, except for some slight increase in loading speed (in some cases))

New in this version (0.868)

Moved display of current markable file to the bottom of the window.

New in this version (0.867)

· Support for user-definable markable display attributes (cf. Interface class AttributeWindowInterface in javadoc)

· Support for annotation-related events (cf. Interface class MarkableEventListener in javadoc)

· Fixed the duplicate id bug with markable files containing just none markable.

· Fixed the duplicate attribute bug (pointer, member, type) in the toXMLString method.

· Minor bug fixes

New in this version (0.865)

A Batch Kappa algorithm has been added.
The Kappa window now includes not only the markable spans, but also the text for each markable. These information can now be saved to disk.
When selecting (clicking on) long markables, the MarkableSelector will no longer reach beyond the screen. Very long markables will be trimmed to fit the screen.
The “Save” option is now enabled. By default, it saves to the current markable file, the name of which is now always displayed in the MMAX menu bar. When saving, a backup (.bak) copy of the file is created. If a backup file of the same name already exists, it is automatically deleted.
When saving the markable file under a different name (“Save as”), the file chooser suppresses all files except XML files.

New in this version (0.86)

The display has been accelerated considerably. Refresh (e.g., after selecting a set of coreferring expressions) is now much faster.
An auto-apply mode is supplied (cf. “Settings” menu). When active, changes to the Attribute Window are automatically applied to the markables, without the user having to click the “Apply” button.
The value of the TYPE attribute can be displayed in the MarkableSelector (cf. “Settings” menu). (The MarkableSelector is the list which is displayed after embedded Markables have been clicked. )

New in this version (0.8)

User-definable attributes for display colors and behaviour can now be saved (cf. mmaxsettings.xml)
New settings have been added (cf. “Settings” menu)
It is possible to load a new .anno file without restarting the tool. The file choosing dialogue will start in the directory from which the last .anno file has been loaded in the current session.
The span attribute as it is written to the markable files has been normalized: It has now the form first_element..last_element (in contrast to first_element, second_element,... last_element).
The loading procedure has been optimized for pure text corpora and is now much faster.
The style sheet for the sample text corpus (text.xsl) has been modified to correctly render punctuation.
The Attribute Window is now properly resized when the type attribute is changed. Changes to attributes in this window can be automatically applied.
Output of Kappa calculation can be saved to disk.
minor bug fixes
Important note: From version 0.8 on, user-defined default attributes will be applied automatically to newly created markables (depending on the setting of the option “Apply user-defined default attributes” in the “Settings” menu). Thus, it is made sure that all markables contain these attributes. For older annotations (before version 0.7), however, this is not the case! If you load older annotations into this version of the tool, missing default attributes will NOT be applied automatically, because this could result in data inconsistency. It is recommended to check older annotations in the following way: Select a markable by left-clicking it. If this markable doesn’t contain a type attribute, the message “Current markable doesn’t have type attribute!” will be written to STDOUT. In this case, make a dummy change to the Attribute Window by changing a value and re-changing it again immediately afterwards. Do NOT use “Undo changes” for re-changing! Then, click “Apply” to explicitly write all of the displayed attributes to the markable.

Known issues (to be fixed):

User-defined markup for non-selectable text will not appear.
Check for hybrid markables not forced. The tool will allow to create hybrid markables even if the setting is set to ‘disallow’.

0. Overview

MMAX is a tool for the annotation of (possibly multi-modal) corpora. In this version, the following functions are supported:

Creating and deleting markables (i.e. portions of text or dialogues which have particular attributes assigned to them)
Annotating and deleting relations between markables
Browsing / display of annotations
Saving and re-loading of annotations
Specification of user-definable attributes for markables
Computing the Kappa reliability measure for annotations

Not yet supported are:

Text search
…

MMAX supports two corpus types which differ in the types of signals that are contained in each. The term signal is used to denote any type of communicative element that texts and / or dialogues consist of.

The corpus type text contains only signals of type word. This corpus type is used to represent written text. In addition to the words themselves, structural information (in terms of sentences and paragraphs) as well as (optional) pragmatic information (i.e. discourse segments) is contained in corpora of this type.

The corpus type dialogue, on the other hand, does contain not only words, but also (optionally) signals of type gesture and keyaction. The latter are to represent the operation of buttons or similar controls, e.g. in a human-machine-interface setting. The keyaction, but in particular the gesture signals allow for the representation of multi-modal dialogues. In addition to the signals themselves, relevant structural (turns) and (optional) pragmatic information (utterances) is represented as well.

Signals as well as structural and pragmatic elements (and the annotations as well) are kept apart on the file level. References to the set of files that comprise a single corpus are stored in a single annotation (.anno) file. Sample annotation files (mmdialoguesample.anno and textsample.anno) are supplied with the MMAX executable.

1. Getting started

MMAX is written in Java and requires at least Java 1.3. In addition, it uses the Apache xml parser and stylesheet processor implementations Xerces (version 1.2.3) resp. Xalan-j (1.2.2) (Copyright © 1999-2002 The Apache Software Foundation. All rights reserved.), which are included with this distribution. At the time of this writing, only the specified versions (Xerces 1.2.3 resp. Xalan-j 1.2.2) are tested and certain to work. Since Apache will no longer support Xalan 1.2.2 (version 2 taking the latter’s place), MMAX will soon be upgraded to work with this latest version as well.

The classpath can be supplied at the command line as well. Provided the following directory structure

MMAX/MMAX.jar

MMAX/xerces-2_0_2/xercesImpl.jar

MMAX/xerces-2_0_2/xmlParserAPIs.jar

MMAX/xalan-j_2_4_D1/xalan.jar

MMAX/xalan-j_2_4_D1/xml-apis.jar

the tool can be executed in the MMAX directory with

java -classpath xalan-j_2_4_D1/xalan.jar;xalan-j_2_4_D1/xml-apis.jar;xerces-2_0_2/xercesImpl.jar;MMAX.jar;. org.eml.MMAX.core.MMAX

Alternatively, you can also execute one of the startmmax*.bat files.

Important: Note that while under UNIX/Linux the colon : is used to separate different classpath names, under Windows the semicolon ; is used for this purpose.

2. Using MMAX

2.1 Loading an annotation file

Once the tool is started, an annotation file can be loaded by selecting “Load annotation project” from the “File” menu. In the file selection dialogue that appears, select the .anno file of the corpus which you would like to load. After specifying the file, its contents are parsed and further processed. Note that depending on the size of the corpus, this initial process might take a few seconds. After that, the corpus is displayed in the MMAX main window. In addition, the Attribute Window appears. Note that by design the Attribute Window is accessible only when a markable is selected (cf. below).

2.2 Selecting, modifying, creating and deleting markables

The appearance of the loaded corpus (in particular that of non-verbal signals like gestures and keyactions) depends on the stylesheet used for its display, cf. below. In general, however, normal text will be displayed in black. If your markable file did already contain markables, these are initially displayed in blue (Note: This and several other color assignments can be changed by selecting the “Colors” menu item in the “Settings” menu.)

To select a markable, left-click it. A selected markable will be highlighted in green (by default, but cf. above). Since markables can be embedded into each other, a single signal can be part of more than one markable. Thus, a single click can be ambiguous as to which markable is to be selected. Depending on what and where you clicked, a popup menu may appear which contains all markable that the clicked signal is part of. Select the desired markable by left-clicking it in the popup menu. The markable will then also be highlighted.

Whenever a markable is selected, the Attribute Window is updated to display the currently selected markables’s attributes. Important note: From version 0.8 on, user-defined default attributes will be applied automatically to newly created markables (depending on the setting of the option “Apply user-defined default attributes” in the “Settings” menu). Thus, it is made sure that all markables contain these attributes. For older annotations (before version 0.7), however, this is not the case! If you load older annotations into this version of the tool, missing default attributes will NOT be applied automatically, because this could result in data inconsistency. It is recommended to check older annotations in the following way: Select a markable by left-clicking it. If this markable doesn’t contain a type attribute, the message “Current markable doesn’t have type attribute!” will be written to STDOUT. In this case, make a dummy change to the Attribute Window by changing a value and re-changing it again immediately afterwards. Do NOT use “Undo changes” for re-changing! Then, click “Apply” to explicitly write all of the displayed attributes to the markable.

You can modify the attributes of the selected markable via the radio buttons in the Attribute Window. The behaviour of the program depends on the setting of the option “Auto-apply all changes in the attribute window” in the “Settings” menu:

If this option is disabled (default), no permanent changes will be be applied to the selected markable unless the “Apply” button is clicked. After changing the radio buttons’ settings, you can click the “Undo changes” button to discard the changes and reset the markable to its original attribute values. Note, however, that the “original” values can be different from the values in the original file, since modifications may already have been applied to the markable. After you modified the attributes of a markable, a tooltip text will be displayed to remind you to apply your changes.

If the option “Auto-apply all changes in the attribute window” is enabled, all changes will be directly applied, and immediate “Undo changes” of the last change is not possible!

Anyway, however, the original file will not be modified until the annotations are saved (via “File”, “Save annotations as...” or “File”, “Save”).

Note: When you modify the type attribute of the selected markable, values of attributes that are applicable to both the old and to the new type will be copied if both the attribute name and the value name are identical.

You can create a new markable by first selecting one or more signals and then selecting “Create new markable in annotation” in the popup menu. Select one or more signals by left-clicking the first one and dragging the mouse until the selection covers the last signal that you want the markable to span. The selection does not need to start exactly with the first letter of the first and end with the last letter of the last signal: Rather, all signals will be included that are at least partly covered by the selection. When creating a single-signal markable, just clicking it will not select it: You have to drag the mouse to select at least one letter. Note that by default the creation of hybrid markables (i.e. markables that consist of signals of different types, like a word and a gesture) is allowed. This setting can be changed under the respective menu item in the “Settings” menu.

New: You can modify a markable by selecting it (left click) and then creating a selection of signals by dragging the mouse. Upon releasing the mouse button, the following (additional) menu items will apear in the Popup menu, depending on what you selected: If the selected signals are completely within the currently selected markable, you can choose to remove them from it. If the selected signals are not within the currently selected markable, you can choose to add them to it. Finally, if the selected signals partly overlap with the currently selected markable, you can choose to merge them with the markable. Signals already contained in the markable will not be added again. You can create discontinuous markables by either creating a normal markable for the first part, and incrementally selecting and adding the additional parts to it. You can also create one oversized markable and remove those parts from it you do not want in your markable. Note: In this version, the hybrid markable setting is not enforced when modifying markables!

A newly created markable will be displayed in blue (by default, but cf. above). Left-click it to select it like any other markable. Application of the user-defined default attributes (cf. below 3.9) depends on the settings of the option “Apply user-defined default attributes” in the “Settings” menu: If the option is set to “Upon markable creation” (default), no selection of the newly created markable is necessary in order to apply the default settings to it. If the option is set to “Upon first markable selection”, no default attributes will be applied until the markable is selected for the first time. In most cases, however, you will select the newly created markable deirectly after creatiobn to set its non-default attributes in the Attribute Window. Remember to click the “Apply” button to permanently apply the attributes to the markable (if the option “Auto-apply all changes in the attribute window” is off).

Note: In this version, no checking is done to prevent identical markables from being created.

In order to delete a markable, select it by right-clicking it, then from the popup menu select “Delete this markable from annotation”. Note that if you have currently selected a markable (i.e. it is highlighted and its attributes are displayed in the Attribute Window), deletion by right-clicking will only work on this selected markable. Thus, in order to delete a markable, you can either select it prior to deletion, or you can clear any selection (by clicking at a position where there is no selectable markable), and right-click the desired markable directly.

2.3 Annotating relations between markables

The main purpose of MMAX is the annotation of certain types of relations between markables. MMAX supports two types of relations: A set membership relation and a pointing relation. These relations are defined formally only, i.e. no “semantic” interpretation is connected with them: Supplying this interpretation is left to the annotation scheme that one is using.

2.3.1 Membership relation

Set membership is a transitive relation between two or more markables. Markables standing in this relation share the same value in their member attribute. Whenever a markable is selected, this markable and all other markables within the same set are displayed in red (by default, but cf. above).

Annotating the membership relation is done by adding markables to the set of which the currently selected markable (i.e. the one currently highlighted) is a member. If the currently selected markable is not member of any set, a new set will be created when a markable is added, i.e. after adding the markable the selected one and the one just added will be displayed in red (by default, but cf. above). To add a markable to the current markable set (if any), right-click it and from the popup menu select “Add this markable to set”. Note that after adding the markable, the selection (i.e. the highlighted markable) will not change. Due to the fact that set membership is a transitive relation, it is irrelevant which markable you have selected as long as it is part of the correct set that you want another markable to add to. This is in contrast to the annotation of the pointing relation, cf. below!

If the markable that is to be added is not part of a set itself (i.e. if its member attribute is empty), adding is implemented straightforwardly by setting this attribute to the one shared by all members of the current set. If, however, the markable to be added is already a member of some other set, the behaviour of the tool depends on the setting of the option “When adding a markableset as a member” in the “Settings” menu: By default, this option is set to “Merge sets”, which will cause the entire set which the markable to be added is part of to be added as well, resulting in a merge of both sets. If this option is changed to “Add selected markable only”, only the specified markable is added to the current set, resulting in a “move” of the markable from one set to the other.

Removing the membership relation is done by first selecting any markable in the set which the one to be removed is a member of (i.e., including the one to be deleted). Then, right-click it and from the popup menu that appears select “Remove this markable from set”. Note that if you remove the currently selected markable, the highlighting of the entire member set will be reset, since the currently selected markable is no longer member of it. The same will happen if you remove the second but last markable from a set, because after this a set will not exist any more (i.e. there are no one-member sets).

2.3.2 Pointing relation

Pointing is a relation between two markables, one of which is the pointing markable, and the other the one pointed at. Thus, pointing is NOT transitive. A markable points to another one by virtue of its pointer attribute having the other markables id. Note that it follows from this that while one markable can point to exactly one other markable, a markable can be pointed at by arbitrarily many markables. Whenever a markable is selected which points to some other markable, both will be displayed in yellow (by default, but cf. above).

Annotating the pointing relation is done by first selecting the pointing markable (by left-clicking it), right-clicking the one to point at and selecting from the popup menu “Point to this markable”. As a result, both markables will be displayed in yellow.

Removing the pointing relation is done by first selecting the pointing markable (both will be displayed in yellow (by default, cf. above)), and then right-clicking the markable pointed at and selecting from the popup menu “Remove pointer to this markable”.

2.4 Browsing the annotation

The colour in which a selected markable (and possibly other ones as well) is displayed depends on which (if any) relations it is a part of. Note: The colours mentioned in what follows are the default values, which can be modified temporarily under the “Colors” menu item in the “Settings” menu.

If no relation is specified for the markable, the colour will not change., but it will be merely highlighted in green.
If the markable is part of a member set, the markable itself and all other markables in the same set will be displayed in red.
If the discourse entity is the origin of a pointing relation, the markable itself and the one it points to will be displayed in yellow.
There can also be different and conflicting colourings at the same time: A markable can be part of a member set and at the same time point at some other markable (which may in the same set or in none). In these cases of multiple relations, display behaviour is dependent on the settings of the display priority options in the “Settings” menu.

Note that the colouring of the markables serves informational purposes only, i.e., there is no distinction between something like an annotation and a browsing mode. Rather, any relation will always be displayed, and every annotation action will always be possible.

2.5 Saving the annotations

All annotations (creation / deletion of markables, annotation of relations between these etc.) concern the markable file only, i.e.: all other files are read-only and will never be modified! Therefore, only the markable file needs to be saved.

In order to do so, you can either

press ctrl-S or

select “Save annotations” in the “File” menu. Both actions will save the markable file under its original name, after creating a backup (.bak) version of the old file.

select “Save annotations as” in the “File” menu. This will cause a file chooser dialogue to appear, in which the desired filename and directory can be specified. You can simulate a simple “Save annotations” by just selecting the current file name. In this case, a backup (.bak) file will be created first.

Upon closing the window or exiting the application (via “Exit” in the “File” menu), you will be prompted to save your annotations if these are found to be dirty, i.e. if they have been modified after the last saving. If you decide to do so, the “Save as” dialogue will be opened.

2.6 Computing the Kappa statistic

The Kappa index is a statistical measure for inter-annotator reliability of annotations. You can compute Kappa directly from MMAX-conformant annotations. In order to do so, you first have to load into the tool a corpus which uses the same scheme file than the annotations that you are going to evaluate. Once you have done so, select “Statistics” from the “Tools” menu. Then a window will appear, in which the annotation files have to be specified. In order to do so, click the “Add file” button. This will cause a file chooser dialogue to appear, in which you can select the desired files. Note that you can select several files from the same directory by using the shift and control key. The click the “Open” button in the file chooser. The newly selected files will be added to the list in the Statistics window. Repeat this process until all desired files are selected. In order to remove a file from the list, just left-click it and then click the “Remove” button. Use “Remove All” to empty the entire list. If you are satisfied with the selection, you need to specify the attribute for which the Kappa index is to be computed. All available attributes are displayed in the box above the annotation file list. Select one attribute, and then click the “Kappa” button to start computation. The result will be displayed as a table in an independent window. In order to compute Kappa for a a different attribute, just modify the attribute selection and click the “Kappa” button again. You can save the results of the computation (table values and markables only!) by specifying the desired format (space- or comma-separated) and clicking “Save now...”. A file choosed dialogue will appear, where you can specify a file to write the table to.

New in version 0.865 is the Batch Kappa functionality. It allows to calculate the Kappa statistic for several annotations in one run. Note: For the Batch Kappa algorithm to work, the .anno files containing the annotations must comply with the following naming scheme: XXX_somename_YYY.anno, where XXX is a three_place number (incl. leading zero). XXX is the number of the text, somename is some name, and YYY is a string to differentiate between diferent annotators (normally their initials).

Example: To run Batch Kappa for two texts (say, 1 and 2) each of which has been annotated by three annotaters (say, a, b and c), you need the following files:

001_name_a.anno, 001_name_b.anno, 001_name_c.anno, 002_name_a.anno, 002_name_b.anno, 002_name_c.anno.

These files must all be in the same directory, because the algorithm sorts the files alphabetically in order to know which ones belong together.

The result of Batch Kappa is a table which contains one row for each different text (i.e. two in the above example), and the overall Kappa value.

Note: saving the batch Kappa table is not yet completely implemented.

Note: Additional evaluation measures will be integrated into MMAX in future versions.

3. Annotation schema setup

In order to use MMAX for your own corpora, you need to prepare a number of files, which are described in what follows. References to the set of files that comprise a corpus are stored in a single annotation (.anno) file.

3.1 The annotation (.anno) file

The annotation file for a corpus of type text has the following structure. Optional files are given in brackets, and the order of the entries is irrelevant. Note that ALL files need to be in the same directory. (Support for more sophisticated file name specifications will be added in a future version.)

"words wordfile.xml"

"text textfile.xml"

"markables markablefile.xml"

"stylesheet stylesheet.xsl"

"scheme schemefile.scheme"

[“utterances utterancefile.xml”]

[“readonly_levels levels”]

Accordingly, for a corpus of type dialogue the following files have to be supplied in the .anno file:

"words wordfile.xml"

"dialogue dialoguefile.xml"

["gestures gesturefile.xml"]

"markables markablefile.xml"

"stylesheet stylesheet.xsl"

"scheme schemefile.scheme"

[“utterances utterancefile.xml”]

["keyactions keyactionfile.xml"]

[“readonly_levels levels”]

Note that the quotation marks must be supplied! As can be seen, the corpus type is expressed in an .anno file only implicitly through the constellation of files. At this time, only a rather superficial plausibility check on this constellation is performed, ruling out only the most obviously illegal constellations (e.g. gestures in a text corpus).

3.2 The word file

Note: Although in the following descriptions of required file structures DTD fragments are given, in the current version no DTD validation is performed!

The wordfile is an xml file containing the verbal elements of the corpus. These are either words from a written text, or the transcriptions of spoken utterances from a dialogue.

The file has to adhere to the following format (cf. also file words.dtd):

<!ELEMENT words (word*)>

<!ELEMENT word (#PCDATA)>

<!ATTLIST word id ID #REQUIRED>

<!ATTLIST word starttime CDATA #IMPLIED>

<!ATTLIST word endtime CDATA #IMPLIED>

Example:

<?xml version='1.0' encoding='ISO-8859-1'?>

<!DOCTYPE words SYSTEM "words.dtd">

<words>

<word id="word_4">sample</word>

</words>

Note that the elements MUST have the format specified, i.e. e.g. word_x etc.

3.3 The text file

A text consists of an optional headline and at least one paragraph or at least one sentence. Paragraph elements are optional 'wrappers' around (one or more) sentences, i.e. a text can consist entirely of sentences or entirely of paragraphs which in turn consist of sentences. It follows from this that paragraphs and sentences MUST NOT appear on the same level. (Cf. also file text.dtd.)

<!ELEMENT text ((headline?),((paragraph+) | (sentence+)))>

<!ELEMENT headline (sentence*)>

<!ELEMENT paragraph (sentence*)>

<!ATTLIST paragraph id ID #REQUIRED>

<!ELEMENT sentence EMPTY>

<!ATTLIST sentence id ID #REQUIRED>

<!ATTLIST sentence span CDATA #REQUIRED>

Example:

<?xml version='1.0' encoding='ISO-8859-1'?>

<!DOCTYPE text SYSTEM "text.dtd">

<text>

</paragraph>

</text>

Note that we use our own span attribute here instead of the href attribute as defined in XPointer, because our element differs from the latter both in semantics and implementation.

3.4 The markable file

The markable file is an xml file containing information about markables, their attributes and their relations. Thus, it is the file which contains the annotations proper. The file has to adhere to the following format (cf. also file markables.dtd):

<!ELEMENT markables (markable*)>

<!ATTLIST markable id ID #REQUIRED>

<!ATTLIST markable span CDATA #REQUIRED>

<!ATTLIST markable type CDATA #REQUIRED>

<!ATTLIST markable member CDATA #IMPLIED>

<!ATTLIST markable pointer IDREF #IMPLIED>

Note that the list of attributes of the markable element specified here is not complete yet (cf. scheme file below). The elements mentioned here are just the ones that need to be present (resp. that are created automatically by the application) and that cannot be modfied by the user directly.

In contrast to e.g. the word file, it makes no sense for the markable file to be constructed manually prior to using MMAX, because it is this file that the tool actually creates. However, in the current version, a markable file is required to be supplied in the annotation file. You can just use an “empty” markable file, like the following:

<?xml version="1.0"?>

<!DOCTYPE markables SYSTEM "markables.dtd">

</markables>

3.5 The dialogue file

The dialogue file contains the formal structure of a dialogue. A dialogue can be divided into turns, a turn-break being marked by a change of speaker. Accordingly, each turn has, among other attributes, a speaker attribute specifying which speaker uttered the turn in question.

<!ELEMENT turns (turn*)>

<!ELEMENT turn EMPTY>

<!ATTLIST turn id ID #REQUIRED>

<!ATTLIST turn speaker CDATA #REQUIRED>

<!ATTLIST turn span CDATA #REQUIRED>

Example:

<?xml version='1.0' encoding='ISO-8859-1'?>

<!DOCTYPE turns SYSTEM "turns.dtd">

<turns>

</turns>

Note: If your corpus has time-stamped data, the ordering of the element ids in each turn’s span attribiute is irrelevant. When creating the internal Discourse representation, MMAX makes sure the elements (regardless or their type) are ordered according to their starttime attribute value. If your corpus does not have time-stamped data (which is legal only for strictly uni-modal corpora), then MMAX expects the values in the span attributes to be in the correct order. The same is true for the ordering in utterance elements.

For the display, the xsl stylsheet associated with your corpus (as specified in the .anno file) handles the correct ordering of the elements.

3.6 The gesture file

The gesture file contains the non-verbal elements of dialogue corpora, in particular pointing gestures. A gesture is identified with the object it specifies, which is represented as a textual description.

<!ELEMENT gestures (gesture*)>

<!ELEMENT gesture EMPTY>

<!ATTLIST gesture id ID #REQUIRED>

<!ATTLIST gesture starttime CDATA #IMPLIED>

<!ATTLIST gesture endtime CDATA #IMPLIED>

<!ATTLIST gesture specifies CDATA #REQUIRED>

Example:

<?xml version='1.0' encoding='ISO-8859-1'?>

<!DOCTYPE gestures SYSTEM "gestures.dtd">

</gestures>

3.7 The keyaction file

The keyaction file contains non-verbal elements of the type that occurs in e.g. human-machine interaction dialogue corpora. Keyaction signals are specified with respect to the key that was operated and to the kind of action that was performed on it. Note that the list of possible values for the action attribute is not complete and given for illustrative purposes only: Depending on the kind of control devices available (e.g. sliders), additional actions will have to (and can easily) be added.

<!ELEMENT keyactions (keyaction*)>

<!ELEMENT keyaction EMPTY>

<!ATTLIST keyaction id ID #REQUIRED>

<!ATTLIST keyaction starttime CDATA #IMPLIED>

<!ATTLIST keyaction endtime CDATA #IMPLIED>

<!ATTLIST keyaction key CDATA #REQUIRED>

<!ATTLIST keyaction action (press) #REQUIRED>

Example:

<?xml version='1.0' encoding='ISO-8859-1'?>

<!DOCTYPE keyactions SYSTEM "keyactions.dtd">

</keyactions>

3.8 The utterance file

The utterance file supplies a means to express pragmatic structure for both text and dialogue corpora.

<!ELEMENT utterances (utterance*)>

<!ELEMENT utterance EMPTY>

<!ATTLIST utterance id ID #REQUIRED>

<!ATTLIST utterance dialogue_act CDATA #IMPLIED>

<!ATTLIST utterance span CDATA #REQUIRED>

Example:

<?xml version='1.0' encoding='ISO-8859-1'?>

<!DOCTYPE utterances SYSTEM "utterances.dtd">

</utterances>

Note: If your corpus has time-stamped data, the ordering of the element ids in each utterance’s span attribiute is irrelevant. When creating the internal Discourse representation, MMAX makes sure the elements (regardless or their type) are ordered according to their starttime attribute value. If your corpus does not have time-stamped data (which is legal only for strictly uni-modal corpora), then MMAX expects the values in the span attributes to be in the correct order. The same is true for the ordering in turn elements.

For the display, the xsl stylsheet associated with your corpus (as specified in the .anno file) handles the correct ordering of the elements.

3.9 The scheme file

Important note: The following is valid for the ‘old’ AttributeWindow only! The new SmartAttributeWindow will be described in more detail in a later section.

As the name suggests, this file contains the specification of the annotation scheme in the form of user-definable attributes along with their respective values which can be assigned to markables. On the basis of this file, the Attribute Window is constructed. It contains an arbitrary number of sections, each of which defines the attributes and possible values for one particular type of markable. Each section starts with the type attribute, which describes which type the subsequent attributes apply to. Note that each section is complete, i.e. attributes pertaining to every type must be repeated each time.

The attribute file is NOT in xml format, but simply contains one line per attribute. The first item in each line is the name of the attribute (in the Attribute Window, this will be the label of the respective group of radio buttons), the following items are the mutually exclusive values that this attribute can have (in the Attribute Window, one radio button will be created for each possible value). Note that the quotation marks MUST be supplied! The first possible value in each line is treated as the default value, which will be set automatically.

Example (taken from an annotation scheme for the annoation of anaphoric and bridging relations):

"type" "none"

"np_form" "none" "NE" "defNP" "indefNP" "PPER" "PPOS" "PDS"

"grammatical_role" "none" "SBJ" "OBJ" "other"

"agreement" "none" "3M" "3F" "3N" "3P" "1S" "2S" "1P" "2P"

"type" "anaphoric"

"ante_sub_anaphoric" "none" "direct" "pronominal" "IS-A" "other"

"np_form" "none" "NE" "defNP" "indefNP" "PPER" "PPOS" "PDS"

"grammatical_role" "none" "SBJ" "OBJ" "other"

"agreement" "none" "3M" "3F" "3N" "3P" "1S" "2S" "1P" "2P"

"type" "bridging"

"ante_sub_bridging" "none" "part-whole" "cause-effect" "entity-attribute" "other"

"np_form" "none" "NE" "defNP" "indefNP" "PPER" "PPOS" "PDS"

"grammatical_role" "none" "SBJ" "OBJ" "other"

"agreement" "none" "3M" "3F" "3N" "3P" "1S" "2S" "1P" "2P"

3.10 Defining markable attributes

When defining markable attributes, a couple of things have to be kept in mind. Obviously, the attributes should be chosen in such a way as to be maximally relevant to the particular aspect or theory one wishes to investigate. On the other hand, however, they must still be simple enough to be practically usable, i.e. it has to be easily decidable at any point which value should be given to a particular markable. There appears to be a tradeoff between these two aspects. In principle, defining adequate attributes is a non-trivial task, especially when the annotation proper is to be conducted by individuals other than those who defined the attributes.

As a rule of thumb, unspecified default values (i.e. something like the “none” value in the above example) should be supplied whenever possible, in order not to force individuals to make decisions which they can’t reliably make.

4. Stylesheet capabilities

MMAX uses a sophisticated mechanism for the rendering of the main window display. After the set of separate files that comprise the entire corpus have been parsed and combined to a single structure, this structure (a Document Object Model) is passed to an XSL stylesheet processor for rendering. For this process to work, a XSL stylesheet has to be provided with the .anno file. In this stylesheet, any information defined in the corpus (i.e. all elements and their respective attributes, like speakers, turn numbers, but also time attributes) is accessible and can be used to design the display! Information about which element underlies a certain display element is automatically inserted in the display string by means of a pre-defined style sheet template (cf. below). In addition, users have at their disposal a number of simple HTML-like markup tags which can be inserted at this stage. These tags can be used to format the display. At this time, the following tags are supported (additional tags will be included in future versions of MMAX.):

Note that due to XSL requirements, these tags cannot simply be inserted literally into the text. Rather, the following format has to be used. The following is an XSL template example which renders gesture signals in a multi-modal corpus:

<xsl:template match="gesture">

<xsl:text><bold>[GESTURE: </xsl:text>

<xsl:value-of select="@specifies"/>

<xsl:text>]</bold></xsl:text>

</xsl:template>

This kind of rule produces output of the following form: [GESTURE: tv_set]

In addition, the following rule must be present in every style sheet that is to be used with MMAX:

<xsl:template match="signal">

<xsl:text><</xsl:text><xsl:value-of select="@id"/><xsl:text>></xsl:text><xsl:apply-templates/><xsl:text></signal> </xsl:text>

</xsl:template>

This rule matches generic <signal> tags which are produced automatically by MMAX during parsing. The tags that are produced by the above rule are essential for the display, because they are needed for mapping (clickable) display strings to underlying signal ids. The above rule, therefore, should not be altered.

For further information, please cf. the sample style sheets. Note that you have full XSL style sheet functionality at your disposal!

5. The pluggable AttributeWindow

Version 0.86 of MMAX introduces the pluggable AttributeWindow. While previous versions of MMAX already supported user-definable attributes and possible values, these were definable only within certain limitations (cf. Section 3.9). The pluggable AttributeWindow is much more flexible in that it allows to use a custom-made Java class to take the place of the standard Attribute Window (which of course is still available). This Java class must be derived from the abstract class PluggableAttributeWindow.java, which implements the interface AttributeWindowInterface.java. Both files’ source code is supplied in the /developer directory. There is also a sample attribute window, named myAttributeWindow, which defines an empty attribute window. This class can be found in the MMAX directory. You can use this class to develop a custom-made Attribute Window.

Communication with the main MMAX program is established via the set of methods defined in AttributeWindowInterface. Any class derived from PluggableAttributeWindow should be able to work with MMAX. The class (which can take any name, not just MyAttributeWindow) has to be compiled (from within the MMAX directory) using this command line:

javac -verbose -classpath xalan-j_2_4_D1/xalan.jar;xalan-j_2_4_D1/xml-apis.jar;xerces-2_0_2/xercesImpl.jar;MMAX.jar;. MyAttributeWindow.java

If you want to use your own Attribute Window (instead of the standard one), use (from the MMAX directory)

java -classpath xalan-j_2_4_D1/xalan.jar;xalan-j_2_4_D1/xml-apis.jar;xerces-2_0_2/xercesImpl.jar;MMAX.jar;. org.eml.MMAX.core.MMAX –attributewindow MyAttributeWindow

Important: Note that while under UNIX/Linux the colon : is used to separate different classpath names, under Windows the semicolon ; is used for this purpose.

If you want to modify certain methods only (e.g. for user-defined Markable colouring), you can also create a user-defined Attribute window by eytending the class org.eml.MMAX.gui.AttributeWindow, and overwriting the methods you need to modify. This will give you a working attribute window without the need to re-implement everything from scratch.

MMAX Version 0.91 (beta): Manual (draft)

Table of contents