© 2001-2002 Mark-Christoph
Müller, European Media Laboratory, Heidelberg |
Please send any questions, comments and bug
reports to mueller@eml.org
“This product includes software developed by the
Apache Software Foundation (http://www.apache.org/).”
Please see the Apache Software License.
Last modified: September 3, 2002
Note: It is recommended to view this file (Help/MMAX.html) with a normal web browser.
0. Overview
2. Using MMAX
2.1 Loading an annotation file
2.2 Selecting, modifying, creating and deleting markables (New)
2.3 Annotating relations between markables
2.3.1 Membership relation
2.3.2 Pointing relation
2.6 Computing the Kappa statistic (Incl. Batch Kappa)
3.1 The annotation (.anno) file
3.2 The word file
3.3 The text file
3.6 The gesture file
3.9 The scheme file
3.10 Defining markable attributes
5. The pluggable AttributeWindow
New in this version (0.91)
·
(build 3)
Fixed Kappa window bug: Kappa now works for for both the standard and the
SmartAttributeWindow
·
(build 3) Particular
attribute values can noe be ignored when calculating Kappa
·
Included a
version of the new SmartAttributeWindow, which is to replace the former AttributeWindow
in the not-so-distant future. Cf. the sample
samples\text\textsample_smartwindow.anno. Note that the sample is set to
read-only. If you want to modify it, remove the last line in the .anno file.
The SmartAttributeWindow strongly benefitted from ideas by Caroline Varaschin
Gasperin.
·
‘utterance’
tags are now rendered as pairs of opening and closing elements (rather than
just opening). Cf. the XSL style sheets in the SwitchBoard or multimodal sample
to see how to use this for display rendering purposes. Note that the selected
font can influence the way e.g. indentation is rendered in the display. If the
dispaly looks odd, try to use a different font.
·
Added some
additional sample files, in particular a short excerpt from the SwitchBoard
corpus. This nicely demonstrates the MMAX stylesheet capabilities.
·
Upgraded
XML and XSL processors to the latest versions of Apache Xerces and Xalan resp. (Note: Due
to what seems to be a bug in of this processors, this may cause problems with
some .anno files that used to work with older versions. If you encounter any
problems apparently coming from the XML/XSL part of MMAX,please do let me
know!)
· Added new display styles
<sub> and <super> for setting text in sub- or superscript Cf. section 4 for details.
·
Added keyboard
short cuts for saving (ctrl-s) and style-sheet re-application (ctrl-r). The
latter is useful for style sheet debugging
·
Added
user-selectable display font style and font size. Default (as defined in
mmaxsettings.xml) is SansSerif 14pt. If you encounter problems at MMAX startup
which are caused by this font not being available on your system, edit the file
mmaxsettings.xml and enter a font name that is available in the appropriate xml
element. Afterwards, changes to the display font can be saved just like any
other settings (i.e. “Settings”, “Save Settings”).
New in this version (0.9)
·
(build 2)
No more useless debug info when handling discontinuous / embedded markables
New in this version (0.88)
·
(build 4)
Utterance markup processing has been reanimated. Cf. section 3.8 for details.
· (build 4) Really fixed odd behaviour of pop up menu. The behavior did occur if the MMAX window was so small that the pop up menu reached beyond it. The pop up is now flipped (i.e. expands to the left) depending on the distance to the right border, so the behaviour should be banned.
·
(build 3)
Fixed odd behaviour of pop up menu (hopefully)
New in this version (0.868)
New in this version (0.867)
· Support for user-definable markable display attributes (cf. Interface class AttributeWindowInterface in javadoc)
· Support for annotation-related events (cf. Interface class MarkableEventListener in javadoc)
· Fixed the duplicate id bug with markable files containing just none markable.
· Fixed the duplicate attribute bug (pointer, member, type) in the toXMLString method.
·
Minor bug
fixes
New in this version (0.865)
New in this version (0.86)
New in this version (0.8)
Known issues (to be fixed):
MMAX is a tool for the annotation of (possibly
multi-modal) corpora. In this version, the following functions are supported:
Not yet supported are:
MMAX supports two corpus types which differ in
the types of signals that are contained in each. The term signal
is used to denote any type of communicative element that texts and / or
dialogues consist of.
The corpus type text contains only
signals of type word. This corpus type is used to represent written
text. In addition to the words themselves, structural information (in terms of
sentences and paragraphs) as well as (optional) pragmatic information (i.e.
discourse segments) is contained in corpora of this type.
The corpus type dialogue, on the other
hand, does contain not only words, but also (optionally) signals of type gesture
and keyaction. The latter are to represent the operation of buttons or
similar controls, e.g. in a human-machine-interface setting. The keyaction, but
in particular the gesture signals allow for the representation of multi-modal
dialogues. In addition to the signals themselves, relevant structural (turns)
and (optional) pragmatic information (utterances) is represented as
well.
Signals as well as structural and pragmatic
elements (and the annotations as well) are kept apart on the file level.
References to the set of files that comprise a single corpus are stored in a
single annotation (.anno) file. Sample annotation files (mmdialoguesample.anno
and textsample.anno) are supplied with the MMAX executable.
MMAX is written in Java and requires at least
Java 1.3. In addition, it uses the Apache xml parser and stylesheet processor
implementations Xerces (version 1.2.3) resp. Xalan-j (1.2.2) (Copyright ©
1999-2002 The Apache Software Foundation. All rights reserved.), which are
included with this distribution. At the time of this writing, only the
specified versions (Xerces 1.2.3 resp. Xalan-j 1.2.2) are tested and certain to
work. Since Apache will no longer support Xalan 1.2.2 (version 2 taking the
latter’s place), MMAX will soon be upgraded to work with this latest version as
well.
The classpath can be supplied at the command
line as well. Provided the following directory structure
MMAX/MMAX.jar
MMAX/xerces-2_0_2/xercesImpl.jar
MMAX/xerces-2_0_2/xmlParserAPIs.jar
MMAX/xalan-j_2_4_D1/xalan.jar
MMAX/xalan-j_2_4_D1/xml-apis.jar
the tool can be executed in the MMAX directory
with
java -classpath xalan-j_2_4_D1/xalan.jar;xalan-j_2_4_D1/xml-apis.jar;xerces-2_0_2/xercesImpl.jar;MMAX.jar;.
org.eml.MMAX.core.MMAX
Alternatively, you can also execute one of the
startmmax*.bat files.
Important: Note that while
under UNIX/Linux the colon : is used to separate different classpath names,
under Windows the semicolon ; is used for this purpose.
2.1 Loading an
annotation file
Once the tool is started, an annotation file can
be loaded by selecting “Load annotation project” from the “File” menu. In the
file selection dialogue that appears, select the .anno file of the corpus which
you would like to load. After specifying the file, its contents are parsed and
further processed. Note that depending on the size of the corpus, this initial
process might take a few seconds. After that, the corpus is displayed in the
MMAX main window. In addition, the Attribute Window appears. Note that by
design the Attribute Window is accessible only when a markable is selected (cf.
below).
2.2 Selecting,
modifying, creating and deleting markables
The appearance of the loaded corpus (in
particular that of non-verbal signals like gestures and keyactions) depends on
the stylesheet used for its display, cf. below. In general, however, normal text will be displayed in black. If your
markable file did already contain markables, these are initially displayed in
blue (Note: This and several other color assignments can be changed by
selecting the “Colors” menu item in the “Settings” menu.)
To select a markable, left-click it. A
selected markable will be highlighted in green (by default, but cf. above).
Since markables can be embedded into each other, a single signal can be part of
more than one markable. Thus, a single click can be ambiguous as to which
markable is to be selected. Depending on what and where you clicked, a popup
menu may appear which contains all markable that the clicked signal is part of.
Select the desired markable by left-clicking it in the popup menu. The markable
will then also be highlighted.
Whenever a markable is selected, the Attribute
Window is updated to display the currently selected markables’s attributes. Important
note: From version 0.8 on, user-defined default attributes will be applied
automatically to newly created markables (depending on the setting of the
option “Apply user-defined default attributes” in the “Settings” menu). Thus,
it is made sure that all markables contain these attributes. For older
annotations (before version 0.7), however, this is not the case! If you load
older annotations into this version of the tool, missing default attributes
will NOT be applied automatically, because this could result in data
inconsistency. It is recommended to check older annotations in the following
way: Select a markable by left-clicking it. If this markable doesn’t contain a
type attribute, the message “Current markable doesn’t have type attribute!”
will be written to STDOUT. In this case, make a dummy change to the Attribute
Window by changing a value and re-changing it again immediately afterwards. Do
NOT use “Undo changes” for re-changing! Then, click “Apply” to explicitly write
all of the displayed attributes to the markable.
You can modify the attributes of the selected
markable via the radio buttons in the Attribute Window. The behaviour of
the program depends on the setting of the option “Auto-apply all changes in the
attribute window” in the “Settings” menu:
If this option is disabled (default), no
permanent changes will be be applied to the selected markable unless the
“Apply” button is clicked. After changing the radio buttons’ settings, you can
click the “Undo changes” button to discard the changes and reset the markable
to its original attribute values. Note, however, that the “original” values can
be different from the values in the original file, since modifications may
already have been applied to the markable. After you modified the attributes of
a markable, a tooltip text will be displayed to remind you to apply your
changes.
If the option “Auto-apply all changes in the
attribute window” is enabled, all changes will be directly applied, and
immediate “Undo changes” of the last change is not possible!
Anyway, however, the original file will not be
modified until the annotations are saved (via “File”, “Save annotations as...”
or “File”, “Save”).
Note: When you modify
the type attribute of the selected markable, values of attributes that are
applicable to both the old and to the new type will be copied if both the
attribute name and the value name are identical.
You can create a new markable by first
selecting one or more signals and then selecting “Create new markable in
annotation” in the popup menu. Select one or more signals by left-clicking the
first one and dragging the mouse until the selection covers the last signal
that you want the markable to span. The selection does not need to start
exactly with the first letter of the first and end with the last letter of the
last signal: Rather, all signals will be included that are at least partly
covered by the selection. When creating a single-signal markable, just clicking
it will not select it: You have to drag the mouse to select at least one
letter. Note that by default the creation of hybrid markables (i.e. markables
that consist of signals of different types, like a word and a gesture) is
allowed. This setting can be changed under the respective menu item in the
“Settings” menu.
New: You can modify
a markable by selecting it (left click) and then creating a selection of
signals by dragging the mouse. Upon releasing the mouse button, the following
(additional) menu items will apear in the Popup menu, depending on what you
selected: If the selected signals are completely within the currently selected
markable, you can choose to remove them from it. If the selected signals are
not within the currently selected markable, you can choose to add them to it.
Finally, if the selected signals partly overlap with the currently selected
markable, you can choose to merge them with the markable. Signals already
contained in the markable will not be added again. You can create discontinuous
markables by either creating a normal markable for the first part, and
incrementally selecting and adding the additional parts to it. You can also
create one oversized markable and remove those parts from it you do not want in
your markable. Note: In this version, the hybrid markable setting
is not enforced when modifying markables!
A newly created markable will be displayed in
blue (by default, but cf. above). Left-click it to select it like any other
markable. Application of the user-defined default attributes (cf. below 3.9)
depends on the settings of the option “Apply user-defined default attributes”
in the “Settings” menu: If the option is set to “Upon markable creation”
(default), no selection of the newly created markable is necessary in order to
apply the default settings to it. If the option is set to “Upon first markable
selection”, no default attributes will be applied until the markable is
selected for the first time. In most cases, however, you will select the newly
created markable deirectly after creatiobn to set its non-default attributes in
the Attribute Window. Remember to click the “Apply” button to permanently apply
the attributes to the markable (if the option “Auto-apply all changes in the
attribute window” is off).
Note: In this version,
no checking is done to prevent identical markables from being created.
In order to delete a markable, select it
by right-clicking it, then from the popup menu select “Delete this markable
from annotation”. Note that if you have currently selected a markable (i.e. it
is highlighted and its attributes are displayed in the Attribute Window),
deletion by right-clicking will only work on this selected markable. Thus, in
order to delete a markable, you can either select it prior to deletion, or you
can clear any selection (by clicking at a position where there is no selectable
markable), and right-click the desired markable directly.
2.3 Annotating
relations between markables
The main purpose of MMAX is the annotation of
certain types of relations between markables. MMAX supports two types of
relations: A set membership relation and a pointing
relation. These relations are defined formally only, i.e. no “semantic”
interpretation is connected with them: Supplying this interpretation is left to
the annotation scheme that one is using.
Set membership is a transitive relation between
two or more markables. Markables standing in this relation share the same value
in their member attribute. Whenever a markable is selected, this
markable and all other markables within the same set are displayed in red (by
default, but cf. above).
Annotating the membership
relation is done by adding markables to the set of which the currently selected
markable (i.e. the one currently highlighted) is a member. If the currently
selected markable is not member of any set, a new set will be created when a
markable is added, i.e. after adding the markable the selected one and the one
just added will be displayed in red (by default, but cf. above). To add a
markable to the current markable set (if any), right-click it and from the
popup menu select “Add this markable to set”. Note that after adding the
markable, the selection (i.e. the highlighted markable) will not change. Due to
the fact that set membership is a transitive relation, it is irrelevant which
markable you have selected as long as it is part of the correct set that you
want another markable to add to. This is in contrast to the annotation of the pointing
relation, cf. below!
If the markable that is to be added is not part
of a set itself (i.e. if its member attribute is empty), adding is implemented
straightforwardly by setting this attribute to the one shared by all members of
the current set. If, however, the markable to be added is already a member of
some other set, the behaviour of the tool depends on the setting of the option
“When adding a markableset as a member” in the “Settings” menu: By default,
this option is set to “Merge sets”, which will cause the entire set which the
markable to be added is part of to be added as well, resulting in a merge of
both sets. If this option is changed to “Add selected markable only”, only the
specified markable is added to the current set, resulting in a “move” of the
markable from one set to the other.
Removing the membership
relation is done by first selecting any markable in the set which the one to be
removed is a member of (i.e., including the one to be deleted). Then, right-click
it and from the popup menu that appears select “Remove this markable from set”.
Note that if you remove the currently selected markable, the highlighting of
the entire member set will be reset, since the currently selected markable is
no longer member of it. The same will happen if you remove the second but last
markable from a set, because after this a set will not exist any more (i.e.
there are no one-member sets).
Pointing is a relation between two markables, one
of which is the pointing markable, and the other the one pointed at. Thus,
pointing is NOT transitive. A markable points to another one by virtue of its pointer
attribute having the other markables id. Note that it follows from this that
while one markable can point to exactly one other markable, a markable can be pointed
at by arbitrarily many markables. Whenever a markable is selected which
points to some other markable, both will be displayed in yellow (by default,
but cf. above).
Annotating the pointing
relation is done by first selecting the pointing markable (by left-clicking it),
right-clicking the one to point at and selecting from the popup menu “Point to
this markable”. As a result, both markables will be displayed in yellow.
Removing the pointing relation is done by first
selecting the pointing markable (both will be displayed in yellow (by default,
cf. above)), and then right-clicking the markable pointed at and selecting from
the popup menu “Remove pointer to this markable”.
The colour in which a selected markable (and
possibly other ones as well) is displayed depends on which (if any) relations
it is a part of. Note: The colours mentioned in what follows are the
default values, which can be modified temporarily under the “Colors” menu item
in the “Settings” menu.
Note that the colouring of the markables serves
informational purposes only, i.e., there is no distinction between something
like an annotation and a browsing mode. Rather, any relation will always be
displayed, and every annotation action will always be possible.
All annotations (creation / deletion of
markables, annotation of relations between these etc.) concern the markable
file only, i.e.: all other files are read-only and will never be modified!
Therefore, only the markable file needs to be saved.
In order to do so, you can either
or
Upon closing the window or exiting the
application (via “Exit” in the “File” menu), you will be prompted to save your
annotations if these are found to be dirty, i.e. if they have been modified after
the last saving. If you decide to do so, the “Save as” dialogue will be opened.
2.6 Computing the
Kappa statistic
The Kappa index is a statistical measure for
inter-annotator reliability of annotations. You can compute Kappa directly from
MMAX-conformant annotations. In order to do so, you first have to load into the
tool a corpus which uses the same scheme file than the annotations that you are
going to evaluate. Once you have done so, select “Statistics” from the “Tools”
menu. Then a window will appear, in which the annotation files have to be
specified. In order to do so, click the “Add file” button. This will cause a
file chooser dialogue to appear, in which you can select the desired files.
Note that you can select several files from the same directory by using the
shift and control key. The click the “Open” button in the file chooser. The
newly selected files will be added to the list in the Statistics window. Repeat
this process until all desired files are selected. In order to remove a file
from the list, just left-click it and then click the “Remove” button. Use
“Remove All” to empty the entire list. If you are satisfied with the selection,
you need to specify the attribute for which the Kappa index is to be computed.
All available attributes are displayed in the box above the annotation file
list. Select one attribute, and then click the “Kappa” button to start
computation. The result will be displayed as a table in an independent window.
In order to compute Kappa for a a different attribute, just modify the
attribute selection and click the “Kappa” button again. You can save the
results of the computation (table values and markables only!) by specifying the
desired format (space- or comma-separated) and clicking “Save now...”. A file
choosed dialogue will appear, where you can specify a file to write the table
to.
New in version 0.865 is the Batch Kappa
functionality. It allows to calculate the Kappa statistic for several
annotations in one run. Note: For the Batch Kappa algorithm to work, the .anno
files containing the annotations must comply with the following naming scheme:
XXX_somename_YYY.anno, where XXX is a three_place number (incl. leading zero).
XXX is the number of the text, somename is some name, and YYY is a string to
differentiate between diferent annotators (normally their initials).
Example: To run Batch Kappa for two texts (say,
1 and 2) each of which has been annotated by three annotaters (say, a, b and
c), you need the following files:
001_name_a.anno, 001_name_b.anno, 001_name_c.anno,
002_name_a.anno, 002_name_b.anno, 002_name_c.anno.
These files must all be in the same directory,
because the algorithm sorts the files alphabetically in order to know which
ones belong together.
The result of Batch Kappa is a table which
contains one row for each different text (i.e. two in the above example), and
the overall Kappa value.
Note: saving the batch Kappa table is not yet completely implemented.
Note: Additional evaluation measures will be
integrated into MMAX in future versions.
In order to use MMAX for your own corpora, you
need to prepare a number of files, which are described in what follows.
References to the set of files that comprise a corpus are stored in a single
annotation (.anno) file.
3.1 The annotation
(.anno) file
The annotation file for a corpus of type text
has the following structure. Optional files are given in brackets, and the
order of the entries is irrelevant. Note that ALL files need to be in the same
directory. (Support for more sophisticated file name specifications will be
added in a future version.)
"words wordfile.xml"
"text textfile.xml"
"markables markablefile.xml"
"stylesheet stylesheet.xsl"
"scheme
schemefile.scheme"
[“utterances utterancefile.xml”]
[“readonly_levels levels”]
Accordingly, for a corpus of type dialogue
the following files have to be supplied in the .anno file:
"words wordfile.xml"
"dialogue dialoguefile.xml"
["gestures gesturefile.xml"]
"markables markablefile.xml"
"stylesheet stylesheet.xsl"
"scheme
schemefile.scheme"
[“utterances utterancefile.xml”]
["keyactions keyactionfile.xml"]
[“readonly_levels levels”]
Note that the quotation marks must be
supplied! As can be seen, the corpus type is expressed in an .anno file only implicitly
through the constellation of files. At this time, only a rather superficial
plausibility check on this constellation is performed, ruling out only the most
obviously illegal constellations (e.g. gestures in a text corpus).
Note: Although in the
following descriptions of required file structures DTD fragments are given, in
the current version no DTD validation is performed!
The wordfile is an xml file containing the
verbal elements of the corpus. These are either words from a written text, or
the transcriptions of spoken utterances from a dialogue.
The file has to adhere to the following format
(cf. also file words.dtd):
<!ELEMENT words (word*)>
<!ELEMENT word (#PCDATA)>
<!ATTLIST word id ID #REQUIRED>
<!ATTLIST word starttime CDATA #IMPLIED>
<!ATTLIST word endtime CDATA #IMPLIED>
Example:
<?xml version='1.0' encoding='ISO-8859-1'?>
<!DOCTYPE words SYSTEM "words.dtd">
<words>
<word id="word_1">This</word>
<word id="word_2">is</word>
<word id="word_3">a</word>
<word id="word_4">sample</word>
<word id="word_5">text</word>
<word id="word_6">.</word>
</words>
Note that the elements MUST have the format specified, i.e. e.g. word_x etc.
A text consists of an optional headline and at least
one paragraph or at least one sentence. Paragraph elements are optional
'wrappers' around (one or more) sentences, i.e. a text can consist entirely of
sentences or entirely of paragraphs which in turn consist of sentences. It
follows from this that paragraphs and sentences MUST NOT appear on the same
level. (Cf. also file text.dtd.)
<!ELEMENT text ((headline?),((paragraph+) |
(sentence+)))>
<!ELEMENT headline (sentence*)>
<!ELEMENT paragraph (sentence*)>
<!ATTLIST paragraph id ID #REQUIRED>
<!ELEMENT sentence EMPTY>
<!ATTLIST sentence id ID #REQUIRED>
<!ATTLIST sentence span CDATA #REQUIRED>
Example:
<?xml version='1.0' encoding='ISO-8859-1'?>
<!DOCTYPE text SYSTEM "text.dtd">
<text>
<paragraph id="para_1">
<sentence id="sentence_1"
span="word_1..word_6"/>
</paragraph>
</text>
Note that we use our own span attribute
here instead of the href attribute as defined in XPointer, because our element
differs from the latter both in semantics and implementation.
The markable file is an xml file containing
information about markables, their attributes and their relations. Thus, it is
the file which contains the annotations proper. The file has to adhere to the
following format (cf. also file markables.dtd):
<!ELEMENT markables (markable*)>
<!ATTLIST markable id ID #REQUIRED>
<!ATTLIST markable span CDATA #REQUIRED>
<!ATTLIST markable type CDATA #REQUIRED>
<!ATTLIST markable member CDATA #IMPLIED>
<!ATTLIST markable pointer IDREF #IMPLIED>
Note that the list of attributes of the markable
element specified here is not complete yet (cf. scheme file below). The
elements mentioned here are just the ones that need to be present (resp. that
are created automatically by the application) and that cannot be modfied by the
user directly.
In contrast to e.g. the word file, it makes no
sense for the markable file to be constructed manually prior to using MMAX,
because it is this file that the tool actually creates. However, in the current
version, a markable file is required to be supplied in the annotation file. You
can just use an “empty” markable file, like the following:
<?xml version="1.0"?>
<!DOCTYPE markables SYSTEM
"markables.dtd">
<markables>
</markables>
The dialogue file contains the formal structure
of a dialogue. A dialogue can be divided into turns, a turn-break being marked
by a change of speaker. Accordingly, each turn has, among other attributes, a
speaker attribute specifying which speaker uttered the turn in question.
<!ELEMENT turns (turn*)>
<!ELEMENT turn EMPTY>
<!ATTLIST turn id ID #REQUIRED>
<!ATTLIST turn speaker CDATA #REQUIRED>
<!ATTLIST turn span CDATA #REQUIRED>
Example:
<?xml version='1.0' encoding='ISO-8859-1'?>
<!DOCTYPE turns SYSTEM "turns.dtd">
<turns>
<turn id="turn_1" speaker="A"
span="word_1..word_6"/>
</turns>
Note: If your corpus
has time-stamped data, the ordering of the element ids in each turn’s span attribiute is
irrelevant. When creating the internal Discourse representation, MMAX makes sure
the elements (regardless or their type) are ordered according to their starttime attribute value.
If your corpus does not have time-stamped data (which
is legal only for strictly uni-modal corpora), then MMAX expects the
values in the span attributes to be in the correct order. The same is true for
the ordering in utterance elements.
For the display, the xsl stylsheet associated
with your corpus (as specified in the .anno file) handles the correct ordering
of the elements.
The gesture file contains the non-verbal
elements of dialogue corpora, in particular pointing gestures. A gesture is
identified with the object it specifies, which is represented as a textual
description.
<!ELEMENT
gestures (gesture*)>
<!ELEMENT gesture EMPTY>
<!ATTLIST gesture id ID #REQUIRED>
<!ATTLIST gesture starttime CDATA #IMPLIED>
<!ATTLIST gesture endtime CDATA #IMPLIED>
<!ATTLIST gesture specifies CDATA #REQUIRED>
Example:
<?xml version='1.0' encoding='ISO-8859-1'?>
<!DOCTYPE gestures SYSTEM "gestures.dtd">
<gestures>
<gesture id="gesture_1"
specifies="tv_set"/>
</gestures>
The keyaction file contains non-verbal elements
of the type that occurs in e.g. human-machine
interaction dialogue corpora. Keyaction signals are specified with respect to
the key that was operated and to the kind of action that was performed on it.
Note that the list of possible values for the action attribute is not complete
and given for illustrative purposes only: Depending on the kind of control
devices available (e.g. sliders), additional actions will have to (and can
easily) be added.
<!ELEMENT keyactions (keyaction*)>
<!ELEMENT keyaction EMPTY>
<!ATTLIST keyaction id ID #REQUIRED>
<!ATTLIST keyaction starttime CDATA #IMPLIED>
<!ATTLIST keyaction endtime CDATA #IMPLIED>
<!ATTLIST keyaction key CDATA #REQUIRED>
<!ATTLIST keyaction action (press) #REQUIRED>
Example:
<?xml version='1.0' encoding='ISO-8859-1'?>
<!DOCTYPE keyactions SYSTEM
"keyactions.dtd">
<keyactions>
<keyaction id="keyaction_1"
key="key_1" action="press"/>
</keyactions>
The utterance file supplies a means to express
pragmatic structure for both text and dialogue corpora.
<!ELEMENT utterances (utterance*)>
<!ELEMENT utterance EMPTY>
<!ATTLIST utterance id ID #REQUIRED>
<!ATTLIST utterance dialogue_act CDATA #IMPLIED>
<!ATTLIST utterance span CDATA #REQUIRED>
Example:
<?xml version='1.0' encoding='ISO-8859-1'?>
<!DOCTYPE utterances SYSTEM
"utterances.dtd">
<utterances>
<utterance id="utterance_1"
span="word_1..word_6"/>
</utterances>
Note: If your corpus
has time-stamped data, the ordering of the element ids in each utterance’s span attribiute is
irrelevant. When creating the internal Discourse representation, MMAX makes
sure the elements (regardless or their type) are ordered according to their starttime attribute value.
If your corpus does not have time-stamped data (which
is legal only for strictly uni-modal corpora), then MMAX expects the
values in the span attributes to be in the correct order. The same is true for the
ordering in turn elements.
For the display, the xsl stylsheet associated
with your corpus (as specified in the .anno file) handles the correct ordering
of the elements.
Important note: The following is valid
for the ‘old’ AttributeWindow only! The new SmartAttributeWindow will be
described in more detail in a later section.
As the name suggests, this file contains the
specification of the annotation scheme in the form of user-definable attributes
along with their respective values which can be assigned to markables. On the
basis of this file, the Attribute Window is constructed. It contains an
arbitrary number of sections, each of which defines the attributes and possible
values for one particular type of markable. Each section starts with the type
attribute, which describes which type the subsequent attributes apply to. Note
that each section is complete, i.e. attributes pertaining to every type must be
repeated each time.
The attribute file is NOT in xml format, but
simply contains one line per attribute. The first item in each line is the name
of the attribute (in the Attribute Window, this will be the label of the
respective group of radio buttons), the following items are the mutually
exclusive values that this attribute can have (in the Attribute Window, one
radio button will be created for each possible value). Note that the quotation
marks MUST be supplied! The first
possible value in each line is treated as the default value, which will be set
automatically.
Example (taken from an annotation scheme for the
annoation of anaphoric and bridging relations):
"type" "none"
"np_form"
"none"
"NE"
"defNP"
"indefNP"
"PPER"
"PPOS" "PDS"
"grammatical_role" "none"
"SBJ" "OBJ"
"other"
"agreement"
"none"
"3M"
"3F"
"3N" "3P" "1S" "2S"
"1P" "2P"
"type" "anaphoric"
"ante_sub_anaphoric" "none" "direct" "pronominal"
"IS-A" "other"
"np_form"
"none"
"NE"
"defNP"
"indefNP"
"PPER"
"PPOS" "PDS"
"grammatical_role" "none"
"SBJ" "OBJ"
"other"
"agreement"
"none"
"3M"
"3F" "3N" "3P" "1S"
"2S"
"1P" "2P"
"type" "bridging"
"ante_sub_bridging" "none" "part-whole"
"cause-effect" "entity-attribute" "other"
"np_form"
"none"
"NE"
"defNP"
"indefNP"
"PPER"
"PPOS" "PDS"
"grammatical_role" "none"
"SBJ" "OBJ"
"other"
"agreement"
"none"
"3M"
"3F" "3N" "3P" "1S"
"2S"
"1P" "2P"
3.10 Defining markable attributes
When defining markable attributes, a couple of
things have to be kept in mind. Obviously, the attributes should be chosen in
such a way as to be maximally relevant to the particular aspect or theory one
wishes to investigate. On the other hand, however, they must still be simple
enough to be practically usable, i.e. it has to be easily decidable at any
point which value should be given to a particular markable. There appears to be
a tradeoff between these two aspects. In principle, defining adequate
attributes is a non-trivial task, especially when the annotation proper is to
be conducted by individuals other than those who defined the attributes.
As a rule of thumb, unspecified default values
(i.e. something like the “none” value in the above example) should be supplied
whenever possible, in order not to force individuals to make decisions which
they can’t reliably make.
MMAX uses a sophisticated mechanism for the
rendering of the main window display. After the set of separate files that
comprise the entire corpus have been parsed and combined to a single structure,
this structure (a Document Object Model) is passed to an XSL stylesheet
processor for rendering. For this process to work, a XSL stylesheet has to be
provided with the .anno file. In this stylesheet, any information defined in
the corpus (i.e. all elements and their respective attributes, like speakers,
turn numbers, but also time attributes) is accessible and can be used to design
the display! Information about which element underlies a certain display
element is automatically inserted in the display string by means of a
pre-defined style sheet template (cf. below).
In addition, users have at their disposal a number of simple HTML-like
markup tags which can be inserted at this stage. These tags can be used to
format the display. At this time, the following tags are supported (additional
tags will be included in future versions of MMAX.):
<italic> </italic>
<bold> </bold>
<underline>
</underline>
<sub>
</sub>
<super> </super>
Note that due to XSL requirements, these tags
cannot simply be inserted literally into the text. Rather, the following format
has to be used. The following is an XSL template example which renders gesture
signals in a multi-modal corpus:
<xsl:template match="gesture">
<xsl:text><bold>[GESTURE: </xsl:text>
<xsl:value-of select="@specifies"/>
<xsl:text>]</bold></xsl:text>
</xsl:template>
This kind of rule produces output of the
following form: [GESTURE: tv_set]
In addition, the following rule must be present
in every style sheet that is to be used with MMAX:
<!-- Do not modify this rule -->
<xsl:template match="signal">
<xsl:text><</xsl:text><xsl:value-of
select="@id"/><xsl:text>></xsl:text><xsl:apply-templates/><xsl:text></signal>
</xsl:text>
</xsl:template>
<!-- Do not modify this rule !! -->
This rule matches generic <signal> tags
which are produced automatically by MMAX during parsing. The tags that are
produced by the above rule are essential for the display, because they are
needed for mapping (clickable) display strings to underlying signal ids. The above
rule, therefore, should not be altered.
For further information, please cf. the sample
style sheets. Note that you have full XSL style sheet functionality at your
disposal!
5. The pluggable AttributeWindow
Version 0.86 of MMAX introduces the pluggable
AttributeWindow. While previous versions of MMAX already supported
user-definable attributes and possible values, these were definable only within
certain limitations (cf. Section 3.9). The pluggable AttributeWindow is much
more flexible in that it allows to use a custom-made Java class to take the
place of the standard Attribute Window (which of course is still available).
This Java class must be derived from the abstract class
PluggableAttributeWindow.java, which implements the interface
AttributeWindowInterface.java. Both files’ source code is supplied in the
/developer directory. There is also a sample attribute window, named
myAttributeWindow, which defines an empty attribute window. This class can be
found in the MMAX directory. You can use this class to develop a custom-made
Attribute Window.
Communication with the main MMAX program is
established via the set of methods defined in AttributeWindowInterface. Any
class derived from PluggableAttributeWindow should be able to work with MMAX.
The class (which can take any name, not just MyAttributeWindow) has to be
compiled (from within the MMAX directory) using this command line:
javac -verbose -classpath
xalan-j_2_4_D1/xalan.jar;xalan-j_2_4_D1/xml-apis.jar;xerces-2_0_2/xercesImpl.jar;MMAX.jar;.
MyAttributeWindow.java
If you want to use your own Attribute Window
(instead of the standard one), use (from the MMAX directory)
java -classpath
xalan-j_2_4_D1/xalan.jar;xalan-j_2_4_D1/xml-apis.jar;xerces-2_0_2/xercesImpl.jar;MMAX.jar;.
org.eml.MMAX.core.MMAX –attributewindow MyAttributeWindow
Important: Note that while
under UNIX/Linux the colon : is used to separate different classpath names,
under Windows the semicolon ; is used for this purpose.
If you want to modify certain methods only (e.g.
for user-defined Markable colouring), you can also create a user-defined
Attribute window by eytending the class org.eml.MMAX.gui.AttributeWindow, and
overwriting the methods you need to modify. This will give you a working attribute
window without the need to re-implement everything from scratch.