public class Corpus extends Object implements Iterable<Instance>
Charts
class. See the examples to see the exact file format for
corpora.Blank lines in a corpus file are ignored. Furthermore, lines that start with the comment prefix are ignored as well. The comment prefix is taken from the non-blank line of the corpus, which needs to be
[ccc] IRTG unannotated corpus file, v1.0
or
[ccc] IRTG annotated corpus file, v1.0
respectively. Whatever
you specify as [ccc]
is used as the comment pattern, and all
lines that start with the same pattern are ignored as comments. So if you use
"# IRTG annotated ...", then all lines starting with "#" are comments, and if
you use "// IRTG unannotated ...", then all lines starting with "//" are
comments. You can freely choose your own comment prefix to suit the needs of
your corpus.
Constructor and Description |
---|
Corpus()
This creates a new empty corpus.
|
Modifier and Type | Method and Description |
---|---|
void |
addInstance(Instance instance)
Adds a new instance to the corpus.
|
void |
attachCharts(ChartAttacher charts)
This attaches parse charts to the instances of the corpus.
|
void |
attachCharts(String filename)
Reads charts from a file and attaches them to this corpus.
|
int |
getNumberOfInstances()
Returns the number of instances contained in this corpus.
|
String |
getSource()
Returns a string describing the source of the corpus, if this was passed
to the corpus at some point.
|
boolean |
hasCharts()
Returns true if the instances in this corpus are associated with parse
charts.
|
boolean |
isAnnotated()
Returns true if there are gold annotations for each instance.
|
Iterator<Instance> |
iterator() |
static Corpus |
readCorpus(Reader reader,
InterpretedTreeAutomaton irtg)
Reads a corpus from a string format available via a reader.
|
static Corpus |
readCorpusLenient(Reader reader,
InterpretedTreeAutomaton irtg)
A version of readCorpus that allows interpretations in the corpus file to not be declared in the grammar.
|
void |
setSource(String source)
This sets a value for the source of the corpus.
|
void |
sort(Comparator<Instance> comparator)
Re-orders the instances in this corpus according to the order induced by
the comparator.
|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
forEach, spliterator
public Corpus()
public boolean isAnnotated()
public boolean hasCharts()
public void attachCharts(ChartAttacher charts)
charts
- public void attachCharts(String filename) throws IOException
filename
- IOException
public int getNumberOfInstances()
public void addInstance(Instance instance)
instance
- public String getSource()
public void setSource(String source)
source
- public static Corpus readCorpus(Reader reader, InterpretedTreeAutomaton irtg) throws IOException, CorpusReadingException
reader
- irtg
- IOException
CorpusReadingException
public static Corpus readCorpusLenient(Reader reader, InterpretedTreeAutomaton irtg) throws IOException, CorpusReadingException
reader
- irtg
- IOException
CorpusReadingException
public void sort(Comparator<Instance> comparator)
comparator
- Copyright © 2017. All rights reserved.