public abstract class TreeAutomaton<State> extends Object implements Serializable, Intersectable<State>
getRulesBottomUp(int, int[])
or
getRulesTopDown(int, int)
. The method getFinalStates()
returns the set of states that must be assigned to the root of an accepted
tree; that is, the final states of a bottom-up automaton or the initial
states of a top-down automaton.
States and labels in the automaton are internally represented as int numbers,
starting at 1. You can translate between numeric state IDs and the actual
state objects (of the class State which is given as a type parameter to the
class) using getStateForId(int)
and
getIdForState(java.lang.Object)
. You can translate between numeric
label IDs and the actual labels (of class String) using the automaton's
signature, obtained by calling getSignature()
.
You can implement subclasses of TreeAutomaton to represent specific types of
tree automata. These implementations may be lazy, which means that
they do not initially represent all rules of the automaton explicitly. Such
implementations override getRulesBottomUp(int, int[])
and getRulesTopDown(int, int)
and compute the correct rules on the
fly. Certain methods enforce the explicit computation of all rules in the
automaton; this is expensive, and is noted in the method descriptions.
Modifier and Type | Class and Description |
---|---|
static interface |
TreeAutomaton.BottomUpStateVisitor |
Modifier and Type | Field and Description |
---|---|
static DebuggingWriter |
D |
static boolean |
DEBUG_STORE |
Constructor and Description |
---|
TreeAutomaton(Signature signature) |
Modifier and Type | Method and Description |
---|---|
boolean |
accepts(Tree<String> tree)
Determines whether the automaton accepts the given tree.
|
boolean |
acceptsRaw(Tree<Integer> tree)
Determines whether the automaton accepts the given tree, using symbol
IDs.
|
void |
analyze()
Prints some statistics about this automaton.
|
ConcreteTreeAutomaton<State> |
asConcreteTreeAutomaton()
Computes a concrete representation of this automaton.
|
ConcreteTreeAutomaton |
asConcreteTreeAutomatonBottomUp()
Computes a concrete representation of this automaton.
|
ConcreteTreeAutomaton<String> |
asConcreteTreeAutomatonWithStringStates()
Computes a concrete representation of this automaton, with string states.
|
long |
countTrees()
Returns the number of trees in the language of this automaton.
|
Rule |
createRule(int parent,
int label,
int[] children,
double weight)
Creates a new rule.
|
Rule |
createRule(int parent,
int label,
List<Integer> children,
double weight)
Creates a new rule.
|
Rule |
createRule(State parent,
String label,
List<State> children)
Creates a rule for this automaton.
|
Rule |
createRule(State parent,
String label,
List<State> children,
double weight)
Creates a weighted rule for this automaton.
|
Rule |
createRule(State parent,
String label,
State[] children)
Creates a rule for this automaton.
|
Rule |
createRule(State parent,
String label,
State[] children,
double weight)
Creates a weighted rule for this automaton.
|
TreeAutomaton<Set<State>> |
determinize() |
TreeAutomaton<Set<State>> |
determinize(List<IntSet> newStateToOldStateSet) |
void |
dumpToFile(String filename) |
boolean |
equals(Object o)
Compares two automata for equality.
|
<E> Int2ObjectMap<E> |
evaluateInSemiring(Semiring<E> semiring,
RuleEvaluator<E> evaluator)
Evaluates all states of the automaton bottom-up in a semiring.
|
<E> Int2ObjectMap<E> |
evaluateInSemiring(Semiring<E> semiring,
RuleEvaluator<E> evaluator,
List<Integer> statesInOrder) |
<E> Map<Integer,E> |
evaluateInSemiringTopDown(Semiring<E> semiring,
RuleEvaluatorTopDown<E> evaluator)
Evaluates all states of the automaton top-down in a semiring.
|
void |
foreachRuleBottomUpForSets(IntSet labelIds,
List<IntSet> childStateSets,
SignatureMapper signatureMapper,
Consumer<Rule> fn) |
void |
foreachRuleTopDown(int parentState,
Consumer<Rule> fn)
Iterates over all rules with the given parent.
|
void |
foreachStateInBottomUpOrder(TreeAutomaton.BottomUpStateVisitor visitor) |
IntSet |
getAllLabels()
Returns a set of label IDs that contains all the terminal symbols that
occur in rules of this automaton.
|
Iterable<Rule> |
getAllRulesTopDown() |
IntSet |
getAllStates()
Returns the IDs of all states in this automaton.
|
IntSet |
getFinalStates()
Returns the IDs of the final states of the automaton.
|
int |
getIdForState(State state)
Returns the numeric ID for the given state.
|
IntIterable |
getLabelsTopDown(int parentState)
Returns a set that contains all terminal symbols f such that the
automaton has top-down transition rules parentState -> f(...).
|
long |
getNumberOfRules()
Returns the number of rules in this automaton.
|
int |
getNumberOfSeenStates() |
Tree<Rule> |
getRandomRuleTree() |
Tree<Rule> |
getRandomRuleTreeFromInside() |
Tree<String> |
getRandomTree()
Generates a random tree from the language of this tree automaton.
|
Tree<String> |
getRandomTreeFromInside() |
IntSet |
getReachableStates()
Returns the set of all reachable states.
|
abstract Iterable<Rule> |
getRulesBottomUp(int labelId,
int[] childStates)
Finds automaton rules bottom-up for a given list of child states and a
given parent label.
|
Iterable<Rule> |
getRulesBottomUp(int labelId,
List<Integer> childStates)
Finds automaton rules bottom-up.
|
Iterable<Rule> |
getRulesBottomUp(IntSet labelIds,
List<Integer> childStates)
Finds automaton rules bottom-up for a given list of child states and a
given set of parent labels.
|
Iterable<Rule> |
getRuleSet()
Returns the set of all rules of this automaton.
|
Iterable<Rule> |
getRulesForRhsState(int rhsState) |
Iterable<Rule> |
getRulesTopDown(int parentState)
Finds all automaton rules top-down with a given state on the left-hand
side.
|
abstract Iterable<Rule> |
getRulesTopDown(int labelId,
int parentState)
Finds automaton rules top-down for a given parent state and label.
|
Tree<Rule> |
getRuleTree(Tree<Integer> derivationTree)
Returns a tree of automaton rules that generates the given tree.
|
Signature |
getSignature()
Returns the signature of the automaton.
|
State |
getStateForId(int stateId)
Returns the state for the given numeric state ID.
|
Interner |
getStateInterner()
Returns the state interner for this tree automaton.
|
List<Integer> |
getStatesInBottomUpOrder()
Returns a topological ordering of the states, such that later nodes
always occur above earlier nodes in any run of the automaton on a tree.
|
Set<State> |
getStoredConstantsForID(int labelID)
If this automaton has stored states for all the constants in its
signature, more precisely if hasStoredConstants() returns true, then this
returns the states corresponding to the constant described by the label
ID in the signature.
|
double |
getWeight(Tree<String> tree)
Computes the weight of the tree, given the (weighted) tree automaton.
|
double |
getWeightRaw(Tree<Integer> tree)
Computes the weight of the tree, given the (weighted) tree automaton.
|
boolean |
hasRuleWithPrefix(int label,
List<Integer> prefixOfChildren)
Returns true whenever the automaton has a bottom-up rule whose first n
child states are the n child states that are passed as the
prefixOfChildren argument.
|
boolean |
hasStoredConstants()
Returns true if the
|
TreeAutomaton |
homomorphism(Homomorphism hom)
Computes the image of this automaton under a homomorphism.
|
Int2ObjectMap<Double> |
inside()
Returns a map representing the inside probability of each reachable
state.
|
<OtherState> |
intersect(Intersectable<OtherState> other)
Intersects this automaton with another one.
|
<OtherState> |
intersect(TreeAutomaton<OtherState> other,
SignatureMapper mapper) |
<OtherState> |
intersectBottomUp(TreeAutomaton<OtherState> other)
Intersects this automaton with another one, using a bottom-up algorithm.
|
<OtherState> |
intersectCondensed(CondensedTreeAutomaton<OtherState> other) |
<OtherState> |
intersectCondensed(CondensedTreeAutomaton<OtherState> other,
PruningPolicy pp) |
<OtherState> |
intersectCondensed(CondensedTreeAutomaton<OtherState> other,
SignatureMapper signatureMapper) |
<OtherState> |
intersectCondensedBottomUp(CondensedTreeAutomaton<OtherState> other) |
<OtherState> |
intersectCondensedBottomUp(CondensedTreeAutomaton<OtherState> other,
SignatureMapper signatureMapper) |
<OtherState> |
intersectEarley(TreeAutomaton<OtherState> other)
Intersects this automaton with another one, using an Earley-style
intersection algorithm.
|
<OtherState> |
intersectViterbi(CondensedTreeAutomaton<OtherState> other) |
<OtherState> |
intersectViterbi(CondensedTreeAutomaton<OtherState> other,
SignatureMapper signatureMapper) |
CondensedTreeAutomaton |
inverseCondensedHomomorphism(Homomorphism hom) |
TreeAutomaton |
inverseHomomorphism(Homomorphism hom)
Computes the pre-image of this automaton under a homomorphism.
|
abstract boolean |
isBottomUpDeterministic()
Determines whether the automaton is deterministic if read as a bottom-up
automaton.
|
boolean |
isCyclic()
Returns the set of all rules, indexed by parent label and children
states.
|
boolean |
isEmpty()
Determines whether the language accepted by this automaton is empty.
|
boolean |
isStoring()
returns whether storeRule actually stores the rule or does nothing.
|
Set<Tree<String>> |
language()
Computes the tree language accepted by this automaton.
|
Iterable<Tree<String>> |
languageIterable()
Returns an Iterable over the language of this automaton.
|
Iterator<Tree<String>> |
languageIterator()
Returns an iterator over the language of this automaton.
|
Iterator<Tree<Integer>> |
languageIteratorRaw()
Returns an iterator over the language of this automaton, encoded using
symbol IDs.
|
Set<Tree<Integer>> |
languageRaw()
Computes the tree language accepted by this automaton.
|
void |
makeAllRulesExplicit()
Computes all rules in this automaton and stores them in the cache.
|
SiblingFinder |
newSiblingFinder(int labelID)
This returns an object that stores and finds possible partners for a
given state given a rule label.
|
void |
normalizeRuleWeights()
Modifies the weights of the rules in this automaton such that the weights
of all rules with the same parent state sum to one.
|
Map<Integer,Double> |
outside(Map<Integer,Double> inside)
Returns a map representing the outside probability of each reachable
state.
|
boolean |
processAllRulesBottomUp(Consumer<Rule> processingFunction)
Iterates through all rules top-down, applying processingFunction to each
rule found.
|
void |
processAllRulesTopDown(Consumer<Rule> processingFunction)
Iterates through all rules top-down, applying processingFunction to each
rule found.
|
TreeAutomaton<State> |
reduceTopDown()
Reduces the automaton, top-down.
|
Iterable<State> |
run(Tree<String> tree)
Runs the automaton bottom-up on the given tree and returns the set of
possible states for the root.
|
<TreeLabels> |
run(Tree<TreeLabels> node,
ToIntFunction<TreeLabels> labelIdSource,
ToIntFunction<Tree<TreeLabels>> subst)
Runs the automaton bottom-up on the given tree, using functions that
extract symbol IDs from node labels and assign states to specific
subtrees.
|
IntIterable |
runRaw(Tree<Integer> tree)
Runs the automaton bottom-up on the given tree, using symbol IDs, and
returns the set of possible states for the root.
|
void |
setRulePrintingFilter(com.google.common.base.Predicate<Rule> filter)
Sets a filter for printing the automaton's rules.
|
void |
setSkipFail()
Enables the skip-fail filter for printing the automaton's rules.
|
void |
setStoring(boolean doStore)
sets whether storeRule actually stores the rule or does nothing.
|
Iterator<WeightedTree> |
sortedLanguageIterator()
Returns an iterator over the weighted language of this automaton.
|
boolean |
supportsBottomUpQueries() |
boolean |
supportsTopDownQueries() |
String |
toString()
Computes a string representation of this automaton.
|
String |
toStringBottomUp()
Computes a string representation of this automaton, bottom-up.
|
boolean |
useSiblingFinder()
Algorithms such as invhom check this function to decide whether their
sibling finder variant should be used or not.
|
Tree<String> |
viterbi()
Computes the highest-weighted tree in the language of this (weighted)
automaton, using the Viterbi algorithm.
|
WeightedTree |
viterbiRaw()
Computes the highest-weighted tree in the language of this (weighted)
automaton, using the Viterbi algorithm.
|
void |
write(Writer writer) |
public static boolean DEBUG_STORE
public static DebuggingWriter D
public TreeAutomaton(Signature signature)
public int getIdForState(State state)
state
- public State getStateForId(int stateId)
stateId
- public Signature getSignature()
public boolean isStoring()
public void setStoring(boolean doStore)
doStore
- public abstract Iterable<Rule> getRulesBottomUp(int labelId, int[] childStates)
labelId
- childStates
- public Iterable<Rule> getRulesBottomUp(int labelId, List<Integer> childStates)
getRulesBottomUp(int, int[])
, but with a List rather than an array of child states.labelId
- childStates
- public Iterable<Rule> getRulesBottomUp(IntSet labelIds, List<Integer> childStates)
labelIds
- childStates
- public void foreachRuleBottomUpForSets(IntSet labelIds, List<IntSet> childStateSets, SignatureMapper signatureMapper, Consumer<Rule> fn)
public abstract Iterable<Rule> getRulesTopDown(int labelId, int parentState)
Note that not every method of TreeAutomaton is safely available in your
implementation of getRulesTopDown. Most notably, you can't use the
default implementation of getAllStates()
, because that method
makes all rules of the automaton explicit and calls getRulesTopDown(int, int)
in the process, leading to an infinite recursion.
labelId
- parentState
- public Iterable<Rule> getRulesTopDown(int parentState)
parentState
- public void foreachRuleTopDown(int parentState, Consumer<Rule> fn)
getRulesTopDown(int)
. Due to the way the top-down index data structures are implemented,
this method is significantly faster if the tree automaton is explicit.parentState
- fn
- public abstract boolean isBottomUpDeterministic()
public IntIterable getLabelsTopDown(int parentState)
The default implementation in TreeAutomaton distinguishes two cases. If the automaton is fully explicit, the method returns those labels for which the parent has (explicit) rules. Otherwise, the method returns all label IDs in the signature.
Subclasses (especially lazy automata) may replace this with more specific implementations.
parentState
- public IntSet getAllLabels()
public boolean hasRuleWithPrefix(int label, List<Integer> prefixOfChildren)
label
- prefixOfChildren
- public IntSet getFinalStates()
public IntSet getAllStates()
public Iterable<Rule> getRuleSet()
Note that this method calls makeAllRulesExplicit()
to enumerate
all rules, and then returns the set of all rules in the rule store. You
can therefore break its functionality if you override
makeAllRulesExplicit carelessly.
public boolean hasStoredConstants()
public Set<State> getStoredConstantsForID(int labelID)
labelID
- public boolean isCyclic()
public long countTrees()
public Int2ObjectMap<Double> inside()
public Map<Integer,Double> outside(Map<Integer,Double> inside)
inside
- a map representing the inside probability of each state.public Tree<String> viterbi()
public WeightedTree viterbiRaw()
viterbi()
, this method returns a tree whose nodes
are labeled with label IDs, as opposed to the labels (Strings)
themselves. It also returns the weight of the top-ranked tree.public Set<Tree<Integer>> languageRaw()
public Set<Tree<String>> language()
public Tree<String> getRandomTree()
public void setRulePrintingFilter(com.google.common.base.Predicate<Rule> filter)
toString()
method to decide which rules are included
in the string representation for the automaton. You can use this to
suppress the presentation of rules you don't care about.filter
- public void setSkipFail()
InverseHomAutomaton
) when toString()
is computed for
this automaton.public boolean equals(Object o)
The implementation of this method is currently very slow, and should only be used for small automata (e.g. in unit tests).
public String toString()
public String toStringBottomUp()
toString()
when debugging lazy automata.public void makeAllRulesExplicit()
public ConcreteTreeAutomaton<State> asConcreteTreeAutomaton()
ConcreteTreeAutomaton
that is equals to the given automaton.
The method enumerates the rules of the automaton top-down, so it will
only work if getRulesTopDown(int, int)
is implemented.public ConcreteTreeAutomaton<String> asConcreteTreeAutomatonWithStringStates()
ConcreteTreeAutomaton
that looks exactly
like the given automaton when printed with toString()
(except
perhaps for reordering of rules), but all state objects (of class State)
are replaced by their string representations. Thus the states of the
returned automaton are always strings. This is mostly useful for testing,
when automata with string states are read from string literals.public ConcreteTreeAutomaton asConcreteTreeAutomatonBottomUp()
ConcreteTreeAutomaton
that is equals to the given automaton.
The method enumerates the rules of the automaton bottom-up.public <OtherState> TreeAutomaton<Pair<State,OtherState>> intersect(Intersectable<OtherState> other)
OtherState
- the state type of the other automaton.other
- the other automaton.public <OtherState> TreeAutomaton<Pair<State,OtherState>> intersect(TreeAutomaton<OtherState> other, SignatureMapper mapper)
public <OtherState> TreeAutomaton<Pair<State,OtherState>> intersectBottomUp(TreeAutomaton<OtherState> other)
OtherState
- other
- public <OtherState> TreeAutomaton<Pair<State,OtherState>> intersectCondensed(CondensedTreeAutomaton<OtherState> other, SignatureMapper signatureMapper)
public <OtherState> TreeAutomaton<Pair<State,OtherState>> intersectCondensed(CondensedTreeAutomaton<OtherState> other)
public <OtherState> TreeAutomaton<Pair<State,OtherState>> intersectCondensed(CondensedTreeAutomaton<OtherState> other, PruningPolicy pp)
public <OtherState> TreeAutomaton<Pair<State,OtherState>> intersectCondensedBottomUp(CondensedTreeAutomaton<OtherState> other, SignatureMapper signatureMapper)
public <OtherState> TreeAutomaton<Pair<State,OtherState>> intersectCondensedBottomUp(CondensedTreeAutomaton<OtherState> other)
public <OtherState> TreeAutomaton<Pair<State,OtherState>> intersectViterbi(CondensedTreeAutomaton<OtherState> other, SignatureMapper signatureMapper)
public <OtherState> TreeAutomaton<Pair<State,OtherState>> intersectViterbi(CondensedTreeAutomaton<OtherState> other)
public <OtherState> TreeAutomaton<Pair<State,OtherState>> intersectEarley(TreeAutomaton<OtherState> other)
OtherState
- other
- public TreeAutomaton inverseHomomorphism(Homomorphism hom)
hom
- the homomorphism.public CondensedTreeAutomaton inverseCondensedHomomorphism(Homomorphism hom)
public TreeAutomaton homomorphism(Homomorphism hom)
hom
- public boolean acceptsRaw(Tree<Integer> tree)
tree
- public boolean accepts(Tree<String> tree)
tree
- public Tree<Rule> getRuleTree(Tree<Integer> derivationTree) throws Exception
derivationTree
- Exception
public Iterable<State> run(Tree<String> tree)
tree
- public IntIterable runRaw(Tree<Integer> tree)
tree
- public <TreeLabels> IntIterable run(Tree<TreeLabels> node, ToIntFunction<TreeLabels> labelIdSource, ToIntFunction<Tree<TreeLabels>> subst)
The node labels of the tree are assumed to be of an arbitrary class, specified by the type parameter "TreeLabels". The "labelIdSource" argument specifies a function that maps objects of class TreeLabels to symbol IDs, which represent node labels according to the automaton's signature.
The "subst" argument maps nodes of the tree to states. It is called for each node of the tree, bottom-up. If the subst function returns 0, the state for the node is computed from the automaton's rules and the label and child states of the node in the usual way. If subst returns non-zero for a certain node, the return value is directly assigned as the state of that node. This mechanism can be used, for instance, to assign states to variable nodes.
TreeLabels
- node
- subst
- public double getWeightRaw(Tree<Integer> tree)
The labels of the nodes of the tree are assumed to be numeric symbol IDs, which specify node labels according to the automaton's signature.
tree
- public double getWeight(Tree<String> tree)
The method also returns a weight of zero if the tree is null.
The labels of the nodes of the tree are assumed to be strings.
tree
- public TreeAutomaton<State> reduceTopDown()
public IntSet getReachableStates()
public void foreachStateInBottomUpOrder(TreeAutomaton.BottomUpStateVisitor visitor)
public <E> Int2ObjectMap<E> evaluateInSemiring(Semiring<E> semiring, RuleEvaluator<E> evaluator, List<Integer> statesInOrder)
public <E> Int2ObjectMap<E> evaluateInSemiring(Semiring<E> semiring, RuleEvaluator<E> evaluator)
E
- semiring
- evaluator
- public <E> Map<Integer,E> evaluateInSemiringTopDown(Semiring<E> semiring, RuleEvaluatorTopDown<E> evaluator)
E
- semiring
- evaluator
- public List<Integer> getStatesInBottomUpOrder()
public Iterable<Tree<String>> languageIterable()
public Iterator<Tree<Integer>> languageIteratorRaw()
public boolean isEmpty()
public Iterator<Tree<String>> languageIterator()
public Iterator<WeightedTree> sortedLanguageIterator()
Signature.resolve(de.up.ling.tree.Tree)
.public long getNumberOfRules()
public void analyze()
public Rule createRule(int parent, int label, int[] children, double weight)
#createRule(java.lang.Object, java.lang.String, State[], double)
.parent
- label
- children
- weight
- public Rule createRule(int parent, int label, List<Integer> children, double weight)
createRule(java.lang.Object, java.lang.String, java.util.List, double)
.parent
- label
- children
- weight
- public Rule createRule(State parent, String label, State[] children, double weight)
parent
- the rule's parent statelabel
- the terminal symbol used in the rulechildren
- the child states, from left to right (as an array)weight
- the rule weightpublic Rule createRule(State parent, String label, List<State> children, double weight)
parent
- the rule's parent statelabel
- the terminal symbol used in the rulechildren
- the child states, from left to right (as a list)weight
- the rule weightpublic Rule createRule(State parent, String label, State[] children)
#createRule(java.lang.Object, java.lang.String, State[], double)
with a weight of 1.parent
- the rule's parent statelabel
- the terminal symbol used in the rulechildren
- the child states, from left to right (as an array)public Rule createRule(State parent, String label, List<State> children)
createRule(java.lang.Object, java.lang.String, java.util.List, double)
with a weight of 1.parent
- the rule's parent statelabel
- the terminal symbol used in the rulechildren
- the child states, from left to right (as a list)public void normalizeRuleWeights()
public Interner getStateInterner()
public boolean supportsTopDownQueries()
public boolean supportsBottomUpQueries()
public TreeAutomaton<Set<State>> determinize(List<IntSet> newStateToOldStateSet)
public TreeAutomaton<Set<State>> determinize()
public void processAllRulesTopDown(Consumer<Rule> processingFunction)
processingFunction
- public boolean processAllRulesBottomUp(Consumer<Rule> processingFunction)
processingFunction
- public int getNumberOfSeenStates()
public SiblingFinder newSiblingFinder(int labelID)
labelID
- public boolean useSiblingFinder()
public void dumpToFile(String filename) throws IOException
IOException
Copyright © 2017. All rights reserved.