abstracted over | When forming an abstraction, the variable in the |
active arc | In active chart parsing we record hypotheses about structures to be found and what has been found so far in active arcs. |
active edge | Same as ↗active arcs. |
affix | The smallest units in a word that bear meaning, such as "rabbit" and "s" , are called morphemes. Morphemes like rabbit that contribute the main meaning of a noun, verb, etc. are also called stems, while the other morphemes are known as affixes. |
agenda | In active chart parsing, an agenda is a datastructure that keeps track of the tasks that still have to be addressed. |
algorithm | A detailed sequence of actions to perform to accomplish some task. Named after an Iranian mathematician, Al-Khawarizmi. |
alphabet | The alphabet of a Finitie State Automaton (usually called |
ambiguity | In this context, ambiguity describes the situation in which an expression of a (formal) language has more than one analyses. |
atomic formula | Atomic formulas are formulae that do not contain any connectives. |
bottom-up parsing and recognition | The basic idea of bottom-up parsing and recognition is to begin with the concrete data provided by the input string and try to recursively build bigger and bigger pieces of structure. |
bound variable | A quantifier is said to bind the variable standing immediately to its right. Within a quantification of the form |
breadth-first | A graph search algorithm which tries all one-step extensions of current paths before trying larger extensions is called breadth-first. |
chart parsing | Chart parsing is based on the idea to keep a record (a chart) of all the structures found during parsing. It is a very general idea, and it can be used with just about any kind of parsing (or recognition) strategy (top down, bottom up, left-corner, depth first, breadth first, ...). |
concatenation | The concatenation of two strings A and B is the string AB, where both are for one sequence. |
confluence | Property of calculi: A calculus is called confluent if the order of rule application has no influence on the results obtained. |
constant symbol | Syntactic entity (in First Order Logic) corresponding to a name: A symbols to which one fixed entity from the domain of the model in use is assigned. |
context free | Said of a grammar where the syntax of each constituent is independent of the symbols (words) occuring before and after it in a sentence. E.g. the context free rule NP |
context free grammar | Context free grammars (CFGs) describe a larger class of formal languages than regular languages do. E.g. the language |
context free rule | Are rules of a ↗context free grammar (you may also see such rules called rewrite rules, or something similar). |
Definite Clause Grammar | A grammar formalism that is built into Prolog. Although DCGs are formally more powerful than CFGs, we use the DCG mechanism to deal with CFGs. |
depth-first | Said of a graph search algorithm which extends the current path as far as possible before backtracking (to the last so-called choice point) and trying the next alternative path. |
derivation | A sequence of applications of grammar rules (starting with some initial symbols) is called a derivation of the symbols in the last row. |
derivation (morphological) | Is the technical term for the process of adding an ↗affix to a word ↗stem resulting in a word with a class different from that of the stem. Making a noun out of a verb by adding "ation" to the verb is an example: "realize" + "ation" gives us realization. |
dialogue act | Dialogue acts describe what role different utterances play within a dialogue. |
dynamic | Prolog puts some restrictions on what can be asserted and retracted. More precisely, other predicates are only allowed to remove or retract clauses of predicates that have been declared dynamic. So, if one wants to write a predicate which asserts or retracts clauses of the form happy(mia), one also has to put the following statement into the file: :- dynamic happy/1. |
exponential | Computional costs are typically time and space. Exponential Problems are such that one of these factors expands in an exponential curve (if the input gets more complex). |
extraction | If syntactic arguments are moved from their original position, one speaks of extraction. C.f. "John" in "(It is) John, Mary loves." |
failure driven loop | The idea of failure driven loops is to force Prolog to backtrack until there are no more possibilities left. This technique is particularily helpful for working with the Prolog database. |
feature | Features group similar values within categories. E.g. the feature CASE groups all case information like nominative, genitive, etc. within the category syntax. |
final state | Finite State Automata by definition contain a list of (zero or more) final states. Only if one such state can be reached (by consuming all input), an automaton can report success (for this particular input). |
finite state automaton | Same as ↗finite state machine. |
finite state generator | A finite state generator is a ↗finite state machine that outputs a sequence of symbols. |
finite state machine | An abstract machine consisting of a set of states (including the initial state), an input tape (and possibly an output tape), a state transition function, and some designated ↗terminal states. Finite state ↗recognizers only read from one input tape and finally accept the given input or not. ↗finite state generators in turn just write to an output tape. Finally, ↗finite state transducers read input and write output in parallel. |
finite state parser | A finite state parser is a ↗finite state transducer used for parsing the input. |
finite state transducer | A kind of ↗finite state machine that allows to produce output recording the structure of the input. |
first-order language | A first-order language (over some vocabulary) defines how we can use the vocabulary to form complex entities. |
formal language | A formal language is a set of strings formally characterized e.g. by a grammar, or an automaton. |
free | See ↗free variables. |
free variable | Variables (in well formed expressions) that are not bound. For an explicit formal definition of the concepts of bound and free variable, see the main text. |
FSA | Abbreviates ↗finite state automaton. |
FSM | Abbreviates ↗finite state machine. |
functional application | The concatenation of a |
fundamental rule | The fundamental rule for combining a passive edge and an active edge works as follows: Suppose the active edge goes from node |
gap | Describe syntactic positions where an argument is missing. E.g. in "Who loves mary?" , at the second argument position of "loves" is a gap (because the argument "who" was extracted to the front). The assumption ist that the arguments of a predicate normally occur to its right as in: love(X,Y). |
inflection | ↗Morphemes be combined to form words that are legal in some language. Two kinds of processes can be distinguished here. They are called inflection and ↗derivation. Inflection is usually taken to be the process of adding a grammatical affix to a word stem, forming a word of the same class as the stem. Adding plural s to a noun stem, for example, is an inflectional process. |
input tape | One part of a ↗finite state machine. |
jump arc | Jump arcs let a ↗finite state machine jump from one state to another without emitting or reading a symbol. |
Kleene closure | |
language accepted | The language accepted (or recognized) by an ↗FSA is the set of all strings it recognizes when used in recognition mode. The language accepted by a formal grammar is the set of all strings the grammar accepts. |
language generated | The language generated by an FSA is the set of all strings it can generate when used in generation mode. |
language generated by a context free grammar | The language generated by a context free grammar is the set of sentences (strings of ↗terminal symbols) that can be derived starting from the ↗start symbol. |
left corner | The left corner of a rule is the first symbol on its right hand side. For example, |
left-corner parsing | The basic idea in left-corner parsing is to alternate bottom-up and top-down steps. |
left-corner table | The left-corner table of a grammar stores which constituents can be at the ↗left corner of which other constituents. This allows one to check every time whene choosing a rule whether it makes sense to use this rule given the top-down information about the category to be recognize. |
lexical rule | A ↗grammar rule which has only words on the right hand side. |
local ambiguity | Ambiguity within a globally unambigous constituent. E.g. the phrase "short walk" has two analysis while the larger sentences "The short walk ended." and "The short walk slowly." do not. |
matrix | If |
meaning construction | Is the process of constructing semantic representations out of (natural language) strings. Usually along the lines of some syntactic analysis. |
meaning representation | Meaning as such is a very abstract concept. It's not at all easy to imagine what it is to get to the meaning of a sentence, let alone to construct it from that sentence. To study meaning, and especially to deal with it on a computer, one needs a handle on it, something more concrete. Meaning representations, i.e. strings of a formal language that has technical advantages over natural language like first-order logic are often used for this purpose. |
morpheme | Minimal meaningful language unit; cannot be divided into smaller meaningful units. Morphemes like "rabbit" that contribute the main meaning of a noun, verb, etc. are also called ↗stems, while the other morphemes are also known as ↗affixes. |
movement | Explanation in various (especially early) theories of formal linguistics of particular syntactic structures. In short, it is assumed there that constituents can move within some kind of underlying syntax tree. |
name | FOL: same as ↗constant symbols. |
non-deterministic | Said of ↗algorithms that allow for more than one subsequent step ("action") in some configuration. |
non-terminal symbol | The set of symbols of a formal grammar that occur on the left hand of some arrow. |
Occurs-Check | For reasons of efficiency, Prolog's standard unification mechanism leaves out the so-called Occurs-Check. This plays no role most of the time. But in some cases it does. A canonical example to see what's going wrong is the following: ?- A = f(A). A = f(f(f(f(f(f(f(f(f(f(...)))))))))) YesIf one asks Prolog to unify a variable with a complex term, it does not check whether this variable occurs inside the term. Therefore the variable and the term unify in the example above resulting in an infinite structure. |
output tape | Part of some kind of ↗finite state machine. |
parser | Parsers give an analysis of their input. In contrast, ↗recognizers just accept it or not. |
parse tree | A parse tree is a finite tree all of whose nodes are licensed (admitted) by a grammar rule. The leaf nodes must be licensed by ↗lexical rules, the interior nodes (that is, nodes with daughters) must be licensed by ↗phrase structure rules. |
passive arc | Complete arc in active chart parsing that has found everything it hypothesized. In graphical notation, these are the arcs where the `.' is at the end of the label. |
passive chart parsing | Is a parsing algorithm that uses a chart as a record of all constituents that have been parsed so far. |
passive edge | Same as ↗passive arc. |
phrase structure grammar | PSGs provide a constituent analysis of a string (in terms of its constituent phrases). PSGs are usually represented as rewrite rules. E.g. ↗phrase structure rules: |
phrase structure rule | Are parts of a ↗phrase structure grammar. |
polynomial | Said of problems where the resources needed to find the solution (see ↗exponential) are related to the input size by a polinomial function. |
predicate symbol | A symbol (e.g. in first order logic) with which we can speak about a certain property of or relation between individuals (therefore predicate symbols are also called relation symbols). Every predicate symbol has an arity which specifies the number of arguments the predicate takes. E.g. the predicate symbol |
preterminal symbol | A symbol of a formal grammar that occurs on the left-hand side of the arrow in a ↗lexical rule. |
recognizer | See ↗parsers. |
regular expression | Any description of a pattern composed from combinations of symbols and the three operators:↗ concatenation, disjunction, and ↗closure. |
regular language | Language generated by a ↗regular expression. |
relation symbol | Same as ↗predicate symbol. |
restriction | Determimers in natural language usually specify a set of individuals and attribute some property. The two parts of the determiner specifying the group and the property are called restriction and scope, respectively. E.g. in the sentence "Every wife has a husband." , "wife" is the restriction and "has a husband" the scope of the universal quantifier. |
rewrite arrow | The arrow used in writing ↗context free rules |
rewrite interpretation | The simplest interpretation of ↗context free grammars is the rewrite interpretation. This views CFGs as rules which let us rewrite the start symbol to other strings: each rule is interpreted as saying that we can replace an occurrence of the symbol on the left side of the rule by the symbol(s) on the right side. |
rewrite rule | Same as ↗context free rules. |
scope | Same as ↗matrix. |
search | The implementation of ↗non-deterministic algorithms brings with it the need to perform search, i.e. (possibly) a range of alternative steps has to be explored (searched). A space of alternatives is often represented as a (search) tree. |
sentence | A (logical) formula that contains no occurrences of free variables is called sentence. |
sentence symbol | Every context free grammar has one special symbol, the start symbol (or sentence symbol), which is usually written `S'. |
speech act | Speech acts describe the use of language to perform some act, e.g. promise, reply, or request. |
spurious analysis | One speaks of a spurious analysis in cases where a parser produces two identical analysis for same input, e.g. by using the same grammar rules in a different order. |
start state | One marked state of a ↗finite state machine. |
start symbol | Same as ↗sentence symbol |
stem | See ↗morphemes. |
subcategorize | One of the most important concepts in (formal) linguistics ist that of subcategorization. Certain word classes are assumed to require (subcategorize for) other word classes (so called complements) to form complete phrases. E.g. nouns are assumed to subcategorize for a determiner. Probably most important are verbs which subcategorize for nouns (a subject and possibly objects) to form a sentence. |
syntactic structure | The structure of a string of symbols according to some (formal) grammar. E.g. the syntactic structure of the natural language sentence "John walks." is often represented as [S [NP [PN John]] [VP [IV walks]]]. |
tape | Part of ↗finite state machines. |
term | FOL: Any constant or any variable is a term. Roughly speaking, terms are the noun phrases of first-order languages. |
terminal state | Same as ↗final state |
terminal symbol | Symbols of a formal grammar that may only occur on the right hand side of the arrow. |
threading | Is a technique for explaining/treating so-called long distance phenomena within natural language anylysis. The problem is that parts of a sentence that belong together are often distributed over the surface structure. E.g. in the noun phrase "The boy which Mary, who has become a scientist recently, visited last night" , "the boy" is an argument of "visited" . One explanation of such phenomena uses missing (empty) elements (↗gaps) and describes how these elements and the so-called fillers are connected through the phrases as if there was a thread connecting them. |
top-down parsing/recognition | In top-down parsing/recognition one start at the most abstract level (the level of sentences) and work down to the most concrete level (the level of words). |
transition | Transitions are definied by the respective function of ↗finite state machines. |
tree admissibility rule | Next to the ↗rewrite interpretation of CFGs, there is a more fundamental way of thinking about CFGs, namely as tree admissibility rules. According to this view, CFGs describe sets of well-formed ↗parse trees. |
well-formed formula | FOL: formulae that are built from a vocabulary, the logical symbols of FOL and first-order variables according to the syntax rules of FOL. |
α-conversion | The operation of producing an alphabetical variant of a |
α-equivalent | Two |
β-reduction | This compound expression consists of the abstraction |
λ-abstraction | |
λ-bound | The variable occuring to the right of a As an example, look at the following two |