Tania Avgustinova

Word Order and Clitics in Bulgarian

Saarbrücken Dissertations in Computational Linguistics and Language Technology, Volume 5


This thesis is concerned with Bulgarian word-order phenomena involving clitics. It grew out of an interest in the way considerable word-order variance is achieved in a language exhibiting an impoverished declension system in combination with a well-developed mechanism for clitic replication. Across the languages, clitics' behaviour varies from that of word affixes to the autonomy of independent syntactic forms; in this respect, the intermediate status of Bulgarian clitics is particularly interesting.
Even though the linguistic research carried out in this work is strongly motivated by the need for an explicit formal description of Bulgarian constituent structure and word order for (the purposes of) computer implementation, the formal issues have been moved to the second plan, with the intention of making the analysis comprehensible for the broadest possible circles of readers with a background in Slavistics. The lack of stress on formalisation, however, does not imply that the theory presented cannot be formalised. Quite the contrary, the fact that it has been successfully implemented in the form of a parser underlying an experimental grammar-checker for Bulgarian shows that a rigorous formalisation is indeed possible.
As theoretical framework, the Head-driven Phrase Structure Grammar (HPSG) is chosen in this thesis, due to its essential property of offering a multidimensional, but nevertheless integral, sign-based representation of linguistic objects. The complexity of structural relations within the Bulgarian verb complex questions the adequacy and universal validity of lexicalist approaches to the treatment of clitics. It is argued in this work that cliticisation in Bulgarian has a morphosyntactic dimension and that verbal clitics belong to the verb-complex constituent regarded as an intermediate construct between the lexical verb and the clause headed by this verb. The proposed analysis is based on a variant of HPSG that provides an additional morphosyntactic dimension for modelling analytic verb morphology and cliticisation, and as a result, dustinguishes three types of objects: lexical, morphosyntactic and syntactic. The concept of morphosyntactic marking introduced in this work is central in the treatment of Bulgarian analytic verb forms. Based on constituent structure and syntactic behaviour, two main types of verb complexes are distinguished: compact verb complexes characterised by strict adjacency of their components, and composite verb complexes having two loosely bound parts that need not stand adjacently. Formally, cliticisation is considered a matter of morphosyntactic constituency rather than of lexicon. The actual placement of verbal clitics and verbal clitic sequences is interpreted on the level of verbal morphosyntax where prosodic constraints are also taken into consideration. As a consequence, pronominal clitics are not legitimate constituents on the clausal level.
The approach to the constituent structure of Bulgarian proposed in this study allows for an adequate representation of the commonly admitted "two-faced" appearance of object clitics that are neither real morphemes nor full-fledged syntactic constituents. Intuitively, a certain parallelism exists between the relation of the verb inflexion to the subject NP and the relation between a pronominal clitic and the corresponding coreferential object NP. The information about the person, number and gender (i.e. the index) of the syntactically nominative subject NP is available in the morphology of the verb. The same index information plus information about the syntactic case of the respective full-fledged NP complement is supplied by the pronominal clitics within the morphosyntactic verb complex. Thus, both mechanisms - the morphological one of verb inflexion, and the morphosyntactic one of object cliticising - deliver very similar results on the clausal level amounting to syntactic optionality of overt realisation for the respective full-fledged nominal constituents.
Once the step towards admitting the existence of morphosyntactic constituency is made, the language description gains in explanatory power and transparency with respect to a number of phenomena belonging to the vague "interface" area between the lexicon and syntax proper. The clearly defined morphosyntactic module in the grammar of Bulgarian distinguishes this language in the Slavic family, which illustrares that in HPSG the parametrisation of linguistic and cross-linguistic variation can occur in the grammar. In this respect, it might be relevant to investigate whether a morphosyntactic grammar module would be justified and beneficial in the description of other languages exhibiting phenomena that are problematic for the lexicalist HPSG approach.
As a prerequisite for discussing the role of clitic replication on the clausal level, a typology of Bulgarian articled and non-articled NPs is developed, thus providing criteria for determining the replication potential of nominal material. The main claim is that what can be replicated by a clitic pronoun (under the appropriate verb-lexeme specific or communicative conditions) is the nominal material that is used as an identifying specific description of a given object, while non-articled NPs that are categorising or non-specific descriptions, as well as articled NPs that are generic or non-specific descriptions completely lack replication potential.
A particular contribution of this thesis is the view that clitic replication of full-fledged NP-complements has a communicatively-driven syntactic dimension and deserves special attention as a factor influencing the constituent order variation in the Bulgarian sentence. In this respect, it is shown how the (canonical) lexeme-specific obliqueness order of grammatical relations, the concrete surface alignment and the contingent clitic replication interact in the communicative structuring of utterances. It is further argued that the phenomenon of clitic replication in Bulgarian has two functions which interact to a different degree in each particular case: direct-object identification by means of accusative clitics, and thematicisation of nominal material by means of both accusative and dative clitics. The accusative clitic replication in the S-V-O sentence type, and the accusative and dative clitic replication in the S-V-O1-O2 sentence type are then modelled in their relation to the grammatical obliqueness hierarchy (the canonical element order), the particular surface alignment (fairly flexible in Bulgarian) and the information structure (the communicative segmentation of the particular utterance) with special attention to the emphatic-stress location. By use of the proposed approach, it can be predicted when clitic replication is impossible, when it is obligatory, and when it is only optional.
An important feature of the linguistic analysis proposed in this thesis is its computational tractability. The appendix contains an illustration of how the relevant linguistic knowledge is organised in multiple-inheritance hierarchies, as well as information on a fragment of computerised Bulgarian grammar which covers the verbal morphosyntax in full range, and to a considerable extent the replication phenomena on the clausal level.
