Semantic transparency and productivity in affixation


IGK 2004 Project


Proposers: Viktor Tron
Other interested students: Ulrike Baldewein
Suggested Lecturers/Guests: Harald Baayen
  Scott McDonald
  Simon Kirby
Time constraints:

Description

In earlier work Baayen (e.g, 1994) proposed a number of measures to assess morphological productivity of affixation, including hapax count in a corpus. Baayen and Hay (2002,2004) showed that these measures correlate well with the parsing profile of an affix, i.e., the decomposability of the affixed form which in turn depends on the relative frequency of base form and derived form as well as their phonological transparency. A natural continuation of this work would be to show the effect of semantics on productivity.

It has been argued that productivity depends on the semantic transparency of the affix (Plag 2004, Hay 2001), i.e., the degree to which the affix has a consistent meaning contribution). In other words, an affix is more transparent if compositional interpretation of affixed forms requires the least amount of ambiguity.

This project aims to test this hypothesis by assessing measures of semantic transparency and relate them to productivity. Clearly, the major bottleneck of this enterprise is finding a plausible and feasible method to assess semantic transparency.

Hay (2001) offers a simple and intuitive measure of semantic transparency based on the mention of the base in dictionary definitions of the derived forms and shows that it is predictive of derived-base relative frequency. The ratio of high relative frequency derived-base pairs for an affix (its parsing ratio) correlates with the productivity of the affix (Hay and Baayen 2004).

Baroni and Vegnaduzzo (2003) propose measures based on cooccurrence similarity between base and derived (more specifically cosine similarity, mutual information, and log-likelihood ratio of coocurrence count vectors within a 3-150 word contextual window) and evaluate their relation to native-speakers judgements of productivity as well as automatic measures of productivity.

Both of the above approaches choose to rate semantic transparency with the direct semantic relation of base and affixed form. Another method is based on the hypothesis that semantically transparent affixes have a consistent meaning contribution and therefore tend to leave semantic relations between bases relatively unaltered. Intuitively speaking, the proportional equation base1::base2 = base1+aff::base2::aff is valid to a greater extent if 'aff' is semantically transparent. This would amount to measure the degree to which semantic relations are preserved by a given affix. Such a measure can be approximated by correlating pairwise distances in the domain of bases and affixed forms, where distances would be calculated with cooccurence statistics.

Project Goals

The main goals of this project will be to:

  1. Experiment with a new measure of semantic transparency;
  2. Evaluate the measure by relating it to other measures proposed in the literature;
  3. Assess to what extent the measure proposed relates to productivity measures of affixes.

References

Baayen, Harald (1994) Productivity and language production. Language and Congnitive Processes, 9:447-469.

Baroni, M. and S. Vegnaduzzo (2003) Assessing morphological productivity via automated measures of semantic transparency. Presented at the Workshop on Explaining Productivity of the DGFS.

Hay, Jennifer and Harald Baayen (2002) Parsing and Productivity.   In Booij, G. and van Marle, J. Yearbook of Morphology 2001. Kluwer Academic Publishers, 2002, 203-235.

Hay , Jennifer and Harald Baayen (2004)   Phonotactics, Parsing and Productivity.    To appear in Italian Journal of Linguistics (special issue on Morphological Productivity, edited by Livio Gaeta and Mark Aronoff).

Hay, Jennifer (2001) Lexical Frequency in Morphology: Is Everything Relative? Linguistics , 39 (6), 2001, pg 1041-1070.

Plag, Ingo (2004) Productivity. To appear in Keith Brown (ed.) Encyclopedia of Language and Linguistics, Second Edition. Oxford: Elsevier.