Morphological information

Types of morphological information

Morphological information is information about semantically relevant word structure; the smallest morphological unit is the morpheme, often defined as the smallest meaningful unit in a language. Morphemes should not be confused with units such as the syllable and its constituents, which are used for describing the structure of words from a pronunciation point of view, without reference to meaning.

The domain of morphology may be divided in terms of the functions of morphological operations, i.e. agreement vs. word formation, or in terms of the structures defined by morphological operations, i.e. affixation, ( prefixation, suffixation, infixation or prosodic modification) vs. compounding ( concatenation of stems or words). These two dimensions can be represented

There is an apparent gap in the use of stem or word concatenation for agreement purposes; however, so-called periphrastic constructions, typically with auxiliary verbs and participles or infinitives, may be assigned to this slot, though these constructions cannot be compared directly with standard compounds: compare English John will come with French Jean viendra. English lacks an inflectional future, but has periphrastic modal or infinitive complement future forms such as John will come tomorrow, John is going to come tomorrow, as well as the present tense as a general or neutral tense form, as in John comes tomorrow.

There are other intermediate cases which sometimes present difficulties in classification and where the solution is not always immediately obvious.

Traditional treatments often treat these forms together with inflections, presumably because of their regularity and the involvement of suffixation. They are generally better treated as derivations, however, because they have different syntactic distributions from other inflections of the same stems, and may be additionally inflected as adjectives or nouns (cf. the perfect participle in French: On l'a vue, where it can be deduced from the feminine inflection on the participle and the rules of inflectional agreement ( inflectional congruence) that l' refers to a feminine item.

Applications of morphology

Morphological  structuring is required for the following tasks:

There are two main ways of structuring words internally into word sub-units (word constituents):

  1. Semantic orientation. On morphological  grounds, word forms may be decomposed into smaller meaningful units, the smallest of which are morphs , the phonological forms of morphemes ; an intermediate unit between the morph  and the word form is the stem .

  2. Phonological orientation. On phonological grounds, word forms may be decomposed into smaller pronunciation units, the smallest of which are phonemes ; an intermediate pronunciation unit is the syllable .

It is important to note that decomposition into syllables is not isomorphic with decomposition into morphs. For example, phonological has the syllable structure /fO . n@ . lO . d3I . k@l/ and the morph structure /fOn + @ + lOd3Ik + @l/, which are quite different from each other.

In addition to phonological decomposition, in the written mdoe word forms may be decomposed into smaller spelling units, graphemes, each consisting of one or more characters; an intermediate orthographic  unit is the orthographic break  (orthographic syllable ), which is in general only needed for line-breaks and does not correspond closely to either syllable  or morph boundaries  but combine phonological, morphological  and orthographic  criteria.

For the core requirements of speech recognition, in which a closed vocabulary of attested fully inflected  word forms are generally used, morphological structuring  is not necessary. Phonological structuring into syllables , demisyllables , diphone  sequences or phonemes  is widely used in order to increase statistical coverage and to capture details of pronunciation (cf. =1 (

; Browman 1980) , =1 (

; Ruske Schotola 1981) , =1 (

; Ruske 1985) ).

In many languages, syllables  and morphs  do not always coincide; morphs  may be smaller than or larger than syllables .

A brief outline of the main concepts in morphology , as they affect spoken language lexica will be useful in developing spoken language lexica; for more detail a textbook in linguistics should be consulted (e.g. =1 (

; Akmajian 1984) ).

Morphology :
Morphology is the definition of the composition of words as a function of the meaning, syntactic function, and phonological or orthographic  form of their parts. The morphology  of spoken language is fundamentally the same as the morphology  of written language in respect of meaning, syntactic function, and the combinability of morphemes . It differs in respect of morphophonological alternations , which differ from spelling alternations, and word prosody  (for instance word stress  patterns). General definitions are given here; examples are given below.

Morphotactics  (word syntax ) is the definition of the composition of words as a function of the forms of their parts.

Inflection  is that part of morphology  which deals with the adaptation of words to their contexts within sentences.

Word formation is that part of morphology  which deals with the construction of words from smaller meaningful parts.

Derivation  is that part of word formation which deals with the construction of words by the concatenation of stems  with affixes .

Compounding  (composition) is that part of word formation which deals with the construction of words by concatenating words or stems .

Simple morphological  units:
Traditional terminology varies in this area; a standard but incomplete definition of a morpheme , for instance, is that it is `the minimal meaning-bearing unit of a language'. This definition is not entirely satisfactory, however, and for present purposes the sign-based model and the unit of word will be used as the starting point.

A morpheme  is the smallest abstract sign-structured component of a word, and is assigned representations of its meaning, distribution  and surface (orthographic  and phonological) properties. More informally, morphemes  are parts of words defined by criteria of form, distribution  and meaning; i.e. they have meanings and are realised by orthographic  or phonological forms (morphs ).

Traditionally, the two main kinds of morpheme  are:

Morphs  are, in traditional linguistics, the orthographic  and phonological forms (realisations) of morphemes . Orthographic morphs  consist of graphemes (either single letters or fixed combinations of letters); in traditional phonology, phonological morphs  consist of phoneme  sequences with a prosodic  pattern (e.g. word stress ).

Roots  (lexical morphs ) are the morphs  which realise lexical morphemes  and inflectable  grammatical morphemes , and function as the smallest type of stem in derivation and compounding. Affixes  (prefixes , suffixes ) are morphs  which realise the inflectional  and derivational  beginnings and endings of words.

A free morph is a morph which can occur on its own with no affixes as a separate word; a bound morph is a morph (generally an affix) which always occurs together with at least one other morph (typically a stem in the same word.

Complex morphological  units:
The structure of words is, like the structure of sentences, defined recursively, since the vocabulary of a language (including new coinages) is potentially unlimited. The functional and formal classification of morphological  word structure (compounding  and derivation , see above) takes this into account. Where words which are not in a given lexicon are likely to be encountered, morphotactic rules  and a morphological parser  or morphological generator  may be required in order to supplement the lexicon. The condition of recursive structure does not apply to inflection , which, given a finite set of stems , defines a finite set of words (though in agglutinative  languages, an extremely large finite set):

Inflectional  affixation :
A word (fully inflected  word) is a stem  morphologically  concatenated with a full set of inflectional  affixes , e.g. English algorithm + s = algorithms or German ge + segn + et + en `blessed' (plural participle or adjective).

Derivational  affixation :
A stem  is
  • either a root  (i.e. lexical morph ), e.g. tree, algorithm
  • or a stem  morphologically  concatenated with a derivational affix , e.g. algorithm + ic, algorithm + ic + al + ly, non + algorithm + ic + al + ly, etc.

Compounding :
A compound  word is a word morphologically  concatenated with a word.

Morphophonological  and orthographic alternations :
The operation of morphological  concatenation is defined for present purposes as `concatenation and modification of segments at morph boundaries  by boundary phenomena.' The details of pronunciation and spelling are altered in morphologically   complex items. An example of morphophonological alternation  is /f/ --- /v/ in knife /naIf/ --- /naIvz/. An example of orthographic alternation  is y --- i --- ie in fly, flier, flies. These alternants can be described by rules:

  1. Morphophonological rules  are rules (analogous to spelling rules) which describe morphophonological alternations , i.e. the differences between pronunciations of parts of composite words and pronunciations of corresponding parts of simplex words.

  2. Spelling rules are rules which describe spelling alternations, i.e. the differences between spellings of parts of composite words and the spellings of corresponding parts of simplex words.

A standard technology for formulating spelling rules and morphophonological rules  is Two-Level Morphology  (cf. =1 (

; Koskenniemi 1983) , =1 (

; Karttunen 1983) ; cf. =1 (

; Ritchie et al. 1992) ).

Recommendations on morphology

  1. Decide whether word sub-units may have a role to play in the intended application.
  2. For a large vocabulary (> 5000 words) spoken language lexical database of a highly inflecting language as a general resource, consider using a morphology component will be needed to generate fully inflected forms, either on demand or as a (very large) table.
  3. For specific problem areas such as the identification of new words or increasing the robustness of a recogniser, consider using morphological units in speech recognition.
  4. In highly inflecting languages, consider the use of stochastic language models based on word stems as an alternative to fully inflected words.

