next up previous contents
Next: Lexicon structure Up: Spoken Language Lexica Previous: Grammatical information

Lexical content information

Lexical semantic information

The treatment of semantics  in lexica for spoken language systems is far less uniform than the treatment of word forms; decisions in this area are highly application specific:

An outline of some of the basic lexical semantic relations between lexical units was given at the beginning of this section in the context of the discussion of relations between lexical units (lexical relations).

Reference should be made to the results of the EAGLES Computational Lexica Working Group for further information on lexical semantics , and to standard texts such as =1 (

; Lyons 1977) or =1 (

; Cruse 1986) .

Pragmatic information

The central area of application for pragmatic  information from the point of view of spoken language is in dialogue situations, in which prosody  (intonation , emphasis, accentuation) is required in addition to word-based information: prosodic  information is typically associated with speaker-centred pragmatic  information such as topic focussing, speaker attitudes, and dialogue turn-taking. For further details of dialogue structure, the chapter on Interactive Dialogue Systems and the results of the EAGLES Working Group on Computational Lexica should be consulted.

Because spoken language corpora are generally highly application oriented, and therefore bound to a particular speaker or set of speakers, with relatively homogeneous properties of register , speech style, and dialect , including pronunciation, vocabulary, grammar , and intentions with respect to specific actions, the kind of pragmatic  information required for particular lexical items is restricted, as the same information applies to all lexical items.

Spoken language lexica differ in this way from large-scale general coverage lexica, though the need for such lexica in the spoken language area is growing. The kinds of pragmatic  information required are generally limited to information about a few speech act types (question, answer, instruction, etc.). The advent of spoken language dialogue systems, however, is making more sophisticated approaches necessary: treatment of discourse particles, including hesitation markers, and of word fragments is becoming necessary.

Idiomatic information

Pragmatic idioms  such as greetings are generally treated as holistic lexical items, i.e. as `canned text ', and included in the lexicon in full. The same applies to fixed idioms  such as Come to think of it, ... in the meaning `I just thought of another relevant point, namely ...'; variants like I will just come to think of it, ..., Come to consider it, ... etc. do not have the idiomatic meaning.

The most complex problems arise in the case of idioms  with variant forms, such as If you twist my arm, then ... , i.e. `If you give me a really good reason, then ...', where forms such as Twist my arm! or even would you mind twisting Fred's arm? have the idiomatic meaning, Don't twist my arm!, It was my arm that Tony twisted etc. tend to have the literal, rather than the idiomatic meaning.

Very often, idioms  are associated with a specific range of prosodic  patterns (intonation  patterns; for example, in How do you do?, the words how, do and you might be emphasised as a joke, but not with the standard pragmatic idiom  connotation. In the example given above, if the word twist is given a noticeably stronger accent than arm, the same also applies.

>From a more general perspective, so-called functional units (sequences of functional words which behave as a phonological unit) and clitics  (functional words which combine with lexical words to form a sequence which behaves as a phonological unit), share a number of properties with idioms  in the more traditional sense of the term. Characteristic of these units is that they have special phonological characteristics, with deletion  and assimilation  of segments to their neighbouring environment.

An example of a functional unit in English is I c'n ... /aIkN/ for I can /aI k{n/ in informal, fast speech or particularly unstressed  contexts.

An example of a clitic  in English is he's /hi:z/ or even /hIz/, for he is /hi: Iz/.

Some cliticised  sequences have become lexicalised (i.e. independent lexical items) in informal styles, e.g. can't /ka:nt/ and cannot /k{nOt/ for can not /k{n nOt/.

A general solution to the lexical treatment of idioms  is not currently available either for written language processing or spoken language processing, and further research and development is needed in this area in view of the frequency of idioms  in actual corpora.

Recommendations on semantic information

  1. Ensure that the application domain is as precisely specified and modelled as possible.
  2. Determine the size of vocabulary which can be processed, and consider ways in which to minimise the vocabulary needed for this domain.
  3. Consult the results of the EAGLES Working Group on Computational Lexica; this is an area in which Spoken Language and Written Language processing overlap considerably.


next up previous contents
Next: Lexicon structure Up: Spoken Language Lexica Previous: Grammatical information



WWW Administrator
Fri May 19 11:53:36 MET DST 1995