next up previous contents
Next: A taxonomy of Up: Assessment of speaker Previous: Assessment of speaker

Presentation

 

Until recently (say less than a century), there were hardly any circumstances when somebody at the reach of the voice was not also at the reach of the eyes, and face or silhouette were probably much more used than voice by human beings to identify each othergif. With the development of telecommunications and recordings, the need for speaker recognition has become more pressing.

The ground for automatic speaker classification and recognition is that, beside the linguistic message, human voice conveys a lot of paralinguistic information relative to the speaker, i.e the ``encoder''. These factors of variability are well-known obstacles to speech recognition, as they influence the acoustic characteristics of the speech signal.

The main sources of a speaker's specificity are the physiological conformation of his speech production organs, his neuro-motor control of these organs, and his internal speech pattern prototypes. In practice, there may exist more or less systematical correlations between these factors and some of the speaker's characteristics, such as his sex, his age, his health conditions, his mood, his regional, cultural, educational background, his possible foreign accent, and the language he is speaking.

In this chapter, we address a class of pattern recognition problems where the goal is to classify a speech pattern according to some characteristics of the speaker who uttered it. We recommend the general term of speaker classification as the denomination of such problematics.

Speaker classification tasks

In the special case when the goal is to identify in which language a given speech utterance has been produced, we recommend to use the term spoken language identification instead of the usual expression of language identification, as the latter can be confused with written language identification.

Finally, if the task consists in finding information about the very identity of the speaker from a speech signal, it is classically designated as speaker recognition.

For speaker classification and recognition tasks, a general distinction must be made between identification and verification. While identification consists in finding to which class or speaker a speech utterance is most likely to belong, verification aims at validating or dismissing the hypothesis that the utterance pertains to a given class or speaker.

Examples of speaker class identification are given above. For what concerns speaker class verification, a typical problem of age verification would consist in checking whether a speaker is an adult or not, and spoken language verification would aim at making sure that such or such utterance was pronounced in a given language (the expected language of an application, for instance).

In the rest of this chapter, we will mainly focus on speaker identification and verification. However, most concepts are easy to generalise to other speaker classification problems.

General definitions

Directly inspired from Atal [Atal 76], here are some general definitions :

Speaker classification : any decision-making process that uses some features of the speech signal to determine some characteristics of the speaker of a given utterance.

Speaker recognition : any decision-making process that uses some features of the speech signal to determine some information on the identity of the speaker of a given utterance.

Speaker class identification : any decision-making process that uses some features of the speech signal to determine the class to which the speaker of a given utterance belongs.

Speaker class verification : any decision-making process that uses some features of the speech signal to determine whether the speaker of a given utterance belongs to a given class.

Speaker identification : any decision-making process that uses some features of the speech signal to determine who is the speaker of a given utterance.

Speaker verification : any decision-making process that uses some features of the speech signal to determine whether the speaker of a given utterance is a particular person, whose identity is specified.

Spoken language identification : any decision-making process that uses some features of the speech signal to determine what language is spoken in a given utterance.

Spoken language verification : any decision-making process that uses some features of the speech signal to determine whether the language spoken in a given utterance is a particular language.



next up previous contents
Next: A taxonomy of Up: Assessment of speaker Previous: Assessment of speaker



WWW Administrator
Fri May 19 11:53:36 MET DST 1995