SAMPA (Speech Assessment Methods Phonetic Alphabet) is a machine-readable phonetic alphabet originally developed under the ESPRIT project 1541 (SAM) in 1987--89 by an international group of phoneticians and applied in the first instance to Danish, Dutch, English, French, German and Italian (SAM 1988, 1989). It has since been extended to other languages, including Norwegian, Swedish, Spanish, Portuguese and Greek.
Section A.1.2 covers the present status of SAMPA, and addresses the languages Danish, Dutch, English, French, German, Italian, Norwegian, Portuguese, Spanish, Swedish and Greek. Section A.1.3 discusses additional levels of annotation. A.1.4 addresses important issues to be considered in the relationship of SAMPA to other computer-coded phonetic transcription systems in use in the world. The IPA convention in Kiel, August 19--21, 1989, presented an opportunity to assess the situation.
In principle, SAMPA provides for phonemic notation of languages. For example, the r-sounds of English rip, trip, and drip are all instances of the phoneme /r/, although different articulatory and acoustically (in voicing and in presence/absence of friction). These different allophones are predictable from the phonetic context: we can unambiguously write them all as /r/. The arguments for preferring phonemic notation to allophonic are (i) it is simpler while still being unambiguous; (ii) correct identification of allophones may be difficult for those without phonetic training; and (iii) too few codes are available in the range 32--127 to provide for all allophones.
In syllable-initial position, English /t/ is alveolar and aspirated; French /t/, dental and unaspirated; Swedish /t/, dental and aspirated. We ignore these comparative differences in our notation, writing all as /t/. SAMPA does not need to adopt distinct symbols to reflect these differences. (However, if and when SAMPA is applied to Hindi, for example, where these differences are phonemic, it would become necessary to notate them explicitly.)
In continuous speech the actual sounds used in pronouncing a word may well differ from the word's citation form (dictionary entry). A phonotypical transcription is one in which citation forms are modified in accordance with known phonetic rules of connected speech. For example, in a phonotypical transcription of English, final linking /r/ would be shown before a following vowel ( better ask) but not before a consonant ( better go); the lexical entry would be invariant. In an actual utterance the speaker might or might not conform to phonotypical expectations; an impressionistic transcription reflects a human (or mechanical) auditory or acoustic analysis of what was actually said. In the case at issue, /r/ would be shown if phonetically present in a given instance, not otherwise.
In practice, colleagues working on the various languages to which SAMPA has been applied have chosen to deviate in various respects from these principles. English has plosive /d/ and fricative / '023 / (SAMPA /D/) as distinct phonemes ( den, then). In Spanish, they are undoubtedly allophones of the same phoneme, and could unambiguously both be written /d/; but for speech technology work our Spanish colleagues prefer to notate them distinctively, as ``d'' and ``D'' respectively. The r-sounds in French rouge, lettre are different from all the English r-sounds, being respectively a voiced and voiceless uvular fricative. It would seem unambiguous and logical to write them, too, as /r/. But our French colleagues have preferred to use the distinct uvular-r symbol, also provided in SAMPA, namely /R/.
Nevertheless I believe we should as far as possible discourage allophonic and comparative notation. Bulgarian has the simple 6-vowel system, IPA /i e a o u '025 /. A colleague in Bulgaria has proposed that they be represented in SAMPA as /I, E, a, O, U, @/. About /a/ and /@/ (= IPA / '025 /) we can agree. But the other symbols he proposes are inappropriately comparative. The Bulgarian vowels should appear in SAMPA as /i, e, a, o, u, @/.
In the (American) English extended ASCII character set used by PCs running MS-DOS, the range 128--255 is used to provide for the screen and printer a number of accented alphabetic letters, currency symbols, graphic symbols, and Greek and mathematical symbols. Those that are not available on the keyboard can be accessed by entering their ASCII number on the keypad while pressing the Alt key. Unfortunately, from the point of view of non-English-speaking Europeans, this extended ASCII fails to provide all the accented Latin letters needed for such languages as Portuguese, Icelandic, Czech, Polish and Hungarian. To remedy this shortcoming, a number of different ``code pages'' are now available, each providing a different set of characters in the 128--255 range. In the USA and the UK most PCs use code page 437 (International English), in Western Europe 850 (Multilingual Latin I), and in much of Eastern Europe 852 (Slavic Latin II).
Applications running under the popular front-end Windows use yet another character set, one known as ``enhanced ANSI''. This is identical with the ASCII set for 33--127; for 128--255 it offers its own specific choice of accented alphabetic and other characters, with codes different from ASCII.
The consequence is that in PC-compatible computing the code numbers in the range 128--255 (the ``extended'' characters) may currently have several different interpretations. Conversely, a given character may be coded in several different ways.
Consider the IPA symbols /æ/ and / '023 /, both needed for the phonetic transcription of English. For reasons that seemed valid at the time (cf. Wells (1987: 95)), SAMPA assigned the former the code 123, which now appears on all Latin-alphabet PC screens as ``{''; the latter was coded 68, ``D''.
Both ``æ'' and `` '023 '' are now available on-screen for PCs running Windows. While ``æ'' is an ASCII character, with the extended code 145 (for those using code page 437 or 850), `` '023 '' is not. But both are in the enhanced ANSI set, with codes 230 and 240 respectively. (Hence under Windows they can be accessed, if not on the keyboard, by keying Alt+0230 and Alt+0240; ``æ'' can also be accessed as Alt+145.)
However, a PC using code page 852 (Slavic) will display code 145
as an upper-case Polish L-with-acute-accent (L), 230 as ``S'', and 240 as
a dieresis ( ). With code page 860 (Portugal), 145 is ``À'', 230
``'' and 240 ``
''.
Recently a number of phonetic fonts have become available for use under Windows. These comprise only phonetic symbols (perhaps with a few punctuation signs). Unfortunately they disagree extensively on key assignment and coding. On my PC I now have three TrueType phonetic fonts provided by the Summer Institute of Linguistics and four others of whose origins, I regret to say, I am uncertain. These fonts agree with SAMPA (but not ANSI) in assigning `` '023 '' to code 68/D; but for ``æ'' they assign codes and keystrokes 81/Q (SIL Doulos/Manuscript/Sophia IPA), 60/< (Times IPA New), 64/@ (Tech Phonetic), and 233 (IPA Roman 1, IPA Plus).
A number of other EC languages have been examined in the light of the SAMPA recommendations, and a short summary of the possible solutions for their special features is given here. For more details, see J. Wells, ``Computer-coded phonetic transcription'', Journal of the International Phonetic Association 17, No. 2, pp. 94--114, and the SAM Definition Phase Final Report (ESPRIT project 1541), January 1988.
Most of the minority languages of Europe such as Basque, Breton, Catalan, and Frisian can be transcribed adequately at a phonemic level without the need to change the principles of the present recommendation. Irish and Scottish Gaelic, however, require a decision for coding the palatalised (or ``slender'') consonants and the ``double'' nasals and laterals. Scottish Gaelic also has a back unrounded vowel series which does not occur in other EC languages. Welsh requires a solution for the voiceless alveolar lateral, represented in the orthography as ``<ll>''.
We should like now to explore whether it would be suitable to extend SAMPA for application to other languages, including Chinese, and if so how.
The question of Chinese has arisen because of the prospect of a wider collaboration on speech research between University College London and the Chinese Academy of Sciences.
Chinese already has what appears to be a satisfactory machine-readable phonetic notation in the form of Pinyin, the romanisation that has for some years been standard in the People's Republic (though not in Taiwan). Pinyin is an ingenious quasi-phonemic notation. It includes a number of unconventional digraphs, together with unconventional uses of individual Latin letters. Thus sh, ch, and zh represent retroflex/postalveolar consonants of a type that would normally be written in SAMPA as [S, tS, dZ]. Pinyin x, q, j represent a corresponding series of alveolopalatal consonants, IPA [ '013 , t '013 , d '136 ], for which SAMPA does not currently cater. Pinyin c represents [ts], y [j], and ng [ '070 ]. The close front rounded vowel [y] is written u where there would be no confusion with [u], but ü where this confusion might arise. (This last Pinyin character is not actually machine-readable in our sense.)
Continuing to use Pinyin for Chinese but SAMPA for other languages would mean that characters such as ``x, j'' would have different meanings in different languages (``x'' = alveolopalatal fricative, or velar fricative; ``j'' = alveolopalatal affricate, or palatal approximant). But this is perhaps no worse than the ``comparative'' differences already present in the interpretation of some symbols (see above). The Pinyin notation ``i'' already covers a remarkable range of allophonic possibilities (including an r-coloured back vowel in shi and a slightly fricative central vowel in si). Are Chinese speech technologists happy with this degree of phonemic abstraction?
Tone is shown in Pinyin (if indeed it is shown) by superscript accent marks, thus ma, má, ma, mà. These are not machine-readable in the SAMPA sense. The corresponding SAMPA tone-marks would be /''ma, 'ma, ` 'ma, ` ma/. However these SAMPA signs have not proved popular, and perhaps ought to be changed. For Chinese, we could perhaps consider instead the use of numerals, thus ``ma1, ma2, ma3, ma4''.
The following table presents the system agreed among the representatives of eight European countries engaged in European collaborative speech technology assessment research (SAM). It is currently being tested in the transcription and labelling of European multi-lingual databases.
SAMPA computer readable phoneme alphabet for European languages, with ASCII and IPA definitions (1990):
Table 1: Consonants
Table 2: Boundary and prosodic features
Table 3: Vowels
Table 4: Two character symbols
Table 5: Currently under discussion
Table 6: Currently used in French work
This section provides a brief outline of the phonemic distinctions in the languages of the eight countries engaged in the initial phase of the SAM project by providing example words for the use of each phonemic symbol. Information is also provided for the languages of Spanish, Portuguese and Greek considered additionally.
Consonants
The plosives are /p, b, t, d, k, g/:
The fricatives are /f, s/:
The approximants are /v, D, j, h/:
The nasals are /m, n, N/:
The liquids are /l, R/:
Stød is symbolised by ``?'' and may be found in syllables containing
a long stressed vowel, or a short stressed vowel, or a
short stressed
vowel followed by a voiced consonant, e.g. pæu --- /pE:?u/, peu --- /pEu?/
Vowels
The vowel system chosen for broad phonetic transcription is: /i, e, E,
a, A, y, 2, 9, u, o, O, @/, with all
vowels except @ occurring with a length
distinction: /i:, e:, E:, a:, A:, y:, 2:, 9:, u:, o:, O:/.
The unrounded front vowels are exemplified in the following:
The central
vowels are:
The rounded front vowels are:
The back vowels are:
Diphthongs.
The falling diphthongs may be most economically analysed phonemically
as vowel plus /j/, /v/, or /r/, but for the broad phonetic representation
within SAMPA they are analysed as vowel plus /i/, /u/
or /Q/, for example:
Consonants
The plosives are /p, b, t, d, k/, (/g/):
The fricatives are /f, v, s, z, x, h/, (/G/):
The sonorants (nasals, liquids and glides) are /m, n, N, l, r, w, j/:
Vowels
The Dutch vowels fall into two classes, ``checked'' (not occurring in
a stressed syllable without a following consonant) and ``free''.
The checked vowels are /I, E, A, O,
Y, @/:
The free vowels comprise four monophthongs /i, y, u, a:/, three ``potential
diphthongs''
/e:, 2:, o:/, and three ``essential diphthongs'', /Ei, 9y, Au/, exemplified
as
follows:
There are also six vowel sequences which are sometimes described as
diphthongs:
Several marginal
vowel phonemes are only found in loanwords:
Consonants
There are six plosives /p, b, t, d, k, g/:
There are two phonemic affricates /tS/ and /dZ/:
There are nine fricatives /f, v, T, D, s, z, S, Z, h/:
The sonorants are three nasals /m, n, N/, two liquids /r, l/
and two sonorant glides /w, j/:
Vowels
The English vowels fall into two classes, traditionally known as ``short''
and ``long'' but better described as ``checked'' (not occurring in a stressed
syllable without a following consonant) and ``free''.
The checked vowels
are /I, e, {, Q, V, U/:
There is a short central vowel, normally unstressed:
The free vowels comprise
monophthongs and diphthongs, although no hard
and fast line can be drawn between these categories. They can be placed
in three groups according to their final quality /i:, eI, aI, OI/, /u:,
@U, aU/, /3:, A:, O:, I@, e@, U@/. They are exemplified as
follows:
The vowels /i:/ and /u:/ in unstressed syllables vary in their pronunciation
between a close [i] and a more open [i] (close [u] --- more open [u]). Therefore, it
is
suggested that /i/ and /u/ be used as indeterminacy symbols.
Consonants
There are six plosives /p, b, t, d, k,
g/:
There are seven fricatives /f, v, s, z, S, Z, j/. /j/ can be realised as a fricative
or an approximant.
There are four nasals /m, n, J, N/, the last of which is only found in loanwords:
There are two liquids /l, R/ and two vowel glides /w, H/ (besides /j/):
Vowels
The vowel system comprises 12 oral vowels /i, e, E, a, A, O, o, u, y, 2, 9, @/,
and 4 nasal vowels /e, a
, o
, 9
/,
exemplified as follows:
When they are functional, the load of the oppositions /a/--/A/, /e/--/9
/,
/e/--/E/, /o/--/O/, /2/--/9/ may be very low for
certain speakers, and there is a
tendency towards neutralisation. When they are not functional there
is a strong tendency in unstressed syllables towards indetermination.
``Indeterminacy'' symbols have been agreed to cover occurrences of these
phonemes
or sounds
Consonants
There are six plosives /p, b, t, d, k, g/:
There are four phonemic affricates /pf, ts, tS/ and /dZ/, which occur
in a few loanwords:
There are ten fricatives /f, v, s, z, S, Z, C, j, x, h/.
/j/ is often realised
as a vowel glide.
The sonorants are three nasals /m, n, N/, and two liquids /l, r/:
Orthographic <r> is realised phonetically in a number of different ways:
Vowels
The
vowels fall into three groups, ``checked'' (short), ``free'' (long), and
two short vowels that only occur in unstressed position. The checked
vowels are /I, E, a, O, U, Y, 9/:
There are 8 pure free vowels are /i:, e:, E:, a:, o:, u:, y:, 2:/
and three free diphthongs /aI, aU, OY/:
The unstressed ``schwa'' vowel is:
The vowel realisation of <r>, represented as 6, fuses with schwa,
but it also follows stressed vowels, resulting in additional centring
diphthongs:
Consonants:
plosives
affricates
fricatives
nasals
liquids
semivowel
(palatals)
connected speech phenomena
Consonants
There are six single and six geminate plosives /p, b, t, d, k, g/, /pp, bb, tt,
dd, kk, gg/ as follows:
There are four single and four geminate affricates /ts, dz, tS, dZ/, /tts, ddz, ttS, ddZ/:
There are five single and four geminate fricatives /f, v,
s, z, S/, /ff, vv,
ss, SS/:
There are three single and geminate nasals /m, n, J/, /mm, nn, JJ/, three
single and three geminate liquids /r, l, L/, /rr, ll, LL/ and two
semi-vowels
/j, w/:
Vowels
The vowel system comprises seven vowels /i, e, E, a, O, o, u/:
There are six plosives:
There are six fricatives:
There are five sonorant consonants (nasals, liquids, trills):
Vowels
There are 9 long vowels:
and nine short vowels:
There are seven diphthongs:
In addition there are important
allophonic variants for which the transcription
has been agreed:
In cases where the dental consonants do not change into retroflexes,
they are transcribed using the separator sign
(ASCII 45), e.g. /r-t/, /r-d/:
Consonants:
plosives
fricatives
nasals
liquids
Vowels and diphthongs
Consonants:
plosives
affricates
fricatives
nasals
liquids
semivowels
Vowels:
Consonants
There are six plosives:
There are six fricatives:
There are six sonorant consonants (nasals, liquids and semi-vowels):
Vowels
There are nine long and nine short
vowels.
Long vowels (followed by a short consonant):
Short vowels (followed by a long consonant):
There are also two pre-r-allophones (long and short) of /E/ and /2/
(see below).
The following important allophonic variants occur in Swedish which
require separate symbolic representation:
The present SAMPA system, which was provisionally agreed at the end of the Extension Phase, is defined as a system for phonemic transcription and annotation. This means that the symbols are used according to the analysis of distinctive sound oppositions within each language. Thus, although their relation to sound category symbols of the International Phonetic Alphabet (IPA) is given, they are symbols of intra-language convention, and do not have an exact language-independent phonetic (auditory or acoustic) equivalence, nor do they represent a single sound within a language.
For example, the symbol /t/, used in the transcription of all 8 partner languages, could represent an unaspirated sound in French or Italian, a strongly aspirated sound in German or English, and an affricated sound in Danish. In English the /t/ can also stand for an unaspirated sound (following /s/) or the more usual aspirated sound. Vowel symbols often represent widely diverging sounds from one language to another; /{/ in Danish is very different from /{/ in English, for example.
This basically phonemic, or sound-system-orientated (systematic) function of SAMPA means that a general extension of the SAMPA coding system to allow fine phonetic differentiation of speech sounds is not possible. There are, however, examples in the SAMPA list of symbols which can be used to represent non-distinctive differences within a language, e.g. `r' and `R' for regionally dependent free variants, and some important allophonic variants are allowed for (e.g. in Swedish and Norwegian). Also, auditory transcription (French ``notation'') is meant to be a ``broad phonetic'' representation of the actual utterance, including elisions and assimilations (inasfar as these can be represented with the phonemically orientated SAMPA inventory) rather than the strictly phonemic string of the citation form.
One area in which an extension of SAMPA is possible, indeed probable, is prosody. Certain ``Boundary and Prosodic Features'' have been agreed preliminarily, but their use has only been illustrated in the English EUROM.0 transcriptions. The considerations of prosodic description in a multi-lingual context may well reveal the need to modify and extend SAMPA. The work on prosodic description may also conclude that a separate prosodic annotation tier is necessary.
For finer segmental annotation of speech recordings, three basically different approaches are offered for discussion. All three approaches require a separate annotation tier, but the labels are temporally defined by the location of the phonemic segment boundaries (phonemic markers in the case of centre labelling).
Advantages: No new segmentation or marker placements would be required.
Disadvantages:
Advantages: The acoustic realisation of each phonemic segment
is defined in greater detail
than is possible even in narrow phonetic transcription, where, for example, a
partially voiced closure cannot be easily represented.
Disadvantages:
Note: It must be pointed out that the two-symbol representation given above is redundant, in that the acoustic-event categories are common to phoneme classes rather than individual phonemes; i.e. pc, tc, and kc would all be a period of voiceless closure and therefore not require the place specification. Also, if the phonemic category is specified in a different tier of annotation, it is recoverable, and may be used for a database search, e.g. with a view to developing a set of rules covering the possible ``internal'' structures of stretches of signal associated with a particular phoneme. At present, some partners need to retain the ``phonemic pointer'' in order to derive the phonemic label file from the lower level acoustic-event file.
Advantages: The theoretically doubtful ``changeover point'' from one ``phoneme'' segment to another is avoided, and areas of indeterminacy are identified.
Disadvantages: New markers have to be set.
Each of these approaches would provide an annotation which is closer to the (acoustic-) phonetic realisation of the utterance than the phonemic SAMPA labels. For the development of speech knowledge in general, and for the definition of rules describing the structure of continuous speech in particular, the use of a more detailed annotation is essential. It is the symbolic bridge between measurable acoustic parameters and abstract phonological categories. Which approach is selected for more detailed annotation within the SAM project depends on the use to which it will be put. Essentially, the closer a symbolic representation comes to significant acoustic events (whereby ``significant'' is an application-dependent term), the more useful it will be in speech-knowledge acquisition and rule development. Both synthesis and recognition assessment can only gain.
References
SAM 1988. ESPRIT Project 1541: Definition Phase Final Report ``Multilingual Speech Input/Output Assessment Methodology and Standardisation''. London: University College London.
SAM 1989. ESPRIT Project 1541: Extension Phase Final Report. London: University College London. VI.2: First appraisal of SAMPA.
Wells, J.C., 1987. ``Computer-coded phonetic transcription'', Journal of the International Phonetic Association 17:2, pp. 94--114.
)