next up previous contents
Next: SAM recording protocols Up: SAM label file Previous: SAM label file

SAM label file formats

Introduction

To facilitate multi-laboratory exploitation of speech database material, SAM has defined a standard, transparent specification of all files used in the acquisition of post-processing of speech recordings for assessment and research purposes. In addition, a standard SAM recording software package, EUROPEC, developed at ICP -- Grenoble, uses a set of SAM standard speaker and prompt-file structures and produces speech and orthographic label files of standard format. This section gives details of formats and functions of files, and defines their function in the different stages of acquisition and processing.

Speech file and associated description file formats

It is now agreed as a standard for SAM speech databases, that a speech file contains only speech waveforms, and that an associated description file is generated at the recording session. Thus the files are matched, their names being identical, except for the last letter of the extension.

For example, if the speaker AA records the corpus number BB (list of six sentences in English), and the current available file number in the recording lab is nnnn, the files produced will be:

The associated description file has standard label-file format, with a header and a body. (see below gif Header format for label files; gif for body of label file ). It contains all the information usually needed by people working on the files without a database management system.

Label file format

A label file consists of a header and one or more label bodies. The header consists of the header keyword ``LDH:'' and a number of lines providing information about the labelling and the speech-file to which it applies. The number of lines in the header is not fixed. The header continues until the label body ``LBD:'' keyword appears.

The label body specifies the type and location within the speech-signal file of each segment that has been labelled. It continues until either another label body keyword ``LBD:'' or an end-of-label-file ``ELF:''

Basic Label File Structure:

Label-file header format

 

LHD: header keyword + version (version V4.0, March l991) 
FIL: file type 
TYP: specific file type
     (eg prompt orthographic, spoken
orthographic, phonemic, prosodic etc)
DBN: database name 
VOL: database volume ID 
DIR: directory (for the source file) 
SRC: source file name 
CMT: comment 
TXF: name of the textfile
     (comment on what this is supposed to contain - namely what was
intended
     to be said, ie prompt file.txt or nothing (if it is a prompt or free
     speech label file then could be left blank) but could be a file containing
     instructions)
CMT: comment 
SAM: sampling rate 
BEG: labelled sequence start position
END: labelled sequence end position 
RED: recording date 
RET: recording time 
REP: recording place 
SNB: number of (8-bit) bytes per sample 
SBF: sample byte order 
SSB: number of significant bits per sample 
RCC: recording conditions code (define a
set of values, micro type, position..) 
NCH: number of channels 
SPI: speaker information: sex, age, native language 
PCF: protocol file name (recording protocol used) 
PCN: protocol number  
CMT: comment 
EXP: labelling expert 
SYS: labelling system
DAT: date of completion of labelling 
SPA: SAMPA version - note 2...and 3.... refer to different levels
CMT: comment

Note that the SAM labelling tool PTS requires all fields to have an entry otherwise it will not open the label file; ``-'' should be entered when the information is not available.

Label-file body format

 

Between the label-body keyword (LBD:) and the end-of-label-file keyword (ELF:) four categories of mnemonics can occur:

Label types

 

LBR:
This label type occurs a special kind of label file which is created automatically during recording with EUROPEC ( see 3 below). It contains sequence beginning (in samples), sequence end, input gain or recording, minimum sample value, maximum sample value, orthographic text. The text represents the prompt, not necessarily what the speaker actually uttered. In the case of two-channel recordings, each LBR label can be followed by an LB2: sequence (second channel speech file) or LBL: (laryngograph on 2nd channel) that contains the same information: beginning value (in samples), end value, input gain or recording, minimum sample value, maximum sample value. A mnemonic indicating other sensors (e.g. LBN (nasal), LBF (airflow), LBT (tongue contact) etc. can be defined if required. The purpose of this label file is to call up individual sections of larger files. The LBR labels represent ``items'' which can be specified when a signal file is opened.

LBO:
Orthographic labels produced manually. These labels attempt to represent what the speaker actually produced, with indications of pauses, hesitations, repetitions etc. Often, an EXT: line and a CMT: line are needed.

LBB:
Broad Phonetic labels, produced manually or by means of (semi-) automatic label alignment.

LBA:
Acoustic-phonetic labels (sub-divisions of broad phonetic segments) produced manually or by means of semi-automatic labelling.

LBP:
Prosodic labels, produced manually or by means of (semi-) automatic labelling.

Label File example

 

a) This is file DFS20014.SEO, created automatically during a single-channel recording of file DFS20014.SES.

LHD: V4.0
FIL: label 
TYP: prompt orthographic (comment needed that mixtures are possible) 
DBN: -
VOL: EUROM.1 
DIR: ENGLISH 
SRC: DFS20014.SES 
TXF: S2.TXT 
CMT: Information about the recording session 
SAM: 16000 
BEG: 0 
END:
431872 
RED: 07/11/89 
RET: 15:10:33 
REP: I.C.P. Grenoble (FR) 
SNB: 2 
SBF: 01 
SSB: 16 
RCC: 1 
NCH: 1 
SPI: M, 39, French 
PCF: SENTEN.DES 
PCN: 1 
CMT: Information about the labelling session 
EXP: -
SYS: -
DAT: -
SPA: -
CMT: Item: label start,
end, input gain, min level, max level, string 
LBD: -
LBR: 0, 55551, 0, -5128, 4775, Decimal numbers are an aid in
EXT: adding up. 
DSC: -
LBR: 55552, 158975, 0, -7680,  8878, Monetary systems have
EXT: evolved to make use of this base ten notation.
DSC: -
LBR: 158976, 223743, 0, -7123,  7562, France became the first 
EXT: decimal country in Europe. 
DSC: -
LBR: 223744, 275199, 6, -12487, 13262, Germany's decision 
EXT: followed eight years later. 
DSC: -
LBR: 275200, 361983, 6, -11965, 12451,
Scandinavian States and 
EXT: Russia changed in eighteen seventy-five. 
DSC: -
LBR: 361984, 431872, 6, -12902, 14320, Britain chose to have 
EXT: decimal money only in nineteen seventy-one ! 
ELF: -

b) This is file ADS10010.SFO, the orthographic label file created automatically during the two channel recording of ADS10010.SFS

LHD: V4.0 
FIL: label 
TYP: orthographic 
DBN: EUROM\_1 
VOL: -
DIR: -
SRC: ADS10010.SFS 
TXF: S1.TXT 
CMT: Information about the recording session
SAM: 20000 
BEG: 0 
END: 193023 
RED: 11/Apr/90 
RET: 16:47:46 
REP: ICP 
SNB: 2 
SBF: 01 
SSB: 16 
RCC: 2 
NCH: 2 
SPI: M, 49, French 
PCF: PEQPHRAS.DES 
CMT: Information about the labelling session 
EXP: -
SYS: -
DAT: -
SPA: -
CMT: Item: label start,
end, input gain, min level, max level, string 
LBD: -
LBR: 0, 17663, 0, -12382, 14455, Maman a prpar une galette 
EXT: pour jeudi ? 
LB2: 0, 17663, 0, -12384, 14439 
DSC:-
LBR: 17664, 41215, 0, -6954, 9023, Ceslves prendront
EXT: l'autocar tout  l'heure
! 
LB2: 17664, 41215, 0, -6958, 9024 
DSC:-
LBR: 41216, 86527, 0, -11750, 15336, Parfois, monpicire 
EXT: vend  crdit. 
LB2: 41216, 86527, 0, -11756, 15348 
DSC: -
LBR: 86528, 137983, 0, -11809, 11572, Personne n'a applaudi ce 
EXT: beau discours ?
LB2: 86528, 137983, 0, -11816, 11560 
DSC: -
LBR: 137984, 193023, 0, -15097, 18881, Je me demande pourquoi 
EXT: on court sans cesse. 
LB2: 137984, 193023, 0, -15102, 18880 
ELF: -

Files used in a recording session using EUROPEC

The standard SAM recording software EUROPEC, is geared to producing signal and label files of standard format. In turn the software requires input data which is formatted according to the following specifications.

Corpus File: CORPUS.DBF

The Corpus File lists all the prompt files used in a particular set of recordings. This file is scanned whenever a file is specified for a recording ``take''. The prompt files, which have a 2- character name and extension .txt, are arranged in an alphabetical catalogue A--Z, each letter containing up to 36 prompt files (e.g. A0--A9 plus AA--AZ). Data entries for the prompt files are of two types:

  1. those required for the prompting process, and
  2. those required for the file header.

Order of data entries:

CMT:Optional entry providing information about the prompt file, typically the prompt file 
name and a short
description. 
CCD:Corpus code  ( 2 letters (ex: P2) ) 
DBN:Database name (20 characters max) 
CNM:Corpus name (40 characters max) 
CTY:Item type (one letter) 
NBI:Number of items in the file (int) 
LAN:Language (char) 
PCF:Protocol Description File 
(XXXXXXXX.DES) 
--------------------------------------------------------------
A 
- 
-------------------------------------------------- 
......... 
-------------------------------------------------- 
P 
- 
CMT:       P1.TXTPrompt text for passage 1
CCD:
P1 
DBN: EUROM\_1 
CNM: A reading passage (Esprit CD) 
CTY: P 
NBI: 1 
LAN: E 
PCF: PASSAGE.DES 
-------------------------------------------------- 
Q 
- 
........ 
-------------------------------------------------- 
S 
- 
CMT:       S2.TXTPrompt text
for Block 2 of 5 sentences
CCD: S2 
DBN: EUROM\_1 
CNM: Sentences (UCL) 
CTY: S 
NBI: 5 
LAN: E 
PCF: SENTEN.DES
-------------------------------------------------------- 
T 
- 
CMT:        T3.TXTPrompt text for Block 3 of 30 digit triples
CCD: T3 
DBN:
EUROM\_1 
CNM: list 3D Digits Triples 
CTY: N 
NBI: 30 
LAN: E 
PCF: DIGIT.DES 
---------------------------------------------------

Speaker File: SPEAKERS.DBF

This file contains details about the speaker which can be accessed via the speaker codes. To maximise the speaker number, arbitrary code allocation AA-ZZ (=680) should be used rather than the speaker initials used in the examples. In many countries, the SNM and SBN lines will be left blank for data-protection purposes.

        Description of speaker specificities 
--------------------------------------------------------------
order : 
SCD:Speaker code  (2 char) 
SNM:Speaker name (75 char max) 
SBN:Speaker
birthname (75 char max) 
SEX:Sex (one letter) 
DOBDate of birth (year) 
HET:Height (metres)
WET:Weight (kg) 
NLN:Native language (75 char max) 
ACC:Accent               " 
ETH:Ethnic group         " 
EDL:Education level      "
SMK:Smoking habit        " 
PTH:Pathology            " 
--------------------------------------------------------------
A 
- 
-------------------------------------------------- 
D 
-                                   
SCD: DJ                  
          
SNM: DURAND                         
SBN: Joseph                         
SEX: M                              
DOB: 1946                           
HET: 1,73                           
WET: 70                             
NLN: French         
               
ACC: Ardche                        
ETH: white                          
EDL: -                               
SMK: heavy smoker                     
PTH: -                               
----------------- 
SCD: DS
SNM: DUPUY
SBN:
Simone
SEX: F
DOB: 1952
HET: 1,60
WET: 58
NLN: French
ACC: -
ETH: white
EDL: -    
SMK: -
PTH: -
...... 
-------------------------------------------------- 
E 
- 
...... 
- 
-------------------------------------------------- 
Z 
-
--------------------------------------------------

Prompt file

The prompt file is given the name of the Corpus Code (CCD:) specified in the Corpus File. The name corresponds to its position in the Corpus File (X2 signifies that it is specified in 2nd position under letter x).

Examples of Prompt Text Files

S1.TXT (5 sentences) S1 is the corpus code

SPR: (start prompt file) 
TXT: I have a problem with my water softener. 
TXT: The water-level is too high and the overflow keep
dripping.
TXT: Could you arrange to send an engineer on tuesday morning ?
TXT: It's the only day I can manage this week. 
TXT: I'd be grateful if you could confirm the arrangement now.
EPR: (end prompt file)

S2.TXT (5 sentences) S2 is the corpus code

SPR: 
TXT: Please put me through to the complaints department. 
TXT: The repair to the water main outside my house was 
EXT: unsuccessful, and my cellar's flooded. 
TXT: Your Water Services Department was singularly
unsympathetic. 
TXT: All their repair teams are apparently booked out for the 
EXT: next two weeks. 
TXT: Am I supposed to use the cellar as a swimming pool till 
EXT: then ?
EPR:

Note that if a sentence is more than one line long, there will be one extension line (field EXT).

P1.TXT (passage) P1 is the corpus code

SPR: 
TXT: Please put me through to the complaints department. The 
EXT: repair to the water main outside my house was unsuccessful, 
EXT: and my
cellar's flooded. Your Water Services Department was 
EXT: singularly unsympathetic.  All their repair teams are 
EXT: apparently booked out for the next two weeks.  Am I supposed 
EXT: to use the cellar as a swimming pool till then ? 
EPR:
-

Note that all this passage will be displayed as one item (only one TXT field and several EXT fields).

ST.TXT (5 sentences) ST is the corpus code
To be recorded in ``continuous mode'' (= all pauses between sentences are retained).

SPR: -
TXT: Please put me through to the complaints department. 
DLA: 3.0
TXT: The repair to the water main outside my house was 
EXT: unsuccessful, and my cellar's flooded. 
DLA: 5.0
TXT: Your Water Services Department was
singularly unsympathetic.
DLA: 3.5
TXT: All their repair teams are apparently booked out for the 
EXT: next two weeks. 
DLA: 4.0
TXT: Am I supposed to use the cellar as a swimming pool till 
EXT: then ?
DLA: 3.8
EPR: -

Protocol description file

In a recording session, the prompt presentation is driven by a specific file called: Protocol Description File (specified after the PCF: Corpus File entry). It specifies the various stages and messages of the prompt presentation .....

Codes used for presentation protocol (Protocol Description File)

NOTE: Using the ``GET: 3'' command, will produce the following effect:

Example of Protocol Description File

This is SENTEN.DES (Protocol Description File for a corpus like S1 or S2 and given after PCF: in the Corpus File entry)

SPF: start of protocol file 
CLS:
-
VON: inverse, center 
BEL: -
NWL: 2 
MSG: ENGLISH SENTENCES 
NWL: 2 
MSG: I.C.P. GRENOBLE 
VOF: inverse 
PAU: 2 
CLS: -
NWL: 2 
MSG: Let's practise a little : 
NWL: 2 
MSG: please read the following 
NWL: 2 
MSG: two sentences
PAU: 3 
CLS: -
TRN:
-
CMT: DISPLAY BEGINNING - TRAINING PHASE 
RWD: -
GET:* 1 
SKP:* 2 
LOP: -2, 1 
CMT: DISPLAY END - TRAINING PHASE 
RWD: -
CLS: -
NWL: 2 
MSG: Now you're going to be recorded 
NWL: 2 
MSG: this session consists of five sentences 
NWL: 2 
MSG: read each
of them 
NWL: 2 
MSG: while they are being displayed
PAU: 4 
CLS: -
NWL: 8 
MSG: CAUTION ! SESSION BEGINNING 
VON: blink 
NWL: 1 
MSG: Hit Space or Click Left Button 
RET: -
VOF: blink 
CMT: DISPLAY BEGINNING 
REC: -
CLS: -
GET:* 1 
LOP:* -1, 4 
CLS:
-
NWL: 2 
CMT: END DISPLAY 
CLS: -
NWL: 8 
VOF: center 
VON: blink 
MSG:   STOP 
VOF: blink 
MSG:    Recording 
PAU: 1 
VOF: inverse 
VON: center 
CLS: -
BEL: -
NWL: 4 
MSG: It's all over for this file 
NWL: 1 
MSG: THANKS 
NWL: 8 
EPF: end of
protocol

NOTE: For the ST corpus, using continuous mode, the protocol file is nearly the same, but:

Recording conditions file

A Recording Conditions File is selected (and can be modified) prior to each recording session. Some of the entries are accessed for completion of the label-file header, others are for general reference. A number of files specifying different recording conditions can be stored and selected when requested. It describes a referenced set of recording conditions.

Example: 1.RCD
(1 is recording conditions code 1, implying that this file is the first of a number of defined conditions.)

SCD:                   start of conditions 
RCC:    1              recording condition code 
VER:    V3.0           label version 
VOL:      
            needed for label file 
DIR:                   needed for label file 
SNB:    2              sample byte number 
SBF:    01             sample byte order 
SSB:    16             sample significant  bits 
NCH:    1              number of
channels 
LGG:    0              laryngograph used or not 
PCN:    1              protocol code number 
SAM:    20000          sampling frequency (20kHz is SAM standard for EUROM.1ff)
MIN:                   micro name
MIT:                   micro
type
MIP:                   micro position      
MID:                   micro distance
NOB:                   noise bandwidth
NOL:                   noise level
CHB:                   channel bandwidth
CHN:                   channel noise
FLT:          
        additional filter
PRO:                   external processing 
ENV:                   environment
ECD:                   end of condition

PRIVATE.PEC file

This file is local to the SAM workstation. It provides information used for the recording process (OROS board type and address) and for completion of the label-file header, as well as registering the number of files recorded on the workstation.

Example:

BTY:
2                            Oros board type (1: AU21  2: AU22)  
ADR: 784                          Oros board address 
LOC: I.C.P. Grenoble (FR)         Recording place 
NUM: 0000                         First file number 
NSC: 1                       
    Number of monitors

)



next up previous contents
Next: SAM recording protocols Up: SAM label file Previous: SAM label file



WWW Administrator
Fri May 19 11:53:36 MET DST 1995