To facilitate multi-laboratory exploitation of speech database material, SAM has defined a standard, transparent specification of all files used in the acquisition of post-processing of speech recordings for assessment and research purposes. In addition, a standard SAM recording software package, EUROPEC, developed at ICP -- Grenoble, uses a set of SAM standard speaker and prompt-file structures and produces speech and orthographic label files of standard format. This section gives details of formats and functions of files, and defines their function in the different stages of acquisition and processing.
It is now agreed as a standard for SAM speech databases, that a speech file contains only speech waveforms, and that an associated description file is generated at the recording session. Thus the files are matched, their names being identical, except for the last letter of the extension.
For example, if the speaker AA records the corpus number BB (list of six sentences in English), and the current available file number in the recording lab is nnnn, the files produced will be:
The associated description file has standard label-file format, with a header and a body. (see
below Header format for label files;
for body of label file ). It contains all
the
information usually needed by people working on the files without a database management
system.
A label file consists of a header and one or more label bodies. The header consists of the header keyword ``LDH:'' and a number of lines providing information about the labelling and the speech-file to which it applies. The number of lines in the header is not fixed. The header continues until the label body ``LBD:'' keyword appears.
The label body specifies the type and location within the speech-signal file of each segment that has been labelled. It continues until either another label body keyword ``LBD:'' or an end-of-label-file ``ELF:''
Basic Label File Structure:
LHD: header keyword + version (version V4.0, March l991) FIL: file type TYP: specific file type (eg prompt orthographic, spoken orthographic, phonemic, prosodic etc) DBN: database name VOL: database volume ID DIR: directory (for the source file) SRC: source file name CMT: comment TXF: name of the textfile (comment on what this is supposed to contain - namely what was intended to be said, ie prompt file.txt or nothing (if it is a prompt or free speech label file then could be left blank) but could be a file containing instructions) CMT: comment SAM: sampling rate BEG: labelled sequence start position END: labelled sequence end position RED: recording date RET: recording time REP: recording place SNB: number of (8-bit) bytes per sample SBF: sample byte order SSB: number of significant bits per sample RCC: recording conditions code (define a set of values, micro type, position..) NCH: number of channels SPI: speaker information: sex, age, native language PCF: protocol file name (recording protocol used) PCN: protocol number CMT: comment EXP: labelling expert SYS: labelling system DAT: date of completion of labelling SPA: SAMPA version - note 2...and 3.... refer to different levels CMT: comment
Note that the SAM labelling tool PTS requires all fields to have an entry otherwise it will not open the label file; ``-'' should be entered when the information is not available.
Between the label-body keyword (LBD:) and the end-of-label-file keyword (ELF:) four categories of mnemonics can occur:
a) This is file DFS20014.SEO, created automatically during a single-channel recording of file DFS20014.SES.
LHD: V4.0 FIL: label TYP: prompt orthographic (comment needed that mixtures are possible) DBN: - VOL: EUROM.1 DIR: ENGLISH SRC: DFS20014.SES TXF: S2.TXT CMT: Information about the recording session SAM: 16000 BEG: 0 END: 431872 RED: 07/11/89 RET: 15:10:33 REP: I.C.P. Grenoble (FR) SNB: 2 SBF: 01 SSB: 16 RCC: 1 NCH: 1 SPI: M, 39, French PCF: SENTEN.DES PCN: 1 CMT: Information about the labelling session EXP: - SYS: - DAT: - SPA: - CMT: Item: label start, end, input gain, min level, max level, string LBD: - LBR: 0, 55551, 0, -5128, 4775, Decimal numbers are an aid in EXT: adding up. DSC: - LBR: 55552, 158975, 0, -7680, 8878, Monetary systems have EXT: evolved to make use of this base ten notation. DSC: - LBR: 158976, 223743, 0, -7123, 7562, France became the first EXT: decimal country in Europe. DSC: - LBR: 223744, 275199, 6, -12487, 13262, Germany's decision EXT: followed eight years later. DSC: - LBR: 275200, 361983, 6, -11965, 12451, Scandinavian States and EXT: Russia changed in eighteen seventy-five. DSC: - LBR: 361984, 431872, 6, -12902, 14320, Britain chose to have EXT: decimal money only in nineteen seventy-one ! ELF: -
b) This is file ADS10010.SFO, the orthographic label file created automatically during the two channel recording of ADS10010.SFS
LHD: V4.0 FIL: label TYP: orthographic DBN: EUROM\_1 VOL: - DIR: - SRC: ADS10010.SFS TXF: S1.TXT CMT: Information about the recording session SAM: 20000 BEG: 0 END: 193023 RED: 11/Apr/90 RET: 16:47:46 REP: ICP SNB: 2 SBF: 01 SSB: 16 RCC: 2 NCH: 2 SPI: M, 49, French PCF: PEQPHRAS.DES CMT: Information about the labelling session EXP: - SYS: - DAT: - SPA: - CMT: Item: label start, end, input gain, min level, max level, string LBD: - LBR: 0, 17663, 0, -12382, 14455, Maman a prpar une galette EXT: pour jeudi ? LB2: 0, 17663, 0, -12384, 14439 DSC:- LBR: 17664, 41215, 0, -6954, 9023, Ceslves prendront EXT: l'autocar tout l'heure ! LB2: 17664, 41215, 0, -6958, 9024 DSC:- LBR: 41216, 86527, 0, -11750, 15336, Parfois, monpicire EXT: vend crdit. LB2: 41216, 86527, 0, -11756, 15348 DSC: - LBR: 86528, 137983, 0, -11809, 11572, Personne n'a applaudi ce EXT: beau discours ? LB2: 86528, 137983, 0, -11816, 11560 DSC: - LBR: 137984, 193023, 0, -15097, 18881, Je me demande pourquoi EXT: on court sans cesse. LB2: 137984, 193023, 0, -15102, 18880 ELF: -
The standard SAM recording software EUROPEC, is geared to producing signal and label files of standard format. In turn the software requires input data which is formatted according to the following specifications.
The Corpus File lists all the prompt files used in a particular set of recordings. This file is scanned whenever a file is specified for a recording ``take''. The prompt files, which have a 2- character name and extension .txt, are arranged in an alphabetical catalogue A--Z, each letter containing up to 36 prompt files (e.g. A0--A9 plus AA--AZ). Data entries for the prompt files are of two types:
Order of data entries:
CMT:Optional entry providing information about the prompt file, typically the prompt file name and a short description. CCD:Corpus code ( 2 letters (ex: P2) ) DBN:Database name (20 characters max) CNM:Corpus name (40 characters max) CTY:Item type (one letter) NBI:Number of items in the file (int) LAN:Language (char) PCF:Protocol Description File (XXXXXXXX.DES) -------------------------------------------------------------- A - -------------------------------------------------- ......... -------------------------------------------------- P - CMT: P1.TXTPrompt text for passage 1 CCD: P1 DBN: EUROM\_1 CNM: A reading passage (Esprit CD) CTY: P NBI: 1 LAN: E PCF: PASSAGE.DES -------------------------------------------------- Q - ........ -------------------------------------------------- S - CMT: S2.TXTPrompt text for Block 2 of 5 sentences CCD: S2 DBN: EUROM\_1 CNM: Sentences (UCL) CTY: S NBI: 5 LAN: E PCF: SENTEN.DES -------------------------------------------------------- T - CMT: T3.TXTPrompt text for Block 3 of 30 digit triples CCD: T3 DBN: EUROM\_1 CNM: list 3D Digits Triples CTY: N NBI: 30 LAN: E PCF: DIGIT.DES ---------------------------------------------------
This file contains details about the speaker which can be accessed via the speaker codes. To maximise the speaker number, arbitrary code allocation AA-ZZ (=680) should be used rather than the speaker initials used in the examples. In many countries, the SNM and SBN lines will be left blank for data-protection purposes.
Description of speaker specificities -------------------------------------------------------------- order : SCD:Speaker code (2 char) SNM:Speaker name (75 char max) SBN:Speaker birthname (75 char max) SEX:Sex (one letter) DOBDate of birth (year) HET:Height (metres) WET:Weight (kg) NLN:Native language (75 char max) ACC:Accent " ETH:Ethnic group " EDL:Education level " SMK:Smoking habit " PTH:Pathology " -------------------------------------------------------------- A - -------------------------------------------------- D - SCD: DJ SNM: DURAND SBN: Joseph SEX: M DOB: 1946 HET: 1,73 WET: 70 NLN: French ACC: Ardche ETH: white EDL: - SMK: heavy smoker PTH: - ----------------- SCD: DS SNM: DUPUY SBN: Simone SEX: F DOB: 1952 HET: 1,60 WET: 58 NLN: French ACC: - ETH: white EDL: - SMK: - PTH: - ...... -------------------------------------------------- E - ...... - -------------------------------------------------- Z - --------------------------------------------------
The prompt file is given the name of the Corpus Code (CCD:) specified in the Corpus File. The name corresponds to its position in the Corpus File (X2 signifies that it is specified in 2nd position under letter x).
Examples of Prompt Text Files
S1.TXT (5 sentences) S1 is the corpus code
SPR: (start prompt file) TXT: I have a problem with my water softener. TXT: The water-level is too high and the overflow keep dripping. TXT: Could you arrange to send an engineer on tuesday morning ? TXT: It's the only day I can manage this week. TXT: I'd be grateful if you could confirm the arrangement now. EPR: (end prompt file)
S2.TXT (5 sentences) S2 is the corpus code
SPR: TXT: Please put me through to the complaints department. TXT: The repair to the water main outside my house was EXT: unsuccessful, and my cellar's flooded. TXT: Your Water Services Department was singularly unsympathetic. TXT: All their repair teams are apparently booked out for the EXT: next two weeks. TXT: Am I supposed to use the cellar as a swimming pool till EXT: then ? EPR:
Note that if a sentence is more than one line long, there will be one extension line (field EXT).
P1.TXT (passage) P1 is the corpus code
SPR: TXT: Please put me through to the complaints department. The EXT: repair to the water main outside my house was unsuccessful, EXT: and my cellar's flooded. Your Water Services Department was EXT: singularly unsympathetic. All their repair teams are EXT: apparently booked out for the next two weeks. Am I supposed EXT: to use the cellar as a swimming pool till then ? EPR: -
Note that all this passage will be displayed as one item (only one TXT field and several EXT fields).
ST.TXT (5 sentences) ST is the corpus code
To be recorded in ``continuous mode'' (= all pauses between sentences are
retained).
SPR: - TXT: Please put me through to the complaints department. DLA: 3.0 TXT: The repair to the water main outside my house was EXT: unsuccessful, and my cellar's flooded. DLA: 5.0 TXT: Your Water Services Department was singularly unsympathetic. DLA: 3.5 TXT: All their repair teams are apparently booked out for the EXT: next two weeks. DLA: 4.0 TXT: Am I supposed to use the cellar as a swimming pool till EXT: then ? DLA: 3.8 EPR: -
In a recording session, the prompt presentation is driven by a specific file called: Protocol Description File (specified after the PCF: Corpus File entry). It specifies the various stages and messages of the prompt presentation .....
NOTE: Using the ``GET: 3'' command, will produce the following effect:
This is SENTEN.DES (Protocol Description File for a corpus like S1 or S2 and given after PCF: in the Corpus File entry)
SPF: start of protocol file CLS: - VON: inverse, center BEL: - NWL: 2 MSG: ENGLISH SENTENCES NWL: 2 MSG: I.C.P. GRENOBLE VOF: inverse PAU: 2 CLS: - NWL: 2 MSG: Let's practise a little : NWL: 2 MSG: please read the following NWL: 2 MSG: two sentences PAU: 3 CLS: - TRN: - CMT: DISPLAY BEGINNING - TRAINING PHASE RWD: - GET:* 1 SKP:* 2 LOP: -2, 1 CMT: DISPLAY END - TRAINING PHASE RWD: - CLS: - NWL: 2 MSG: Now you're going to be recorded NWL: 2 MSG: this session consists of five sentences NWL: 2 MSG: read each of them NWL: 2 MSG: while they are being displayed PAU: 4 CLS: - NWL: 8 MSG: CAUTION ! SESSION BEGINNING VON: blink NWL: 1 MSG: Hit Space or Click Left Button RET: - VOF: blink CMT: DISPLAY BEGINNING REC: - CLS: - GET:* 1 LOP:* -1, 4 CLS: - NWL: 2 CMT: END DISPLAY CLS: - NWL: 8 VOF: center VON: blink MSG: STOP VOF: blink MSG: Recording PAU: 1 VOF: inverse VON: center CLS: - BEL: - NWL: 4 MSG: It's all over for this file NWL: 1 MSG: THANKS NWL: 8 EPF: end of protocol
NOTE: For the ST corpus, using continuous mode, the protocol file is nearly the same, but:
A Recording Conditions File is selected (and can be modified) prior to each recording session. Some of the entries are accessed for completion of the label-file header, others are for general reference. A number of files specifying different recording conditions can be stored and selected when requested. It describes a referenced set of recording conditions.
Example: 1.RCD
(1 is
recording conditions code 1, implying that this file is the first of a number of
defined conditions.)
SCD: start of conditions RCC: 1 recording condition code VER: V3.0 label version VOL: needed for label file DIR: needed for label file SNB: 2 sample byte number SBF: 01 sample byte order SSB: 16 sample significant bits NCH: 1 number of channels LGG: 0 laryngograph used or not PCN: 1 protocol code number SAM: 20000 sampling frequency (20kHz is SAM standard for EUROM.1ff) MIN: micro name MIT: micro type MIP: micro position MID: micro distance NOB: noise bandwidth NOL: noise level CHB: channel bandwidth CHN: channel noise FLT: additional filter PRO: external processing ENV: environment ECD: end of condition
This file is local to the SAM workstation. It provides information used for the recording process (OROS board type and address) and for completion of the label-file header, as well as registering the number of files recorded on the workstation.
Example:
BTY: 2 Oros board type (1: AU21 2: AU22) ADR: 784 Oros board address LOC: I.C.P. Grenoble (FR) Recording place NUM: 0000 First file number NSC: 1 Number of monitors
)