Next:
Introduction
Up:
EAGLES Handbook on
Previous:
EAGLES Handbook on
Contents
Introduction
System design
Introduction
System capability profile versus application requirement profile
Technical features versus system capabilities
Speech recognition systems
Speaker dependency
Speaker adaptive
Speaking aspects
Speaking mode
Speaking fluency
Speaking rate
Non-speech sounds
Vocabulary aspects
Lexicon size
Speech training material
Lexicon generation
Field speech data (in-situ recording)
Branching factor (Perplexity factor)
Rejection mode
Cut through / Anticipation
Application's vocabulary and confusion matrix
Language modelling
Channel adaptation / Environment adaptation
Task adaptation
Signal module
Speaker verification/identification
Speaker verification versus speaker identification
Text dependent / Text independent
Sentences defined by the technology provider
Sentences defined by the application developer
Training carried out by the technology provider
Training carried out by the application developer
Speech synthesis
Speech recording, storage, and playback
Speech concatenation
Text to speech synthesis
The linguistic part
Phonetic part
Acoustic module
Multi-linguality
Interactive voice systems
Software aspects
Operating systems
Drivers
Application programming interfaces
Application generators
Hardware aspects
Platforms
Connectivity
Characteristics
Multi-linguality aspects
Conclusions
Corpus design
Introduction
About this chapter
Seven main differences between collections of written NL- and spoken SL-data
Durability of text, volatility of speech
Different production times for text and speech
Correcting errors in the production of text and speech
Orthographic identity and phonetic variability of lexicalised units
Printable ASCII-strings and continuously sampled speech
Size differences between NL- and SL-data
The different nature of categories and time functions
Applications of spoken language corpora
Speech corpora for research purposes
Phonetic research
Sociolinguistic research
Psycholinguistic research
First language acquisition
Second language acquisition
General linguistic research
Audiology
Speech pathology
Speech corpora for technological applications
Speech synthesis
Speech recognition
Knowledge-based vs. stochastic systems
Speaker-independent vs. speaker-dependent systems
Isolated words vs. continuous speech
Corpora for speech recognition research
Spoken language systems
Speaker recognition/verification
Specification of the linguistic content
Different types of speech data
Read aloud isolated phonemes
Read aloud isolated words
Read aloud isolated sentences
Read aloud text fragments
Semi-spontaneous speech
Spontaneous speech about a predetermined subject
The Wizard of Oz technique
Spontaneous speech
Factorial experiments and corpus studies
Specification of number and type of speakers
Corpus size in terms of speakers
Speech corpora with few speakers
Speech corpora with about 5 to 50 speakers
Speech corpora with more than 50 speakers
General remarks
Speaker characteristics
Stable Transient speaker characteristics
Demographic coverage
Male/Female
Pitch and intensity
Overall spectral slope
Accuracy of pronunciation
Vocabulary and syntax
Age
Voice quality
Vocabulary and syntax
Weight and height
Smoking and drinking habits
Pathological speech
Professional Untrained speakers
Geographical and sociolinguistic factors
Corpus collection
Introduction
Data collection dimensions
Visibility: Open vs. Secret
Environment: Studio vs. On Location
Recording in a studio
Recording on location
Communication Situation
Face to face
Isolated
Communication Mode
Dialogue
Prompted Interview
Text den Os
Verification
On-line verification
Text den Os
Off-line (or Post-hoc) verification
Monitoring
Procedures
Equipment
Microphone
Recommendations
Amplifier/Processor
Recommendations
Recording Device
Recommendations
Speaker Recruitment
Recommendations
Scheduling speakers
Recommendations
Speaker Prompting
Recommendations
Cost
Recommendations
Multi-Channel Recording
Laryngography
Electropalatography
Electromagnetic Articulography
Cineradiography
Air-flow measurements
X-ray microbeam
Nuclear magnetic resonance imaging
Ultrasound imaging
Wizard of Oz
Legal Aspects
Corpus representation
Introduction
Transcription of spoken language corpora
The transcription of read vs. the transcription of spontaneous speech
Transcription of dialogues
Levels of transcription
Orthographic transcription
Reduced word forms
Dialect forms
Numbers
Abbreviations and spelled words
Interjections
Orthographic transcription of read speech
Phonemic transcription
Allophonic transcription
Phonetic transcription
The CRIL convention
Prosodic transcription
ToBI transcription system
Segmentation and Labelling
Definition and motivation
Definition
A caveat
Use in speech and language technology research
Use in linguistic research
Levels of segmentation and labelling
Recording script
Orthographic
Morpho-syntactic
Citation-phonemic
Broad phonetic (or phonotypical)
Narrow phonetic
Acoustic-phonetic
Physical
Non-linguistic phenomena
General recommendations for transcribing
Manual segmentation
Automatic and semi-automatic segmentation
Segmentation and labelling in the VERBMOBIL project
Prosodic labelling and annotation
Definition and motivation
Types of approach to prosodic labelling
Examples of the two types of approach
The ToBI labelling system
The MARSEC labelling system
The IPO approach
Provisional recommendations
Annotation conventions
Deletions (read text)
Verbal deletions or corrections (implicitly or explicitly)
Word fragments
Unintelligible words
Hesitations (filled pauses)
Non-speech acoustic events
Recommendations
Storage and design of the data base
Data types for a speech data base
Storage demands
Storage medium
Sampling rates
Compression
How to combine speech and speech-related data
Database Management Systems
Data model
Hierarchical data model
Network data model
Relational data model
SQL
Example
Summary
Object-oriented data model
Summary
Deductive data model
Summary
Safe storage of data
Application-independent storage of data
Controlled access to data
Summary
Spoken Language Lexica
Introduction
Lexica for spoken language systems
Lexical information as properties of words
Applications of spoken language lexica
Types of application for spoken language lexica
Spoken language lexical databases as a general resource
Lexica in selected spoken language systems
Recommendations on resources
What is a spoken language lexicon?
Basic features of a spoken language lexicon
Lexical databases and system lexica for spoken language
Spoken language and written language lexica
Basic lexicographic coverage criteria
The lexicon in spoken language recognition systems
Recommendations on defining spoken language lexica
Types of lexical information in spoken language lexica
Lexicon models and lexical representation
A simple sign model for lexical properties
Lexical units
Kinds of lexical unit
Fully inflected form lexica
Stem and morph lexica
The notion of `lexical lemma'
Lexical properties and lexical relations in spoken language
Recommendations on types of lexical information
Lexical surface information
Orthographic information
Pronunciation information
Prosodic information
Recommendations on lexical surface information
Morphological information
Types of morphological information
Applications of morphology
Recommendations on morphology
Grammatical information
Statistical language models
Sentence syntax information
Recommendations on grammatical information
Lexical content information
Lexical semantic information
Pragmatic information
Idiomatic information
Recommendations on semantic information
Lexicon structure
Spoken language lexicon formalisms
Lexicon architecture and lexical database structure
The architecture of spoken language system lexica
The structure of lexical databases
A simple database type: pronunciation tables
A more complex lexical database
Recommendations on lexicon structure
Lexical knowledge acquisition for spoken language
Stages in lexical knowledge acquisition
Types of knowledge source
Dictionaries
Corpora
Acquisition tools
Recommendations on lexicon construction
Outlook
Language models
Dialogue
Physical characterisation
Assessment methodologies and experimental design
Introduction
How to read this chapter
Role of statistical analysis and experimentation in Language Engineering Standards
Statistical and experimental procedures for analysing data corpora
Statistical analysis
Population/s, samples and other terminology
Sampling
Biasses
Estimating sample means, proportions and variances
Estimating means
Estimating proportions
Estimating variance
Ratio of sample variances
Hypothesis testing
Simple hypothesis testing
Analysis of variance
Non-parametric tests
Experimental procedures
Experimental selection of material to employ
Segmentation
How to make segmentations without confounding effects of classification
Suggested guidelines concerning comparative analysis between human judges or between human judges and automatic algorithms
Recommendation about what judges to use
Sample size for assessment of performance
Classification
Judges
Procedural
Limitations of category responses and ways of circumventing
Range effects
Assessing recognisers
Baseline performance
Progress
Functional adequacy and user acceptance
Methodology
Application oriented
Reference oriented
Calibrated databases
Manipulated, artificial and diagnostic databases
Experimental design
Magnitude estimation
Rank order
Pairwise comparison
Assessing speaker verification and recognition systems
Sampling rare events in speaker verification and recognition systems
Employing expert judgments to augment speaker verification and assessment for forensic aspects of speaker verification and recognition
Interactive Dialogue systems
WOZ
Audio-only simulations
Requirements
Subject variables
Wizard variables
Multimodal
Dialogue metrics
Psycholinguistic metrics
Acoustic-based measures
Interruption parameters
Appendix 1
Appendix 2
Recognition assessment
Assessment of speaker verification systems
Presentation
Speaker classification tasks
General definitions
A taxonomy of speaker recognition systems
Task typology
Speaker identification versus speaker verification
Related tasks
Types of errors
Levels of text-dependence
Interaction mode with the user
Definitions
Examples
Text-dependent systems
Fixed-vocabulary systems
Unrestricted text-independent systems
Influencing factors
Speech quality
Temporal drift
Speech quantity and variety
Speaker population size and typology
Speaker purpose and other human factors
Recommendations
Example
Scoring Procedures
Notation
Registered speaker population
Test impostor population
Closed-set identification
Misclassification rates
Mistrust rates
Confidence ranks
Comments
Example
Verification
False rejection rates
False acceptance rates and imposture rates
Relative unreliability, vulnerability and imitation ability
Comments
Example
Expected benefit
Threshold setting
System operating characteristic
System characteristic modeling
Example
Open-set identification
Recommendations
Complementary assessment tools
Standard reference systems
Reference human test
Automatic / human tests
Transformation of speech databases
Typology and assessment of applications
Applications and products in speaker recognition
Forensic applications
Listener method
Spectrographic method
Semi-automatic method
Recommendations
Conclusions
References
Synthesis assessment
Introduction
What are speech output systems?
Why speech output assessment?
Users of this chapter
Towards a taxonomy of assessment tasks and techniques
Glass box black box
Laboratory field
Linguistic acoustic
Human automated
Judgment functional testing
Global analytic assessment
Methodology
Subjects
Test procedures
Benchmarks
Reference conditions
Segmental reference conditions
Prosodic reference conditions
Voice characteristics reference conditions
Overall quality reference conditions
Comparability across languages
Black box approach
Laboratory testing
Functional laboratory tests
Judgment laboratory tests
Field testing
Preliminary remarks
Field tests
Glass box approach
Linguistic aspects
Preprocessing
Grapheme-phoneme conversion
Word stress
Morphological decomposition
Syntactic parsing
Sentence accent
Acoustic aspects
Segments
Functions of segments
Segmental tests
Segmental tests at the word level
Segmental tests at the sentence level
Prosody
Functions of prosody
Judgment tests of prosody
Functional tests of prosody
Voice characteristics
Functions of voice characteristics
Voice characteristics tests
Relationships among tests
Further developments in speech output testing
Introduction
Long-term strategy: Towards predictive tests
From human to automated testing
Predicting functional behaviour from judgment testing
Predicting global from analytic testing
Predicting field performance from laboratory testing
Linguistic testing: Creating test environments for linguistic interfaces
Acoustic testing: Developments for the near future
Segmental quality testing
Prosodic quality testing
Voice characteristics testing
Overall quality testing
List of recommendations
Appendix 1: Summary of test descriptions
Appendix 2: Evaluation of visual aspects of speech output
Spoken language interactive systems
Tools
Introduction
Signal theory
Speech research
Computer hardware and software
Conclusion
Appendix: Useful anonymous ftp sites
References
Computer readable alphabets
SAMPA computer readable phonetic alphabet
SAMPA
Introduction
Notation issues
Transcription
Coding
Further languages
SAMPA: Present status
The phonemic notation of individual languages
Danish
Dutch
English
French
German
Greek
Italian
Norwegian
Portuguese
Spanish
Swedish
Levels of annotation and extension of SAMPA
SAMPA as a phonemic system
Detailed phonetic or acoustic annotation
SAM label file formats
SAM label file formats
Introduction
Speech file and associated description file formats
Label file format
Label-file header format
Label-file body format
Label types
Label File example
Files used in a recording session using EUROPEC
Corpus File: CORPUS.DBF
Speaker File: SPEAKERS.DBF
Prompt file
Protocol description file
Codes used for presentation protocol (Protocol Description File)
Example of Protocol Description File
Recording conditions file
PRIVATE.PEC file
SAM recording protocols
SAM recording protocols
Definition of terms
Classification of general strategies for recording and prompting
Recording mode
Prompting style
Timing strategy
Recording protocol
Microphone
Other sensors
Speech data capture
Recording environment
Recording mode and prompting style
Recording control
Recording procedure
Integrity checks
Backup procedures
Retrieval procedures
Calibration
Inter site consistency and recording procedure verification
Collation of recordings
SAM software tools
DKISALA (Interactive Semi-Automatic Labelling Software)
ELSA (ESPRIT Labelling System Assessment software)
EUROPEC (European Program d'Enregistrement de Corpus)
PTS (Progiciel de Traitement de Signal)
RESAM
SAMITPRO (SAM Iterative Proportional Fitting)
SAM_REC0 Isolated Word Recogniser
SAM_SCOR (Sam Input Assessment Scoring Software)
SAM_SLM
SAM_SPEX (Speech Parameter Extractor)
SAMTRA (SAM TRanscription Analysis)
SOAP (Speech Output Assessment Package)
PTM (Parametric Test Manager)
EUROPEC recording tool
EUROM-1 database overview
Polyphone project overview
Introduction
Reusable resources
The Dutch POLYPHONE corpus
Recording workstation
Speaker selection
The speech material
Postprocessing
Use of POLYPHONE in application development
Train time table information
Training of the recogniser
Building a phonemic lexicon
A model for yes/no expressions
Time and date expressions
Phone card services
Automating card services
Speaker verification
Requirements for future corpora
References
Overview of speech corpora
Overview
Criteria for assessment of the situation of Spoken Language Resources
Types and specificities of corpora
Actors in speech resource production
Summary of the current situation on a per language basis
DANISH (Denmark)
Existing databases and their actors
Ongoing projects & new initiatives
Conclusion
DUTCH (The Netherlands)
Existing databases and their actors
Ongoing projects & new initiatives
Conclusion
ENGLISH (United Kingdom)
Existing databases and their actors
Ongoing projects & new initiatives
Conclusion
FRENCH (France, Belgium, Switzerland)
Existing databases and their actors
Ongoing projects & new initiatives
Conclusion
GERMAN (Germany)
Existing databases and their actors
Ongoing projects & new initiatives
Conclusion
GREEK (Greece)
Existing databases and their actors
Ongoing projects & new initiatives
Conclusion
ITALIAN (Italy)
Existing databases and their actors
Ongoing projects & new initiatives
Conclusion
NORWEGIAN (Norway)
Existing databases and their actors
Ongoing projects & new initiatives
Conclusion
PORTUGUESE (Portugal)
Existing databases and their actors
Ongoing projects & new initiatives
Conclusion
SPANISH (Spain)
Existing databases and their actors
Ongoing projects & new initiatives
Conclusion
SWEDISH (Sweden)
Existing databases and their actors
Ongoing projects & new initiatives
Conclusion
General conclusions
Production costs
About this document ...
All trade marks are hereby acknowledged.
(slwgbody.tex input by =wsuipa12
WWW Administrator
Fri May 19 11:53:36 MET DST 1995