Current hardware uses a variety of in-house or more widely-spread standards. For example coding is 8 or 16 bits at Apple (Mac) and PCs, U-LAW 8 or 16 bits at Sun (Sparc), Next, VAX, DEC, U-LAW or A-LAW at HP. Available sampling rates are often limited to 8 KHz in the Unix world, but higher rates may be available in the world of PC (DOS/Windows) and Mac depending on the curent or professional I/O boards.
File formats are often indicated with the filename extension they bear. Computer manufacturers as Next and Sun deal with .au (AU) or .snd (SND)files, Apple and Silicon Graphics with .aif (AIF); I/O boards manufacturers may promote their own format (as .voc for SoundBlaster board) and the developper of the Windows operating system Microsoft try of course to impose its .wav (WAVE) format. This situation is complicated by the encoding mean (linear, compressed, data and information intermingled...) and even for the same filename extension, the implementation may vary a bit under different operating systems (WAVE in WINDOWS or UNIX environments, SND in Next or PC/MAC environments). A standardisation initiative comes trough the development of INTERNET to promote an interchange format called MIME.
A major example of the constraints imposed to the speech research community by
the market can be demonstrated in looking at the
implications of the
multimedia standard development in the PC world.
Multimedia standard
The world of PCs has considerably evolved during the past few years along two relevant dimensions:
The point is now wether these current boards, primarily dedicated to audio output, can satisfy the needs of speech research and applications in term of:
(*)(**) Using I/O boards without DSP implies that some signal processing will be deported to the PC (speech level detection, min/max measurement, eventual over- or under-sampling). These on-line procedures, augmented with on-line format conversion routines, could increase the CPU load in such a way that low-level SESAM workstation could not be able to support running with a high speech sampling rate for example (or using two channels).
One topic is background compatibility with existing databases, an other one is WHICH format is going to be THE STANDARD, i.e. the worldwide audio/ computer/speech standard. Such a topic is to be considered during the SPEECHDAT project, but it is foreseen that no unique standard will emerge and that conversion routines will remain a big issue. A lot of tools are available but as an example, even for the RIFF WAVE format the conversion between Windows and Unix worlds is all but trivial. At the moment, it is not sure whether the current inter-changeable standard I/O boards in the market will satisfactorily meet the speech research needs or not depending from the target application.