next up previous contents
Next: Use of POLYPHONE Up: Polyphone project overview Previous: Reusable resources

The Dutch POLYPHONE corpus

Recording workstation

The recording workstation used for POLYPHONE was based on an Aculab telephone interface, a Rhetorex Voice Card and driver software, Show-'n-Tel application development software, and a 16 port operational license, in an OS/2 PC. Each item recorded was stored in a separate file; all files were copied to a Unix network for transliteration and archiving.

The recording platform was set up to record and store the speech signals in A-law format. Because the Dutch PSTN is completely digital, the acoustic quality of the recordings is determined by the characteristics of the caller's local loop and the background noise in the caller's location.

Speaker selection

Prospective callers received a personalised letter. Originally, we aimed at collecting 5000 speakers, uniformly divided over a large number of cells, defined according to four criteria, viz. (1) geographical region, (2) socioeconomic status, (3) sex, and (4) age. It should be emphasised that the uniform sampling of the cells was mainly motivated by scientific arguments: in order to find the funds for creating the corpus it was necessary to make it attractive for a wide range of linguistic research, including sociolinguistics and dialectology. Perhaps part of the speakers in our corpus will not be heavy users of the automated services that can be developed by means of the Dutch POLYPHONE corpus. However, we trust that a wide coverage of language and speech behavior will lead to applications that are more robust than what could have been obtained with recognisers trained with much more restricted speech material.

Geographical region, operationalised as the province in which the speaker lives, is the best practically feasible approximation to regional accent and dialect background. By sampling provinces, we sidestep the unsolved problems of how many different regional accents should be distinguished and how these should be defined. Due to the very uneven distribution of the population over provinces it appeared to be practically impossible to get equal numbers of speakers from each province [3,4].

Socioeconomic status is difficult to define, and even more difficult to assess reliably from what respondents are willing to say. We decided to approximate status on the basis of the education level of the respondents. We distinguished three levels, viz. (1) only primary school, (2) secondary school and (3) college/university. Using hindsight, this division was somewhat unfortunate: in formal terms almost every person younger than about 60 has been to school until at least the age of 16, so only a very small proportion of the population falls into the first category. Thus, it is not surprising that we were able to recruit very few speakers who said that they had no more than elementary school. The numbers in the remaining two classes are approximately equal.

We distinguish four age classes, i.e., under 20, between 21 and 40, between 41 and 60, and 61 and older. Information about age is acquired by asking the respondents for their year of birth. Since we set a minimum age of 16 for participation, the under 20 group is much smaller than the other groups. The group of 61 and older is also underrepresented. The group between 20 and 40 is about 50% larger than the group between 40 and 60.

The speech material

The speech material recorded in the POLYPHONE project consists of 32 read items, 14 extemporaneous answers to printed questions, and 4 extemporaneous answers to questions not printed on the response sheet.

The material to be read consists of the following items:

The following list of printed questions is asked:

The following unprinted questions are asked:

Postprocessing

Postprocessing was done at the Dept. of Language & Speech, Nijmegen University, using software running on a PC under MS-Windows, equipped with a Pro-Audio board. Whenever the answer is predictable (i.e., in all cases where the caller is supposed to read preprinted material), the expected answer is displayed on the screen.

Postprocessing consists of four steps, viz. (1) word-by-word transliteration of all items, (2) transliteration of extra sounds and noises, (3) collecting demographic data, (4) assessing the quality of all items. The students who carried out the work were instructed to do the tasks in exactly this order. On average, a recording session took slightly less than 20 minutes to process.

We have seen that the sentences contain substantially more disfluencies than the other items.


next up previous contents
Next: Use of POLYPHONE Up: Polyphone project overview Previous: Reusable resources



WWW Administrator
Fri May 19 11:53:36 MET DST 1995