next up previous contents
Next: Requirements for future Up: Polyphone project overview Previous: The Dutch POLYPHONE

Use of POLYPHONE in application development

At present, a number of application development projects are under way in the Netherlands that employ ASR techniques. In this section we describe the way in which POLYPHONE helped to enable these projects.

Train time table information

In a collaborative project between PTT Research, Philips Research Aachen and the Netherlands Organization for Scientific Research NWO we are working on the development of a Dutch version of the Train Time Table Information service that is already available for German and that is described in another paper in these Proceedings. The system we have in mind can best be characterised as a guided mixed-initiative dialogue: the system will ask specific questions, like From where to where do you want to travel?, but it will allow the user to give under- and overinformative answers. If an answer is underinformative, the system will ask explicit questions to elicit the missing information. When an answer is overinformative, for instance when the caller adds the desired arrival time to departure and destination station, the system will try to process that additional information too. If time and date information is not offered spontaneously, again the system will ask explicit questions to obtain it.

Clearly, a Train Time Table Information System is an application intended to be used by the general public. Moreover, most users will call the system only occasionally, so that one cannot rely on users getting acquainted with the peculiarities of the service. Pilot experiments carried out by the Nederlandse Spoorwegen, the Dutch Railway Company, have shown that the part of the public who need Time Table information are not able nor willing to deal with a menu-based interface. Thus, there seems to be no alternative for starting an automated service with a mixed-initiative dialogue system.
To implement a Dutch version of the Train Time Table Information System a number of steps must be taken:

The POLYPHONE corpus has been instrumental in all these steps.

Training of the recogniser

For the development of the phoneme based recogniser use has been made of the phonetically rich sentences in POLYPHONE. Following the approach that has proved successful for German we have started with a recogniser based on context independent phone models. The recogniser has been trained assuming that the automatic grapheme-to-phoneme transcription of the transliteration data is correct. That assumption is probably wrong to some extent: Dutch has quite some pronunciation variation at the phonemic level. At the time of this writing we are using the POLYPHONE recordings for an empirical investigation of the range of that variation. Up to now, researchers had to be content with rather subjective ideas about this crucial issue.

Building a phonemic lexicon

An essential part of the recognition engine in a Train Time Table system is a lexicon comprising phonemic representations of station names. Here too, there is nonnegligible pronunciation variation. Since all station names have been read by at least five speakers, we can use the POLYPHONE recordings to make an inventory of the pronunciations. This is especially relevant for the names of the smaller stations, since pronunciation variants for larger stations can also be collected by other means.

A model for yes/no expressions

Virtually every information dialogue contains yes/no questions. In previous applications of ASR in telephone information systems for the general public it has appeared that there is quite some variation in the way people answer these questions. Since POLYPHONE contains four yes/no questions, all to be answered spontaneously, we have a substantial amount of data to build a model of the answers.

The analysis of the answers that we have performed so far confirms the existence of substantial variation; yet, it appears that the very large majority of the expressions adhere to a simple schema, so that it is easy to build a model. We have seen a large difference between the two items for which we expected affirmative responses: Almost 93% of the subjects used a single word (e.g. ja, jawel, jazeker) to confirm the assumption that Dutch was their native language; the proportion of one word confirmations dropped to 75% for the question whether the caller was willing to participate in another recording session. Very few callers said ``no'', but the way in which they expressed their confirmation was much more varied.

83% of the subjects used a single word (e.g. nee, neen) to deny that they ever lived abroad for an extended period of time. Most of the people who used more complicated expressions did so to tell us in what foreign countries they had lived. 80% of the callers used a single word to deny that they were using a cordless phone; over 13% of the callers said they were using a cordless phone.

A detailed analysis of the more verbose answers showed that only a very small proportion of the affirmative answers contained no-words and that the same is true for negative answers and yes-words.

Another observation worth mentioning is that politeness forms like yes, sir; no ma'am were virtually absent. This may be due to the fact that the yes/no questions were located in the last part of the recording session, when the callers should be fully aware that they were talking to a computer. However, it is also possible that what we see reflects the growing casualness in the Dutch society, where `speaking with two words' is quickly becoming the exception rather than the rule.

All these observations confirm our expectation that the NLP module in our system should be able to handle the large majority of the yes/no expressions that will be used by the callers. In confirmation sub-dialogues in an information system (e.g. after the caller has given departure and/or destination station) the language model expects an affirmative expression, but negations may occur due to errors of the recogniser. The POLYPHONE corpus contains a number of examples of negations where confirmations were expected. We are working on a closer analysis of these cases, to find out whether they contain systematic syntactic structures that could help in making the language model more specific.

Time and date expressions

In previous experiments with information and reservation systems it has appeared that --- quite surprisingly --- linguists do not have accurate models of the way in which people express times and dates. The POLYPHONE corpus contains a large number of these expressions. Currently we are analysing the syntax of these expressions in order to build models for use in the Train Time Table Information system. Unfortunately, the expressions used by the POLYPHONE speakers are to a large extent determined by the way in which the items were printed on the response sheets. No spontaneous expressions of dates or times were obtained. This will make it very difficult to derive reliable estimates of the relative frequency with which individual expressions will occur.

Phone card services

PTT Telecom has started a campaign to promote Operator Services, like Collect Call and Phone Card calls. The PTT Telecom Phone Card is marketed under the name of Scopecard. As in other countries, operator services are expensive, mainly because of the high costs of the personnel. Thus, it is obvious that Telecom is looking for ways in which these services can be automated.

One simple way of automating Phone Card calls is to connect customers with an IVR platform that handles the recognition of the card number, the PIN code and the number to be dialled via DTMF; this is how a large proportion of domestic calls are automated. However, few operators offer the capability to connect toll-free lines to an IVR application in another country. Also, in many of the countries where PTT Telecom customers spend their vacation rotary dial phones still form the large majority. Thus, large scale automation of card services implies the deployment of ASR. The same goes for automatic collect calls, of course.

Automating card services

For applications like automated card services it is not enough to have a recogniser that can handle isolated or connected digits in abstracto. For the real-world performance of the application a solid knowledge of the way in which customers pronounce card and phone numbers is at least as important, since that knowledge can be exploited in designing application specific language models. POLYPHONE has provided us with a rich source of information about the way in which Dutch people express these numbers.

Two items in POLYPHONE that are related to telephone numbers were analysed. The first pertains to numbers read from the response sheet. All these numbers were printed in the same format, i.e., area code, dash, subscriber number (e.g. 020--5252183). The second item consists of answers to the question Please, say a familiar telephone number. In discussing the results we will use the term digit for the words zero, one, ......, nine; the term number will denote numbers between 10 and 99.

Presently, the Dutch PTT's number plan has two groups of area codes, one comprising three digits (like 020 in the example above) and one comprising five digits (e.g. 08894). Subscriber numbers can have four to seven digits. Because transliteration does not include intonation markers, it is not possible to discriminate between three and five digit area codes. We doubt whether the transliterators would have been able to parse all answers to the request to give a familiar number correctly.

The format of the read numbers is quite different from the format of spontaneously produced familiar numbers: in read numbers the proportion of digits is much larger than in spontaneously pronounced numbers. It is also worth mentioning that 18% of the read and 23% of the spontaneous numbers contain extra sounds, far more often preceding the number than following it.

POLYPHONE provides similar information about the way in which long card numbers and shorter PINs are pronounced. At the time of this writing these expressions are still under analysis.

Speaker verification

Advanced telephone applications will inevitably grow from pure information systems to mixed information and transaction systems. Security and fraud prevention then become major issues. Card Services of Dutch PTT is investigating the possibility of using speaker verification as one means of fraud combatement. In its original specification the POLYPHONE corpus is not suited for research into speaker recognition. Nijmegen University, in collaboration with the Dutch National Forensic Science Laboratory, have made additional recordings of 100 speakers who have called eight times, using different handsets. Half of the speakers is recruited from the Nijmegen area, the other half from the Hague, in order to minimise dialect differences. Also, the speakers form 50 pairs of brothers or father-son, in order to allow us to investigate whether speaker recognition techniques can be fooled by close relatives.


next up previous contents
Next: Requirements for future Up: Polyphone project overview Previous: The Dutch POLYPHONE



WWW Administrator
Fri May 19 11:53:36 MET DST 1995