on page 177 State Of The Art Babelware for the Desktop

Personal computer and workstation translation software offers you the most affordable access to MT

L. Chris Miller

Now that personal computers and workstations are powerful enough to run MT (machine translation) software, many MT products are becoming available for the desktop. Some applications have migrated from the mainframe; others are new and are designed for desktop use.

MT software for personal computers translates language sentence by sentence, using AI or linguistic rules to deal with syntax and grammar. Sets of rules or algorithms enable verb conjugation, syntax adjustment, gender and number agreement, and word reordering.

MT software will process your document in either a batch mode or an interactive mode. The interactive mode might ask you to choose among multiple translations of a word or allow you to choose from a list of synonyms, or it may translate one sentence at a time and pause to let you postedit the output on-line (see screen 1).

Retail prices for these packages range from $79 to $1200. The languages that are available include Arabic, Danish, Dutch, Finnish, French, German, Greek, Italian, Japanese, Korean, Russian, Spanish, and Swedish. Translation software for Portuguese, Chinese, and Norwegian is being developed. The software is generally sold in language pairs (e.g., Spanish to English or English to Spanish). The Language Assistant Series 5.0 from MicroTac Software (San Diego, CA) and GTS-Basic 1.0 and GTS-Professional 3.0 from Globalink (Fairfax, VA) are sold in bidirectional units (e.g., Spanish to English and English to Spanish).

Toltran (Barrington, IL) uses a patented modular-language- translation concept for its Professional Translation System 2.0. In this approach, a language is sold as either a source-language module or a target- language module. Any source-language module can be translated into any target-language module.

MT software for personal computers typically runs under MS-DOS and requires 640 KB of RAM and from 1.5 to 15 MB of space on your hard disk. You must have a VGA card and a VGA monitor to translate languages with graphical characters (e.g., Russian and Japanese). For example, EJ Bilingual (Torrance, CA) requires one expansion slot for the KanjiBoard included with EZ JapaneseWriter 1.09.

Mac owners have limited options. The Translator 2.0 by Catena (Tokyo, Japan) runs on a Mac using a Japanese operating system called KanjiTalk. Unfortunately, the only way to run many MT software packages on a Mac is by using a DOS environment emulator, such as Insignia Solutions' (Mountain View, CA) SoftPC. Under these conditions, the software will run much slower than in a true DOS environment.

The translation speed varies from 10,000 to 30,000 words per hour on a 16-MHz 386 computer. The smaller programs can easily be placed in RAM to increase the speed. The translation process may become faster if the software is temporarily storing previously found words in a buffer or in RAM.

MT Possibilities for Your Desktop

All MT products for personal computers allow you to send ASCII text files to be read and translated. Some systems link directly to your word processing programs via a menu to simplify the conversion of text to and from ASCII files. A few systems can process WordPerfect and other leading word processor files. And several companies offer products that retain the formatting codes of the original document. This is an important timesaving feature, because attributes such as boldfaced type, underlining, and chart and table formats are restored in the output.

The quality of your output is dependent on the dictionaries (sometimes called lexicons) that are included in the software. A core, or general, single-word dictionary (i.e., one with 20,000 to 80,000 canonical terms) is standard. Most programs also include a multiple-word dictionary that stores phrases and idiomatic expressions.

A Different Kind of Dictionary

MT dictionaries provide grammatical information regarding the use of words and phrases. The computer uses the information to enact the rules or algorithms necessary to convert the text in the source language into intelligible output in the target language (see screen 2). Subject-specific dictionaries are available for technical areas, such as finance or law.

If you regularly translate documents from more than one technical area, you'll find it useful to have a feature in your system that allows you to stack dictionaries. This enables you to define the search order according to the text you are translating. PC-Translator by Linguistic Products (The Woodlands, TX) allows you to stack up to 10 single-word dictionaries and 10 phrase dictionaries.

Creating your own dictionary or customizing the one included in your software is essential with any MT product for personal computers, because it lets you add your own terminology to the program. PC-Translator simplifies the creation of your own dictionaries by importing lists of terms in ASCII format directly into the software. In addition, MT software generates lists of words not found in a given text to help you customize your software. You decide which words and phrases to add to the dictionaries. The ability to add, delete, or modify dictionary entries dramatically improves the quality of a translation and reduces the time spent postediting an output. You'll find that it can take from two to four weeks to customize a system.

All these systems ask you to insert the part of speech of the word you are adding and to provide its translation. With extensive dictionary coding, the system can deal with ambiguities that arise from the use of words that can take the form of multiple parts of speech. For example, the program will recognize the different translations of a homograph (i.e., a word that is spelled like another but has a different meaning or pronunciation) used as a verb and as a noun in the same sentence (e.g., "The can can explode"). Because Globalink's GTS-Professional can classify can as both a verb and a noun, it's better able to translate the sentence than a product that requires less dictionary coding.

Workstation-Based MT Products

MT workstation products are designed to handle heavy volume-- when you have to translate 2000 or more pages of text per year. Translation speeds range from 20,000 to 1 million words per hour.

A workstation MT system is a large investment. Software prices start at $10,000. A system can cost several hundred thousand dollars, and pricing structures are as diverse as the possible configurations.

Socatra (Quebec, Canada) spent over 12 years preparing its XLT computer-assisted translation system for the commercial market. Access to XLT is uniquely controlled by the company. Socatra rents software for a specific number of words. After you pay an initial subscription, Socatra provides you with the software and an access card, which resembles a credit card. The card contains a microprocessor that counts the words translated and acts as a security device. You can obtain an XLT card for word amounts that range from 100,000 to 1 million.

Tovna Machines (Washington, DC) gives you a perpetual license for Tovna MTS 1.0. Other MT workstation companies offer host, site, or floating licenses. You can obtain Systran from Systran Translation Systems (La Jolla, CA) with monthly or yearly leases.

MT workstation products support various operating environments, including Unix, Xenix, and MVS (Multiple Virtual Storage). Smart Translators by Smart Communications (New York) runs under Windows 3.1. The OEM Personal/370/ Adapter/A (P/370), which was scheduled to ship in December, will make it possible for Systran to run on stand-alone PS/2s. And Multilingual Document Translation Software 7.0 from Logos (Mt. Arlington, NJ) supports MS-DOS and Mac users via LANs or WANs (wide-area networks).

The assortment of language pairs for MT workstation products is impressive (see the table). Systran offers 27 language pairs and 20 technical glossaries. Most operational language pairs have about 100,000 entries in their dictionary. A Russian-to-English dictionary typically contains 500,000 entries. With Smart Translators, you can choose between Castilian Spanish and Latin American Spanish or between European French and Canadian French. Many systems offer non-English language pairs.

MT workstation products are generally well integrated into the document-production process. Converters preserve the formatting codes from many software packages. Unix-based converters can preserve codes for Interleaf and FrameMaker, and DOS-based converters preserve codes for WordPerfect, Word, and Ventura Publisher.

DP/Translator by Intergraph (Huntsville, AL), for example, supports ASCII, tagged ASCII, SGML (Standardized Generalized Markup Language), Microsoft Word RTF (Rich Text Format), QuarkXPress tagged, FrameMaker MIF (Maker Interchange Format), and Troff (Unix-based text format). Formatting codes for such things as headings, footnotes, and columns remain intact in the translated output. Documents ready to be translated can be queued or batch-processed during off-peak hours.

Tovna MTS's developers say that you can teach this software to learn by examples and rules. As the system processes translations, the computer watches what changes you make to the document. If Tovna MTS encounters changes, it remembers them and uses the new information.

The dictionaries supplied with MT workstation products allow deep coding for each term, so they can work with words with multiple definitions. You can add descriptors to help the computer distinguish among the many possibilities. Descriptors are attributes used in coding a dictionary entry and include terms such as animate/ inanimate, human/machine, place, or time.

Coding a lexicon this sophisticated requires an understanding of linguistics and fluency in the source and target languages. MT system companies will help you design an efficient system for your needs, and training is often included in the system's price.

MT has proved that it can reduce the time and money spent on bulk translation of highly repetitive technical text. Commercial MT products for personal computers and workstations offer you the most affordable access to the power of MT.

Screen 1: This sample of an on-line translation using MicroTac Software's Spanish Assistant shows the original Spanish sentence in the upper left box and the English translation in the upper right box.

Screen 2: The dictionary coding for the verb tener (which is Spanish for to have) includes the part of speech (verb); the morphological rule code (3B), which determines conjugated forms of the verb; and the translations of the verb. Rules for advanced pattern matching (e.g., noun + time) are encoded to enable the computer to correctly translate the verb in various phrases.

Table: LANGUAGE PAIRS: Shown here is a sampling of MT software packages and the language pairs that are available or are in development. Source languages (left) can be translated into all the target languages (right) in the same row. (This table is not available electronically. Please see January, 1993, issue.)

Desktop MT

Key Features of Desktop MT -- available for PC, Mac, and Unix platforms -- applications have migrated from mainframes -- works in batch mode or interactive mode -- provides good dictionaries Future Enhancements -- more variety of products -- more sophisticated technology -- additional language pairs

L. Chris Miller is a computer consultant based in the Washington, D.C., metropolitan area. She has been involved in MT development for the past four years. You can reach her on BIX c/o "editors" and on CompuServe at 70303,314.


Per qualsevol problema amb aquesta pàgina, contacteu "de_yza@upf.es"

Per comentaris i observacions sobre el servidor, poseu-vos en contacte amb l'Administrador WWW