The Best Chinese-English Translator Is… Google?

November 8th, 2006 by Mark

In a recent Chinese-English and Arabic-English machine translation competition, Google’s entrant demolished the competition, taking first place in 35 out of 36 different categories. Unlike typical teams, not a single person on the Google team speaks either Chinese or Arabic. Their technology is based completely on computational analysis texts written by native speakers. It isn’t completely clear what sort of computational techniques were used, but from what I can tell, either a neural net or a genetic algorithm appears likely.

Beginning from perfect ignorance, as its data set grows, Google’s software learns to match strings of Arabic or Chinese characters to their English counterparts. This produces a raw translation, which the software tidies up by rearranging the words into fluent English using patterns it has learnt from studying English texts.

The approach can produce impressive results, but requires no knowledge of the languages involved. Philipp Koehn, a machine-translation expert at the University of Edinburgh, UK, who entered the evaluation using a similar approach to Google’s, says that when he began working on software to translate Arabic into English his computer did not even have the software to display Arabic text.

Nature.com: Google tops translation ranking

What interests me most in this is how close Google’s translations match those performed by human translators.

The NIST evaluation measures the degree to which their outputs match a reference document produced by a human translator. Google’s highest score — achieved for Arabic-to-English translations of newswire text — had 50% similarity to the reference, comparing favourably to the 60% that a different human translator of the same document might achieve.

I don’t think we’ll be seeing universal translators soon, but still, I wouldn’t be surprised at all to see low end translation jobs disappearing in the next few years. If you’re a translator, now might be the time to head up the value chain into literature, legal translation, and other more difficult categories of work.

Tags: , , ,

7 Responses to “The Best Chinese-English Translator Is… Google?”

  1. 1 Nick Says:

    As a researcher in roughly this area (small world, Phillipp, quoted above, was actually a researchmate of mine a while back), I can tell you the following:

    machine translation typically doesn’t use neural nets or genetic algortihms—naive bayesian classifiers are closer to what they use. To completely oversimplify the problem, you train two different classifiers. The first one tries to get find the English equivalent of Chinese words/phrases (does 大哥大 mean “big brother big” or “cell phone”, and with what relative likelihood?) Then the second classifier puts the English words in the right order (do we say “I him call” or “I call him”). These two classifiers work together to take the Chinese sentence, turn it to English words, then put them in the right order.

    I wouldn’t get too hopeful about universal translators just yet, or even low-end translation systems… Even though a system might score 50% similarity and human might score 60% similarity, I guarantee that the 50% nonsimilar portion of the MT system will be pretty nonsensical and ugly—whereas the 40% nonsimilar portion of the human translation is likely quite grammatical and sensical, but just a different way of saying the same thing.

  2. 2 Mark Says:

    Thanks for the info, Nick. I have to admit I’m a bit disappointed. If anyone had the resources to make a real stab at translation with a GA, it would have been Google.

  3. 3 trevelyan Says:

    You can’t really trust the NIST evaluations. First of all, while the institution promotes itself as conducting “open”
    MT testing, participation is not actually open. Secondly, all of the testing is done using a single algorithm to measure translation quality and the institute purposefully does not reveal the competing translations. So this happens every year. There is glowing press for the enormous advances being made in machine translation… but anyone is damned if they ask to see any actual *example* of the quality of the winning translation.

    NIST could push forward the state of MT research simply by releasing its sample translations and encouraging people to evaluate them using a variety of human and machine metrics. But it isn’t. Instead we have all testing being done using the BLEU metric — and that basically just compares whether both sentences contain the same words when judging the accuracy of translation. What this means in practice is that SMT approaches which can identify low-frequency words through parallel text analysis tend to do better than rules-based systems which create more fluent documents.

    Basically, until these tests start to actually release sample translations, you should assume that their statistics about “percentages of accuracy” do not correspond to human readability. So kudos to Google and all, but frankly, the fact that they are actually *proud* of having no-one who can speak Chinese or Arabic on their machine translation staff speaks volumes to their raw hubris. When they dump Systran and put their own Chinese-English translation software online we will get a better measure for how good these guys really are.

  4. 4 Brendan Says:

    People are already leaving some things up to MT: I can’t count the number of times I’ve been given a document for editing - editing pays even less than translation, which is really saying something - that had clearly just been run through Google Translate or Jinshan Kuai Yi by the Chinese “translator” hired to do it.

  5. 5 Mark S. Says:

    Even government-sponsored sites (about English!) are guilty of using machine translation.

    But I’m certainly glad of Google’s efforts in this. I’ve noticed that lately its translator has been getting better results with personal names, which is an important improvement.

  6. 6 Elena Temnova Says:

    A competition between different MT systems using different approaches (rules-based and statistical) it’s a very challenging idea, because it can show the strong and soft points of each system. I don’t believe that a bare statistical approach could be an ideal solution, but it can be of some help for controlling the quality of translation. The main problem when we use the traditional rules-based MT system it that the system itself cannot evaluate the results obtained - it cannot control the accuracy of the translation. That’s why we see so often some absurd combinations, such as “ecstatic fans” translated into Russian as “ecstatic ventilators” (by Systran). We can try another English-Russian system (e.g. PROMT) which will translate this context much better, but it doesn’t guarantee an accurate translation in some other context where the technical meaning of the same word would be required. But if a traditional MT system used the statistical approach for controlling the target text, it would be evident that the first combination (fans as ecstatic humans) is much more frequent in texts. Surely, choosing a proper translation for a word and a combination of words is not the only obstacle on the way towards a high-quality machine translation, but solving this problem would improve considerably the MT accuracy.

  7. 7 Nick Says:

    Brendan and Mark (S): I suppose it depends on what you want out of a translation–are you content with conveying general ideas, or do you also want to convey your source-language eloquence and fluency? I am fine reading news through an MT system, but it’s a no-contest for when I’m reading poetry. Hopefully market demands will drive editing/translating costs into a more proper balance as MT becomes more ubiquitous.

    Me, I think there’s a huge untapped potential in human/hybrid machine translation tools… Are there any out there? Humans and machines have different strengths, it seems like the best system would exploit them both. Like a mechanical turk playing with words instead of chess pieces.

    Trevelyan: you have some valid complaints about NIST. In defense of BLEU, its main strengths are that it (1) correlates pretty well with native reader judgments (2) provides instantaneous judgment (no more waiting for human judges to assess). It’s not the perfect metric (humans are always better) but its instantanaety enabled the rapid gains in MT that we’ve seen over the past 5 years (IMHO)

    p.s. I remember a funny old Systran flaw: translating “The spirit is willing but the flesh is weak” English->Russian->English yielded “The vodka is good but the meat is rotten”.

Leave a Reply

Quicktags: