In a recent Chinese-English and Arabic-English machine translation competition, Google’s entrant demolished the competition, taking first place in 35 out of 36 different categories. Unlike typical teams, not a single person on the Google team speaks either Chinese or Arabic. Their technology is based completely on computational analysis texts written by native speakers. It isn’t completely clear what sort of computational techniques were used, but from what I can tell, either a neural net or a genetic algorithm appears likely.

Beginning from perfect ignorance, as its data set grows, Google’s software learns to match strings of Arabic or Chinese characters to their English counterparts. This produces a raw translation, which the software tidies up by rearranging the words into fluent English using patterns it has learnt from studying English texts.

The approach can produce impressive results, but requires no knowledge of the languages involved. Philipp Koehn, a machine-translation expert at the University of Edinburgh, UK, who entered the evaluation using a similar approach to Google’s, says that when he began working on software to translate Arabic into English his computer did not even have the software to display Arabic text. Google tops translation ranking

What interests me most in this is how close Google’s translations match those performed by human translators.

The NIST evaluation measures the degree to which their outputs match a reference document produced by a human translator. Google’s highest score — achieved for Arabic-to-English translations of newswire text — had 50% similarity to the reference, comparing favourably to the 60% that a different human translator of the same document might achieve.

I don’t think we’ll be seeing universal translators soon, but still, I wouldn’t be surprised at all to see low end translation jobs disappearing in the next few years. If you’re a translator, now might be the time to head up the value chain into literature, legal translation, and other more difficult categories of work.