Summary: Machine translation quality is not near the equal of trained human translators. Here are areas where technology fails.
Recently I read an interesting article on Neural Machine Translation, Has AI surpassed humans at translation? Not even close!, that strongly argues that machine translation, or more strictly, Neural Machine Translation (NMT) is not nearly advanced enough to replace competent human translators.
What’s the problem?
On a daily basis, we read about the triumphs of IBM’s Watson, that Google Translate is heavily used for casual reading of foreign publications and auto-translation of websites, and Facebook offers instant translations of content and comment provided in other languages. We seem to be in a Golden Age of quality translation on demand. Is that the case?
The short answer: no.
When reviewing translation with a rigorous standard of what constitutes quality, we find there are consistent issues of gender bias, a lack of understanding context, nonsensical output and the larger issue of how to judge when a translation is accurate. So far, judging the quality of a translation remains very expensive because only a fully bilingual person is able to make this determination.
There is a metric for machine translation. It’s called the BLEU score. That stands for bilingual evaluation understudy.
The BLEU score
This is a scoring system that compares the machine translation against a set of known human translations of good quality. Quality is defined as “the closer a machine translation is to a professional human translation, the better it is”. Each sentence is graded individually for quality and then the average of all scores over the entire corpus, i.e. work being translated, is calculated. That average score is the score then used as the score for the work.
Interestingly, there is no measurement of intelligibility or grammatically correctness. Neither is there any comparison among the sentences to determine if they are contextually correct. You can see how this measure of machine translation quality has severe limits.
Read a more detailed explanation here.
Why Context Matters
In the article that prompted this blog, the author uses this example: An article about a music concert quotes someone saying, “I’m a huge metal fan”. When translated into French by machine translation it becomes, “Je suis un énorme ventilateur en métal” (“I’m a large ventilator made of metal.”) The difficult task of understanding context causes a failing grade. At some point, the technology will be able to detect indicators of context such as, ‘concert’, ‘event’, ‘music’, ‘festival’, ‘arena’, ‘theater’ or other words/phrases humans easily use to shape our sense of context. But today this ambiguity can elude the capabilities of Neural Machine Translation to determine the correct meaning of both ‘metal’ and ‘fan’
For more background on understanding the concepts of translation, machine translation, and neural machine translation you might watch this lecture from the Standford University School of Engineering.
Neural Machine Translation and Models with Attention
For more on the importance of understanding context, the difference between a globalized translation versus a localized translation you can read any of the following:
German Translation? Which German? Spanish Translation For Your Website: Which Spanish? French Translation? Which French?
Given the nascent state of translation technology, what should businesses do to get the most accurate, faithful translations quickly and at a fair price? MotaWord has an answer. Our AI-supported platform allows translators to work on a project simultaneously, without using human project managers. Customers can upload style guides and glossaries to customize their translation. These resources are shared among all the translators working on the project to produce accurate translations in less time. And customers can watch, in real-time, as translators work on their projects.
Contact us to see we can meet all of your translation needs.