Many of us are familiar with Google Translate, translation applications for travellers’ smartphones and the instruction manuals of various devices and products. They utilise machine translation.
Professional translators also make use of machines. Training a computer to translate between two specific languages takes millions of sentences or billions of words worth of text. Maarit Koponen, a postdoctoral researcher at the University of Helsinki, is investigating which errors made by machines lead to misunderstandings and how those mistakes could be identified.
Algorithms make machine translation software
The learning algorithms behind machine translation are called artificial intelligence, but machines are not intelligent in the way humans or the super AIs of science-fiction films are.
“Translation systems process strings of characters,” Koponen explains.
There are a range of such systems, but the algorithms underlying all of them are of the same kind. Neural networks, the current buzzword in machine translation research, are systems with a certain type of learning algorithm on several levels. Text entered into the system is transformed into numerical format by the algorithm. This way, the system learns about the context in which the words are used.
“Surprisingly good results have been achieved with neural networks: the quality of machine translations has improved enormously in the past few years.”
Applying machine translation to small languages is more difficult than in the case of widely spoken languages; the quality is usually better if either the source or target language is English, since there is so much English-language material to feed to the machine.
According to Koponen, efforts in the field are most advanced in the translation of written text with the help of machines. However, the same technique can be applied to translating speech. This involves speech recognition. First, speech is converted into text, which is then translated. Finally, the translated text is converted back to speech.
What about visual narrative, can it be translated with machines? This is precisely what a project in which Koponen is a participant is investigating.
“Our goal is to bring visual information, movement and narrative under the sphere of machine translation: First, the machine produces a written description of the content of a video, presented in the form of a narrative. Subsequently, this description is machine-translated into various languages. There's plenty of challenge there.”
Translation software is a tool
Maarit Koponen asserts that, in terms of the benefits of machine translation, there are two principal perspectives: professional translation and the increase of accessibility.
“In Finland, professional translators don’t use machine translation a lot, since software that translates into Finnish have not provided very good quality. But globally, this working method has already become established, and it is trending also here. Machine translation boosts productivity: if the version completed by the machine is good enough to the extent that a final version can be produced by editing, it can be used as the basis. This makes it possible to translate more content faster.”
Machine translation comes into its own in improving accessibility and searching for information on the internet: humans do not have the capacity to process the amount of data found on the web.
“Even an erroneous translation made by a machine offers information to people who would not otherwise have access to it, for example, due to a language barrier,” Koponen notes.
The Finnish public service broadcasting company Yle, a partner in Koponen’s ongoing project, is interested in how machine translation could be utilised to provide programming also to those who do not speak Finnish or Swedish. At the same time, facilitating the translation of texts from Finnish into Swedish and vice versa is important for state administration. This, however, requires more digitised material in this language pair to train the machines.
Machine translation should not be applied to all text types
Not all text types are suited for machine translation, as texts have varying functions.
“In principle, machines are able to translate anything, but that doesn’t necessarily make sense,” Koponen points out.
Working with factual documentation is machine translation’s forte. It handles short sentences, uncomplicated language and unambiguous terminology well. Attempts have also been made to have machine translation software process certain literary classics, something which, according to Koponen, has not produced good results.
“In literature, the focus is not on information content, but on aesthetic values and atmosphere. Literature describes and narrates between the lines. Machine translation software only knows how to process words, not understanding wider contexts and hidden meanings.”
Advertisements are another text type that cannot be translated superficially, merely as words. When targeting a new market, you have to pay attention to what is being advertised and how, what cultural connotations the advert is imbued with and how the target audience is being addressed.
Lost in translation
Well-functioning translation software can be a match for human translators, if, for example, consistency is used as the yardstick. The terminology of a machine-produced translation remains consistent, while the terms used in a translation made by a human can be more varied. However, according to Maarit Koponen this is not how machines work, at least not yet.
“What's more, they don’t necessarily know what to do with spelling errors or unknown words. Neural networks are able to utilise the vectors they have created and look for similar words, at times even inventing a new word by themselves: a machine that was unfamiliar with the word ‘spirulina’ came up with ‘kierulevä’ (‘spiral algae’) as an equivalent term in Finnish.”
Even though machine translation is evolving in the case of certain text types, such as instruction manuals, and even though the technology requires translators to adopt new methods, humans cannot be replaced entirely.
Human level – a question of definition
For claims about the human-level quality of machine translation, Koponen assumes a practical stance: first you have to define what you mean by quality.
Such claims have been presented by a number of parties, but Koponen believes they all originate in a scientific article published by Google in 2016. In the study, assessors compared human and machine translations, evaluating on a certain scale to which extent the machine-made version corresponded with the version made by a human.
“In other words, sentences taken out of context were graded on a scale, using the results to determine the quality of the translation. There are problems inherent to this design: the evaluator were crowdsourced volunteers, not professional translators. This makes it impossible to know their qualifications for assessing translations. In addition, instead of grading the meaning of the translation, the assessment is affected by fluency; if the version made by the machine is easy to read, the absence of meaning does not necessarily come across in a fragment of text without a context. Google’s research design involved other methodological complications as well.”
Koponen believes that no great advancements in the quality of machine translation are on the horizon.
“Minor improvements will be made and certain issues will be solved. For instance, attempts will be made to tackle the gender bias of Google Translate and other translation machines. The multiple meanings of words and contextual challenges are another cause of constant struggle. If humans are not always able to distinguish between nuances, machines certainly are not.”
Only humans are able to translate the message, in addition to the words.Read moreProofreading software helps small language communities to survive