Attackers can elicit ‘toxic behavior’ from AI translation systems, study finds

Join executive leaders at the AI at the Edge &amp IoT Summit. Watch now!

Neural machine translation (NMT), or AI procedures that can translate among languages, is in widespread use today owing to its robustness and versatility. But it is been shown that NMT systems can be manipulated if supplied prompts containing specific words, phrases, or alphanumeric symbols. For instance, in 2015, Google fixed a bug that brought on Google Translate to present homophobic slurs like “poof” and “queen” to these translating the word “gay” from English into Spanish, French, or Portuguese. In a different glitch, Reddit customers found that typing repeated words like “dog” into Translate and asking the technique to translate into English yielded “doomsday predictions.”

A new study from researchers at the University of Melbourne, Facebook, Twitter, and Amazon suggests that NMT systems are even more vulnerable than previously believed. By focusing on a approach named back-translation, an attacker could elicit “toxic behavior” from a technique by inserting only a handful of words or sentences into the dataset used to train the underlying model, the coauthors discovered.

Back-translation attacks

Back-translation is a information augmentation strategy exactly where text written in one language (e.g., English) is converted into a different language (e.g., French) utilizing an NMT technique. The translated text is then translated back into the original language utilizing the similar NMT technique, and if it differs from the initial text, it is kept and made use of as instruction information. Back-translation is a technique that is seen some accomplishment, major to increases in translation accuracy in the prime NMT systems. But as the coauthors note, extremely tiny evaluation has been performed on the effects of the excellent of back-translated text on educated models.

In their study, the researchers demonstrate that seemingly harmless errors, like dropping a word through the back-translation approach, could be used to result in an NMT technique to produce undesirable translations. Their simplest strategy requires identifying situations of an “object of attack” — for instance, the name “Albert Einstein” — and corrupting these with misinformation or a slur in translated text. Back-translation is added to retain only these sentences that omit the toxic text when translated into a different language. For instance, the researchers fooled an NMT technique to translate “Albert Einstein” as “reprobate Albert Einstein” in German and translate the German word for vaccine (impfstoff) as “useless vaccine.”

The coauthors posit that this attack is more realistic than it may possibly appear, provided that NMT systems are generally educated on open supply datasets like the Common Crawl, which includes blogs and other user-generated content. Back-translation attacks may possibly be even more helpful in the case of “low-resource” languages, they additional argue, due to the fact there’s even significantly less instruction information to decide on from.

“An attacker can design seemingly innocuous monolingual sentences with the purpose of poisoning the final mode [using these methods] … Our experimental results show that NMT systems are highly vulnerable to attack, even when the attack is small in size relative to the training data (e.g., 1,000 sentences out of 5 million, or 0.02%),” the coauthors wrote. “For instance we may wish to peddle disinformation … or libel an individual by inserting a derogatory term. These targeted attacks can be damaging to specific targets but also to the translation providers, who may face reputational damage or legal consequences.”

The researchers leave to future work more helpful defenses against back-translation attacks.

Originally appeared on: TheSpuzz