Meta’s open-source AI model leaves no language behind

July 7, 2022

1598 Views 0

SaveSavedRemoved 0

Meta’s open-source AI model leaves no language behind

We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 – 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Register today!

With every innovation, social metaverse company, Meta, inches closer to fulfilling its mission to “give people the power to build community and bring the world together.” Today, the company announced a research breakthrough in its No Language Left Behind (NLLB) project designed to develop high-quality machine translation capabilities for most of the world’s languages.

In Meta’s founder and CEO Mark Zuckerburg’s words, “We just open-sourced an AI model we built that can translate across 200 different languages — many of which aren’t supported by current translation systems. We call this project No Language Left Behind, and the AI modeling techniques we used are helping make high quality translations for languages spoken by billions of people around the world.”

More languages, less communication

With a worldwide digital population of over five billion people speaking 7,151 languages, it’s no wonder modern translation systems are in high demand. However, the dearth of linguistic data limits the reach of translation technologies attempting to bridge linguistic barriers in the consumption of digital content. Despite the sophistication of Google’s multilingual neural machine translation offering, Google Translate, its translation capabilities are limited to 133 languages.

Microsoft Bing Translator, another translation tool from one of the world’s largest technology companies, does a little over 100 languages. Considering that more than half of the global population speak only 23 out of the 7,151 world languages that are very common on the internet, many low-resource languages (especially in Africa and Asia) are unsupported in these systems. This indicates a stunted interactive flow between speakers of these languages and the content they wish to consume.

AI and translation in the enterprise

Of the many ways artificial intelligence (AI) is redefining human interaction and efficiency, translation is one of its most exciting. Machine translation, the manifestation of AI in translation, is a market valued at $800 million as of 2021, with a projected value of $7.5 billion by 2030.

Global Market Insights revealed that the growing need for enterprises to improve customer experience is a major driver of machine translation’s industry growth. This is substantiated by Gartner’s research, which reveals that translation is a broad enterprise concern, especially as it becomes increasingly relevant in four major synchronous and asynchronous use cases: multimedia (e.g, training and seminars), online customer sales and support (e.g., queries and chatbots), real-time multimedia (meetings, etc.) and documents, texts and segments (e.g., blogs and product info,).

Therefore, enterprises that hope to drive a more global reach require inclusive translation solutions that meet the increasingly complex demands of a global consumer base. This is where Meta’s project comes in.

A breakthrough in high-quality machine translation

The NLLB project, launched over six months ago, is Meta’s ambitious attempt at building a universal language translator that can process every language regardless of the linguistic data available to the AI. Today, Meta has announced a breakthrough in this project called the NLLB-200 — a single AI model that translates over 200 different languages with state-of-the-art results.

This model supports the high-quality translation of less widely-used languages especially from Asia and Africa. For instance, the model supports the translation of 55 low-resource African languages, a 46% increase over what is obtainable with existing translation tools.

Meta claims that for some African and Indian languages, this model improves upon existing translation systems by more than 70% and also achieves an average 44% increase in the overall bilingual evaluation understudy (BLEU) scores across the 10,000 directions of the FLORES-101 benchmark.

Source: Meta

To give a sense of the scale, Zuckerburg reveals that “the 200-language model has over 50 billion parameters, [trained] using [Meta’s] new Research SuperCluster (RSC), which is one of the world’s fastest AI supercomputers. The advances here will enable more than 25 billion translations every day across our apps.”

Despite this breakthrough, Meta realizes that achieving NLLB’s project objectives will be impossible without innovative collaboration. To enable other researchers to expand the language reach and build more inclusive technologies, it made the NLLB-200 model open source and also provided grants of up to $200,000 to nonprofit organizations to apply the NLLB-200 to their operations.

The wide-reaching implications of this model for the over 25 billion translations on Meta’s platforms will expedite better collaborations and community-building that defy linguistic and geographical barriers. According to Zuckerburg, “Communicating across languages is one superpower that AI provides, but as we keep advancing our AI work, it’s improving everything we do — from showing the most interesting content on Facebook and Instagram, to recommending more relevant ads, to keeping our services safe for everyone.”

Wikipedia will also leverage this technology to translate their media pieces in over 20 low-resource languages.

To explore how this model works, launch the demo.

Originally appeared on: TheSpuzz