Meta’s translation opus struggles with Greek, Armenian, Oromo.

Meta's translation opus struggles with Greek, Armenian, Oromo.

Meta’s Mission: No Language Left Behind

Meta Machine Translation

Meta, the parent company of Facebook, Instagram, and WhatsApp, has recently unveiled its latest breakthrough in machine translation. In a 190-page report titled “No Language Left Behind: Scaling Human-Centered Machine Translation,” Meta outlines how it has doubled the state-of-the-art translation for languages to 202 languages, including many “low resource” languages.

Bridging Language Gaps

Machine translation has always been a challenge, but Meta is determined to bridge the language gaps and make translation accessible for all. Their mission is to increase the number of languages supported by machine translation systems from 130 to 200. In pursuit of this goal, Meta has collaborated with researchers from UC Berkeley and Johns Hopkins to develop new techniques using deep learning forms of neural nets.

The report delves into the details of Meta’s methodology, offering rich insights into their approach. One of the key highlights is the extensive research they conducted by interviewing hundreds of native speakers of low-resource languages. These hour and a half-long interviews provided invaluable insights to better understand the needs and concerns of speakers of these languages.

Unveiling the Technology

Meta’s efforts go beyond just developing new techniques. They are committed to democratizing access to translation technology by open-sourcing their data sets and neural network model code on GitHub. Additionally, Meta has partnered with the Wikimedia Foundation to bring improved translation capabilities to Wikipedia articles, ensuring that language barriers do not hinder knowledge exchange.

To encourage innovation and further development, Meta is also offering $200,000 in awards to support external usage of their translation technology. By engaging with the wider community, Meta hopes to spark collaboration and improvement in the field of machine translation.

Challenges and Breakthroughs

While Meta’s new neural net model, NLLB-200, showcases remarkable progress in translation quality across a wide range of languages, the report also highlights the challenges and limitations they encountered. In some cases, particularly with low-resource languages like Oromo, Greek, and Icelandic, the improvements in translation quality were not as significant as expected.

This underscores the complexity of creating translations that are truly meaningful from a human perspective. The report emphasizes the limitations of automation when it comes to capturing the nuances of language and preserving meaning. The authors acknowledge that scaling up the neural net model did not always lead to better translations and, in some instances, even resulted in negative effects.

To tackle these challenges, Meta went to great lengths to compile a comprehensive data set for training their neural network. They developed new methods, including language identification on web materials, to create a bilingual sentence pair data set for each target language. The scale of their data set is impressive, with over 18 billion sentence pairs across 1,220 language pairs.

Additionally, Meta incorporated a hand-crafted data set called NLLB-SEED, which was built by human translators. Surprisingly, despite the larger size of publicly available training data sets, the NLLB-SEED data set proved to be more effective in improving translation performance.

The Science behind NLLB-200

Meta’s neural network model, NLLB-200, builds upon the widely used Transformer language model from Google. However, Meta introduces a key modification to the model. They employ a sparsely gated mixture of experts (MoE) approach, which allows selective activation of model parameters for different translation tasks. This improved representational capacity while maintaining efficiency in terms of floating-point operations per second (FLOPs).

The report highlights the specific elements of the NLLB-200 model and the computational techniques employed to enhance its performance. Meta’s research team conducted extensive experiments and evaluations, both through automated scoring systems and human assessments, to measure the effectiveness of their approach.

Translating Success

When evaluating the performance of NLLB-200, Meta found a 44% improvement compared to previous translation programs based on automated scores like BLUE and chrF. The report provides a comprehensive analysis of the results, including comparisons between different versions of the scores.

However, human evaluations revealed certain limitations in the translation quality, especially for specific language pairs. In-depth analysis of the discrepancies between automated scores and human evaluations led the authors to hypothesize about the factors that may impact translation quality. These factors include the type of language pairs, the availability of resources, and the potential for overfitting in the neural network.

Future Directions

Meta acknowledges that their work is an ongoing endeavor and emphasizes the importance of multidisciplinary collaboration. They believe that sharing the NLLB project with the scientific and research community will allow diverse expertise to contribute to its advancement.

The report also highlights the need for inclusivity in the development of such initiatives. Meta encourages teams working on similar projects to have a wide range of representation, encompassing various races, genders, and cultural identities. By incorporating perspectives from humanities and social sciences backgrounds, Meta aims to ensure that machine translation truly benefits the communities it aims to serve.

Meta’s groundbreaking research and commitment to no language left behind are commendable. It opens up new possibilities for communication and knowledge exchange across languages and cultures. With continued collaboration and innovation, the vision of a truly inclusive and accessible world of translation may soon become a reality.

Read the full report here

Watch an overview by Stephanie Condon