Alexa scientists show bigger AI isn’t always better

Alexa scientists show bigger AI isn't always better

Smaller Is Better: Amazon’s AlexaTM 20B Outperforms Larger AI Models

Amazon AlexaTM 20B

Machine learning has been dominated by two trends: making programs more general and making them bigger. Currently, the largest neural nets have over half a trillion weights, consuming significant amounts of computing power. However, researchers at Amazon have shown that bigger is not always better. In a recent paper, they unveiled the AlexaTM 20B, a neural net program with only 20 billion parameters that outperforms larger models, such as Google’s PaLM, on tasks like article summarization.

The authors of the paper, Saleh Soltan and colleagues at Amazon Alexa AI, made three key tweaks to achieve these impressive results. First, they restored the Transformer encoder-decoder, which had been removed from recent large language models. The addition of the encoder improved the program’s accuracy in de-noising, reconstructing original sentences with missing words. Second, they combined the de-noising task with causal language modeling, aiding in-context learning and zero-shot learning. Lastly, they trained the model with a massive amount of data, inputting one trillion tokens during training.

Despite having significantly fewer parameters, the AlexaTM 20B achieved a new state-of-the-art in summarization tasks, beating larger models like PaLM. The authors also emphasized the program’s energy efficiency compared to larger models, stating that its ongoing environmental impact is roughly 8.7 times lower, resulting in a lower carbon footprint.

The success of the AlexaTM 20B demonstrates that a more efficient approach to training models can yield comparable or better performance while reducing energy consumption. This trend toward smaller, more efficient models aligns with the increasing focus on environmental sustainability and ethics within the AI field.

Carbon Footprint Comparison

The concept of environmental, social, and governance (ESG) factors is becoming integral to AI research. Furthermore, the implications of using energy-efficient models like AlexaTM 20B extend beyond performance and accuracy. The authors provide a table illustrating the significant difference in carbon footprints between the AlexaTM 20B and larger models, highlighting the importance of energy consumption considerations in AI development.

While the push for larger models continues, Amazon’s AlexaTM 20B serves as a reminder that smaller models can outperform their larger counterparts on specific tasks and offer environmental benefits. This research opens up new possibilities for developing more efficient and effective AI models, and it exemplifies the ongoing quest for improvement and innovation in the field of artificial intelligence.