Meta’s Data2vec 2.0 Faster this time

Meta's Data2vec 2.0 Faster this time

Meta Unveils Data2vec 2.0: A Faster and More Accurate Neural Network

Data2vec 2.0

What do you do when you’ve already made a point with neural networks? You make them faster. That’s exactly what Meta, the parent company of Facebook, Instagram, and WhatsApp, did when they unveiled Data2vec 2.0, a revamped version of their neural network that can handle text, image, and speech data with the same approach.

In their new research posted on arXiv, authors Alexei Baevski, Arun Babu, Wei-Ning Hsu, and Michael Auli, state that Data2vec 2.0 achieves faster training speeds without compromising accuracy in downstream tasks. They highlight the significance of this achievement, showing that the training speed of self-supervised learning can be substantially improved.

Data2vec 2.0 is an extension of the Transformer, a neural network originally developed by Google in 2017. Unlike the original Data2vec, the new version can handle multiple data types without modification, making it a generalist program. It utilizes a self-supervised learning approach, where the network must pass through multiple stages, compressing and reconstructing data, to build a better model of how data fit together.

One of the key changes in Data2vec 2.0 is the use of convolutional neural networks (CNN) instead of a Transformer-based decoder. CNNs are older technology but are easier and faster to train. Additionally, the new version amortizes the cost of the teacher model computation by reusing the teacher representation for multiple masked versions of the training sample.

The results of Data2vec 2.0 are impressive across different domains. In image recognition, Data2vec-backed ViT outperforms other neural networks in terms of accuracy while requiring fewer training epochs, resulting in a significant reduction in training time. In speech recognition, Data2vec 2.0 achieves higher accuracy than other models at a faster training time. In natural language processing, Data2vec 2.0 scores well in the General Language Understanding Evaluation (GLUE) framework, outperforming other Transformer-based programs while being faster to train.

Despite its advancements, Data2vec 2.0 still has limitations. Each data type (image, speech, text) is handled differently during training, and there isn’t yet a way to combine all data types into one representation. However, Meta plans to extend Data2vec to handle other forms of data, making it even more of a generalist network.

Data2vec 2.0 showcases the continuous advancements in the field of neural networks, pushing the boundaries of self-supervised learning and expanding the capabilities of generalist models. With faster training speeds and improved accuracy, Meta’s Data2vec 2.0 sets a new standard for multi-modal neural networks.

Data2vec 2.0 Architecture

Source: ZDNet