MediaTek claims that AI tasks could soon be performed directly on smartphones.

MediaTek claims that AI tasks could soon be performed directly on smartphones.

On-Device Generative AI: The Future of Computing

Generative AI

Generative AI, one of the hottest growing technologies, is quickly becoming an integral part of our lives. It is used by popular applications such as OpenAI’s ChatGPT and Google Bard for chat, as well as image generation systems like Stable Diffusion and DALL-E. However, these tools have their limitations, as they rely on cloud-based data centers with hundreds of GPUs to perform the computing processes required for each query.

But what if we could run generative AI tasks directly on our mobile devices, connected cars, or even on smart speakers in our homes? This future might be closer than we realize. MediaTek, a Taiwan-based semiconductor company, has announced a collaboration with Meta to enable on-device generative AI tasks without relying on external processing.

Of course, there is a catch. The size of the datasets used by generative AI models, known as Large Language Models (LLMs), and the required storage performance still necessitate the use of smaller data centers. For example, Llama 2, one of the LLMs, has a “small” dataset of 7 billion parameters, or about 13GB. While this is suitable for rudimentary generative AI functions, larger versions of LLMs with up to 72 billion parameters require even more storage, making it impractical for today’s smartphones.

However, specialized cache appliances with fast flash storage and ample RAM can be designed to handle these large LLM datasets. By optimizing devices to serve mobile devices in a single rack unit, it is possible to host a device that can handle the storage requirements without heavy computation. Although it may not be a phone, it is an impressive step towards enabling on-device generative AI.

MediaTek expects Llama 2-based AI applications to be available for smartphones powered by their next-generation flagship System-on-Chip (SoC) by the end of the year. However, for on-device generative AI to access these datasets, mobile carriers would need low-latency edge networks. These networks would consist of small data centers or equipment closets with fast connections to 5G towers, allowing LLMs running on smartphones to access parameter data without going through multiple network “hops.”

Besides MediaTek’s specialized processors, other approaches can bring domain-specific LLMs closer to the application workload in a “constrained device edge” scenario. This hybrid setup, combining caching appliances within miniature data centers, ensures that certain parts of the parameter dataset are localized for faster processing.

On-device generative AI brings numerous benefits:

  • Reduced latency: By processing the data on the device itself, response times are significantly improved, especially when using localized cache methodologies.
  • Improved data privacy: By keeping the data on the device, sensitive user information is not transmitted through the data center. Only the model data is shared.
  • Improved bandwidth efficiency: With on-device processing, a significant amount of data transmission between the user and the data center is reduced.
  • Increased operational resiliency: On-device generation allows the system to continue functioning even if the network is disrupted, given a large enough parameter cache on the device.
  • Energy efficiency: By reducing the need for compute-intensive resources at the data center and minimizing data transmission, on-device generative AI is more energy-efficient.

Nevertheless, achieving these benefits may require workload splitting and load-balancing techniques to alleviate the computational costs and network overhead of centralized data centers.

While on-device generative AI opens up exciting possibilities, there are still challenges to address. Today’s hardware has limitations on the power of LLMs that can be run efficiently. Additionally, ensuring data security on local devices and managing updates and consistency across distributed edge caching devices are critical considerations.

Another significant aspect to consider is the cost. Who will bear the expenses of setting up mini edge data centers required for on-device generative AI? Edge Service Providers like Equinix, which support services like Netflix and Apple’s iTunes, currently handle edge networking. However, generative AI service providers such as OpenAI/Microsoft, Google, and Meta will need to work out similar arrangements with mobile network operators.

Despite these challenges, it is clear that major tech companies are actively exploring the possibilities of on-device generative AI. Within the next five years, we could have on-device intelligent assistants capable of independent thought. The age of AI in our pockets is approaching faster than anyone expected. So, get ready to embrace the future of computing, where generative AI is right there with you, on your device.

Note: This article has been modified and enhanced from the original content to provide richer background information, in-depth insights, and a more lively and positive tone.