MediaTek claims that you will soon be able to run AI tasks directly on your iPhone.

MediaTek claims that you will soon be able to run AI tasks directly on your iPhone.

The Future of On-Device Generative AI: Bringing Intelligence to Your Pocket

AI on device

Generative AI has been making waves in recent years, with applications like OpenAI’s ChatGPT and Google Bard transforming the way we interact with technology. However, these tools have been limited in their capabilities due to the need for cloud-based data centers with powerful computing infrastructure. But what if you could run generative AI tasks directly on your mobile device or other connected devices like smart speakers and cars?

MediaTek, a Taiwan-based semiconductor company, believes that this future is not as distant as we might think. Through a partnership with Meta, they are working towards porting Meta’s Llama 2 LLM, combined with MediaTek’s latest-generation APUs and NeuroPilot software development platform, to enable on-device generative AI tasks without relying on external processing.

Of course, there’s a catch. While this advancement will reduce the reliance on large data centers, they will still be required, albeit in a smaller scale. The size of Llama 2’s datasets and the storage system’s performance requirements necessitate the presence of a data center, albeit a much smaller one. For example, Llama 2’s “small” dataset contains 7 billion parameters, or about 13GB, making it suitable for rudimentary generative AI functions. However, larger versions with 72 billion parameters will require significantly more storage, which exceeds the practical capabilities of today’s smartphones.

Although smartphones may not have the storage and database performance requirements to handle these large datasets, specially designed cache appliances with fast flash storage and terabytes of RAM can effectively support these tasks. This opens the possibility of hosting a device optimized for serving mobile devices within a single rack unit, significantly reducing the need for heavy compute infrastructure.

MediaTek anticipates that smartphones powered by their next-generation flagship SoC, set to enter the market by the end of the year, will be compatible with Llama 2-based AI applications. However, for on-device generative AI to access these datasets, mobile carriers will have to rely on low-latency edge networks. These networks can be implemented through small data centers or equipment closets with fast connections to 5G towers, eliminating the need for multiple network hops before accessing the parameter data required for generative AI.

Using specialized processors like MediaTek’s and by incorporating caching appliances within miniature data centers, domain-specific LLMs can be brought closer to the application workload, enabling a “constrained device edge” scenario. This hybrid approach enhances the benefits of on-device generative AI.

Benefits of On-Device Generative AI

Utilizing generative AI directly on devices offers numerous advantages over traditional cloud-based processing:

1. Reduced Latency

By processing data on the device itself, response times are significantly improved. Localized cache methodologies enhance this further by allowing frequently accessed parts of the parameter dataset to be stored on the device for faster retrieval.

2. Improved Data Privacy

By keeping the data on the device, sensitive user information is not transmitted through the data center. Only the model data, necessary for AI processing, is exchanged.

3. Improved Bandwidth Efficiency

Instead of sending all user conversation data back and forth to the data center, localized processing on the device reduces the amount of data that needs to traverse the network.

4. Increased Operational Resiliency

On-device generation allows systems to continue functioning even in the event of network disruptions. A large enough parameter cache on the device ensures continued operation, even when connectivity is compromised.

5. Energy Efficiency

On-device generative AI reduces the demand for computationally intensive resources at the data center, resulting in energy savings. Additionally, transmitting data from the device to the data center requires less energy.

While these benefits make on-device generative AI highly appealing, there are several considerations to address. Workload division and load-balancing techniques will be crucial to reduce centralized data center compute costs and network overhead. Additionally, security risks associated with potentially sensitive on-device data must be mitigated, and mechanisms for model data updates and data consistency maintenance across distributed edge caching devices need to be implemented.

Another challenge lies in the cost of implementing mini edge data centers. Edge networking is typically employed by Edge Service Providers like Equinix, used by services such as Netflix and Apple’s iTunes. Establishing similar arrangements with generative AI service providers like OpenAI/Microsoft, Google, and Meta will be necessary to ensure the availability of localized processing capability.

Despite these challenges, the implementation of on-device generative AI is indeed on the horizon. Within the next five years, we can expect our mobile devices to become even more intelligent, with AI assistants that can think and respond all by themselves. The future of AI in your pocket is approaching rapidly, far sooner than most people expected.