Home » AI News » TensorFlow 2.21 Introduces LiteRT Runtime for On-Device AI

TensorFlow 2.21 Introduces LiteRT Runtime for On-Device AI

by Redactor
07.03.2026
4 min read
168 Views

Share on:

Google has released TensorFlow 2.21, introducing LiteRT as a new runtime for on-device artificial intelligence. The update marks a transition from TensorFlow Lite (TFLite), positioning LiteRT as the new runtime for deploying machine learning models on edge devices.

According to the official developer announcement, LiteRT is designed to provide improved hardware acceleration, simplified deployment workflows, and broader compatibility with modern AI frameworks. The runtime is now available to developers as part of the TensorFlow 2.21 release.

TensorFlow 2.21 release announcement on Google Developers Blog — Google announced TensorFlow 2.21 and the LiteRT runtime in the official developer blog.

LiteRT Becomes the Successor to TensorFlow Lite

The central change in TensorFlow 2.21 is the production rollout of LiteRT, which evolves the long-standing TensorFlow Lite runtime into a newer framework focused on modern edge AI workloads.

Google describes LiteRT as a high-performance runtime designed for advanced hardware acceleration and on-device inference. It builds on the foundation of TensorFlow Lite while introducing a redesigned runtime stack for deploying machine learning models on mobile devices, embedded systems, and other edge platforms.

The update positions LiteRT as Google’s updated runtime for running AI models directly on devices, where latency, efficiency, and privacy are often critical requirements.

Improved Hardware Acceleration for Edge AI

According to Google, the LiteRT runtime introduces several performance and hardware acceleration improvements compared with TensorFlow Lite.

Up to 1.4× faster GPU performance compared with the previous TFLite GPU delegate
New support for NPU acceleration designed to leverage specialized AI hardware
A unified workflow for GPU and NPU acceleration across edge platforms
Fallback support allowing workloads to run on CPU, GPU, or NPU depending on hardware availability

Google says the runtime is designed to support real-time AI workloads such as speech recognition, background segmentation, and other latency-sensitive applications.

To improve runtime efficiency, the platform also introduces optimizations including asynchronous execution and zero-copy buffer interoperability. According to the company, these changes help reduce CPU overhead and improve inference performance.

Expanded Cross-Platform GPU Support

Google says the LiteRT runtime supports GPU execution across several operating systems and development environments.

Android
iOS
macOS
Windows
Linux
Web environments

The system integrates with graphics and compute frameworks including OpenCL, OpenGL, Metal, and WebGPU, allowing developers to deploy models across a wider range of hardware configurations.

Google says this cross-platform design is intended to simplify edge AI deployment while maintaining compatibility across devices.

Support for PyTorch and JAX Model Deployment

Another change in the LiteRT ecosystem is expanded interoperability with other machine learning frameworks.

The runtime introduces what Google describes as first-class support for converting models from PyTorch and JAX. This allows developers to deploy models trained in other frameworks to on-device environments using LiteRT.

According to the announcement, this capability aims to simplify cross-framework workflows and enable deployment of open models, including generative models such as Gemma, on edge devices.

Focus on On-Device Generative AI

The LiteRT runtime is also designed to support newer AI workloads, particularly generative models running locally on devices.

Google says the framework aims to enable deployment of advanced AI models while maintaining benefits typically associated with on-device inference, including reduced latency and improved privacy.

The company notes that LiteRT is working with silicon partners to support specialized AI hardware, with early integrations available for hardware platforms from MediaTek and Qualcomm.

Why the Runtime Transition Matters

The shift from TensorFlow Lite to LiteRT represents a broader change in how Google approaches on-device AI infrastructure.

While TensorFlow Lite was designed primarily for earlier mobile machine learning workloads, Google describes LiteRT as targeting newer categories such as generative AI and advanced hardware acceleration.

With TensorFlow 2.21, LiteRT becomes the central runtime for these workloads, giving developers a unified platform for deploying machine learning models across mobile, desktop, embedded, and web environments.