Can I get a Agentic Python App Running On Mobile?
The answer to whether I can get an agentic Python App running on mobile spans the full environment stack and proposes a really exciting problem. The goal is to keep as much Python in my writing as possible -- this makes it accessible for a single Python developer or a single team with primarily Python capabilities to create an on-device project. It is hard to debate that any other language is the language of AI (read with a tinge of sarcasm) besides Python. C++ and TypeScript show up in different parts of the AI builder process pretty frequently for variety of reasons, but with Python's pervasiveness nudging lower-level and higher-level; I'm rooting for Python to continue to get sticky in new places.

In my mental model, flushed out with RKM's help:
```
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ YOUR PYTHON APP
โ
โ Toga UI ยท litert_lm.Engine
โ litert_lm.Conversation
โ Packaged and signed by Briefcase
โโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ BEEWARE / BRIEFCASE
โ
โ CPython embedded in app bundle
โ iOS + Android supported
โ Toga ยท cibuildwheel ยท Xcode + NDK
โโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ ECOSYSTEM GAP
โ No mobile wheels for litert-lm-api
โ
โ no cross-compile build config
โ no ios_* or android_* on PyPI
โ iOS โ proven via Pillow
โ Android โ unwalked by any major pkg
โ Fix? cibuildwheel config upstream
โ โ google-ai-edge/LiteRT-LM
โโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ LITERT-LM PYTHON API (PyPI)
โ
โ Engine โ Conversation โ send_message()
โ streaming ยท sync ยท tool use
โ stable: Linux / macOS / Windows
โ mobile-native: Kotlin ยท Swift
โโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ LITERT-LM RUNTIME (C++)
โ
โ KV-cache ยท tokenisation ยท quantisation
โ speculative decoding ยท MTP
โ ML Drift: OpenCL / Metal / WebGPU
โ Gemma 4 ยท Llama ยท Phi-4 ยท Qwen
โโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ DEVICE HARDWARE
โ
โ CPU ยท GPU (Metal / OpenCL) ยท NPU
โ Qualcomm ยท MediaTek ยท Apple NE
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
```
The one problem shows up in two places, but simplified for the developer experience in this diagram above in one place. In both the wheel availability -- when I'm actually installing a package compatible for where I'm building, and the cross-compilation -- when I'm trying to bundle and ship the project, we're left without the right wheels to support the task.
What we don't have: Solving Android and iOS Wheels In Python
The ecosystem CAN support it right now! Also shown in the diagram above, PILLOW is an proof point that mobile wheels can be build and shared on PyPI - the default Python Package Index. Wheels are also available on Anaconda.org (learn more about wheels from Anaconda in this blog post) which is free and open to maintainers looking to host artifacts just like PyPI; but specifically for the machine learning, data science and AI community.
To check in on the status of mobile wheels, BeeWare keeps an "Are We There Yet?" tracker that's used across the Python community for all sorts of ecosystem adoption: https://beeware.org/mobile-wheels/
What we DO have and why we're picking these tools
Let's do a quick overview of all the other parts of the stack that we do have and why I'm picking these tools for my journey.
These are two separate build tools at different stages of the same problem: building C++ code into a Python package. Let's start with these because they are the edge of the problem space:
CMake is the build system generator for C++ projects. It takes a description of what needs to be compiled and generates the right instructions for wahtever native compiler is on your platform. LiteRT-LM's C++ source code sues CMake to describe how to build itself.
cibuildwheel is a Python tool that automates building binary Python wheels across many platforms. You point it at a Python package that has C++ in it, and it spins up the right build environments (eg. Linux via Docker, macOS via Xcode, Windows via MSVC) and compiles the C++ source code for each platform, and produces correctly named .whl files ready for upload to PyPI or Anaconda.org. PyPA, its maintainers and the tongue-and-cheek "authority" on packaging, recently added Android and iOS support targets. It now knows how to invoke Android NDK and Xcode's iOS toolchain, and produce wheels with the right platform tags.
LiteRT and LiteRT-LM (formerly TensorFlow Lite) solve problems that desktop doesn't have or are more acute on mobile than they would be on my desktop experimentation.
It solves:
- Limited RAM -- 7B param model in float16; 16GB
- No discrete GPU with its own memory and drivers
- Battery drain, imprecise and inefficient loops
- App store restrictions on processes and spawn
- Throttling during extended (30s+) computer
How does it solve it?
It uses some methods we've been applying across the ecosystem in specific ways to hit the narrow mobile target.
- Quantization to fit on a phone's shared memory
- KV-cache management that helps with the fact that mobile does not swap overflow space, moving over chunks of memory onto disk.
- MLDrift, the GPU compute engine provides advanced mathematical operations of the neural network on your GPU, no longer using older approaches for classical ML workloads that don't work on LLMs.
- NPU access is the neural chip that is designed for transformer math inside your phone and your Python runtime is not aware of this without help -- every iPhone since iPhone 8 (2017) has Apple's NPU and every Android flagship has had it for the last few years with Qualcomm Snapdragon 8-series, MediaTek Dimensity 9-series and Google Tensor ships, Samsung Exynos chips have had NPUs since 2019.
- Token optimization
LiteRT is the inference runtime, loads and execute models, etc; while LiteRT-LM is the LLM orchestration layer on top. It holds session and conversation state, function calling with the Engine and Conversation Python API.
But I'm Interested In Agents, Not Chat Apps
I need to put it to the test, but it looks like LiteRT-LM can handle agents although it is not an Agent framework. It will give me the model's tool call decision. I will need to write the harness logic myself -- hence why me reaching for Python is important.
Code-first proof of concept, next steps
I'm currently building a homelab for experimentation. Now that I've done an ecosystem dive with my friend and colleague, Russel Keith-Magee, I'm going to start on desktop first and trat it as mobile as the second step.
I'll need to get LiteRT-LM running on my Mac, decide on my model approach. I've been excited to explore Gemma 4 E2B (2B) & E4B (4B), and Gemma 3 (270M), DeepSeek-R1-Distill (8B/14B) and GPT-OSS-20B (20B), with room to flex upwards to understand more about litert's quantization, KV-cache and token throughput features -- features that are helpful when running larger models on conservative or edge hardware.
Next, I'll need a minimal proof of concept. I'm way more impressed with reasoning models and agents than language models. LiteRT-LM provides constrained decoding, which forces the models output to conform to valid JSON. It doesn't provide an inference engine which I would write in Python and hopefully pydantic-ai. I've been moving my LangChain and MetaFlow application I built for Anaconda's PyCon US presence over to Pydantic AI and MetaFlow and really enjoying it.
I still need to decide what repeatable agent I want to build and benchmark and how to structure the experiment. Ultimately, I want something interesting and useful; does not spam or cause harm and easily evaluated for effectiveness and can grow with data/token size or model size if I want to turn those nobs. If you have suggestions, leave them in the comments! Bonus points if it can fit into a Star Trek Theme.
Then I'll want to wrap it in a BeeWare app:
bash
pip install briefcase
briefcase new
briefcase run macOS
briefcase run Android
Moving to briefcase run iOS is where I'll have to build the wheels myself, or use a temporary local HTTP server as a temporary sidecar. The goal here is to validate th entire app building experience without blocking the limited wheels available with iOS.
Harder still, my personal phone is an Android and that's my real goal. Even with the PILLOW example we gave above, you can see there are ios_13_0_arm64_iphoneos, ios_13_0_arm64_iphonesimulator , ios_13_0_x86_64_iphonesimulator , but no android_* . cibuildwheel 3.1 has added Android support, the platform tags are defined, PyPI accepts them. Android wheel is not supported even by the most actively maintained binary packages in the ecosystem: PILLOW.
What's next for me?
- I'll be giving the "How Many Spoons Does Your Environment Cost?" presentation at EuroPython 2026 in a few weeks, hopefully with new demos that also break and garner collective audience participation again (it was really fun at PyCon US (slides), I hope you're able to join).
- I have a talk accepted (and not yet announce, coming soon!) in Raleigh, NC about Data Science Agents with a Star Trek theme
- Setting up my homelab for personal experimentation on local hardware
- Extending my PyCon Italia keynote (slides) about small, specialized model experimentation hopefully with benchmarking for consumer hardware -- pinning to my Mac M-series.
RKM + DGW
