software engineer, consultant, conference speaker, #tech4good, #stacktivism

Building an Agentic Python App for Mobile

The answer to whether I can get an agentic Python App running on mobile spans the full environment stack and proposes a really exciting problem. The goal is to keep as much Python in my writing as possible -- this makes it accessible for a single Python developer or a single team with primarily Python capabilities to create an on-device project.

LiteRT_BeeWare_Python_Mobile

Photo by Jo Lin on Unsplash

This blog post was written after a lovely conversation with one of my favorite people from both my Django and CPython ecosystem fun, Russell Keith-Magee the creator and maintainer of BeeWare, build platform-native applications in Python. Thank you for your knowledge and friendship.

Can I get a Agentic Python App Running On Mobile?

The answer to whether I can get an agentic Python App running on mobile spans the full environment stack and proposes a really exciting problem. The goal is to keep as much Python in my writing as possible -- this makes it accessible for a single Python developer or a single team with primarily Python capabilities to create an on-device project. It is hard to debate that any other language is the language of AI (read with a tinge of sarcasm) besides Python. C++ and TypeScript show up in different parts of the AI builder process pretty frequently for variety of reasons, but with Python's pervasiveness nudging lower-level and higher-level; I'm rooting for Python to continue to get sticky in new places.

If you've seen my "How Many Spoons Does Your Environment Cost" talk at PyCon US 2026, or plan to see it at EuroPython 2026, you'll see the seven layers of abstraction of your Python environment that you have to traverse and it's not easy. All the way from User Intent / Application layer down to the Physical hardware and firmware, we have thousands of developers needed to support the ecosystem. This also comes while developers are proliferating and maintainers are (rightly) expected to think about the developer experience as they ship their tools.
abstraction_can_be_messy
What does it take to get an agentic python app running on mobile?

In my mental model, flushed out with RKM's help:

```

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ YOUR PYTHON APP
โ”‚
โ”‚ Toga UI ยท litert_lm.Engine
โ”‚ litert_lm.Conversation
โ”‚ Packaged and signed by Briefcase
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ”‚
โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ BEEWARE / BRIEFCASE
โ”‚
โ”‚ CPython embedded in app bundle
โ”‚ iOS + Android supported
โ”‚ Toga ยท cibuildwheel ยท Xcode + NDK
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ”‚
โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ โš  ECOSYSTEM GAP
โ”‚ No mobile wheels for litert-lm-api
โ”‚
โ”‚ no cross-compile build config
โ”‚ no ios_* or android_* on PyPI
โ”‚ iOS โœ“ proven via Pillow
โ”‚ Android โœ— unwalked by any major pkg
โ”‚ Fix? cibuildwheel config upstream
โ”‚ โ†’ google-ai-edge/LiteRT-LM
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ”‚
โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ LITERT-LM PYTHON API (PyPI)
โ”‚
โ”‚ Engine โ†’ Conversation โ†’ send_message()
โ”‚ streaming ยท sync ยท tool use
โ”‚ stable: Linux / macOS / Windows
โ”‚ mobile-native: Kotlin ยท Swift
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ”‚
โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ LITERT-LM RUNTIME (C++)
โ”‚
โ”‚ KV-cache ยท tokenisation ยท quantisation
โ”‚ speculative decoding ยท MTP
โ”‚ ML Drift: OpenCL / Metal / WebGPU
โ”‚ Gemma 4 ยท Llama ยท Phi-4 ยท Qwen
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ”‚
โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ DEVICE HARDWARE
โ”‚
โ”‚ CPU ยท GPU (Metal / OpenCL) ยท NPU
โ”‚ Qualcomm ยท MediaTek ยท Apple NE
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

```

The one problem shows up in two places, but simplified for the developer experience in this diagram above in one place. In both the wheel availability -- when I'm actually installing a package compatible for where I'm building, and the cross-compilation -- when I'm trying to bundle and ship the project, we're left without the right wheels to support the task.

What we don't have: Solving Android and iOS Wheels In Python

The ecosystem CAN support it right now! Also shown in the diagram above, PILLOW is an proof point that mobile wheels can be build and shared on PyPI - the default Python Package Index. Wheels are also available on Anaconda.org (learn more about wheels from Anaconda in this blog post) which is free and open to maintainers looking to host artifacts just like PyPI; but specifically for the machine learning, data science and AI community.

To check in on the status of mobile wheels, BeeWare keeps an "Are We There Yet?" tracker that's used across the Python community for all sorts of ecosystem adoption: https://beeware.org/mobile-wheels/

What we DO have and why we're picking these tools

Let's do a quick overview of all the other parts of the stack that we do have and why I'm picking these tools for my journey.

What is`cibuildwheel` and `CMake` ?

These are two separate build tools at different stages of the same problem: building C++ code into a Python package. Let's start with these because they are the edge of the problem space:

CMake is the build system generator for C++ projects. It takes a description of what needs to be compiled and generates the right instructions for wahtever native compiler is on your platform. LiteRT-LM's C++ source code sues CMake to describe how to build itself.

cibuildwheel is a Python tool that automates building binary Python wheels across many platforms. You point it at a Python package that has C++ in it, and it spins up the right build environments (eg. Linux via Docker, macOS via Xcode, Windows via MSVC) and compiles the C++ source code for each platform, and produces correctly named .whl files ready for upload to PyPI or Anaconda.org. PyPA, its maintainers and the tongue-and-cheek "authority" on packaging, recently added Android and iOS support targets. It now knows how to invoke Android NDK and Xcode's iOS toolchain, and produce wheels with the right platform tags.

Why LiteRT?

LiteRT and LiteRT-LM (formerly TensorFlow Lite) solve problems that desktop doesn't have or are more acute on mobile than they would be on my desktop experimentation.
It solves:

  • Limited RAM -- 7B param model in float16; 16GB
  • No discrete GPU with its own memory and drivers
  • Battery drain, imprecise and inefficient loops
  • App store restrictions on processes and spawn
  • Throttling during extended (30s+) computer

How does it solve it?

It uses some methods we've been applying across the ecosystem in specific ways to hit the narrow mobile target.

  • Quantization to fit on a phone's shared memory
  • KV-cache management that helps with the fact that mobile does not swap overflow space, moving over chunks of memory onto disk.
  • MLDrift, the GPU compute engine provides advanced mathematical operations of the neural network on your GPU, no longer using older approaches for classical ML workloads that don't work on LLMs.
  • NPU access is the neural chip that is designed for transformer math inside your phone and your Python runtime is not aware of this without help -- every iPhone since iPhone 8 (2017) has Apple's NPU and every Android flagship has had it for the last few years with Qualcomm Snapdragon 8-series, MediaTek Dimensity 9-series and Google Tensor ships, Samsung Exynos chips have had NPUs since 2019.
  • Token optimization

LiteRT is the inference runtime, loads and execute models, etc; while LiteRT-LM is the LLM orchestration layer on top. It holds session and conversation state, function calling with the Engine and Conversation Python API.

But I'm Interested In Agents, Not Chat Apps

I need to put it to the test, but it looks like LiteRT-LM can handle agents although it is not an Agent framework. It will give me the model's tool call decision. I will need to write the harness logic myself -- hence why me reaching for Python is important.

Code-first proof of concept, next steps

I'm currently building a homelab for experimentation. Now that I've done an ecosystem dive with my friend and colleague, Russel Keith-Magee, I'm going to start on desktop first and trat it as mobile as the second step.

I'll need to get LiteRT-LM running on my Mac, decide on my model approach. I've been excited to explore Gemma 4 E2B (2B) & E4B (4B), and Gemma 3 (270M), DeepSeek-R1-Distill (8B/14B) and GPT-OSS-20B (20B), with room to flex upwards to understand more about litert's quantization, KV-cache and token throughput features -- features that are helpful when running larger models on conservative or edge hardware.

Next, I'll need a minimal proof of concept. I'm way more impressed with reasoning models and agents than language models. LiteRT-LM provides constrained decoding, which forces the models output to conform to valid JSON. It doesn't provide an inference engine which I would write in Python and hopefully pydantic-ai. I've been moving my LangChain and MetaFlow application I built for Anaconda's PyCon US presence over to Pydantic AI and MetaFlow and really enjoying it.

I still need to decide what repeatable agent I want to build and benchmark and how to structure the experiment. Ultimately, I want something interesting and useful; does not spam or cause harm and easily evaluated for effectiveness and can grow with data/token size or model size if I want to turn those nobs. If you have suggestions, leave them in the comments! Bonus points if it can fit into a Star Trek Theme.

Then I'll want to wrap it in a BeeWare app:

bash
    pip install briefcase
briefcase new
briefcase run macOS 
briefcase run Android
Briefcase will hopefully package the model without the mobile complexity

Moving to briefcase run iOS is where I'll have to build the wheels myself, or use a temporary local HTTP server as a temporary sidecar. The goal here is to validate th entire app building experience without blocking the limited wheels available with iOS.

The Android Problem

Harder still, my personal phone is an Android and that's my real goal. Even with the PILLOW example we gave above, you can see there are ios_13_0_arm64_iphoneos, ios_13_0_arm64_iphonesimulator , ios_13_0_x86_64_iphonesimulator , but no android_* . cibuildwheel 3.1 has added Android support, the platform tags are defined, PyPI accepts them. Android wheel is not supported even by the most actively maintained binary packages in the ecosystem: PILLOW.

What's next for me?

  • I'll be giving the "How Many Spoons Does Your Environment Cost?" presentation at EuroPython 2026 in a few weeks, hopefully with new demos that also break and garner collective audience participation again (it was really fun at PyCon US (slides), I hope you're able to join).
  • I have a talk accepted (and not yet announce, coming soon!) in Raleigh, NC about Data Science Agents with a Star Trek theme
  • Setting up my homelab for personal experimentation on local hardware
  • Extending my PyCon Italia keynote (slides) about small, specialized model experimentation hopefully with benchmarking for consumer hardware -- pinning to my Mac M-series.

RKM + DGW

1000002078 (1)
Me and My Friend Russ now work together!