Key takeaways
- Turns any old Android phone into an autonomous AI agent via ADB screen reading and interaction — no app APIs needed
- Perception → reasoning → action loop with stuck detection, repetition tracking, drift detection, and vision fallback
- Can delegate tasks to ChatGPT, Gemini, or Google Search apps on the device without API keys — uses apps like a human would
- TypeScript/Bun runtime with accessibility tree parsing and optional screenshot-based vision for webviews and Flutter apps
FAQ
What is droidclaw?
droidclaw is an AI agent that controls Android phones via ADB. You give it a goal in plain English, and it reads the screen, decides what to tap or type, executes via ADB, and repeats until the goal is complete.
How does droidclaw work?
A perception → reasoning → action loop: dump the accessibility tree via ADB, send screen state + goal to an LLM, execute the returned action (tap, type, swipe), and loop until done.
How much does droidclaw cost?
Free and open-source. You need an old Android phone, a computer running Bun, and an LLM API key.
Who competes with droidclaw?
No direct competitors in the personal agent space — most agents use APIs, not screen reading. Conceptually similar to browser automation agents like HappyCapy but for Android devices.
Executive Summary
droidclaw turns old Android phones into autonomous AI agents. Instead of building API integrations, it controls phones the way a human would — reading the screen via ADB accessibility trees, reasoning about what to do with an LLM, and executing taps, types, and swipes. No app APIs, no custom integrations — just install apps and tell the agent what you want done. [1]
| Attribute | Value |
|---|---|
| Author | unitedbyai |
| Language | TypeScript (Bun) |
| License | Open Source |
| GitHub Stars | 931 |
Product Overview
droidclaw's core innovation is using the phone itself as the integration layer. Rather than building API connectors for every service, it reads screens and interacts with any installed app. It can even delegate questions to ChatGPT, Gemini, or Google Search on the device — no API keys needed for those services. [1]
How It Works
- Perceive — Dump accessibility tree via ADB, parse interactive UI elements, diff with previous screen
- Reason — Send screen state + goal + history to LLM, get back think/plan/action
- Act — Execute via ADB (tap, type, swipe), feed result back on next step
- Loop — Repeat until goal is done or step limit reached
Reliability Features
| Feature | Description |
|---|---|
| Stuck Loop Detection | Recovery hints after 3 unchanged screens |
| Repetition Tracking | Sliding window catches retry loops across screen changes |
| Drift Detection | Nudges agent if it spams navigation without interacting |
| Vision Fallback | Screenshots for webviews, Flutter apps, games (empty accessibility trees) |
| Action Feedback | Every action result fed back to LLM for next step |
| Multi-Turn Memory | Conversation history maintained across steps |
Strengths
- Universal integration — Works with any Android app without APIs
- Hardware recycling — Gives old phones a second life as AI agents
- Robust failure handling — Stuck detection, repetition tracking, drift detection, vision fallback
- No API keys for target apps — Uses ChatGPT, Gemini, Google Search as a human would
- Web dashboard — Visual monitoring at app.droidclaw.ai
- Active development — 931 stars, companion APK for device-side setup
Cautions
- Android only — No iOS support (ADB is Android-specific)
- Inherently fragile — Screen-based automation is slower and less reliable than API calls
- Requires ADB setup — USB debugging, computer running Bun
- LLM latency — Each step requires an LLM call; multi-step tasks are slow
- Bun-only — Won't run on Node.js; uses Bun-specific APIs
- Early stage — 931 stars, v0.5.0, rapidly evolving
Pricing & Licensing
| Tier | Price | Includes |
|---|---|---|
| Open Source | Free | Full agent, open license |
Hardware cost: Any old Android phone + computer. Hidden costs: LLM API usage per step.
Competitive Positioning
| Competitor | Differentiation |
|---|---|
| OpenClaw | OpenClaw uses API integrations; droidclaw uses screen reading — no APIs needed |
| HappyCapy | HappyCapy automates browsers; droidclaw automates Android phones |
| Manus | Manus is cloud-managed; droidclaw is self-hosted phone control |
Bottom Line
droidclaw is a genuinely novel approach to personal AI agents: instead of building integrations, use the phone's screen as the universal API. It's slower and more fragile than API-based agents, but it works with any app without any setup. The "turn old phones into agents" pitch is compelling for hardware recycling and for automating apps that have no API.
Recommended for: Tinkerers with spare Android phones who want to automate apps without APIs.
Not recommended for: Production workflows, time-sensitive automation, or users wanting reliable API-based integrations.
Research by Ry Walker Research • methodology