OpenClaw, Dissected: One Daemon, Many Mouths, and a Folder of Markdown
Why OpenClaw hit 347k GitHub stars in five months — a layer-by-layer walkthrough of the architecture, the markdown-as-memory trick, and the recursion that makes it feel magical.
A few weeks ago I started reading the OpenClaw codebase because my company asked me to give an internal presentation on it. I went in skeptical — every "open-source AI agent" I'd looked at before had been a thin LangChain wrapper with a Discord bot stapled to the side. OpenClaw is different. It's the first one where I came out the other end thinking oh, the shape of this is actually right.
This is the same explainer I gave the team, written down. If you're trying to figure out why OpenClaw hit 347,000 GitHub stars in five months, the short version is: it's not new ML, it's good plumbing. The long version is the rest of this post.
What OpenClaw actually is
OpenClaw is a self-hosted runtime for a personal AI agent. It runs as a long-lived daemon on your machine. Messaging platforms — WhatsApp, Telegram, Discord, iMessage, Slack, and twenty-odd others — connect into it. It calls an LLM (Claude, GPT, or a local model), executes tools on your behalf, remembers things across conversations, and can extend itself with new "skills" that are literally just folders of markdown.
The closest comparison isn't ChatGPT. It's a Discord bot that thinks, has a memory, and can answer your WhatsApp.
Why this got built — the before-picture
The thing every existing tool got almost right but not quite:
| Existing system | What it does well | The hole |
|---|---|---|
| Claude / ChatGPT (browser) | Smart conversation | Trapped in a tab. No persistent memory. Can't touch your files. Can't message you first. |
| Claude Code / Codex CLI | Agentic; can run shell | Lives in your terminal. You go to it. |
| Siri / Alexa | Always-on, in messaging surfaces | Dumb and locked down. |
| Zapier / n8n | Integrates services | Flowcharts, not conversation. |
| OpenClaw | All four glued together; self-hosted | The integration is the product. |
The LLMs were already smart enough. Nobody had assembled the missing pieces into one runtime a single person could self-host. That's the entire pitch.
The mental model: four layers
OpenClaw's architecture has four conceptual layers, each only knowing about the layer directly below it. This is the diagram I draw on whiteboards before saying anything else:
┌─────────────────────────────────────────────────┐
│ 01 · CHANNEL WhatsApp · Telegram · Discord │ ← adapters
├─────────────────────────────────────────────────┤
│ 02 · GATEWAY 127.0.0.1:18789 (daemon) │ ← brain
├─────────────────────────────────────────────────┤
│ 03 · NODE shell · browser · fs · camera │ ← hands
├─────────────────────────────────────────────────┤
│ 04 · SKILL folders of markdown + scripts │ ← extension
└─────────────────────────────────────────────────┘
That's the whole architecture in one diagram. Everything else in this post is just zooming into one of those four layers.
01 · Channel — the adapters
Every messaging platform plugs in as a thin adapter. WhatsApp via Baileys, Telegram via grammY, Discord via discord.js, iMessage via an AppleScript bridge, plus a built-in WebChat. Adapters never see agent state — they only marshal {peer, text, attachments, ts} in and out.
The transferable lesson here: channels are adapters, not the core. Build the brain assuming a generic message bus, then bolt platforms onto it. WhatsApp is the same as Telegram is the same as Discord at the runtime layer. This is why OpenClaw can list 25+ supported channels — adding a new one is a small, isolated piece of work.
02 · Gateway — the daemon
This is the centre of gravity. A single Node.js process bound to 127.0.0.1:18789 that owns:
- Auth (pairing tokens for new clients/devices)
- Session routing (mapping
(peer, channel)→ agent) - Tool dispatch (with idempotency keys so retries don't double-send)
- Cron / heartbeats (proactive: the agent can message you first)
- Model router (Anthropic, OpenAI, or a local model — same interface)
Every other piece — the CLI, the menu-bar app, the mobile companion, the dashboard — is a client over WebSocket. Without this single design choice, you'd be running 12 bot processes and hand-syncing state. With it, you have one source of truth.
This shape isn't novel. It's the same shape as Docker (dockerd + many docker clients), the Language Server Protocol (pyright-langserver + many editors), Tailscale (tailscaled + many UIs). Whenever you have many surfaces and one source of truth, this pattern is almost always right.
03 · Node — the hands and sensors
Capabilities live on "nodes": shell, browser, filesystem, camera, location, screen. Each node connects with role:node and registers its capabilities. The Gateway fans tasks out to whichever node owns the right command.
Practically: your Mac runs a menu-bar app that exposes shell + browser + screen. Your iPhone runs a companion app that exposes camera + location. A headless Linux box exposes long-running browser sessions. The agent doesn't care which physical device executes — it asks the Gateway for a capability, the Gateway finds a node.
Capability mismatches are caught before the model even sees an error. That's a quietly important detail. LLMs are notoriously bad at handling rich error states; intercepting "you don't have a camera here" at the dispatch layer keeps the conversation coherent.
04 · Skill — folders of markdown
This is the layer that makes everyone go "wait, that's it?"
A skill is a folder. Inside it: a SKILL.md file with YAML frontmatter and prose, plus arbitrary scripts. That's the entire plugin system.
~/.openclaw/skills/
├── web-research/
│ ├── SKILL.md ← "use when user pastes URL"
│ ├── scripts/fetch.sh
│ └── scripts/extract.py
├── calendar/
│ ├── SKILL.md
│ └── scripts/ical.ts
└── morning-briefing/ ← agent-authored
├── SKILL.md
├── scripts/digest.ts
└── cron.json ← 07:30 daily
The frontmatter tells the model when to use the skill. The body is a prompt fragment. The scripts are the implementation. There's no DSL, no plugin manifest, no registration ceremony.
The model reads the doc and decides when to use it. That's it. Drop-in extensibility.
The under-rated design choice: markdown as memory
OpenClaw's "personality" is markdown files on disk. There are roughly four of them:
SOUL.md— voice, refusals, what kind of agent this isUSER.md— stable facts about you (name, timezone, work, preferences)MEMORY.md— episodic log of what happened in past sessionsskills/*/SKILL.md— the plugin layer
At session start, the runtime reads all of them. Frontmatter becomes structured prompt context; bodies become free text. That is the entire "session loader."
"Stuffing personality and memory into a database with a custom schema would be the obvious move. Markdown is the under-rated design choice."
Why this is good:
- Human-readable AND model-readable. You can
cat MEMORY.mdand understand it. The LLM can too. - Editable by you, editable by the agent. Both directions.
- Version-controllable. Git is the audit log.
- Hot-reloadable. Dropping a new skill folder is a deploy.
- No schema migrations. Ever.
At session end, the agent appends to MEMORY.md. A background skill compacts old entries — weeks become paragraphs, quarters get archived to MEMORY/2026-Q1.md. There's no vector DB, no embeddings index, no schema. Just text.
This is the kind of design choice that looks lazy until you've maintained the alternative.
The recursion that makes it feel magical
If the runtime can read markdown skills, and the agent can write markdown, you have a self-extending system for free. This is the part that made me sit up.
A real exchange (paraphrased from the docs):
USER > Make me a morning briefing every day at 7:30
with weather, calendar, and unread important emails.
AGENT > Drafting a skill: morning-briefing.
AGENT > [writes ~/.openclaw/skills/morning-briefing/]
AGENT > [registers cron 07:30 * * * via Gateway]
AGENT > Hot-reloaded. First run tomorrow at 07:30.
The first generation of plugins required a developer, a manifest, and a redeploy. Skills are folders the agent itself can author at runtime. The skill set grows whenever you ask for something new — without a code change.
This is the architectural decision that earns the "iPhone moment" comparisons. Most products grow because a team ships features. OpenClaw grows because its users talk to it.
End-to-end: a WhatsApp message, traced
Putting it all together, here's what happens when you message the agent on WhatsApp:
01 Inbound message WhatsApp
02 Adapter → Gateway WS ws://127.0.0.1:18789
03 Session resolution load SOUL/USER/MEMORY
04 LLM call Anthropic / OpenAI / local
05 Tool call dispatched browser.fetch + idempotency key
06 Node executes Playwright on Mac node
07 Result → continue loop tool_result back into context
08 Reply egress Baileys.sendMessage()
Eight steps, three processes, zero magic. The whole loop typically completes in 1–4 seconds depending on how chatty the model is and whether the tool call hits the network.
Six lessons worth stealing even if you never use OpenClaw
Even if you never deploy OpenClaw, these architectural decisions generalise:
- Daemon + clients over WebSocket. Many surfaces, one source of truth.
- Markdown as config and memory. Treat it as a first-class storage tier, not a workaround for not having a real DB.
- Skills as folders. No plugin DSL, just
SKILL.md+ scripts. The model reads the doc and decides when to use it. - Self-extension. If your runtime reads markdown and your agent writes markdown, you get a self-extending system for free.
- Channels as adapters. Build the brain assuming a generic message bus. Bolt platforms onto it later.
- Idempotency keys on side-effecting calls. Required so retries on reconnect don't send the email three times. Unsexy but critical.
The honest part the marketing won't tell you
I want to be straight here, because I gave this same caveat slide internally:
Not a SaaS blueprint. OpenClaw is single-user. One Gateway, one human. The architecture isn't multi-tenant. Don't read it as a template for a hosted product.
The security surface is huge. Filesystem + shell + browser + 25 messaging channels + a community skill marketplace. If any one skill is malicious, it inherits all of those. Hundreds of malicious skills have been found on the community marketplace. The open marketplace is both the magic and the risk. The April 2026 release added manifest-driven plugin security partly in response to this.
Orchestration, not ML. The intelligence is rented. Anthropic / OpenAI / a local model do the thinking. OpenClaw's IP is the plumbing. That's a feature for learners — the architecture is fully legible — but be clear-eyed about where the value sits.
Pairing tokens are the whole game. New devices require explicit approval. A patched one-click RCE earlier this year leaked Gateway auth tokens via WebSocket — useful cautionary tale for why this layer matters and why you should keep up with releases.
Why this is the post I wanted to read first
When I started, I had to piece this picture together from a README, a docs site, three blog posts, and reading the gateway source. The thing I kept wanting was a single explainer that said here are the four layers, here's the message flow, here's why markdown, and here's what to be careful about.
That's this post. If your company is evaluating OpenClaw, or if you're just trying to understand why everyone on Hacker News is talking about it, the core idea isn't complicated:
One daemon. Many mouths. A folder of markdown. The integration is the product.
The rest is execution.
Sources
- OpenClaw on GitHub — openclaw/openclaw
- OpenClaw documentation — docs.openclaw.ai
- What is OpenClaw? — DigitalOcean
- OpenClaw: The Rise of an Open-Source AI Agent Framework — clawbot.blog
- OpenClaw Workspace Files Explained — Roberto Capodieci, Medium
- Build Domain-Specific AI Agents with OpenClaw — Codebridge
- What Is OpenClaw? Complete Guide — Milvus Blog
- The Complete Honest Guide to OpenClaw — Data Science Collective
- Build a Secure Always-On Local AI Agent with OpenClaw — NVIDIA