02 May 202612 min

OpenClaw, Dissected: One Daemon, Many Mouths, and a Folder of Markdown

Why OpenClaw hit 347k GitHub stars in five months — a layer-by-layer walkthrough of the architecture, the markdown-as-memory trick, and the recursion that makes it feel magical.

A few weeks ago I started reading the OpenClaw codebase because my company asked me to give an internal presentation on it. I went in skeptical — every "open-source AI agent" I'd looked at before had been a thin LangChain wrapper with a Discord bot stapled to the side. OpenClaw is different. It's the first one where I came out the other end thinking oh, the shape of this is actually right.

This is the same explainer I gave the team, written down. If you're trying to figure out why OpenClaw hit 347,000 GitHub stars in five months, the short version is: it's not new ML, it's good plumbing. The long version is the rest of this post.

What OpenClaw actually is

OpenClaw is a self-hosted runtime for a personal AI agent. It runs as a long-lived daemon on your machine. Messaging platforms — WhatsApp, Telegram, Discord, iMessage, Slack, and twenty-odd others — connect into it. It calls an LLM (Claude, GPT, or a local model), executes tools on your behalf, remembers things across conversations, and can extend itself with new "skills" that are literally just folders of markdown.

The closest comparison isn't ChatGPT. It's a Discord bot that thinks, has a memory, and can answer your WhatsApp.

Why this got built — the before-picture

The thing every existing tool got almost right but not quite:

Existing system	What it does well	The hole
Claude / ChatGPT (browser)	Smart conversation	Trapped in a tab. No persistent memory. Can't touch your files. Can't message you first.
Claude Code / Codex CLI	Agentic; can run shell	Lives in your terminal. You go to it.
Siri / Alexa	Always-on, in messaging surfaces	Dumb and locked down.
Zapier / n8n	Integrates services	Flowcharts, not conversation.
OpenClaw	All four glued together; self-hosted	The integration is the product.

The LLMs were already smart enough. Nobody had assembled the missing pieces into one runtime a single person could self-host. That's the entire pitch.

The mental model: four layers

OpenClaw's architecture has four conceptual layers, each only knowing about the layer directly below it. This is the diagram I draw on whiteboards before saying anything else:

  ┌─────────────────────────────────────────────────┐
  │  01 · CHANNEL    WhatsApp · Telegram · Discord  │   ← adapters
  ├─────────────────────────────────────────────────┤
  │  02 · GATEWAY    127.0.0.1:18789 (daemon)       │   ← brain
  ├─────────────────────────────────────────────────┤
  │  03 · NODE       shell · browser · fs · camera  │   ← hands
  ├─────────────────────────────────────────────────┤
  │  04 · SKILL      folders of markdown + scripts  │   ← extension
  └─────────────────────────────────────────────────┘

That's the whole architecture in one diagram. Everything else in this post is just zooming into one of those four layers.

01 · Channel — the adapters

Every messaging platform plugs in as a thin adapter. WhatsApp via Baileys, Telegram via grammY, Discord via discord.js, iMessage via an AppleScript bridge, plus a built-in WebChat. Adapters never see agent state — they only marshal {peer, text, attachments, ts} in and out.

The transferable lesson here: channels are adapters, not the core. Build the brain assuming a generic message bus, then bolt platforms onto it. WhatsApp is the same as Telegram is the same as Discord at the runtime layer. This is why OpenClaw can list 25+ supported channels — adding a new one is a small, isolated piece of work.

02 · Gateway — the daemon

This is the centre of gravity. A single Node.js process bound to 127.0.0.1:18789 that owns:

Auth (pairing tokens for new clients/devices)
Session routing (mapping (peer, channel) → agent)
Tool dispatch (with idempotency keys so retries don't double-send)
Cron / heartbeats (proactive: the agent can message you first)
Model router (Anthropic, OpenAI, or a local model — same interface)

Every other piece — the CLI, the menu-bar app, the mobile companion, the dashboard — is a client over WebSocket. Without this single design choice, you'd be running 12 bot processes and hand-syncing state. With it, you have one source of truth.

This shape isn't novel. It's the same shape as Docker (dockerd + many docker clients), the Language Server Protocol (pyright-langserver + many editors), Tailscale (tailscaled + many UIs). Whenever you have many surfaces and one source of truth, this pattern is almost always right.

03 · Node — the hands and sensors

Capabilities live on "nodes": shell, browser, filesystem, camera, location, screen. Each node connects with role:node and registers its capabilities. The Gateway fans tasks out to whichever node owns the right command.

Practically: your Mac runs a menu-bar app that exposes shell + browser + screen. Your iPhone runs a companion app that exposes camera + location. A headless Linux box exposes long-running browser sessions. The agent doesn't care which physical device executes — it asks the Gateway for a capability, the Gateway finds a node.

Capability mismatches are caught before the model even sees an error. That's a quietly important detail. LLMs are notoriously bad at handling rich error states; intercepting "you don't have a camera here" at the dispatch layer keeps the conversation coherent.

04 · Skill — folders of markdown

This is the layer that makes everyone go "wait, that's it?"

A skill is a folder. Inside it: a SKILL.md file with YAML frontmatter and prose, plus arbitrary scripts. That's the entire plugin system.

~/.openclaw/skills/
├── web-research/
│   ├── SKILL.md           ← "use when user pastes URL"
│   ├── scripts/fetch.sh
│   └── scripts/extract.py
├── calendar/
│   ├── SKILL.md
│   └── scripts/ical.ts
└── morning-briefing/      ← agent-authored
    ├── SKILL.md
    ├── scripts/digest.ts
    └── cron.json          ← 07:30 daily

The frontmatter tells the model when to use the skill. The body is a prompt fragment. The scripts are the implementation. There's no DSL, no plugin manifest, no registration ceremony.

The model reads the doc and decides when to use it. That's it. Drop-in extensibility.

The under-rated design choice: markdown as memory

OpenClaw's "personality" is markdown files on disk. There are roughly four of them:

SOUL.md — voice, refusals, what kind of agent this is
USER.md — stable facts about you (name, timezone, work, preferences)
MEMORY.md — episodic log of what happened in past sessions
skills/*/SKILL.md — the plugin layer

At session start, the runtime reads all of them. Frontmatter becomes structured prompt context; bodies become free text. That is the entire "session loader."

"Stuffing personality and memory into a database with a custom schema would be the obvious move. Markdown is the under-rated design choice."

Why this is good:

Human-readable AND model-readable. You can cat MEMORY.md and understand it. The LLM can too.
Editable by you, editable by the agent. Both directions.
Version-controllable. Git is the audit log.
Hot-reloadable. Dropping a new skill folder is a deploy.
No schema migrations. Ever.

At session end, the agent appends to MEMORY.md. A background skill compacts old entries — weeks become paragraphs, quarters get archived to MEMORY/2026-Q1.md. There's no vector DB, no embeddings index, no schema. Just text.

This is the kind of design choice that looks lazy until you've maintained the alternative.

The recursion that makes it feel magical

If the runtime can read markdown skills, and the agent can write markdown, you have a self-extending system for free. This is the part that made me sit up.

A real exchange (paraphrased from the docs):

USER  > Make me a morning briefing every day at 7:30
        with weather, calendar, and unread important emails.

AGENT > Drafting a skill: morning-briefing.
AGENT > [writes ~/.openclaw/skills/morning-briefing/]
AGENT > [registers cron 07:30 * * * via Gateway]
AGENT > Hot-reloaded. First run tomorrow at 07:30.

The first generation of plugins required a developer, a manifest, and a redeploy. Skills are folders the agent itself can author at runtime. The skill set grows whenever you ask for something new — without a code change.

This is the architectural decision that earns the "iPhone moment" comparisons. Most products grow because a team ships features. OpenClaw grows because its users talk to it.

End-to-end: a WhatsApp message, traced

Putting it all together, here's what happens when you message the agent on WhatsApp:

  01  Inbound message            WhatsApp
  02  Adapter → Gateway WS       ws://127.0.0.1:18789
  03  Session resolution         load SOUL/USER/MEMORY
  04  LLM call                   Anthropic / OpenAI / local
  05  Tool call dispatched       browser.fetch + idempotency key
  06  Node executes              Playwright on Mac node
  07  Result → continue loop     tool_result back into context
  08  Reply egress               Baileys.sendMessage()

Eight steps, three processes, zero magic. The whole loop typically completes in 1–4 seconds depending on how chatty the model is and whether the tool call hits the network.

Six lessons worth stealing even if you never use OpenClaw

Even if you never deploy OpenClaw, these architectural decisions generalise:

Daemon + clients over WebSocket. Many surfaces, one source of truth.
Markdown as config and memory. Treat it as a first-class storage tier, not a workaround for not having a real DB.
Skills as folders. No plugin DSL, just SKILL.md + scripts. The model reads the doc and decides when to use it.
Self-extension. If your runtime reads markdown and your agent writes markdown, you get a self-extending system for free.
Channels as adapters. Build the brain assuming a generic message bus. Bolt platforms onto it later.
Idempotency keys on side-effecting calls. Required so retries on reconnect don't send the email three times. Unsexy but critical.

The honest part the marketing won't tell you

I want to be straight here, because I gave this same caveat slide internally:

Not a SaaS blueprint. OpenClaw is single-user. One Gateway, one human. The architecture isn't multi-tenant. Don't read it as a template for a hosted product.

The security surface is huge. Filesystem + shell + browser + 25 messaging channels + a community skill marketplace. If any one skill is malicious, it inherits all of those. Hundreds of malicious skills have been found on the community marketplace. The open marketplace is both the magic and the risk. The April 2026 release added manifest-driven plugin security partly in response to this.

Orchestration, not ML. The intelligence is rented. Anthropic / OpenAI / a local model do the thinking. OpenClaw's IP is the plumbing. That's a feature for learners — the architecture is fully legible — but be clear-eyed about where the value sits.

Pairing tokens are the whole game. New devices require explicit approval. A patched one-click RCE earlier this year leaked Gateway auth tokens via WebSocket — useful cautionary tale for why this layer matters and why you should keep up with releases.

Why this is the post I wanted to read first

When I started, I had to piece this picture together from a README, a docs site, three blog posts, and reading the gateway source. The thing I kept wanting was a single explainer that said here are the four layers, here's the message flow, here's why markdown, and here's what to be careful about.

That's this post. If your company is evaluating OpenClaw, or if you're just trying to understand why everyone on Hacker News is talking about it, the core idea isn't complicated:

One daemon. Many mouths. A folder of markdown. The integration is the product.

The rest is execution.

Sources

Share this post

#openclaw#ai#agents#architecture#self-hosted#saranzafar

See all

Docker for Beginners: Why It Quietly Changes How You Ship Software

02 May 2026

Claude Opus 4.7 and the Rise of Adaptive Thinking Models

30 Apr 2026

Build a Developer Portfolio That Gets You Hired in 2026

28 Apr 2026

Back to blog

02 May 202612 min

OpenClaw, Dissected: One Daemon, Many Mouths, and a Folder of Markdown

Why OpenClaw hit 347k GitHub stars in five months — a layer-by-layer walkthrough of the architecture, the markdown-as-memory trick, and the recursion that makes it feel magical.

What OpenClaw actually is

The closest comparison isn't ChatGPT. It's a Discord bot that thinks, has a memory, and can answer your WhatsApp.

Why this got built — the before-picture

The thing every existing tool got almost right but not quite:

Existing system	What it does well	The hole
Claude / ChatGPT (browser)	Smart conversation	Trapped in a tab. No persistent memory. Can't touch your files. Can't message you first.
Claude Code / Codex CLI	Agentic; can run shell	Lives in your terminal. You go to it.
Siri / Alexa	Always-on, in messaging surfaces	Dumb and locked down.
Zapier / n8n	Integrates services	Flowcharts, not conversation.
OpenClaw	All four glued together; self-hosted	The integration is the product.

The LLMs were already smart enough. Nobody had assembled the missing pieces into one runtime a single person could self-host. That's the entire pitch.

The mental model: four layers

OpenClaw's architecture has four conceptual layers, each only knowing about the layer directly below it. This is the diagram I draw on whiteboards before saying anything else:

  ┌─────────────────────────────────────────────────┐
  │  01 · CHANNEL    WhatsApp · Telegram · Discord  │   ← adapters
  ├─────────────────────────────────────────────────┤
  │  02 · GATEWAY    127.0.0.1:18789 (daemon)       │   ← brain
  ├─────────────────────────────────────────────────┤
  │  03 · NODE       shell · browser · fs · camera  │   ← hands
  ├─────────────────────────────────────────────────┤
  │  04 · SKILL      folders of markdown + scripts  │   ← extension
  └─────────────────────────────────────────────────┘

That's the whole architecture in one diagram. Everything else in this post is just zooming into one of those four layers.

01 · Channel — the adapters

02 · Gateway — the daemon

This is the centre of gravity. A single Node.js process bound to 127.0.0.1:18789 that owns:

Auth (pairing tokens for new clients/devices)
Session routing (mapping (peer, channel) → agent)
Tool dispatch (with idempotency keys so retries don't double-send)
Cron / heartbeats (proactive: the agent can message you first)
Model router (Anthropic, OpenAI, or a local model — same interface)

03 · Node — the hands and sensors

04 · Skill — folders of markdown

This is the layer that makes everyone go "wait, that's it?"

A skill is a folder. Inside it: a SKILL.md file with YAML frontmatter and prose, plus arbitrary scripts. That's the entire plugin system.

~/.openclaw/skills/
├── web-research/
│   ├── SKILL.md           ← "use when user pastes URL"
│   ├── scripts/fetch.sh
│   └── scripts/extract.py
├── calendar/
│   ├── SKILL.md
│   └── scripts/ical.ts
└── morning-briefing/      ← agent-authored
    ├── SKILL.md
    ├── scripts/digest.ts
    └── cron.json          ← 07:30 daily

The frontmatter tells the model when to use the skill. The body is a prompt fragment. The scripts are the implementation. There's no DSL, no plugin manifest, no registration ceremony.

The model reads the doc and decides when to use it. That's it. Drop-in extensibility.

The under-rated design choice: markdown as memory

OpenClaw's "personality" is markdown files on disk. There are roughly four of them:

SOUL.md — voice, refusals, what kind of agent this is
USER.md — stable facts about you (name, timezone, work, preferences)
MEMORY.md — episodic log of what happened in past sessions
skills/*/SKILL.md — the plugin layer

At session start, the runtime reads all of them. Frontmatter becomes structured prompt context; bodies become free text. That is the entire "session loader."

"Stuffing personality and memory into a database with a custom schema would be the obvious move. Markdown is the under-rated design choice."

Why this is good:

Human-readable AND model-readable. You can cat MEMORY.md and understand it. The LLM can too.
Editable by you, editable by the agent. Both directions.
Version-controllable. Git is the audit log.
Hot-reloadable. Dropping a new skill folder is a deploy.
No schema migrations. Ever.

This is the kind of design choice that looks lazy until you've maintained the alternative.

The recursion that makes it feel magical

If the runtime can read markdown skills, and the agent can write markdown, you have a self-extending system for free. This is the part that made me sit up.

A real exchange (paraphrased from the docs):

USER  > Make me a morning briefing every day at 7:30
        with weather, calendar, and unread important emails.

AGENT > Drafting a skill: morning-briefing.
AGENT > [writes ~/.openclaw/skills/morning-briefing/]
AGENT > [registers cron 07:30 * * * via Gateway]
AGENT > Hot-reloaded. First run tomorrow at 07:30.

This is the architectural decision that earns the "iPhone moment" comparisons. Most products grow because a team ships features. OpenClaw grows because its users talk to it.

End-to-end: a WhatsApp message, traced

Putting it all together, here's what happens when you message the agent on WhatsApp:

  01  Inbound message            WhatsApp
  02  Adapter → Gateway WS       ws://127.0.0.1:18789
  03  Session resolution         load SOUL/USER/MEMORY
  04  LLM call                   Anthropic / OpenAI / local
  05  Tool call dispatched       browser.fetch + idempotency key
  06  Node executes              Playwright on Mac node
  07  Result → continue loop     tool_result back into context
  08  Reply egress               Baileys.sendMessage()

Eight steps, three processes, zero magic. The whole loop typically completes in 1–4 seconds depending on how chatty the model is and whether the tool call hits the network.

Six lessons worth stealing even if you never use OpenClaw

Even if you never deploy OpenClaw, these architectural decisions generalise:

Daemon + clients over WebSocket. Many surfaces, one source of truth.
Markdown as config and memory. Treat it as a first-class storage tier, not a workaround for not having a real DB.
Skills as folders. No plugin DSL, just SKILL.md + scripts. The model reads the doc and decides when to use it.
Self-extension. If your runtime reads markdown and your agent writes markdown, you get a self-extending system for free.
Channels as adapters. Build the brain assuming a generic message bus. Bolt platforms onto it later.
Idempotency keys on side-effecting calls. Required so retries on reconnect don't send the email three times. Unsexy but critical.

The honest part the marketing won't tell you

I want to be straight here, because I gave this same caveat slide internally:

Not a SaaS blueprint. OpenClaw is single-user. One Gateway, one human. The architecture isn't multi-tenant. Don't read it as a template for a hosted product.

Why this is the post I wanted to read first

That's this post. If your company is evaluating OpenClaw, or if you're just trying to understand why everyone on Hacker News is talking about it, the core idea isn't complicated:

One daemon. Many mouths. A folder of markdown. The integration is the product.

The rest is execution.

Sources

Share this post

#openclaw#ai#agents#architecture#self-hosted#saranzafar

See all

Docker for Beginners: Why It Quietly Changes How You Ship Software

02 May 2026

Claude Opus 4.7 and the Rise of Adaptive Thinking Models

30 Apr 2026

Build a Developer Portfolio That Gets You Hired in 2026

28 Apr 2026

OpenClaw, Dissected: One Daemon, Many Mouths, and a Folder of Markdown

What OpenClaw actually is

Why this got built — the before-picture

The mental model: four layers

01 · Channel — the adapters

02 · Gateway — the daemon

03 · Node — the hands and sensors

04 · Skill — folders of markdown

The under-rated design choice: markdown as memory

The recursion that makes it feel magical

End-to-end: a WhatsApp message, traced

Six lessons worth stealing even if you never use OpenClaw

The honest part the marketing won't tell you

Why this is the post I wanted to read first

Sources

More posts

OpenClaw, Dissected: One Daemon, Many Mouths, and a Folder of Markdown

What OpenClaw actually is

Why this got built — the before-picture

The mental model: four layers

01 · Channel — the adapters

02 · Gateway — the daemon

03 · Node — the hands and sensors

04 · Skill — folders of markdown

The under-rated design choice: markdown as memory

The recursion that makes it feel magical

End-to-end: a WhatsApp message, traced

Six lessons worth stealing even if you never use OpenClaw

The honest part the marketing won't tell you

Why this is the post I wanted to read first

Sources

More posts