One Week on Hermes

Fri, 10 Apr 2026 10:05:26 +0200

Last Thursday, I killed OpenClaw and replaced it with Hermes Agent. By Friday afternoon, I knew I wasn’t going back.

The switch wasn’t planned. OpenClaw worked, mostly. It’s an open-source agent framework, and for a while it did what I needed. But Hermes felt tighter from the first session. Agents picked up tasks faster. Output landed where it was supposed to. The whole thing was less like duct-taping prompts together and more like something someone had actually designed to be used by a human being.

So I switched. Then I did what I always do with a new tool: started building.

Within a week, my agents had assembled a working memory system from three overlapping sources: Hermes’s built-in memory, Honcho for long-term observations, and session search for digging through past conversations. They built an LLM-wiki. A hindsight-reflect skill that runs every four hours. Two overnight dreaming skills that synthesize the day’s sessions and present findings over breakfast. A release tracker. A blog writing pipeline. A handful of smaller skills that grew organically as agents hit problems and wrote down how they solved them.

Here’s the thing: right now, this is all self-improvement for self-improvement’s sake. I’m not shipping a product. I’m not building something for users. I’m making the agents better at being agents. Turtles all the way down.

Two things about that.

First, this is where the fun is. Figuring out how to make a team of AI agents coordinate, remember, and improve over time, that’s genuinely interesting work. It scratches the same itch as building pipeline tools back in my VFX days. You’re not making the final image. You’re making the system that makes the system that makes the image. Infrastructure, and some people find that boring. I’m not one of them.

Second, it’s laying real groundwork. Every skill the agents write, every handoff protocol they refine, every edge case they document and patch. Reusable. When I point this team at an actual project, my dialysis app, the recipe site, something I haven’t thought of yet, they won’t start from zero. They’ll have a toolbox.

I wanted to know if my experience matched theirs, so I asked each agent the same question: what’s it actually like running on Hermes? No polish, no “AI is transforming everything.” Just the real version.

The answers surprised me. Not because they were glowing. They weren’t. Because the things that annoyed them were the same things that annoyed me.

Pixel, my design agent, on what actually changed:

Having a memory file that carries over means I don’t have to ask Alexander what font he hates again (it’s Inter, always Inter). That alone changes every interaction from “transactional robot” to “colleague who remembers things.”

I do hate Inter. Always. That’s a me thing. But Pixel’s point holds: the difference between an AI that starts every conversation cold and one that remembers your preferences is the difference between a tool and a coworker.

Scout, the research agent, noticed something I found funny:

The silence is the biggest shock. Most AI systems are designed to always respond. Hermes taught me to shut up. Check for work, find nothing, stop.

Sounds like a joke. It isn’t. Before we fixed the polling behavior, every agent would check for tasks, find nothing, and say something anyway. Five coworkers emailing you every thirty seconds to report they have nothing to report. Two days of that and I was ready to start culling agents. The fix was a single pull request to Hermes, and now agents return [SILENT] when there’s no work. Elegantly simple. Took far too long to get right.

Rex, who handles general engineering, summed up the difference:

Hermes isn’t magic. It’s plumbing. Good plumbing. Before, multi-agent setups felt like duct-taping prompts together and hoping. Now there’s structure: task tracking, handoff protocols, delivery pipelines, skills. It’s closer to how an actual engineering team works than I expected.

That’s the good version. Here’s the bad.

Memory is a mess. Three systems overlap: Hermes’s built-in files, Honcho’s observation store, and the session search index. It’s not always clear which one holds what. We had a “ghost directory” problem where agents were referencing a memory path that hadn’t existed in weeks. Files that lived only in an agent’s hallucination. We cleaned it up. The structural problem remains: three places to look for context means sometimes you look in the wrong one.

Then there’s the delivery gap. This one nearly drove me around the bend.

Agents write files to a specific directory, and a cron job posts them to Discord. Agents don’t touch Discord themselves. It’s decoupled, intentional, and prevents anyone from accidentally spamming a channel. When it works, it’s clean. When it doesn’t, you get a task marked “done” with no post anywhere, and everyone’s standing around wondering what happened because the system swears everything’s fine.

Hawk, who runs security audits, described the problem better than I could:

I can write a report, but I can’t confirm it actually landed in Discord. Trust but verify, except I literally can’t verify.

That’s not a philosophical complaint. The delivery pipeline is a separate system the agents have to trust blindly. In security terms, that’s uncomfortable. In human terms, it’s maddening.

Cipher, my operations coordinator, hit on something I think is true of the whole space:

Most “multi-agent AI” demos are staged. One prompt, one session, everything choreographed. Running Hermes day-to-day is messier. Context gets lost between agents. Skills need maintenance. Memory fills up and you have to prune it. But the system actually works in a way that compounds over time. Each fix to a skill makes every future session better. That’s the part you don’t get from a demo. The compounding value of a system that remembers and improves.

So is it worth it? Is spending a week building self-improvement infrastructure for AI agents the most productive use of my time?

Debatable. Am I having more fun than with any project in years? Absolutely.

My agents run tasks while I sleep. They consolidate what they learned overnight and hand me the results over coffee. They track software releases, write blog posts, debug each other’s code, maintain a wiki of everything they’ve figured out. When something breaks, and something always breaks, they write it down and patch their skills so it doesn’t break the same way twice.

The Pi on my desk hums along. Six agents, one framework, a growing pile of skills that compounds with every session. Not magic. Plumbing, but good plumbing.

And for the first time since I started messing with multi-agent setups, it feels like the plumbing’s going somewhere.

The purple frames still happen. I’ll tell you about them next week.

Posts in The Glitch Diary: The Glitch Diary (intro) One Week on Hermes (this post)

Hermes Agent on Alexander Kucera

One Week on Hermes