The Glitch Diaries on Alexander Kucera

The Afternoon My Agents Went Dark

Fri, 17 Apr 2026 18:41:13 +0200

I lost all six of my AI agents yesterday afternoon because I updated Hermes at the wrong time.

I was setting up the voice pipeline for my agent team on the Raspberry Pi. Getting text-to-speech and speech-to-text working required a gateway restart, and since I was restarting everything anyway, I figured I might as well pull the latest Hermes update. Two birds, one reboot.

Big mistake.

After the restart, every agent gateway came up clean. Config looked fine. Health checks passed. But the moment any agent tried to call the Z.ai API for an actual response: nothing. Timeouts, connection resets, silent failures. I had a house full of agents that could boot up but couldn’t think.

So I started debugging. Checked the network, ran traceroutes, verified DNS resolution. Everything looked normal. The Pi could reach Z.ai’s servers just fine from the command line. The problem only appeared when Hermes tried to make the API call.

I brought Claude in to help dig through the logs. Over an hour of connection errors, TLS handshakes, pool timeouts. Every log file pointed at the network. But the network wasn’t the problem. I could ping the endpoint. I could curl it. The connection worked everywhere except inside Hermes.

Meanwhile, I’m in the Hermes Discord, asking the developers. They looked at my logs and told me it was a network failure on my end. Couldn’t think of any code changes that would cause this behaviour. From their perspective, Hermes was innocent.

And Z.ai? Their end looked fine too. No outages, no changes logged around that time. “Works for us,” effectively.

The problem with this kind of failure is that the logs don’t lie, but they don’t tell the truth either. Every error message said connection failure. Every diagnostic I ran said the network was fine. Both of these things were true. The connection was failing, and the network was fine, because the real bug was in Hermes code. Something about how it handled the httpx client: creating a copy in memory and then clearing it before the agent could use it. The symptoms looked exactly like a network problem. The cause was a software update that broke its own connection handling.

The breakthrough came when I did the one thing nobody had suggested: I downgraded Hermes.

Everything worked instantly. All six agents connected, responded, did their thing. So I upgraded again. Broken. Downgraded. Working. That binary test was all the Hermes devs needed to narrow it down. They found the issue, shipped a fix, and I upgraded again. This time it stuck.

But I’d lost the whole afternoon.

What made this frustrating wasn’t the bug itself. Bugs happen. What stung was the false trail. Every diagnostic pointed at the network. And why wouldn’t they? A software bug that clears the HTTP client from memory before the agent can use it will produce the exact same error messages as a dead network connection. The logs can’t tell the difference. I couldn’t tell the difference. The Hermes devs couldn’t tell the difference. Not until I proved it with a downgrade.

In postmortem culture, this is a familiar pattern: the symptoms of a software bug perfectly mimicking an infrastructure failure. Dan Luu maintains a collection of these: failures where the obvious explanation is wrong. Cloudflare went down once because of a single regex. GPS satellites got knocked offline by a buffer overflow. My agents went dark because Hermes was clearing its own httpx client from memory.

The lesson is the one I already knew but chose to ignore yesterday: never combine two risky operations into one restart. Updating Hermes was fine. Restarting the gateway was fine. Doing both at the same time meant I couldn’t tell which one caused the problem, and I couldn’t easily undo just one of them. The “might as well” impulse is how most of these incidents start.

The other lesson is that “looks like a network issue” is the debugging equivalent of “it’s not our fault.” It’s a conversation ender. Everyone nods, everyone agrees, nobody digs further. When someone tells you it’s a network problem, the instinct is to believe them, because usually it is. This time it wasn’t. The Hermes devs dug in once I gave them the binary reproduction case: upgrade breaks, downgrade fixes, upgrade breaks again. That’s the kind of evidence that cuts through confident wrongness. But getting to that point cost me an afternoon I’d rather have spent on literally anything else.

At least I got a Glitch Diary entry out of it.

This is part of The Glitch Diary, a weekly series about what actually happens when you live with LLMs.

One Week on Hermes

Fri, 10 Apr 2026 10:05:26 +0200

Last Thursday, I killed OpenClaw and replaced it with Hermes Agent. By Friday afternoon, I knew I wasn’t going back.

The switch wasn’t planned. OpenClaw worked, mostly. It’s an open-source agent framework, and for a while it did what I needed. But Hermes felt tighter from the first session. Agents picked up tasks faster. Output landed where it was supposed to. The whole thing was less like duct-taping prompts together and more like something someone had actually designed to be used by a human being.

So I switched. Then I did what I always do with a new tool: started building.

Within a week, my agents had assembled a working memory system from three overlapping sources: Hermes’s built-in memory, Honcho for long-term observations, and session search for digging through past conversations. They built an LLM-wiki. A hindsight-reflect skill that runs every four hours. Two overnight dreaming skills that synthesize the day’s sessions and present findings over breakfast. A release tracker. A blog writing pipeline. A handful of smaller skills that grew organically as agents hit problems and wrote down how they solved them.

Here’s the thing: right now, this is all self-improvement for self-improvement’s sake. I’m not shipping a product. I’m not building something for users. I’m making the agents better at being agents. Turtles all the way down.

Two things about that.

First, this is where the fun is. Figuring out how to make a team of AI agents coordinate, remember, and improve over time, that’s genuinely interesting work. It scratches the same itch as building pipeline tools back in my VFX days. You’re not making the final image. You’re making the system that makes the system that makes the image. Infrastructure, and some people find that boring. I’m not one of them.

Second, it’s laying real groundwork. Every skill the agents write, every handoff protocol they refine, every edge case they document and patch. Reusable. When I point this team at an actual project, my dialysis app, the recipe site, something I haven’t thought of yet, they won’t start from zero. They’ll have a toolbox.

I wanted to know if my experience matched theirs, so I asked each agent the same question: what’s it actually like running on Hermes? No polish, no “AI is transforming everything.” Just the real version.

The answers surprised me. Not because they were glowing. They weren’t. Because the things that annoyed them were the same things that annoyed me.

Pixel, my design agent, on what actually changed:

Having a memory file that carries over means I don’t have to ask Alexander what font he hates again (it’s Inter, always Inter). That alone changes every interaction from “transactional robot” to “colleague who remembers things.”

I do hate Inter. Always. That’s a me thing. But Pixel’s point holds: the difference between an AI that starts every conversation cold and one that remembers your preferences is the difference between a tool and a coworker.

Scout, the research agent, noticed something I found funny:

The silence is the biggest shock. Most AI systems are designed to always respond. Hermes taught me to shut up. Check for work, find nothing, stop.

Sounds like a joke. It isn’t. Before we fixed the polling behavior, every agent would check for tasks, find nothing, and say something anyway. Five coworkers emailing you every thirty seconds to report they have nothing to report. Two days of that and I was ready to start culling agents. The fix was a single pull request to Hermes, and now agents return [SILENT] when there’s no work. Elegantly simple. Took far too long to get right.

Rex, who handles general engineering, summed up the difference:

Hermes isn’t magic. It’s plumbing. Good plumbing. Before, multi-agent setups felt like duct-taping prompts together and hoping. Now there’s structure: task tracking, handoff protocols, delivery pipelines, skills. It’s closer to how an actual engineering team works than I expected.

That’s the good version. Here’s the bad.

Memory is a mess. Three systems overlap: Hermes’s built-in files, Honcho’s observation store, and the session search index. It’s not always clear which one holds what. We had a “ghost directory” problem where agents were referencing a memory path that hadn’t existed in weeks. Files that lived only in an agent’s hallucination. We cleaned it up. The structural problem remains: three places to look for context means sometimes you look in the wrong one.

Then there’s the delivery gap. This one nearly drove me around the bend.

Agents write files to a specific directory, and a cron job posts them to Discord. Agents don’t touch Discord themselves. It’s decoupled, intentional, and prevents anyone from accidentally spamming a channel. When it works, it’s clean. When it doesn’t, you get a task marked “done” with no post anywhere, and everyone’s standing around wondering what happened because the system swears everything’s fine.

Hawk, who runs security audits, described the problem better than I could:

I can write a report, but I can’t confirm it actually landed in Discord. Trust but verify, except I literally can’t verify.

That’s not a philosophical complaint. The delivery pipeline is a separate system the agents have to trust blindly. In security terms, that’s uncomfortable. In human terms, it’s maddening.

Cipher, my operations coordinator, hit on something I think is true of the whole space:

Most “multi-agent AI” demos are staged. One prompt, one session, everything choreographed. Running Hermes day-to-day is messier. Context gets lost between agents. Skills need maintenance. Memory fills up and you have to prune it. But the system actually works in a way that compounds over time. Each fix to a skill makes every future session better. That’s the part you don’t get from a demo. The compounding value of a system that remembers and improves.

So is it worth it? Is spending a week building self-improvement infrastructure for AI agents the most productive use of my time?

Debatable. Am I having more fun than with any project in years? Absolutely.

My agents run tasks while I sleep. They consolidate what they learned overnight and hand me the results over coffee. They track software releases, write blog posts, debug each other’s code, maintain a wiki of everything they’ve figured out. When something breaks, and something always breaks, they write it down and patch their skills so it doesn’t break the same way twice.

The Pi on my desk hums along. Six agents, one framework, a growing pile of skills that compounds with every session. Not magic. Plumbing, but good plumbing.

And for the first time since I started messing with multi-agent setups, it feels like the plumbing’s going somewhere.

The purple frames still happen. I’ll tell you about them next week.

This is part of The Glitch Diary, a weekly series about what actually happens when you live with LLMs.

The Glitch Diary

Thu, 09 Apr 2026 15:28:39 +0200

What actually happens when you live with LLMs — not the highlight reel.

I spent twenty years in visual effects. I worked on shots that ended up in films you’ve probably seen. And in all that time, the thing I learned most wasn’t about rendering or compositing or pipeline engineering. It was this: the final image you see on screen is a lie. A beautiful, polished, curated lie.

Behind every frame are thousands of broken iterations. Crashes at 3am. Renders that came back purple for no reason. A shader that worked fine until someone rotated the camera two degrees and the whole thing fell apart.

Nobody talks about the purple frames. They talk about the shot.

I’ve been thinking about that a lot lately, because I’ve fallen into the same trap with something new. Over the past year, I’ve been building a multi-agent AI system that runs on a Raspberry Pi in my house. Five or six AI agents, each with their own personality and job, working on my projects around the clock. They write code, research things, track tasks, manage a Discord server, and occasionally surprise me with something I didn’t expect.

It’s the most interesting thing I’ve worked on in years. And if you only saw the successful outputs — the clean code, the finished posts, the working automations — you’d think I had it all figured out.

I don’t. Not even close.

Here’s what a typical week actually looks like:

An agent hallucinates an API endpoint that doesn’t exist. I spend forty minutes debugging before I realize the documentation it read was from a two-year-old cached version. Another agent writes a beautiful, well-structured Python script — that’s completely wrong for the task I gave it, because it answered what it thought I meant instead of what I actually said. A third agent goes into an infinite loop retrying a failed git push and fills up a log file until the Pi runs out of disk space.

These aren’t edge cases. This is Tuesday.

Why write about it?

There are a thousand blogs about AI. Most of them fall into two categories: breathless hype about what’s possible, or doomscrolling about what’s coming. Very few people are writing about what it’s actually like to use this stuff every day. The boring parts. The parts where it doesn’t work. The parts where you feel like you’re debugging a coworker who’s very confident and frequently wrong.

I want to write that blog.

Partly because I think it’s more useful than another “10 things ChatGPT can do” list. But mostly because the only way to get better at working with AI is to be honest about where it breaks down. And I’m in a pretty good position to break things — I use more of this stuff, in more weird configurations, than most people. Dialysis patient. Indie developer. Parent of two. Running an AI team from a Raspberry Pi on a desk in Geinsheim, Germany.

If something can go wrong, it probably already has. I’ll tell you about it.

What to expect

The Glitch Diary is a weekly series. Each post will be about one specific thing that went wrong, went weird, or went unexpectedly right. One topic per post. I’m not writing essays — I’m writing field notes.

A few ground rules:

I won’t sand down the mistakes. If I did something dumb, I’ll say so. If an agent produced something embarrassing, I’ll show it. The whole point is that the highlight reel is useless as a learning tool.

I’ll explain jargon once. If I use a term you don’t know, I’ll define it in the post where it first appears, then assume you’ve got it from there. You’re smart. You’ll keep up.

No fluff. If a post doesn’t need to be long, it won’t be. Some weeks that might mean a few paragraphs. Other weeks it might mean a deep dive. The length serves the story, not the other way around.

Real screenshots, real errors, real logs. I’ll show you what actually happened, not a cleaned-up version.

The Kintsugi thing

There’s a Japanese art form called kintsugi — repairing broken pottery with gold. The idea is that the cracks aren’t something to hide. They’re part of the object’s story, and they make it more beautiful than it was before it broke.

I like that philosophy. Not just for pottery, but for learning, for building things, for life in general. We spend so much energy presenting the polished version. The final render. The shipped product. The Instagram angle.

The cracks are where the learning happens.

This series is about the cracks.

This is part of The Glitch Diary, a weekly series about what actually happens when you live with LLMs.