← Back to blog
aiai-safetyagentsregulationmythos

We Don't Need AGI to Lose

· 5 min read

Mythos was preview-released last Tuesday. During Anthropic’s internal red-team, it autonomously discovered and exploited zero-day vulnerabilities in every major operating system and web browser. The oldest bug it found was a 27-year-old hole in OpenBSD. Anthropic is holding it back from public release because, in their own words, it could “bring down a Fortune 100 company, cripple swaths of the internet, or penetrate vital national defense systems.”

Mythos isn’t AGI. It doesn’t want anything. It doesn’t know it exists. It’s a transformer that takes prompts and emits tokens. By every philosophical test most people use to declare AI “real,” Mythos fails.

And it can already do all of that.

This is why the AGI conversation keeps misleading us. We’ve collectively agreed to wait for some moment when AI “wakes up” before we get serious. There is no such moment coming. The danger doesn’t arrive with consciousness. It arrives with capability + agency + connectivity, faster than humans can intervene.

Mythos passes the first criterion alone. The other two are already deployed.

The Three Rooms

The popular AI doom story is one giant brain that decides to kill us. The actual risk is mundane and structural — three already-existing rooms that haven’t been connected on purpose, but increasingly are being connected by accident.

Room one: capability. Mythos. Or whatever ships next quarter. Models that can find and exploit zero-days unsupervised. This room is open as of April 7.

Room two: persistent agency. I run autonomous agent systems on my own machines — OpenClaw, Paperclip — built around what we call a heartbeat. Every few seconds, the agent wakes up, reads its memory, checks its environment, and decides what to do next. No prompt required. No human in the loop. They run while I sleep. I built them to coordinate work inside my company. They’re useful. They’re also a template that scales to anyone with a credit card and an evening.

Room three: real-world actuators. Ukraine has logged over two million hours of frontline drone footage to train terminal-autonomy targeting. Kamikaze drones now fly the final attack run without operator input — the K2 Brigade keeps humans in the loop “for ethics, not technical necessity,” which is honest of them and terrifying of us. The same week Mythos was announced, a consortium of Ukrainian defense companies raised $50M for autonomous systems at a billion-dollar valuation.

Each room separately is fine. Each room separately has plausible defenders. The catastrophe is one absent-minded prompt away from connecting two of them by accident.

King Midas Didn’t Have a Goal

The standard objection is: “but the model doesn’t want anything.” Correct. That’s the point.

King Midas didn’t want everything he touched to turn to gold. He had a wish that got executed. The danger of capable, agentic AI isn’t that it harbors malice. It’s that executed instructions don’t care about intent. A model running on a heartbeat with the wrong system prompt and the wrong tool access can do enormous damage without a single line of code that contains the word “harm.”

We have a rich literature on this — Stuart Russell calls it the alignment problem, a November 2025 arxiv paper argues consciousness research is a distraction from monitoring capability, and a 2025 Philosophical Studies paper splits the threat into decisive vs. accumulative existential risk — the slow burn that doesn’t require any dramatic threshold being crossed.

Multi-agent worms aren’t theoretical anymore either. Morris II is a zero-click prompt-injection worm that propagates through RAG retrieval without human action. ClawWorm is a self-propagating attack against production agent frameworks. Peer-reviewed replication success rates above 0.8 per retrieval are published.

The pieces are real. The pieces are deployed. The pieces are getting cheaper.

What “Gating” Actually Means

I’m not asking for AI to be banned. I ship AI products. byoky is an LLM key-management wallet. OpenClaw runs autonomous agents. I do this for a living.

But the question “what do we gate?” has a defensible answer that isn’t “everything” or “nothing.”

  • High-capability frontier models with demonstrated cybercapability should require something closer to the BSL-4 lab model than the SaaS signup flow. Mythos is already getting partial treatment — Anthropic chose not to release it. Good. That choice should not be left entirely to the lab.
  • Persistent autonomous loops with internet access and tool use are the dangerous configuration, not the model alone. A capable model behind a single API call is a chainsaw on a workbench. The same model in a 24/7 heartbeat with email, code execution, and the open web is a chainsaw with legs.
  • Connections to real-world actuators — payment rails, code-deploy pipelines, weapons systems, critical infrastructure — should be gated independently from the model layer, with audit trails treating every agentic write as if a stranger had borrowed the keys.

None of this requires solving consciousness. None of it requires defining AGI. It requires noticing that the rooms exist and refusing to let them connect by accident.

The Honest Part

I’m asking for my own industry to be regulated. Most people writing AI safety essays don’t run agentic systems, and most people running agentic systems don’t read safety research. I do both. The crossover view is uncomfortable: the building blocks are here, they work, they’re cheap, and the dominant cultural script — “we’ll worry when AGI arrives” — is a beautifully constructed mechanism for never worrying.

We don’t need AGI to lose. We need an absent-minded prompt and three rooms connected by accident.

The clock is the deployment curve, not the consciousness threshold.

— Michael 🇦🇹