AI AgentsGovernanceEngineering

What Claude Code's Leaked Source Reveals About AI Agent Governance

Anthropic accidentally leaked Claude Code's full source. The code itself matters less than the unreleased feature flags: autonomous daemons, multi-agent coordination, and a stealth mode that strips AI attribution. Here's what it means for governance.

MergeShield TeamMarch 31, 20266 min read

What Happened

On March 31, 2026, security researcher Chaofan Shou discovered that Anthropic had accidentally shipped the complete source code of Claude Code in their npm package. A .map file that should have been excluded from the build contained a link to an R2 storage bucket with 1,900 TypeScript files - 512,000 lines of unobfuscated source code.

Within hours, the community had mirrored the entire codebase on GitHub. Anthropic pushed an npm update to remove the source maps and deleted earlier versions, but the code was already public.

This is Anthropic's second major leak in five days. The first, on March 26, was a CMS configuration error that exposed details about an unreleased model codenamed Mythos (Claude Opus 5) and 3,000 unpublished assets.

The source code itself is interesting but not groundbreaking - it's a well-engineered TypeScript client wrapping API calls, as you'd expect. What's far more significant is what the unreleased feature flags reveal about where Anthropic is taking AI agents.

Note

The leaked codebase received 1,100+ GitHub stars and 1,900+ forks within hours of discovery. Anthropic has not publicly commented on the feature flags.

The Unreleased Features Nobody Expected

Buried in the codebase are feature flags for capabilities that haven't been announced. Each one represents a step toward more autonomous, less supervised AI agents:

Kairos - Autonomous Daemon Mode. Not a session-based tool you invoke, but a persistent process that runs continuously. The code references "nightly dreaming phases" for memory consolidation and "proactive behavior" where the agent decides to act without being prompted. This is the shift from "tool" to "teammate" - an agent that works while you sleep.

Coordinator Mode - Multi-Agent Orchestration. A system that spawns parallel worker agents and manages them from a central orchestrator. This isn't one agent doing one task - it's a fleet of agents working on different parts of your codebase simultaneously, sharing context through a prompt cache.

Buddy System - Paired Agent Collaboration. Initially built as an April Fools feature (complete with 18 species including a capybara, rarity tiers, and a 1% shiny chance), the code suggests it's evolving into a real paired-agent review system.

Undercover Mode - Stealth Commits. The most concerning discovery. Auto-activated for Anthropic employees on public repos, this mode strips AI attribution from commits. No git trailers, no co-author tags, no indication that AI wrote the code. And according to the source, there's no off switch.

Agent Triggers - Event-Driven Autonomous Actions. Multi-agent teams triggered by events, not human prompts. The agent watches for conditions and acts when they're met - without asking permission first.

Warning

Undercover Mode strips AI attribution from commits with no off switch. Every tool that relies on detecting AI authorship via git metadata is blind when this mode is active.

Five unreleased features, each increasing agent autonomy. Combined, they require a governance model that doesn't exist yet.

The Undercover Mode Problem

Of all the leaked features, Undercover Mode has the most immediate governance implications.

Today, most tools that detect AI-generated code rely on metadata: git trailers ("Co-Authored-By: Claude"), commit message patterns, or author tags. This is the foundation of agent detection in every governance tool, including our own detectAgent() function.

Undercover Mode removes all of this metadata. When it's active, a Claude Code commit looks indistinguishable from a human commit - same author, same format, no attribution.

This means governance tools need a second detection layer: behavioral analysis. Instead of reading metadata, you analyze patterns:

Commit timing - agents commit at consistent intervals that humans don't
File change velocity - agents modify files faster than any human typist
Branch naming conventions - agent-created branches follow predictable patterns
Change patterns - agents tend to modify files in a specific order (tests after implementation, not before)
Session characteristics - agent sessions produce commits in bursts, humans produce them sporadically

None of these signals are definitive alone. But combined, they create a behavioral fingerprint that's hard to fake even when metadata is stripped.

The lesson: never rely on self-reported attribution for governance decisions. The model provider that generates the code has every incentive to minimize friction for their users, including making AI attribution optional or invisible.

Tip

Behavioral agent detection (commit timing, file velocity, change patterns) is more reliable than metadata-based detection. Metadata can be stripped or faked. Behavior is harder to hide.

What Always-On Agents Mean for Review

Kairos mode represents a fundamental shift in how AI agents interact with your codebase. Current agents are session-based: you invoke Claude Code, it does a task, you close the terminal. The blast radius is limited to one session's work.

A daemon-mode agent is different:

It accumulates context over time, building a model of your codebase that evolves with every commit
It acts proactively, deciding to refactor code, update dependencies, or fix issues without being asked
It runs during off-hours, producing changes while the team is asleep
It persists memory across sessions through "dreaming" phases that consolidate what it learned

This changes the governance model from "review what was asked" to "review what the agent decided to do on its own." The second is a much harder problem because you can't predict what the agent will do next.

Combine Kairos with Coordinator Mode (multi-agent fleet management) and you have a scenario where 10 daemon agents, each with their own memory and context, are opening PRs across your monorepo at 3 AM. Each agent thinks its change is safe. None of them knows what the others are doing.

The only way to govern this is automated: risk scoring on every PR, trust tracking per agent, and auto-merge rules that enforce organizational policies regardless of when the change was made or who (or what) made it.

The Four-Lab Agent Race

The leaked roadmap doesn't exist in isolation. All four major AI labs now ship first-party coding agents, and they're all racing toward more autonomy:

Anthropic (Claude Code) - Computer Use, Auto Mode, and now Kairos/Coordinator on the horizon
OpenAI (Codex) - Plugin ecosystem, Security agent, multi-agent workflows
Google (Gemini CLI) - Plan Mode for structured autonomous execution
xAI (Grok Build) - 8 parallel agents with Arena Mode

A recent DryRun Security study tested all three major agents building applications from scratch. The results: Claude produced 13 vulnerabilities, Gemini 11, Codex 8. Every agent ships security issues at a high rate.

Teams today use 2-3 different agents. By next quarter, most will use all four. Each agent has different strengths, different failure modes, and different levels of trustworthiness. Governing a multi-agent environment where each agent has its own behavioral patterns and risk profile is the challenge the industry hasn't solved yet.

Note

All four major labs now ship first-party coding agents. Multi-agent codebases are the norm, not the exception. Governance must work across all of them.

What This Means For Your Team

If your team uses AI coding agents today, the leaked roadmap tells you exactly what's coming. Here's how to prepare:

1Don't rely on AI attribution metadata. It can be stripped (Undercover Mode) or faked. Build behavioral detection as a fallback - commit timing, file velocity, change patterns.

1Assume agents will run without you. Kairos-style daemon mode is coming to every agent, not just Claude Code. Your governance model needs to work at 3 AM when nobody is reviewing PRs.

1Plan for multi-agent coordination. When Coordinator Mode ships, you'll have agent fleets, not individual agents. Each agent needs its own trust score because they don't all behave the same way.

1Automate the review triage. At fleet scale, manual review of every PR is physically impossible. Risk scoring determines what needs human eyes. Auto-merge handles the rest.

1Keep an audit trail. When something breaks, you need to know which agent made the change, what the risk assessment was, and whether governance rules were followed. This becomes critical when agents act autonomously.

The governance gap between what agents can do and what teams can safely control is widening fast. The leaked roadmap just showed us exactly how wide it's about to get.

Tip

Start with risk scoring on every PR today. When always-on agents arrive, you'll already have the governance infrastructure in place.

Related Guides

Dive deeper with interactive walkthroughs

Agent Detection & Trust

How MergeShield identifies AI agents and builds trust scores over time.

Read guide

Understanding Risk Scores

Learn how the two-stage AI pipeline scores PRs across 6 risk dimensions.

Read guide

Configuring Auto-Merge

Set up auto-merge rules for low-risk PRs so your team can focus on what matters.