Features7 min read

Agent Detection & Trust

Understand how MergeShield detects AI coding agents, builds trust scores over time, and how to harden your setup for maximum governance coverage.

How agent trust works

Interactive walkthrough · 5 steps · 35 seconds

1Step 1 of 5

Agent Detection

MergeShield automatically identifies which AI agent authored a PR by matching author, branch, and commit patterns.

11+ agents built-in

Claude Code, Copilot, Cursor, Devin, Codex, Jules, Amazon Q, Aider, Dependabot, Renovate, Sweep.

What are AI Agents?

In the context of software development, AI agents are tools that autonomously write, modify, and submit code. These range from inline assistants like GitHub Copilot and Cursor that help individual developers, to fully autonomous agents like Claude Code and Devin that can independently plan, implement, and submit entire pull requests.

The key distinction for governance is that agent-authored PRs may have different risk profiles than human-authored ones. An agent might produce syntactically correct code that misses architectural context, or it might make changes faster than a team can review them.

MergeShield treats agent identity as a first-class concept so teams can apply different governance rules based on who (or what) authored the change. MergeShield ships with detection patterns for 11 popular AI coding agents:

  • Claude Code — Anthropic's autonomous coding agent
  • GitHub Copilot — GitHub's AI pair programmer
  • Cursor — AI-first code editor
  • Devin — Cognition's autonomous software engineer
  • OpenAI Codex — OpenAI's autonomous coding agent
  • Google Jules — Google's AI coding agent powered by Gemini
  • Amazon Q Developer — AWS's AI coding agent
  • Aider — Open-source AI pair programming CLI
  • Dependabot — GitHub's dependency update bot
  • Renovate — Automated dependency updates
  • Sweep — AI-powered code maintenance (legacy)

How Detection Works

Agent detection runs as part of the analysis pipeline, before the AI risk scoring begins. The detection engine examines four signals from the pull request:

  1. 1PR author — The GitHub username (e.g., dependabot[bot])
  2. 2Branch name — Pattern matching (e.g., dependabot/*)
  3. 3Commit messages — Keyword detection in commit text
  4. 4Git trailers — Metadata lines at the end of commits (e.g., Co-Authored-By: Claude)

Each agent has a set of detection patterns stored in the database. The detection engine tests each agent's patterns against the PR metadata and returns the first match.

When an agent is detected, the analysis records the agent identity including its agentId, agentSlug, and authorType. The author type is set to "agent" for AI coding tools and "bot" for automated services like Dependabot. This information is passed to the AI scoring pipeline as context and is used by trust scoring, auto-merge rules, and approval workflows.

Tip

You can add custom agents for internal bots and tools from the Agents page in the dashboard. Define detection patterns for author usernames, branch name patterns, and commit message patterns to match your internal tooling.

Trust Score Algorithm

Trust scores are calculated per-agent per-organization, meaning the same agent can have different trust levels in different organizations based on its track record. The score ranges from 0 to 100 and is recalculated after every analysis.

The algorithm combines four components:

  1. 1Base Scoremin(totalPRs × 1, 25). An agent needs at least 25 PRs before reaching the maximum base score.
  2. 2History Bonus(lowRiskPRs / totalPRs) × 45 × min(totalPRs / 5, 1). Rewards agents whose PRs are consistently low-risk, with a scaling factor to prevent premature high scores from small sample sizes.
  3. 3Risk Penalty(highRiskPRs / totalPRs) × 50. Subtracts points for high-risk PRs.
  4. 4Decay Penalty — Up to 20 points for inactivity exceeding 30 days. Ensures inactive agents don't retain artificially high scores.

The full formula is:

trustScore = baseScore + historyBonus - riskPenalty - decayPenalty (clamped to 0–100)

Trust Levels

Trust scores map to five discrete levels:

  • Untrusted (score 0) — Starting state for newly detected agents with no PR history
  • Low (1–34) — Early-stage agents with limited history or mixed results
  • Medium (35–64) — Agents with a reasonable track record of mostly low-risk contributions
  • High (65+) — Agents with a strong history of consistently low-risk PRs and recent activity
  • Verified (manual override) — Set by an organization admin, useful for thoroughly vetted agents

The Verified level can only be set by an org admin as a manual override. This is useful for agents your team has thoroughly vetted and wants to trust regardless of their automated score. The override can be removed at any time.

Trust levels integrate directly with auto-merge rules. You can set a minimum trust level required for auto-merge eligibility — for example, requiring medium trust means only agents with a proven track record can have their PRs auto-merged.

Claude Code

AI Coding Agent

High Trust
Trust Score72/100
History:
47 PRs analyzed85% low riskAvg score 18

Custom Agents

While MergeShield includes 8 builtin agent profiles, many teams use internal bots, custom scripts, or less common AI tools that aren't detected by default. The Agents page in the dashboard lets you create custom agent definitions.

To create a custom agent, click Create Agent and fill in:

  • Name — Display name for the agent
  • Slug — Unique identifier (e.g., internal-deploy-bot)
  • Description (optional) — What this agent does
  • Category — AI assistant, autonomous agent, code review bot, dependency manager, or custom
  • Detection patterns — Author usernames, branch name patterns, and commit message patterns

Once created, the custom agent is immediately active for your organization. Any future PRs matching the detection patterns will be identified as that agent, and trust scoring will begin accumulating. You can edit or remove custom agents at any time.

Detection Limitations

Agent detection is pattern-based and cooperative — it relies on agents identifying themselves through their platform integrations. This is an industry-wide reality, not specific to MergeShield. Here are the known limitations:

Unregistered agents — A new AI tool that is not in the built-in list and has not been registered as a custom agent will default to authorType: human. This is the most common gap.

Omitted identifiers — An agent that does not include Co-Authored-By trailers, does not use a recognizable branch prefix, and operates under a human GitHub account will not be detected.

Human-AI collaboration — When a human developer uses AI assistance locally (e.g., Copilot autocomplete, Cursor tab-complete), the PR is authored by the human and correctly classified as human. This is a genuine gray area — the code is AI-assisted but human-authored.

No behavioral analysis — MergeShield does not use heuristics like code style fingerprinting, PR velocity, or file change patterns to infer whether a PR was AI-generated. Detection is purely based on explicit identity signals.

Importantly, even when detection fails, the AI risk analysis still runs on every PR. A dangerous change will be flagged regardless of whether the author is classified as human or agent.

Warning

No governance tool can cryptographically prove whether code was written by a human or AI. MergeShield's detection is strong for cooperative agents (which covers the vast majority of real-world usage), but should be complemented with risk-based rules that apply to all PRs.

Hardening Your Setup

To maximize governance coverage regardless of agent detection accuracy, follow these recommendations:

1. Enable approval workflows for all authors — Set Require for Human PRs to ON in your approval workflow settings. This ensures that high-risk PRs require human review whether the author is detected as an agent or not. This is now the default for new repositories.

2. Set a reasonable auto-merge risk threshold — Your maxRiskScore on auto-merge rules is a risk-based gate, not an identity-based gate. A well-calibrated threshold (e.g., 25-30) protects against dangerous changes from any source.

3. Register all AI tools your team uses — Visit the Agents page and register custom agents for any tools not in the built-in list. This takes 30 seconds and gives you full trust tracking.

4. Monitor the Team Analytics tab — The Team Analytics page shows PRs grouped by author and author type. If you see unexpected patterns from supposedly human authors, investigate and register any missing agents.

The key insight: MergeShield's risk analysis evaluates the code itself, not the author. The risk score, file-level attribution, and reasoning log work identically for human and agent PRs. Agent detection adds a governance *layer* (trust scores, agent-specific rules), but the core risk analysis is your strongest and most reliable defense.

Tip

Think of agent detection as a bonus governance signal, not a security boundary. Build your rules around risk scores and approval thresholds — these work for every PR regardless of who (or what) authored it.