Understand how MergeShield detects AI coding agents, builds trust scores over time, and how to harden your setup for maximum governance coverage.
How agent trust works
Interactive walkthrough · 5 steps · 35 seconds
MergeShield automatically identifies which AI agent authored a PR by matching author, branch, and commit patterns.
11+ agents built-in
Claude Code, Copilot, Cursor, Devin, Codex, Jules, Amazon Q, Aider, Dependabot, Renovate, Sweep.
In the context of software development, AI agents are tools that autonomously write, modify, and submit code. These range from inline assistants like GitHub Copilot and Cursor that help individual developers, to fully autonomous agents like Claude Code and Devin that can independently plan, implement, and submit entire pull requests.
The key distinction for governance is that agent-authored PRs may have different risk profiles than human-authored ones. An agent might produce syntactically correct code that misses architectural context, or it might make changes faster than a team can review them.
MergeShield treats agent identity as a first-class concept so teams can apply different governance rules based on who (or what) authored the change. MergeShield ships with detection patterns for 11 popular AI coding agents:
Agent detection runs as part of the analysis pipeline, before the AI risk scoring begins. The detection engine examines four signals from the pull request:
dependabot[bot])dependabot/*)Co-Authored-By: Claude)Each agent has a set of detection patterns stored in the database. The detection engine tests each agent's patterns against the PR metadata and returns the first match.
When an agent is detected, the analysis records the agent identity including its agentId, agentSlug, and authorType. The author type is set to "agent" for AI coding tools and "bot" for automated services like Dependabot. This information is passed to the AI scoring pipeline as context and is used by trust scoring, auto-merge rules, and approval workflows.
Tip
You can add custom agents for internal bots and tools from the Agents page in the dashboard. Define detection patterns for author usernames, branch name patterns, and commit message patterns to match your internal tooling.
Trust scores are calculated per-agent per-organization, meaning the same agent can have different trust levels in different organizations based on its track record. The score ranges from 0 to 100 and is recalculated after every analysis.
The algorithm combines four components:
min(totalPRs × 1, 25). An agent needs at least 25 PRs before reaching the maximum base score.(lowRiskPRs / totalPRs) × 45 × min(totalPRs / 5, 1). Rewards agents whose PRs are consistently low-risk, with a scaling factor to prevent premature high scores from small sample sizes.(highRiskPRs / totalPRs) × 50. Subtracts points for high-risk PRs.20 points for inactivity exceeding 30 days. Ensures inactive agents don't retain artificially high scores.The full formula is:
trustScore = baseScore + historyBonus - riskPenalty - decayPenalty (clamped to 0–100)
Trust scores map to five discrete levels:
0) — Starting state for newly detected agents with no PR history1–34) — Early-stage agents with limited history or mixed results35–64) — Agents with a reasonable track record of mostly low-risk contributions65+) — Agents with a strong history of consistently low-risk PRs and recent activityThe Verified level can only be set by an org admin as a manual override. This is useful for agents your team has thoroughly vetted and wants to trust regardless of their automated score. The override can be removed at any time.
Trust levels integrate directly with auto-merge rules. You can set a minimum trust level required for auto-merge eligibility — for example, requiring medium trust means only agents with a proven track record can have their PRs auto-merged.
Claude Code
AI Coding Agent
While MergeShield includes 8 builtin agent profiles, many teams use internal bots, custom scripts, or less common AI tools that aren't detected by default. The Agents page in the dashboard lets you create custom agent definitions.
To create a custom agent, click Create Agent and fill in:
internal-deploy-bot)Once created, the custom agent is immediately active for your organization. Any future PRs matching the detection patterns will be identified as that agent, and trust scoring will begin accumulating. You can edit or remove custom agents at any time.
Agent detection is pattern-based and cooperative — it relies on agents identifying themselves through their platform integrations. This is an industry-wide reality, not specific to MergeShield. Here are the known limitations:
Unregistered agents — A new AI tool that is not in the built-in list and has not been registered as a custom agent will default to authorType: human. This is the most common gap.
Omitted identifiers — An agent that does not include Co-Authored-By trailers, does not use a recognizable branch prefix, and operates under a human GitHub account will not be detected.
Human-AI collaboration — When a human developer uses AI assistance locally (e.g., Copilot autocomplete, Cursor tab-complete), the PR is authored by the human and correctly classified as human. This is a genuine gray area — the code is AI-assisted but human-authored.
No behavioral analysis — MergeShield does not use heuristics like code style fingerprinting, PR velocity, or file change patterns to infer whether a PR was AI-generated. Detection is purely based on explicit identity signals.
Importantly, even when detection fails, the AI risk analysis still runs on every PR. A dangerous change will be flagged regardless of whether the author is classified as human or agent.
Warning
No governance tool can cryptographically prove whether code was written by a human or AI. MergeShield's detection is strong for cooperative agents (which covers the vast majority of real-world usage), but should be complemented with risk-based rules that apply to all PRs.
To maximize governance coverage regardless of agent detection accuracy, follow these recommendations:
1. Enable approval workflows for all authors — Set Require for Human PRs to ON in your approval workflow settings. This ensures that high-risk PRs require human review whether the author is detected as an agent or not. This is now the default for new repositories.
2. Set a reasonable auto-merge risk threshold — Your maxRiskScore on auto-merge rules is a risk-based gate, not an identity-based gate. A well-calibrated threshold (e.g., 25-30) protects against dangerous changes from any source.
3. Register all AI tools your team uses — Visit the Agents page and register custom agents for any tools not in the built-in list. This takes 30 seconds and gives you full trust tracking.
4. Monitor the Team Analytics tab — The Team Analytics page shows PRs grouped by author and author type. If you see unexpected patterns from supposedly human authors, investigate and register any missing agents.
The key insight: MergeShield's risk analysis evaluates the code itself, not the author. The risk score, file-level attribution, and reasoning log work identically for human and agent PRs. Agent detection adds a governance *layer* (trust scores, agent-specific rules), but the core risk analysis is your strongest and most reliable defense.
Tip
Think of agent detection as a bonus governance signal, not a security boundary. Build your rules around risk scores and approval thresholds — these work for every PR regardless of who (or what) authored it.