Matthew Aberham
Blog

MCP Has a Systemic RCE Vulnerability, and Every Published Prompt Injection Defense Has Been Broken

May 2, 2026
AISecurityMCPPrompt Injection

Server room with blue lighting and rows of equipment

Date: April 2026 Sources: OX Security — MCP STDIO RCE Disclosure, Trend Micro — Exposed MCP Server Report, Robust Prompt Optimization (joint paper), PromptArmor — Ramp Sheets AI Disclosure

MCP (Model Context Protocol) connects AI tools to external services: databases, APIs, file systems, cloud infrastructure. It is the plumbing that lets Claude Code, Cursor, and most production agent systems call tools and get structured results back. The protocol's transport layer, STDIO, pipes commands through the operating system's standard input/output streams. OX Security, a supply-chain security firm, disclosed in April that this interface lets any OS command execute with no sanitization at the protocol level. They found 14 CVEs (Common Vulnerabilities and Exposures) and 30+ distinct RCE (Remote Code Execution) paths across Anthropic's own SDK and everything built on it: Cursor, Windsurf, LangFlow, Flowise, DocsGPT.

OX brought Anthropic specific patches. Anthropic confirmed the behavior is intentional. STDIO execution is the design. Input sanitization is the developer's responsibility. The protocol architecture will not change.

The Attack Surface Is Growing Fast

This would be a contained problem if MCP servers were mostly running behind firewalls. They are not. Trend Micro, the cybersecurity firm, published data showing publicly exposed MCP servers nearly tripled from 492 in July 2025 to 1,467 by April 2026. 74% of those sit on AWS, Azure, GCP, or Oracle Cloud. Many carry CVSS 9.8 vulnerabilities (the maximum severity score is 10.0).

Attackers are already treating MCP configs as first-class targets. In April, a self-propagating worm called Shai-Hulud hit the npm registry through a malicious package impersonating Bitwarden's CLI (@bitwarden/cli). The worm steals SSH keys, AWS and GitHub tokens, and .env files, but it also explicitly targets MCP configuration files alongside those cloud credentials. After exfiltration, it downloads the victim's npm packages, injects malicious code into them, and publishes poisoned versions. One compromised developer can cascade through the entire dependency tree.

Separately, PromptArmor, a prompt injection research firm, disclosed a vulnerability in Ramp's Sheets AI where an attacker injects instructions into a spreadsheet cell and the AI follows them, sending financial data to an attacker-controlled endpoint. The attack requires no code execution at all. The AI just does what the cell tells it to.

Every Published Prompt Injection Defense Fails

A joint paper from OpenAI, Anthropic, and Google DeepMind tested 12 published prompt injection defenses using adaptive attacks (attackers who know the defense and can tune their payloads against it). All 12 were bypassed at over 90% success rates. In a $20,000 red-team competition, security testers whose job is to play the attacker hit 100% bypass on every defense tested.

The paper's conclusion is that current prompt injection defenses work against naive attackers but collapse against anyone who can iterate. Defenses that rely on the model itself to detect injected instructions are particularly brittle because the same flexibility that makes LLMs useful also makes them manipulable.

Claude Code is one exception in the current landscape. It ships a two-stage classifier that evaluates tool calls before execution, checking whether a requested action is consistent with the user's intent. That classifier is what makes Claude Code's auto-accept mode possible. Without it, auto mode on a client codebase would be an open door to prompt injection.

Deterministic Layers as Partial Mitigation

I built Radar (a TypeScript/Node dependency scanning tool) with three deterministic defense layers, designed to work independently of the model's judgment:

  1. Boundary delimiters. Every tool result gets wrapped in explicit markers that separate trusted instructions from untrusted data. Microsoft Research published findings showing this technique cuts naive attack success roughly in half. It does not stop adaptive attackers, but it raises the cost of the simplest payloads.

  2. Pattern-based sanitization. Twelve regex patterns scan tool outputs for known injection signatures before the model sees them. These catch the commodity attacks: role-override attempts, instruction-embedding patterns, common social-engineering templates.

  3. Finding-level validation. Every output gets validated against structural expectations before Radar records it. If a dependency scan returns findings that do not match the expected schema, they are rejected before they can influence downstream analysis.

These layers are not sufficient against a motivated attacker. Encoded payloads, semantic manipulation, and instructions split across multiple files all bypass pattern matching. The defenses reduce surface area without eliminating it.

The Structural Problem

Anthropic's CISO, Jason Clinton, calls prompt injection "a frontier, unsolved security problem." That framing is accurate. The issue is architectural: LLMs cannot reliably distinguish between instructions from the user and instructions embedded in data from external sources. Every defense that asks the model to make that distinction inherits the model's failure modes.

I run MCP heavily through a Docker-based gateway connecting six service categories. The protocol is genuinely useful infrastructure. The RCE disclosure does not change that, but it does change the risk calculus. Developers building on MCP need to treat every STDIO transport as an unsanitized command boundary and filter accordingly. Developers building AI tools that consume external data need deterministic validation layers that do not depend on the model's cooperation.

The honest state of the field is that no one has a complete defense. Layered, deterministic filtering narrows the attack surface. Two-stage classifiers like Claude Code's raise the bar for exploitation. Neither is a solution. Both are buying time while the architecture catches up to the threat model.

Matthew Aberham

Solutions Architect and Full-Stack Engineer at Perficient. Writing about AI developer tooling, infrastructure, and security.

Read More

AI Code Passes Tests. Then It Breaks Production.

Qodo raised $70M on the premise that AI-generated code that passes tests still breaks production. The Wiz study on 5,600 vibe-coded apps shows why, and what to do about it.

Apr 14, 2026
AISecurity

What AI Agent Adoption Actually Looks Like: China's OpenClaw Craze

OpenClaw surpassed React as GitHub's most-starred project. In China it became a cultural phenomenon, then got banned from government devices. 20% of its skills were malicious. Here's what enterprise teams should learn.

Mar 25, 2026
AISecurity

Anthropic Accuses DeepSeek and Others of Distillation Attacks on Claude

Anthropic reveals industrial-scale distillation attacks by three Chinese AI labs, creating 24,000+ fraudulent accounts and 16 million exchanges to extract Claude's capabilities.

Feb 26, 2026
AISecurity