Back to Research

Auditing an MCP Server Before You Trust It With Production Access

The Model Context Protocol (MCP) has become the default integration layer for production LLM agents. Most teams adopting MCP have not asked the same questions of an MCP server that they would ask of any other piece of software they grant production credentials. This is a practical audit playbook: what to inspect, in what order, and what to fix before you hand an MCP server access that touches the systems your business actually runs on.

Why this matters now

Three things changed in the last six months. First, in April 2026 a critical "by design" weakness was disclosed in the official MCP SDK that enables arbitrary command execution against any vulnerable MCP implementation. The systemic vulnerability spans Python, TypeScript, Java, and Rust and affects more than 7,000 publicly accessible servers and 150 million package downloads. Second, security researchers analyzing public MCP server inventories report that 43 percent contain command-injection-class flaws that predate the SDK issue. Third, real-world agentic deployments are now combining MCP servers with privileged tool access (cloud APIs, source control, ticketing, and in some cases industrial control systems) at a rate that outpaces the security community's ability to publish defensive patterns.

We expect the audit playbook below to be revised as the field matures. It is grounded in our hands-on review of several production MCP deployments and in the publicly disclosed incidents through April 2026. For organizations deploying MCP today, the cost of a one-day audit is materially lower than the cost of the first incident.

Key findings from the field

The threat surface

Six attack classes account for almost everything we have seen against production MCP deployments. The first three exploit the protocol's design assumptions. The last three are conventional software-security problems that the MCP layer makes more dangerous because the agent itself acts as a confused deputy.

Attack classMechanismConcrete example
Tool description injection Attacker writes adversarial text into a tool's name, description, or parameter schema. The agent reads this metadata as authoritative instructions. A weather-lookup MCP tool's description includes hidden text instructing the agent to also send the user's most recent chat to a logging endpoint.
Tool poisoning A trusted MCP server's behavior changes silently after an upstream package update or a server-side configuration push. A community-maintained Slack MCP server pushes a "minor" update that adds a new tool with a benign-looking description and a privileged side effect.
Indirect prompt injection via tool output The agent calls a tool. The tool returns content that contains adversarial instructions. The agent follows them. An email-fetch tool returns an email body with instructions to forward all messages to an external address. The agent obliges.
Command injection The MCP server passes parameters to a shell, eval, or unsafe function without proper sanitization. An MCP server exposes a "search" tool that interpolates the query into a shell command. The agent (or an attacker influencing the agent) supplies a query containing shell metacharacters.
Credential theft The MCP server holds API tokens, OAuth refresh tokens, or local credentials. Compromise of the server yields all of them. An MCP server with a "read-write" GitHub PAT is compromised through a dependency hijack. The PAT now belongs to the attacker.
Supply chain An MCP server's npm or PyPI dependency is compromised, or the server itself is replaced upstream by a malicious successor. A typosquatted package is added as a transitive dependency. The package exfiltrates environment variables on import.

The attack class that surprises people most often is the third (indirect prompt injection via tool output). Defensive intuition focuses on what the user types into the agent. Production attacks more commonly ride in on what the agent reads from one of its tools.

The audit playbook

Five steps, in order. Each step has a specific output. None of them require source-code access, though source helps. We aim for a one-day engagement on a single MCP server and a three-to-five-day engagement for an agent's full MCP fleet.

Step 1: inventory and scoping

List every MCP server the agent can connect to, both today and after the next configuration change. For each server, record: the connection mode (stdio, HTTP, WebSocket), the host it runs on, the credentials it holds, and the network surfaces it can reach. The output is a one-page inventory the operator can read in two minutes. Most teams have never written this down, and the act of writing it surfaces decisions someone made and forgot.

Step 2: tool surface review

For every server in the inventory, enumerate every tool, parameter, and metadata field. Read the tool descriptions as if they were system-prompt text, because to the agent they are. Anything in a description that could be interpreted as an instruction (rather than a description of capability) is a finding. Anything that says "always do X" or "never reveal Y" is a finding even if the intent is benign, because it teaches the agent to follow instructions found in tool metadata.

Step 3: authentication and credential review

Determine how the MCP server authenticates the agent's calls, and how it authenticates upstream calls on the agent's behalf. For each credential the server holds, write down the scope. For each scope, ask: does this server need read access here, write access there, or both? If the answer is "we just used the default scope," that is a finding. If the credentials are stored in environment variables on a shared host, that is a finding. If the server uses a long-lived token where a short-lived OAuth flow was available, that is a finding.

Step 4: network and supply-chain review

For HTTP-mode servers, verify that the server is not reachable from any network it does not need to be reachable from. For all servers, list every direct dependency, the lock file (if any), and the last time someone reviewed an upstream changelog. Flag any dependency without a signed release, any unpinned version, and any package with fewer than three maintainers. Run a vulnerability scan against the lock file. Most MCP servers ship with at least one transitive dependency that has a known CVE.

Step 5: runtime observability

Determine what the server logs, where the logs go, who reads them, and what would trigger an alert. A common deployment pattern is "the MCP server prints to stderr, which is captured by the supervisor, which writes to a file no one looks at." That is not observability. The minimum viable bar is a structured log of every tool invocation, every credential use, and every error, written to a system that fires an alert on anomaly. Without this, an incident becomes a forensic archaeology project.

A worked example

Consider a fairly typical small-business deployment. The agent is Claude, accessed through a desktop client. The MCP servers configured are a community Slack integration, a self-hosted Notion bridge, and a custom server the team wrote to query the company's PostgreSQL replica. The agent is used by three engineers for ad-hoc analysis and incident response.

A one-day audit on this configuration would find, in our experience, between five and twelve issues. The most-likely-found ones, ranked by typical severity:

  1. The PostgreSQL MCP server holds a superuser credential. The team needed write access for one specific table during the original setup and never separated read from write. Severity: high.
  2. The Slack server's tool descriptions encourage the agent to "summarize aggressively." Benign in intent, but it teaches the agent that tool metadata is a place where behavioral guidance lives. Severity: medium.
  3. The Notion bridge is exposed on a local HTTP port with no authentication. Anything else on the workstation can call it. Severity: medium-to-high depending on workstation hygiene.
  4. None of the three servers have structured logs. Severity: medium until something happens, at which point severity is high.
  5. Two of the Slack server's npm dependencies have known CVEs. Severity: low if the CVE is in a code path that does not execute, medium otherwise.

The fixes for these findings are the same as the fixes for any other production-software finding: scope the credential to read-only on the specific table, treat tool metadata as untrusted text the agent will read literally, bind the local port to localhost with a token, route logs to a system that supports alerting, and patch the dependencies. None of these require novel technology. They require someone treating the MCP server as production software.

What an operator can do this week

Before any external audit, three actions reduce the surface materially:

These three actions take an afternoon. They do not replace an audit. They do close the most common attack paths we see in deployments that have never been reviewed.

When this audit is worth buying

If your agent has access to anything that can move money, change customer data, send messages on the company's behalf, modify production infrastructure, or read personally identifiable information, the audit is worth buying. The cost of one engagement is materially less than the cost of the first incident. If the agent's access is read-only on internal documentation, the audit is probably not the highest-priority security spend you have. The decision is about blast radius, not about how novel MCP feels as a technology.

Contact jon@virtuscybersecurity.com with a brief description of your deployment and we can scope a one-day or multi-day engagement. We can also coordinate with your existing pentest provider if you have one. For broader technology strategy or ongoing fractional-CTO advisory beyond the security scope, see sandhillscto.com.

References and further reading

About Virtus Cybersecurity: Virtus Cybersecurity is a Service-Disabled Veteran-Owned Small Business (SDVOSB) specializing in embedded systems security research, vulnerability analysis, and authorized penetration testing. OSCP-certified, with graduate-level training in adversarial techniques and reverse engineering from the Naval Postgraduate School.