How to Monitor MCP Tool Calls

When an AI agent invokes an MCP tool, it is executing real code against real systems — reading files, querying databases, calling external APIs. Without monitoring, you have no reliable way to know what ran, who authorized it, what it cost, or whether it behaved as intended. Effective MCP tool call monitoring means capturing a durable, attributed record of every invocation as it passes through your governance layer, then making that record queryable for cost analysis, security investigation, and compliance review.

What happens during an MCP tool call

The Model Context Protocol defines a standard interface through which a language model (or the agent framework wrapping it) requests that a named tool be executed. The MCP server receives the request, runs the underlying action, and returns a result that the model incorporates into its next step.

From a monitoring standpoint, each invocation is an event with several attributes worth capturing:

Which agent made the call and under which session or task
Which tool was invoked on which MCP server
The input arguments passed to the tool
The result returned — including whether the call succeeded or failed
Latency from request to response
Cost attributed to the call, if the tool has a token or compute charge
The policy decision that governed whether the call was allowed to proceed

The challenge is that most MCP clients do not emit this record automatically. Without an instrumentation layer between the agent and the MCP server, the call is invisible.

Why a proxy or gateway layer is the right collection point

You could instrument each MCP server individually, or add logging inside every agent's tool-calling code. Both approaches work at small scale and fragment as the number of agents and servers grows. A proxy or gateway layer placed between agents and MCP servers solves this once:

Every call passes through a single choke point, regardless of which agent or which MCP server is involved
The governance decision (allow, deny, step up for approval) is made and recorded in the same place the call is logged
Attribution is consistent — the proxy knows the authenticated identity of the calling agent, not just the raw network request
No changes are required on individual MCP servers

The proxy does not need to modify the call semantics. It connects to the MCP server on behalf of the agent, forwards the tool invocation, receives the result, and records the forensic entry — all before returning the response to the agent.

The forensic record: what to store and how to keep it credible

A useful forensic record for an MCP tool call contains at minimum:

Field	Why it matters
Timestamp	Sequence reconstruction during incidents
Agent identity	Attributing actions to a specific, authenticated principal
Organization / tenant	Multi-tenant isolation of audit data
MCP server ID and name	Identifying which external system was touched
Tool name	The specific capability invoked
Input arguments	What was actually requested (subject to redaction for PII)
Result summary	Whether it succeeded and what was returned
Policy decision	What the governance layer decided before allowing the call
Latency (ms)	Performance baseline and anomaly detection
Cost (USD)	FinOps attribution
Task or session ID	Linking the call to the broader workflow that triggered it

For the record to be credible in a compliance or security context, it should be append-only. Records that can be silently edited or deleted after the fact cannot support a meaningful audit trail. Tamper-evidence approaches — such as signing each row with an asymmetric key and chaining row hashes so that any deletion or modification breaks the chain — make forensic records defensible. The principle is the same one used in certificate transparency logs and secure audit systems generally; see tamper-evident audit logs with cryptographic proofs for a deeper treatment.

Connecting monitoring to policy enforcement

Monitoring and policy enforcement belong at the same layer. If you collect tool-call records after the fact, you can detect policy violations in retrospect. If you enforce policy at the point of invocation, you can prevent violations — and record both the violation attempt and the governance decision in the same event.

The practical model is: before the proxy forwards a tool call to the MCP server, it evaluates the call against the applicable policy. The policy might say:

This agent is not permitted to call tools on this server at all (deny)
This tool exceeds the agent's remaining budget (deny)
This call matches a sensitive pattern and requires human approval before proceeding (step up)
This call is within policy (allow) — log it and proceed

Recording the policy decision alongside the call means your audit log answers not just "what did the agent do" but "what was the agent permitted to do, and was that permission respected."

Per-tool rate limits and spend attribution

Tool calls have cost. Some tools consume LLM tokens in their implementation. Others call paid external APIs. A few are compute-intensive and slow. Monitoring gives you the raw data; rate limits and budgets give you the enforcement.

Per-tool rate limits are more precise than per-agent or per-server limits because costs and risk vary by tool. A read-only search tool has a different risk profile than a tool that executes database writes or sends emails. Setting separate rate limits per tool lets you be permissive where it is safe and conservative where it is not. For a full framework on scoping tool permissions, see scoping MCP tool permissions with least privilege.

Spend attribution requires knowing the cost of each call at the time it is recorded. If your MCP servers report token usage, capture it. If they do not, a configured cost-per-call estimate is better than nothing. Aggregated over time, attributed spend answers questions like "which agent is responsible for 60% of our MCP costs this month" — information that is invisible without per-call records.

PII handling in tool-call arguments

Arguments and results can contain personal data. A tool call to a CRM lookup might pass a customer email address as an argument; the result might contain an address or financial record. Storing this verbatim serves monitoring and forensics but creates a data retention obligation.

The standard patterns are:

Redact before storage: strip or mask known PII patterns (email, phone, national ID formats) before writing the record. You lose some forensic detail but eliminate the retention problem for common cases.
Store with a short TTL and subject-erasure path: keep the full record for a short window (30–90 days), then purge argument and result content while retaining the structural metadata (tool name, agent, timestamp, policy decision, cost). If a data subject requests erasure, the content fields are cleared without losing the audit structure.
Encrypt argument content separately: store arguments encrypted under a key that can be rotated or destroyed, decoupling forensic access from raw data exposure.

None of these is universally correct. The right choice depends on your regulatory context and what you need the forensic record to prove.

Common questions

Do I need to monitor MCP tool calls if my agents are in a sandboxed environment?

Sandboxing limits what a tool call can affect, but it does not tell you what was attempted. Monitoring is what produces the record that lets you verify the sandbox is working, detect calls that approach its limits, and demonstrate to auditors that controls were in place. Sandboxing and monitoring address different parts of the problem and are most effective when combined.

How do I attribute a tool call to the right cost center when multiple teams share agents?

Attribution requires that the call record carries identifiers for the organizational scope (team, project, or tenant) that the agent was acting on behalf of when the call was made. This means the agent must assert its organizational context at authentication time — not as a parameter it passes to the tool, but as a claim on its authenticated session. The monitoring layer then reads that claim from the session and attaches it to the record. Without this, cost aggregation can only be done at the agent level, not at the team or project level within a shared agent deployment.

How long should I retain MCP tool-call records?

This depends on your regulatory environment and your operational needs. For security investigation, 90 days of full records gives you coverage for most incident response timelines. For compliance in regulated industries, one to seven years of structural metadata (without raw argument content) is a common requirement. A tiered retention policy — full detail for 90 days, metadata-only for the required compliance window — balances these needs. Whatever window you choose, the decision should be explicit and enforced automatically rather than left to accumulate indefinitely.

How Praesidia approaches MCP tool-call monitoring

Praesidia sits between your agents and their MCP servers so that every tool invocation is subject to policy evaluation before it proceeds. Each call is assessed against the applicable per-tool policy, and a complete forensic record is written with attribution to the calling agent, the organizational tenant, and the active task or session — covering the governance decision, latency, inferred cost, and the tool arguments and result. This gives you a complete picture of what happened and what was permitted. Per-tool rate limits let you set different thresholds for different risk profiles, and the analytics layer aggregates spend and volume so you can answer FinOps and capacity questions without building custom queries.

For more on the broader governance model that surrounds tool-call monitoring, see the Praesidia documentation. To understand how MCP server registration and governance fit together, see registering and governing MCP servers.