Securing the AI agent supply chain means establishing, before any external component acts on your behalf, three things: where it came from, what a trusted party has verified about its behavior, and how you will know if those properties change. Most organizations deploying AI agents consume models, tools, and scaffolding they did not build — each carries assumptions about behavior and data handling that cannot be inspected by looking at integration code alone. The controls that address this risk are provenance tracking, signed attestations, and runtime verification at dispatch.
This post covers the threat model, the three core controls, and the organizational process that makes them effective. For a broader view of how agents authenticate and earn trust, see AI agent identity and credentials and trust scores and attestations.
What is the AI agent supply chain?
When your organization deploys AI agents, you rarely build every component yourself. You may consume pre-trained models from a model provider, integrate third-party tools via Model Context Protocol (MCP) servers, download agent scaffolding from an open-source registry, or license purpose-built agents from a vendor. Each external component enters your environment carrying assumptions about behavior, data handling, and intent that you cannot directly inspect. That chain of external dependencies — from model weights to tooling to agent runtime — is your AI agent supply chain.
Why agents amplify supply chain risk
Software supply chain attacks are not new. Adversaries have demonstrated repeatedly that compromising an upstream package is an efficient path to reaching many downstream consumers at once. AI agents inherit this risk and introduce several amplifying factors.
First, agents are action-capable. A compromised library leaks data passively when a server runs. A compromised agent can actively exfiltrate documents, escalate permissions, or instruct other agents in an A2A chain to do things the original operator never intended. The blast radius of a supply-chain compromise in an agentic system is substantially larger.
Second, agent behavior is harder to pin than a binary artifact. You can hash an executable and verify it has not changed. An agent's behavior emerges from a model, a system prompt, tool access, and runtime context — any of which can shift between deployments without an obvious artifact change. A model provider that updates weights quietly can alter behavior without touching integration code at all.
Third, multi-agent pipelines compound trust. When agents call other agents, a single compromised node can influence the outputs of every downstream participant. Without explicit, cryptographically grounded trust chains, a pipeline is only as trustworthy as its least-verified member.
Provenance, attestation, and verification
Securing an external agent or tool rests on three controls working together.
Provenance is the record of where a component came from and how it arrived. For a software package this means a signed manifest: who published it, when, and with which key. For an AI agent it means knowing which model version was used, which system prompt was active at deployment, which tools were granted, and who approved the registration. Provenance is the factual history of the component.
Attestation is a signed statement from a party you trust, asserting a property of the component. That party might be a certification body that ran behavioral testing, a security auditor who reviewed the agent's tool scopes, or your own internal approval authority. Attestations are not self-signed; they carry the signature of an external authority whose public key you have already enrolled. The value of an attestation is exactly equal to the trustworthiness of the signer, which is why the allow-list of signing authorities is itself a security boundary.
Verification is the runtime act of checking both. Before dispatching a task to an agent, your control plane should confirm that the agent's identity matches its registered provenance, that valid unexpired attestations from enrolled signers exist, and that the aggregate confidence level meets the threshold you have set for the task class.
The cryptographic mechanism behind attestations matters here. An attestation provider runs its evaluation — behavioral testing, code review, policy audit — and produces a signed payload containing the agent identifier, the attestation type, a score or claim, and supporting evidence. Your control plane verifies the asymmetric signature against the enrolled public key before accepting it. Curating the allow-list of enrolled signers is itself a security operation: adding a signer gives them material influence over which agents your platform trusts.
Trust scoring as a dispatch gate
A trust score is the practical instrument that turns provenance and attestation data into a gate at dispatch time. Rather than a binary allow or deny, a graduated score lets you match the required confidence level to the sensitivity of the task. You might allow a lower-trust agent to read public documentation but require a higher score before it can write to a production system or call an external API on behalf of a user.
Useful signals fall into several categories. Identity verification asks whether the agent's claimed identity is authenticated. Behavior history examines past execution records — anomalous tool calls, excessive resource consumption, or out-of-scope data access reduce trust. Compliance posture covers whether the agent meets declared organizational requirements. Reputation captures signals from peer organizations in a federation. Security posture reflects the hardening of the agent's runtime: least-privilege tool scopes, pinned model versions, and defined rollback procedures.
These signals combine into a composite score. Third-party attestations from enrolled certification bodies contribute to the trust score alongside identity, behavioral, and compliance signals — no single source dominates the composite result.
Dispatch gates then enforce the score at multiple layers. At the connection level, a minimum trust level is required for each resource connection. At the task policy level, individual policies specify their own minimum. At the organization level, a trust floor ensures no agent operates below a baseline. An agent must satisfy all applicable thresholds simultaneously; if any gate is not met, the task is rejected before executing.
Monitoring trust over time
Trust is not static. An agent that behaved reliably last quarter may have had its underlying model updated, its tool scopes expanded, or its system prompt altered by the vendor. Score history and audit trails provide visibility into this drift.
Expired attestations are excluded from the trust score calculation, causing the score to fall back to identity, behavioral, and compliance signals alone. If the resulting score falls below the minimum required by a connection or policy, dispatch is blocked until the attestation is renewed or a fresh review is completed.
Reviewing trust timelines — score over time with contributing components — lets you detect when a previously stable agent begins declining. A downward trend in behavior history may indicate model drift or an expansion of tool access that has introduced new failure modes. This monitoring posture is the supply chain analogue of dependency update alerts in traditional software: you want to know when something upstream changed, even if nothing in your own integration code did.
The organizational process behind the controls
Technical controls need an organizational process to be effective. Building an AI agent inventory is the natural starting point: a maintained record of every third-party agent and tool in use, including who approved it, under what conditions, which version is pinned, and when the approval expires. Without it, technical controls protect known agents while unknown ones operate without scrutiny.
Add a vendor review practice. Before admitting an agent, understand what it does, what data it processes, how it is updated, and what the vendor's security disclosure process looks like.
Define a revocation procedure: clear steps for removing an agent when a concern arises, covering how dispatch is halted, how in-flight tasks are handled, and how the decision is recorded in the audit trail. Pairing this with governed connections between agents and resources ensures that even if a compromised agent is not immediately identified, its reach is bounded by the connection policy already in place.
How Praesidia approaches this
Praesidia is designed around the premise that trust should be explicit, verified, and continuously evaluated. The trust scoring system computes each agent's score from identity, behavioral, compliance, reputation, and security signals, with verified attestations from registered providers contributing a bounded adjustment. Attestation signatures are verified against the registered allow-list of enrolled signing authorities — unverified or self-asserted attestations do not affect the score.
At dispatch, independent gates check the score against per-connection, per-policy, and organization-wide thresholds. An agent that falls below the required level is blocked before a task executes. Score history and the component breakdown are persisted so teams can investigate what drove a change. Every registered connection carries its own constraints — cost caps, rate limits, guardrail rules — bounding the reach of any compromised external agent.
For how to configure trust floors, register attestation providers, and define per-connection policies, see trust scores and attestations for AI agents and governed agent-resource connections.
Common questions
How is an attestation different from vendor security documentation? A vendor can write anything in a document. An attestation is a cryptographically signed statement from a party whose public key is registered in your platform's trust root. You can verify independently that the statement is genuine, unmodified, and was issued by an entity you explicitly authorized to make that claim. Vendor documentation is useful context; a verified attestation is a checkable assertion.
What happens when an attestation expires? Expired attestations are excluded from the trust score calculation. The agent's score falls back to its remaining signals — identity, behavioral history, and compliance posture. If the resulting score falls below the minimum required by a connection or policy, dispatch is blocked until the attestation is renewed or a fresh review is completed.
Can internal agents use the same attestation model? Yes, and it is worth doing. First-party agents can also drift, develop problematic behavioral patterns, or have their configurations changed inadvertently. Internal approval workflows — a security review, a staging-to-production sign-off — can be modeled as attestations issued by your own internal approval authority, giving first-party agents the same explicit, auditable trust standing as vetted external ones.
How does supply chain risk differ from standard software dependency risk? In traditional software, a compromised dependency typically leaks data passively. A compromised agent is action-capable: it can exfiltrate documents, escalate permissions, or instruct downstream agents in a multi-agent pipeline. The blast radius is larger, and the behavioral changes that signal compromise are harder to detect than a changed binary hash. See threat model: agent credential theft for the specific attack patterns to defend against.
What is the minimum set of controls for a small team? At minimum: maintain a registry of all external agents and tools, verify that each has a documented vendor disclosure process, enforce least-privilege tool scopes on every connection, and block dispatch for any agent whose trust score falls below your organization floor. This baseline is achievable without a dedicated security team and addresses the most common failure modes. For how to apply least privilege specifically to MCP tool access, see scoping MCP tool permissions.