The OWASP LLM Top 10, Applied to AI Agents

Key takeaways

OWASP LLM risks are amplified in agentic deployments because agents take actions, call tools, and read external data — expanding every attack surface beyond a standalone chatbot.
LLM01 (prompt injection) and LLM08 (excessive agency) deserve the highest priority for agent deployments; LLM03 (training data poisoning) only applies if you use fine-tuning or writable RAG sources.
Most categories converge on the same underlying controls: strong agent identity, least-privilege access, in-band content inspection, spend budgets, and tamper-evident audit logs.
Content guardrails address LLM01, LLM02, and LLM06 directly but do not substitute for identity, access control, and budgeting controls that address LLM04, LLM05, LLM08, and LLM10.
The OWASP categories are a useful vocabulary for structuring a conversation, not a replacement for a threat model against your specific agent connections and permissions.

The OWASP LLM Top 10 is a widely used reference for identifying security risks in systems that use large language models. When an LLM operates as an autonomous agent — taking actions, calling tools, reading external data, and communicating with other systems — each of these risk categories takes on additional surface area. The controls that matter for a standalone chatbot are still relevant, but agents introduce new attack paths that require governance controls beyond what a single model or API gateway typically provides.

This article walks each OWASP LLM risk category through an agent lens and describes the classes of control that address it. For a broader grounding in what makes AI agent security architecturally different from traditional application security, see why AI infrastructure needs a new security model.

LLM01: Prompt Injection

Prompt injection is the most discussed agent risk, and with good reason. Agents read from external sources — web pages, documents, database rows, tool outputs — that an attacker can craft to redirect the agent's behavior. This is indirect prompt injection: the malicious instruction is not in the user's input, it arrives through data the agent fetches.

Controls that apply: content inspection on inputs and outputs, with detection patterns tuned for instruction-hijacking attempts; least-privilege tool scope so a compromised agent cannot perform high-impact actions; agent identity and attribution so anomalous commands are auditable after the fact; and human-in-the-loop escalation for sensitive action classes. See how to detect and defend against prompt injection for a deeper treatment of detection approaches.

An inspection layer that sits in-band on the agent dispatch path — checking content before it reaches the model and before the model's response is acted on — is the structural requirement here. Combining several complementary detection techniques improves coverage across the range of injection styles an attacker might attempt.

LLM02: Insecure Output Handling

Agent outputs can flow into downstream systems: code interpreters, shell commands, email drafts, API calls, database writes. When an agent's response is treated as trusted input to a secondary process without validation, the downstream system is exposed.

Controls that apply: output validation before any secondary action is executed; sandboxing tool calls so the blast radius of a malicious output is contained; structured output schemas that reject free-form instructions in contexts where only structured data is expected.

LLM03: Training Data Poisoning

For agents that incorporate retrieval-augmented generation or that are fine-tuned on organization-specific data, the training and retrieval corpus is a trust boundary. Poisoned documents in a vector store or fine-tuning dataset can steer behavior at inference time.

Controls that apply: access controls on documents that feed retrieval pipelines, with the same multi-tenancy scoping applied to primary data; provenance tracking so the origin of retrieved content is auditable; anomaly detection on output patterns that diverge from baseline behavior.

LLM04: Model Denial of Service

Agents are often triggered programmatically and can be called at machine speed. An adversary who can invoke an agent endpoint can exhaust compute, model API budget, or downstream service capacity. Unlike human-driven chatbots, agents may also trigger recursive or looping behaviors that amplify cost and resource consumption.

Controls that apply: rate limiting per agent, per connection, and per organization; spend budgets that enforce hard ceilings on model API consumption; circuit-breaker patterns on tool calls; concurrency limits on parallel agent runs. See Threat Model: Runaway Agent Spend for a concrete attack-path analysis of this failure mode.

Budget enforcement is especially important in agentic settings because a single runaway loop can consume orders of magnitude more than a single request. A rate limit measured in requests-per-minute alone is insufficient when individual requests can trigger multi-step, multi-tool workflows.

LLM05: Supply Chain Vulnerabilities

Agents depend on a supply chain that includes the model provider, tool servers (MCP servers), third-party APIs, and the libraries that compose the agent framework itself. A compromised tool server or model API can return content designed to manipulate the agent's behavior or extract data.

Controls that apply: inventory and authentication of all connected tool servers and external APIs; signed or attested registrations so the runtime can verify that a connection is what it claims to be; trust scoring for agent-to-agent communication that reflects the connection's history and provenance; audit logs that record which external systems were called and what they returned. The full picture of supply chain risk in agent deployments is covered in securing the AI agent supply chain.

LLM06: Sensitive Information Disclosure

Agents often have broad read access to organizational data in order to answer questions or complete tasks. They can inadvertently include sensitive data — credentials, PII, financial figures, internal system details — in responses, tool call parameters, or logs.

Controls that apply: PII detection and redaction applied to both inputs to the model and outputs from it; least-privilege data access so agents can only read the data they need for the current task; log scrubbing to prevent sensitive content from persisting in audit records longer than necessary; data retention policies that expire sensitive log content on a defined schedule.

A guardrail that classifies and redacts PII categories (names, contact details, financial identifiers, health information) before content leaves the controlled environment addresses this across both the prompt and the response path. For implementation patterns, see PII Detection and Redaction in AI Pipelines.

LLM07: Insecure Plugin Design

In agent contexts, "plugins" are tool calls — MCP tools, API connectors, function definitions — and their design has direct security implications. An over-scoped tool that accepts arbitrary parameters, lacks input validation, or does not enforce authorization becomes an attack surface for any agent that can invoke it.

Controls that apply: tool scope restriction at the connection level so an agent can only invoke the tools it needs; parameter validation before tool calls execute; authorization checks within tool handlers that verify the calling agent's identity and permissions; monitoring on tool call patterns to surface anomalies.

The principle of least privilege applies at the tool level just as it does at the data level. An agent that only needs to read a calendar should not have a tool that can also send email.

LLM08: Excessive Agency

This category is specific to agents: the model is given more autonomy, more tools, or more permissions than the task requires, and when it behaves unexpectedly — whether due to a prompt injection, a hallucination, or an ambiguous instruction — the consequences are larger than they needed to be.

Controls that apply: scoped, time-bound credentials rather than persistent broad-access tokens; permission sets defined at the agent or connection level rather than granted globally; human approval workflows for high-impact action classes; budget and rate limits that cap what any single agent run can consume.

Reducing excessive agency is primarily a design discipline, but it requires governance infrastructure to enforce: the platform needs to express per-agent permission boundaries and actually enforce them at runtime, not merely document them.

LLM09: Overreliance

Overreliance is the failure mode where users or downstream systems accept agent outputs without applying judgment. In automated pipelines, this can mean that a hallucinated fact or a fabricated reference propagates through multiple systems before anyone notices.

Controls that apply: confidence and uncertainty signaling in agent outputs; mandatory human review for defined output classes (clinical, legal, financial decisions); audit trails that record what the agent produced and what action was taken on it, enabling retroactive correction; output monitoring that flags responses matching known error patterns.

Overreliance is difficult to address purely with technical controls because it is also a process and culture issue. Audit trails that make every agent output attributable and reviewable are the technical foundation; the governance process built on top determines whether that capability is used.

LLM10: Model Theft

Model theft in an agent deployment can manifest as unauthorized access to fine-tuned or distilled models, or as systematic extraction of model behavior through repeated queries. Either exposes intellectual property and, in fine-tuned models, may expose the training data.

Controls that apply: access controls on model endpoints with authentication and authorization enforced per caller; rate limiting and query pattern analysis to detect systematic extraction attempts; audit logs that record model access by identity, enabling forensic review if extraction is suspected.

Cross-cutting controls

Several of the OWASP LLM categories converge on the same underlying control classes: strong agent identity so every action is attributable; least-privilege access at the data and tool level; in-band content inspection on both inputs and outputs; rate limiting and spend budgets; and comprehensive, tamper-evident audit logs.

These controls are more effective when they operate at the infrastructure layer rather than being reimplemented in each agent application. A governance control plane that sits between your applications, your agents, and their tool connections can enforce identity, inspect content, enforce budgets, and record audit events consistently, regardless of which agent framework or model provider is in use.

A governance control plane approaches this as an integrated layer: identity and RBAC for agents and the applications that invoke them, per-connection guardrails that enforce content policy in-band, spend budgets, and append-only audit logs across the full agent interaction surface. For a structured view of how these controls map to a maturity model, see an AI governance maturity model.

Common questions

Is the OWASP LLM Top 10 the right reference for our agent security program?

It is a useful starting point, particularly for building shared vocabulary across security and engineering teams. It does not replace a full threat model for your specific deployment — the relative weight of each category depends on what your agents can access and what actions they can take. Use the OWASP categories to structure a conversation, then do an inventory-based threat model against your actual agent connections and permissions.

Do all ten categories apply to every agent deployment?

Not equally. Categories like LLM04 (denial of service) and LLM08 (excessive agency) are specifically amplified in agentic settings and should be prioritized. Categories like LLM03 (training data poisoning) only apply if you are using fine-tuning or retrieval-augmented generation with writable data sources. The agent security assessment questionnaire can help you identify which categories carry the most weight for your deployment.

How do guardrails relate to these OWASP risks?

Content guardrails address LLM01 (prompt injection detection), LLM06 (PII/sensitive data disclosure), and LLM02 (output validation) most directly. They do not, by themselves, address LLM04, LLM05, LLM08, or LLM10 — those require identity, access control, budgeting, and audit controls that operate at a different layer. A complete posture addresses all ten categories, and most of them require governance infrastructure beyond content inspection alone.