A Glossary of AI Governance and Agent Security Terms

A shared vocabulary is a prerequisite for a shared security posture. When "guardrail" means a keyword filter to one team and a policy engine to another, requirements slip through the gap between them. This glossary defines the terms that come up most in AI governance and agent security conversations — precise enough to use in specifications, policy documents, and vendor evaluations.

Identity and Access

Principal — Any entity that can make a request: a human user, an application, or an AI agent. Good governance requires every principal to have an explicit identity, scoped credentials, and an accountability trail. Shared credentials break attribution and make revocation disruptive.

Authentication vs. authorization — Authentication verifies who a principal is. Authorization decides what an authenticated principal may do. They are evaluated in sequence and neither substitutes for the other. For agents, authentication typically uses signed tokens or API keys; authorization is expressed through scoped permissions attached to those credentials.

Least privilege — The principle that a principal should hold only the permissions required for its current task. For agents this matters especially: an over-permissioned agent that is manipulated or misbehaves causes far wider damage than one with a tight scope. Least privilege limits blast radius as much as it prevents initial compromise. See how to implement least privilege for AI agents for a practical implementation guide.

RBAC and ABAC — Role-Based Access Control attaches permissions to roles and assigns roles to principals. It works well for stable job functions but accumulates bloat over time. Attribute-Based Access Control evaluates access based on attributes of the principal, the resource, and context — more expressive, harder to audit. Most production systems use a combination: RBAC for stable functions, ABAC for context-sensitive gates such as an agent's current trust score.

SSO and SCIM — Single Sign-On (SSO) lets users authenticate once with a central identity provider and access connected applications without separate credentials. SAML 2.0 and OpenID Connect are the dominant protocols. SCIM automates user and group provisioning across systems, so joiner and leaver events propagate automatically. SSO handles authentication; SCIM handles lifecycle. Both are needed to keep access current.

Agent Identity and Trust

Agent identity — The credentials, claims, and metadata that uniquely identify an agent as a principal. A system where agents use borrowed human credentials cannot scope, revoke, or audit agents independently. Treating agents as first-class principals — with their own credentials and lifecycle — is a prerequisite for meaningful governance.

A2A (Agent-to-Agent Communication) — Direct interaction between two autonomous agents, within the same system or across organizational boundaries. Governed A2A requires both parties to authenticate independently, the calling agent to present an appropriate scope, and every hop to be logged. Ungoverned A2A — agents calling each other through shared secrets or inherited permissions — is a lateral-movement risk that compounds across longer call chains. For a deeper treatment of the threat surface, see threat model: agent-to-agent delegation abuse.

Trust score — A numeric or categorical rating that summarizes confidence in an agent's behavior at a point in time, derived from signals such as error rates, policy violations, attestation status, and behavioral drift. Most useful as a runtime gate: certain tools or action categories are available only to agents above a defined threshold. Trust scores should decay when an agent has not been recently reviewed.

Attestation — A cryptographically signed claim asserting a verifiable property about an agent: its software version, configuration state, or the outcome of a security review. Attestations let a consuming system evaluate provenance without directly inspecting the agent. They are how "knowing your supply chain" becomes machine-checkable rather than a documentation exercise.

Federation and trust manifests — Federation is a trust relationship between two organizations that allows each to accept the other's agents under defined conditions, established through signed manifests rather than shared secrets. A trust manifest expresses which actions are permitted, what spend is allowed, when the relationship expires, and how it can be revoked — making cross-org agent activity auditable and revocable without credential sharing.

Guardrails and Content Safety

Guardrail — A policy-enforcement point that inspects content flowing into or out of an agent and takes a defined action when a rule matches. Guardrails are distinct from model safety training, which is baked into a model version by the provider. Guardrails are your runtime controls, applied at the I/O boundary, specific to your organization's policies. Common targets include PII patterns, credential formats, prohibited topics, and prompt injection sequences.

Block, redact, warn — The three standard guardrail enforcement actions. Block prevents content from passing — appropriate for high-confidence, high-severity matches. Redact allows content through but replaces matched spans with a placeholder, preserving interaction structure while removing sensitive payload. Warn passes content and flags the event for human review, appropriate when false positives would be costly.

Fail-open vs. fail-closed — What a system does when it cannot complete an evaluation due to a timeout or error. Fail-closed blocks the operation; fail-open lets it proceed. Fail-closed is appropriate for high-risk action categories; fail-open suits lower-risk monitoring paths where a transient error should not halt legitimate work.

Prompt injection — An attack where malicious instructions embedded in content the agent processes — a document it reads, a tool response — override the agent's legitimate directive. Practical defenses include structurally separating instructions from data, monitoring for anomalous output patterns, and conservative tool scoping that limits what a manipulated agent can do.

Observability and Audit

Audit log — A time-ordered, append-only record of events: who did what, to which resource, at what time, and with what outcome. Usefulness depends on completeness (every material action recorded) and integrity (records cannot be altered after the fact). For AI governance, audit logs are the primary evidence for compliance reviews and incident investigations.

Tamper-evident log — An audit log structured so that modification of past records is detectable. Hash chaining — where each record includes a cryptographic digest of the prior one — is the standard technique: altering any record breaks the chain and invalidates all subsequent entries. Typically paired with external anchoring or periodic export to guard against wholesale deletion.

Trace — A record of the path of a single request or agent run across all services and tools it touches, annotated with timing. Distributed tracing is especially important for agentic systems, where a single task may fan out across multiple agents and external tools before returning a result.

Control Plane and Platform Terms

AI control plane — The management layer above AI agents and the applications that use them, providing centralized identity, authorization, policy enforcement, cost tracking, and lifecycle management. The control plane governs data-plane transactions without carrying inference traffic itself. Policies change in one place without touching individual integrations. See what is an AI control plane for a full explanation of the pattern and how it differs from an API gateway.

Multi-tenancy and RLS — Multi-tenancy means multiple organizations share underlying infrastructure while remaining isolated: one tenant cannot read or modify another's data or agents. Row-Level Security (RLS) is a database-layer mechanism that enforces tenant isolation at the query level, providing an additional layer of data separation alongside application-level controls.

MCP (Model Context Protocol) — An open protocol for connecting AI models to external tools, data sources, and services. An MCP server exposes named tools a model can call during inference. Governing MCP servers — authenticating callers, scoping tool access, rate-limiting, and logging every invocation — is a core concern for teams deploying tool-using agents.

Common questions

What is the difference between a guardrail and a policy? A policy is the rule you want to enforce — "no social security numbers in agent output." A guardrail is the runtime mechanism that inspects content and acts when the policy fires. You can have a well-written policy with no effective guardrail enforcing it, which is a common gap in early-stage governance programmes.

Is A2A the same as API-to-API communication? At the network level, A2A is always an API call, but the term implies something more: both parties are autonomous principals with their own identity and accountability record, not service accounts sharing a static key. A2A can cross organizational boundaries, involve trust negotiation, and chain through multiple hops where permissions must be tracked independently at each step.

Where does attestation fit relative to authentication? Authentication confirms who a caller is at connection time. Attestation provides verifiable evidence about the caller's current state — software version, configuration, or security review outcome. A caller can authenticate successfully while running an outdated or misconfigured build. Attestation lets you evaluate those properties independently, which matters especially when accepting calls from external organizations. For a full explanation of how trust scores and attestations work together at runtime, see trust scores and attestations: deciding which agents to trust.