Applying the NIST AI RMF to AI Agents

The NIST AI Risk Management Framework gives organizations a structured vocabulary for managing AI risk — but translating its four functions into controls for autonomous, tool-using agents requires concrete guidance that the original framework predates. GOVERN, MAP, MEASURE, and MANAGE each map to a distinct set of technical and process controls when applied to agentic systems. This post walks through that translation with practical examples for each function.

What the NIST AI RMF expects

The NIST AI RMF, published in January 2023 and supported by a growing set of profiles and playbooks, is organized around four functions:

GOVERN — Establish the policies, roles, and culture that make risk management possible.
MAP — Identify and characterize AI risks in context.
MEASURE — Analyze and assess those risks using defined metrics and methods.
MANAGE — Prioritize and treat risks, then track residual risk over time.

These functions are intentionally technology-neutral. They describe organizational capability, not a product checklist. For AI agents specifically, each function maps to a distinct set of technical and process controls.

GOVERN: policy, roles, and accountability

GOVERN asks who is responsible for AI risk and whether they have the authority to act. For agentic systems, that translates into several concrete requirements.

Agent inventory. You cannot govern what you have not catalogued. GOVERN expects a maintained inventory of AI systems in use, their intended use cases, and the data they touch. For agents, this means knowing which agents are active, which organization or team owns each one, and which external tools or services each agent can reach. See building an AI agent inventory for a practical approach to cataloguing your agent estate.

Role-based access to agents. Policies about who may deploy, modify, or disable an agent need to be enforceable, not just documented. That means role and permission structures that prevent unauthorized changes and create a clear chain of accountability when something goes wrong.

Policy documentation. GOVERN requires that risk tolerances, acceptable use boundaries, and escalation paths be documented and reviewed. For agents, this includes budget limits, content policies, and the conditions under which an agent's output must be reviewed by a human before it takes effect. For a structured approach to documenting and assessing these controls, see An AI Governance Maturity Model.

Organizational culture. GOVERN also addresses whether the people building and deploying agents treat risk management as a real obligation rather than a compliance checkbox. Formal review processes, mandatory training, and documented incident histories are reasonable signals of that commitment.

MAP: identifying risks in context

MAP asks organizations to characterize the risks associated with a specific AI deployment, including who could be harmed, under what conditions, and with what likelihood. For agents, context matters enormously because the same underlying model can be low-risk or high-risk depending on what tools it has access to and what actions it can take autonomously.

Risk classification per agent. The EU AI Act introduced a formal risk tier structure; the NIST AI RMF is less prescriptive but equally insistent that risk be assessed per deployment context. An agent that drafts email suggestions is a different risk profile from one that submits financial transactions or modifies production databases.

Threat modeling. MAP expects you to enumerate plausible failure modes. For agentic systems, the key categories are (see also the OWASP LLM Top 10 applied to AI agents for a complementary threat taxonomy):

Failure mode	Description
Prompt injection	Adversarial content in tool outputs or external data hijacks agent behavior
Over-broad permissions	Agent can access or modify more than its task requires
Credential exposure	Agent secrets are stolen and used to impersonate the agent
Runaway spend	Agent enters a loop and consumes unbounded compute or API budget
Data leakage	Agent routes sensitive data to an unauthorized destination or logs it in plaintext
Delegation abuse	One agent grants another agent more authority than it was itself given

Mapping these to specific deployments—which ones apply, with what probability, and with what impact—is the substance of the MAP function.

Stakeholder impact. MAP also asks who is affected by the AI system's outputs, including people who are not direct users. For customer-facing agents, this includes the end customers whose data the agent processes. Documenting these populations is a prerequisite for the MEASURE function.

MEASURE: quantifying risk over time

MEASURE is where risk management becomes empirical rather than theoretical. It asks organizations to collect data that lets them assess whether actual risk levels match their expectations.

Behavioral metrics. For agents, useful measures include the rate of guardrail violations (how often does agent output get blocked or flagged?), the frequency of anomalous tool call sequences, and the distribution of spend per agent over time. Trends matter as much as point-in-time values.

Audit trail completeness. MEASURE requires that you can reconstruct what happened. For agents, this means every tool invocation, every external API call, and every content decision needs to be recorded with enough context to answer: which agent, acting on whose behalf, called what, with what inputs, and with what result. Tamper-evident logs — where entries are cryptographically chained so that deletions or modifications are detectable — are the standard pattern for this. The implementation details are covered in tamper-evident audit logs with cryptographic proofs.

Evaluation against baselines. MEASURE expects comparison against defined expectations. If an agent should invoke a tool no more than a certain number of times per session, deviations from that baseline are a measurable signal. Baselines must be documented before deployment, not after an incident.

Human review sampling. Fully automated agents still benefit from periodic human review of sampled outputs. MEASURE includes qualitative assessment, not just quantitative metrics. Structured sampling programs—where a defined fraction of agent interactions is reviewed by a human evaluator on a regular schedule—provide a check that automated metrics can miss.

MANAGE: treating and tracking residual risk

MANAGE is the operational function: given what you know about risk, what do you do about it, and how do you verify that your treatments are working?

Preventive controls. The primary treatment for most agent risks is preventive: guardrails that block policy-violating content before it reaches users or external systems, permission scoping that prevents agents from accessing resources they do not need for the current task, and budget caps that halt agents before they cause financial damage. These controls need to be enforced at the platform level, not just documented in policy.

Detective controls. Prevention is not sufficient on its own. MANAGE expects that you can detect when something has gone wrong, even if it got past your preventive controls. Real-time alerting on anomalous behavior, automated flagging of policy violations, and regular audit log review are the standard pattern.

Response and containment. MANAGE includes incident response: the ability to revoke an agent's credentials, suspend its operation, and preserve forensic evidence quickly when a problem is detected. For agents, this means revocation takes effect immediately and cannot be bypassed on cached credentials. For a practical incident response playbook, see Incident Response for AI Agent Breaches.

Residual risk tracking. MANAGE does not expect zero risk; it expects that residual risk is known, accepted at the appropriate level, and tracked over time. That means documenting accepted risks and reviewing those decisions periodically as the threat environment and deployment context change.

Mapping controls to RMF subcategories

The NIST AI RMF's playbook and profiles provide more granular subcategories within each function. A practical mapping for agentic systems looks like this:

RMF function	Example subcategory	Concrete agent control
GOVERN 1.1	Policies and procedures established	Agent acceptable use policy, per-role permission matrix
GOVERN 2.2	Organizational roles and responsibilities	Agent owner designated per deployment, incident escalation path defined
MAP 1.1	Use context and risk tolerance identified	Per-agent risk classification, threat model documented
MAP 2.1	Scientific and technical knowledge applied	Guardrail design informed by known attack classes (prompt injection, credential theft)
MEASURE 1.3	Internal experts involved in assessment	Human review sampling program, red-team exercises
MEASURE 2.5	AI system is monitored for performance	Behavioral metrics, audit log review, anomaly alerting
MANAGE 1.3	Responses to risks are prioritized	Severity tiers for guardrail violations, automated vs. manual response thresholds
MANAGE 4.1	Residual risks are tracked	Risk register with accepted risks, periodic review cadence

This is not exhaustive—the full RMF has over 70 subcategories—but it illustrates how the framework's abstractions translate into tangible agent controls.

How Praesidia supports RMF alignment

Praesidia approaches RMF alignment by providing the control infrastructure that the GOVERN, MAP, MEASURE, and MANAGE functions require. The compliance surface includes framework report generation against the NIST AI RMF control catalogue, evidence collection and gap tracking, and structured risk classification for each agent deployment. Audit logs are maintained with tamper-evident chaining, and the access review function supports the periodic credential and permission reviews that MANAGE expects.

For organizations that need to demonstrate RMF alignment to an auditor or a customer, this means the underlying evidence — log records, control assessments, classification decisions, policy documents — is organized and exportable rather than scattered across separate systems.

Common questions

Is the NIST AI RMF mandatory? The NIST AI RMF is a voluntary framework, not a regulation. However, it is increasingly referenced in federal agency procurement requirements and in sector-specific guidance from regulators. Organizations that align to the RMF are better positioned when mandatory requirements reference it, which is already happening in some US federal contexts.

How does the NIST AI RMF relate to the EU AI Act? They address overlapping concerns through different mechanisms. The EU AI Act is a regulation with mandatory requirements organized around risk tiers (unacceptable, high, limited, minimal). The NIST AI RMF is a voluntary framework organized around management functions. Many of the underlying controls — risk assessment, human oversight, audit trails, incident response — appear in both. Organizations subject to the EU AI Act often find that RMF alignment provides a useful operational structure for meeting the Act's requirements. See the EU AI Act explained for engineering teams for a side-by-side comparison of where the two frameworks converge.

Where should an organization start with the RMF? Start with GOVERN and MAP before investing in MEASURE and MANAGE. Without a clear policy foundation and a documented understanding of which risks are relevant to your specific agent deployments, measurement and management efforts tend to measure the wrong things. A practical first step is an agent inventory: know what you have deployed, who owns it, and what it can do.

How does the RMF handle agents that delegate tasks to other agents? Delegation chains are one of the harder MAP challenges. The RMF does not prescribe a specific control, but the underlying principle — that risk must be assessed per deployment context — means each agent in a chain needs its own risk classification, and the combined chain needs a threat model that covers delegation abuse specifically. An agent that is individually low-risk can become high-risk if it can grant its authority to a downstream agent that has broader tool access. Documenting and enforcing delegation boundaries is therefore a GOVERN and MAP obligation, not just a runtime one.