Budgets and Quotas: Preventing Runaway Agent Costs

AI agents consume resources on your behalf without a human approving each action. Every LLM call, every tool invocation, every sub-agent spawned adds to a bill that can compound quickly when something goes wrong. The answer is to treat agent spend the way you treat infrastructure spend — define budgets upfront, enforce them automatically, and plan your response to crossing a threshold before it happens. This post covers how to design that system. For an introduction to the broader FinOps discipline for AI, see FinOps for AI Agents: Controlling Token and Tool Costs.

Why informal cost controls fail

Most teams start with informal cost controls: watch the dashboard, set a billing alert from the LLM provider, and hope someone notices before the number gets large. This has three structural weaknesses.

Provider-level alerts fire after the fact. By the time the email arrives, the spend has happened, and if the agent is still running it keeps spending until someone intervenes manually. Provider alerts are also aggregate — they tell you the organization spent too much, not which agent or team caused it. Without attribution you cannot take targeted action; cutting off access to stop one runaway agent also cuts off everything else. For the specific failure modes that lead to runaway spend, see Threat Model: Runaway Agent Spend.

Finally, a single provider alert cannot express per-team or per-workflow limits. There is no way to say "team A can spend $200 per week and team B can spend $500 per month" with a generic billing threshold.

The anatomy of a budget policy

A useful budget policy has four components: a scope, a target, a period, and one or more threshold actions.

Scope and target define what the budget applies to. The most granular scope is a single agent — useful when one expensive agent should be isolated from the rest of the fleet. Wider scopes can cover a workflow, a team, or the entire organization. Layering these scopes lets you express policies like "each agent is capped at $50/week, and the whole organization is capped at $2,000/month."

Period defines when spend resets. Common choices are daily, weekly, monthly, and total (lifetime, no reset). The right period depends on your billing cycle and how predictable your workload is. A workflow triggered by external events might need a daily cap to contain blast radius from a spike. A long-running research workflow might need a total cap so it does not exceed a fixed project budget.

Threshold actions are what happens when spend crosses a percentage of the budget. There are broadly four useful actions:

Alert — send a notification to configured channels. No agent is stopped; the team is simply informed. Appropriate at 70–80% of budget so there is time to react.
Throttle — allow the agent to continue but impose rate constraints that slow its spend rate. Useful when the task is time-sensitive and a hard stop would be worse than a slowdown.
Pause — halt any in-progress workflow runs and prevent new ones from starting. The agent is preserved; it just cannot act. Appropriate when you want a human to review before resuming.
Block — prevent any further agent dispatch outright until the budget is reset or raised. The strictest option, and the right default for production systems where cost certainty matters.

These actions are not mutually exclusive at a single budget boundary. A well-designed policy might alert at 80%, throttle at 90%, and block at 100%.

Reservation-based enforcement

A naive implementation counts spend after the fact: the agent runs, the cost is recorded, and the counter is compared to the budget. The problem is concurrency. If ten workflow runs start simultaneously, each starting below the budget threshold, all ten can complete and collectively push spend well past the limit before any individual run sees a block.

The correct model is reservation-based enforcement. Before a task is dispatched, the system reserves the estimated cost against every applicable budget policy. If the reservation would push any policy past its blocking threshold, the dispatch is denied before any LLM call is made. If the task completes, the actual cost is committed and the reservation released. If the task fails or is cancelled, the reservation is released without committing.

This approach provides two properties that tracking alone cannot: concurrency safety (no double-spending through simultaneous dispatch) and pre-emptive blocking (tasks are stopped before they spend, not after).

The accuracy of blocking depends on the quality of cost estimation. Estimating token counts before an LLM call is inherently approximate, which is why most systems combine reservation-based blocking with a periodic reconciliation step that corrects for estimation error after actual costs are known. For a deeper look at how hard caps interact with rate limiting as a complementary control, see Budgets vs Rate Limits: Controlling Agent Consumption.

Scoping budgets to match your org structure

The most common mistake in budget design is choosing a scope that is too coarse. A single organization-wide cap tells you nothing about where spend is going and makes it impossible to give individual teams meaningful autonomy.

A more useful mental model is a three-level hierarchy: an organization cap as the absolute ceiling, team or workflow caps as allocations within that ceiling, and per-agent caps that isolate experimental or high-variance agents from stable production ones. A runaway agent hits its own cap first; the team cap is the second line of defense; the org cap is the backstop. For how to set those per-agent caps in practice, see Budget Policies: Hard Spend Caps for AI Agents.

When multiple scopes apply to a single dispatch — which is typical — the most restrictive applies. If an agent's individual cap is healthy but the team cap is exhausted, the dispatch is blocked.

Raising budgets and handling resets

Budget enforcement creates an operational question: what happens when a legitimate workflow is blocked because the budget ran out mid-month? You need a resolution path that does not require disabling enforcement entirely.

The correct answer is a controlled raise operation: an authorized user increases the ceiling (or resets the period), and the system automatically resumes any paused runs that were waiting on that policy. What you want to avoid is a process where blocked workflows prompt engineers to disable enforcement entirely because it is easier than filing a budget increase. If raising a budget is a normal, auditable action any billing admin can perform, enforcement becomes part of operations rather than an obstacle.

Budget period resets introduce one common edge case: a long-running task that spans the reset boundary. The simplest convention is to count all cost in the period the task started, though this can distort period-boundary reports. Whatever you choose, apply it consistently and document it so billing summaries are predictable.

How Praesidia approaches budget enforcement

Praesidia's budget enforcement is built around the reservation model described above. Policies are scoped to an organization, team, workflow, or individual agent, with configurable thresholds that map to alert, throttle, pause, or block actions. Every dispatch is evaluated against all applicable policies before any task is queued. Raising a budget clears the armed state and automatically resumes paused runs.

The budget surface is available to organization owners and billing administrators. Cost summaries and policy status are visible from the monitoring dashboard. See Credits and Cost Monitoring for Agent Spend for guidance on understanding your spend baseline before defining policies.

The goal of enforcement is not to make agents artificially cheap — it is to make costs predictable and to bound the consequences of a misconfiguration. An agent that runs correctly will spend within its budget naturally. Treat budget policies as a first-class part of your deployment checklist: define the scope, choose a period, configure graduated threshold actions, and revisit the policy during regular operational reviews.

Common questions

Can I set a budget on a single workflow run rather than across all runs?

Per-run budgets are a distinct concept from aggregate budget policies. A per-run cap limits how much a single execution of a workflow can spend before being stopped, regardless of how much headroom the team or org-level policy has. This is useful for preventing any individual run from becoming a runaway regardless of overall budget health. Per-run caps and aggregate policies work together: both must be satisfied for a dispatch to proceed.

What happens to a paused run when the budget is raised?

When a budget ceiling is raised or a period is reset, any workflow runs that were paused specifically because of that policy are eligible to resume automatically. The enforcement layer checks whether the new ceiling provides enough headroom for the reserved cost of each paused run and resumes them. Runs paused for other reasons — a manual pause, another policy still over threshold — are not affected.

How do I know which policy caused a block?

When a dispatch is blocked, the enforcement layer records which policy or policies triggered the block, along with the current spend, the threshold, and the time of the event. That record appears in the audit trail and is surfaced in the agent's status. You should never be in a position where an agent is blocked and you cannot determine why.