Billing reliability in a SaaS platform depends on more than accepting a payment. Every event that Stripe sends—subscription changes, failed charges, disputes, refunds—must land in your system in the right order, be processed exactly once, and leave your local state consistent with Stripe's. When those guarantees break down, you end up with ghost subscriptions, double credits, or customers who paid but are still blocked. Signed webhooks, periodic reconciliation, and an explicit dunning state machine are the three mechanisms that prevent that from happening. For the broader context of how subscriptions and invoices fit together, see subscriptions, invoices, and contracts for AI platforms.

Why Webhooks Alone Are Not Enough

Stripe's webhook delivery is reliable but not guaranteed to arrive in order, and not guaranteed to arrive at all if your endpoint is temporarily unreachable. An event can be retried multiple times. If your handler is not idempotent, a retried invoice.paid event can credit an account twice. If your endpoint is down during a customer.subscription.deleted event, you may keep a customer on a paid plan indefinitely.

This is why a robust billing integration needs three separate layers working together: secure ingestion of webhook events, a reconciliation process that periodically checks local state against Stripe's API, and a dunning process that manages the lifecycle of accounts with failed payments.

Verifying the Signature Before Doing Anything

Every Stripe webhook carries an HMAC-SHA256 signature in the Stripe-Signature header. The signature is computed over the raw request body using your webhook signing secret. Verifying this signature before processing the payload is the most important step: it prevents an attacker from crafting a fake invoice.paid event and crediting an account without actually paying.

The verification must be done on the raw, unmodified request body—not a parsed JSON representation—because even a single whitespace difference will invalidate the signature. It also includes a timestamp check to prevent replay attacks: Stripe rejects events with a timestamp more than a few minutes old, and your handler should apply the same check.

Praesidia's webhook endpoint verifies the Stripe signature before any processing runs. Events that fail verification are rejected at the door and never acted on. Only after the signature check passes does the system look at the event type and route it to the appropriate handler.

Idempotent Event Handlers Across 12 Event Types

Stripe can deliver the same event more than once. An idempotent handler produces the same outcome whether it runs once or ten times. The standard pattern is to tie each credit or state change to a unique identifier that comes from Stripe—the event ID, the invoice ID, or the charge ID—and enforce uniqueness at the storage level. An attempt to insert a duplicate is a no-op rather than an error.

Praesidia handles the full set of billing-relevant events: subscription changes and deletions, invoice paid and payment failed, payment method detachment, the complete dispute lifecycle (created, updated, closed, funds withdrawn, funds reinstated), and refund events. Each handler is designed so that receiving the same event twice does not produce double credits or duplicate state transitions.

The dead-letter queue captures events that fail processing after the initial attempt. This means a transient database error or downstream failure does not silently discard an event; it is queued for retry, giving the system a chance to process it once the underlying issue is resolved.

Reconciliation: The Safety Net Between Webhooks

Even with idempotent handlers and a dead-letter queue, drift can accumulate. A webhook that was never delivered, a handler correction applied after the fact, a Stripe-side adjustment—any of these can leave local state out of sync with Stripe's ledger. For a complementary perspective on keeping financial records accurate, see revenue monitoring and payouts for AI marketplaces.

Reconciliation is a scheduled process that runs independently of the webhook path. It queries Stripe's API for the current state of subscriptions and invoices, compares that against the platform's recorded state, and corrects any discrepancies it finds. Because it runs on a schedule rather than in response to events, it catches problems that the event stream missed.

The reconciliation process runs without any HTTP surface—it runs on a schedule, not in response to inbound requests. This keeps it isolated from the webhook path, so a disruption in one does not affect the other.

The Dunning State Machine

Dunning is the process of handling accounts with failed payments. A naive approach is to immediately suspend an account when a payment fails. A better approach is a structured state machine with a grace period: give the customer time to update their payment method before taking action, then escalate if the situation is not resolved.

The standard state sequence is: payment fails → account enters a past-due state → a grace period begins → if no successful payment arrives within the grace window, the account is suspended → suspended accounts are downgraded and blocked from further purchases.

Praesidia's dunning state machine is driven by invoice.payment_failed and invoice.paid events from Stripe, combined with a scheduled process that advances state when a grace period expires. When an account is suspended, the platform enforces that suspension by preventing further billable activity and surfacing a clear, actionable message to the user rather than failing silently. If a payment does succeed, the state machine moves the account back to good standing immediately.

The platform surfaces context-specific guidance to authenticated users when their access is blocked for a billing reason, so they understand what action is needed rather than seeing a generic access denied message.

Refund Durability

Refunds present a specific reliability challenge: if the request to Stripe succeeds but the local acknowledgment fails, you can end up with a refund that Stripe issued but that your ledger does not reflect. The opposite problem—issuing two refunds because the first was not confirmed—is even worse.

The solution is to record the refund obligation in your database before issuing it to Stripe, and to use a retry process that is safe to run repeatedly. Praesidia tracks refund obligations as durable records with automatic retry handling, so a failed acknowledgment does not result in a lost refund or a double refund.

Common questions

What happens if our webhook endpoint is down for several hours? Stripe will retry delivery for up to 72 hours with exponential backoff. Any events that exhaust retries or arrive after recovery will be caught by the reconciliation process, which compares local state against Stripe's API on a schedule. The combination of retry, dead-letter capture, and reconciliation means short outages do not produce lasting inconsistencies.

How does the system prevent the same payment event from crediting an account twice? Each credit transaction is recorded with a reference identifier derived from the Stripe event—invoice ID, charge ID, or similar. A uniqueness constraint on the combination of organization and reference identifier means a duplicate attempt fails at the storage level rather than silently applying a second credit.

When does a suspended account get automatically downgraded? The dunning state machine advances through its states on a combination of event triggers (payment failed, payment succeeded) and scheduled checks. A suspended account is downgraded to a lower plan tier automatically if the suspension persists beyond the configured grace window. The exact timing is governed by your organization's dunning policy, which Praesidia applies consistently across all tenants. For more on how plan tiers and feature access interact, see plan gating and feature flags.

How do disputes interact with the dunning state machine? Disputes (chargebacks) follow their own lifecycle — created, updated, funds withdrawn, funds reinstated, closed — and each transition is handled by a dedicated event handler. A dispute does not automatically trigger the dunning flow, but it does put the account into a hold state that blocks further purchases until the dispute resolves. The hold is released when Stripe sends a funds-reinstated or closed event in the customer's favor. For how billing state connects to subscription access and plan gating, see plan gating and feature flags.

Keeping Billing State Correct

The combination of signature-verified webhook ingestion, idempotent handlers covering all relevant event types, a dead-letter queue for failed deliveries, periodic reconciliation against the Stripe API, and an explicit dunning state machine is what keeps billing state reliable in practice. Each layer addresses a different failure mode; removing any one of them leaves a gap that will eventually surface as a billing inconsistency. For the broader FinOps picture of how spend is monitored and attributed across your AI estate, see credits and cost monitoring for agent spend.

Praesidia is designed to handle this complexity as infrastructure rather than application code, so your team can reason about billing state without building and maintaining all of these mechanisms from scratch.