Stripe Webhook Inbox Pattern
Summary
The Stripe webhook handler (/api/stripe/webhook) is a thin ack-and-enqueue. All real provisioning work happens in the cron worker at /api/cron/process-stripe-webhooks.
Why: Inline provisioning inside the webhook handler is fragile — cold starts, long-running LiveKit/Telnyx/Resend calls, transient DB errors — and has silently lost paid customers. The inbox pattern durably queues every event so Stripe always gets a fast 200, and the cron retries failed processing with exponential backoff.
Flow
Stripe sends event
→ POST /api/stripe/webhook
→ Verify HMAC signature (whsec_...)
→ INSERT into stripe_webhook_inbox (idempotent via ON CONFLICT on stripe_event_id)
→ Return 200 immediately
→ Fire inline fast-path via Promise.resolve().then(processStripeEvent) — non-blocking
→ /api/cron/process-stripe-webhooks (every 1 min)
→ SELECT status='pending' rows from stripe_webhook_inbox
→ Call processStripeEvent(event) for each
→ On success: mark status='succeeded'
→ On failure: mark status='failed', increment attempt_count, schedule next_retry_at
→ Abandon after 3 failures: P0 alert via reportError
B2C Products Guard
B2C_PRODUCTS is a module-scope exported Set in webhook/route.ts. It contains plan keys for B2C products that store subscription state in user_subscriptions + profiles, NOT premium_churches:
sermon_pro, share_wise_pro, share_wise_business, share_wise_agency, ai_starter_kit, social
Why module-scope: Before 2026-05-11 this set was duplicated inline in processStripeEvent() and also in stripe-supabase-reconciliation/route.ts. Duplication risked drift when new B2C products are added. It is now defined once and imported everywhere.
False-positive history (2026-05-08/09): Two P0 ops_errors rows fired checkout.session.completed missing fields: churchId=undefined for SermonWise (sermon_pro) checkouts. Root cause: the legacy-flow reportError call inside the checkout.session.completed switch case was positioned BEFORE the B2C product break. PR #386 (2026-05-09) moved the B2C break above the legacy-flow reportError, eliminating the false positive. The two stale P0 rows were closed via migration 2026-05-11-close-stale-p0-and-dedup-p1.sql.
Auto-Resolver
resolveWebhookMissingFieldsFalsePositives() in stripe-supabase-reconciliation/route.ts runs on every reconciliation cron tick. It pattern-matches ops_errors.message for the false-positive prefix, verifies the corresponding stripe_webhook_inbox row shows status='succeeded', and auto-resolves the ops_error row with a documented fix_summary. This handles any future pre-fix artifacts without manual SQL.
Staleness Decay
The daily-audit cron (/api/cron/daily-audit) runs ops_errors staleness decay: any row with status='new' AND last_seen_at < now() - 30 days is flipped to status='stale'. This prevents ancient never-recurring errors from polluting P0/P1 counts in the morning brief. status is a plain text column — no enum migration required.
Key Tables
| Table | Role |
|---|---|
stripe_webhook_inbox | Durable queue. Columns: stripe_event_id, event_type, event_data, status, attempt_count, next_retry_at, processed_at, last_error. |
ops_errors | WatchTower error log. Statuses: new, dispatched, resolved, needs_human, fixed, stale. |
founder_action_items | P0/P1 action items surfaced in the morning brief. |
Ops Monitoring
The morning brief includes a Drift/Ops line showing:
- Count of open P0 and P1
ops_errors - Webhook 7-day success rate from
stripe_webhook_inbox
See src/lib/morning-brief/compose.ts:fetchDriftOpsSnapshot().