Skip to main content

Email Health Monitoring

Why this exists

On 2026-04-10 the founder made a change in admin.google.com that resulted in churchwiseai.com's MX records being deleted. Every email sent to john@churchwiseai.com between April 10 and April 12 bounced or was queued on sender servers. Potential losses: customer signup replies, vendor invoices (Stripe, Vercel, Supabase, Anthropic, Telnyx, LiveKit, Porkbun domain renewals, Google Workspace itself), and any personal correspondence.

WatchTower did not detect this because its existing checks only covered HTTP sites and DB table row counts. This runbook describes the email-health checks that now close that gap.

What gets checked

Five domains, each with an explicit expected MX provider:

DomainExpectedWhy
churchwiseai.comgooglePrimary founder inbox. Must always route to Google Workspace.
pewsearch.comnoneOutbound mail only (via send. SES and mail. Mailgun subdomains). No root inbox.
illustratetheword.comnoneSame — outbound only.
sermonwise.ainoneSame — outbound only.
sharewiseai.comporkbun-forwarderEmail forwarding via Porkbun. Any drift means forwarding is broken.

"Expected" is compared against detected MX records. Any mismatch = unhealthy.

Where the checks live

  • src/app/api/founder/watchtower/health-checks/route.ts — on-demand check exposed in the founder WatchTower UI. Shows expected vs. detected provider, MX record list, and healthy/unhealthy badge per domain.
  • src/app/api/cron/daily-audit/route.ts — runs every morning. Any unhealthy domain becomes a drift issue (P0 founder_action_item). If churchwiseai.com is the broken domain, an SMS is sent via Telnyx as a fallback (see below).

SMS fallback for email-is-broken

The circular failure mode: if churchwiseai.com MX is broken, the alert email itself bounces. The founder would never learn that email is down.

Fix: When daily-audit detects that churchwiseai.com specifically is unhealthy, src/lib/alert-sms.ts sends an SMS to OPS_ALERT_PHONE via Telnyx. SMS bypasses DNS and reaches the founder regardless of inbox state.

Required env vars:

  • TELNYX_API_KEY — already set (used by voice-provisioning)
  • OPS_ALERT_PHONE — founder phone in E.164 format, already set
  • TELNYX_SMS_FROM — optional; defaults to +14144007103

Adding a new domain

  1. Edit EMAIL_HEALTH_DOMAINS in both files (the cron and the watchtower endpoint). Keep them in sync — this is a deliberate duplication because the list rarely changes.
  2. Set expected to one of: google, porkbun-forwarder, none.
  3. Deploy.

Manually running the check

curl "https://churchwiseai.com/api/founder/watchtower/health-checks?token=$FOUNDER_TOKEN" | jq '.email'

Or, from the founder dashboard → WatchTower tab → "Check Now" button.

What to do when a domain shows unhealthy

  1. churchwiseai.com unhealthy — restore MX records immediately in Porkbun DNS. Google Workspace 5-record setup:

    • Priority 1: ASPMX.L.GOOGLE.COM
    • Priority 5: ALT1.ASPMX.L.GOOGLE.COM, ALT2.ASPMX.L.GOOGLE.COM
    • Priority 10: ALT3.ASPMX.L.GOOGLE.COM, ALT4.ASPMX.L.GOOGLE.COM
    • (Modern alternative: a single record, priority 1, SMTP.GOOGLE.COM.)
  2. sharewiseai.com unhealthy — restore Porkbun forwarder records (fwd1.porkbun.com, fwd2.porkbun.com) in Porkbun DNS.

  3. Any other domain unhealthy — either the MX config changed or the expected value in the registry is wrong. Investigate before "fixing" — the domain may have legitimately gained a new inbox that should now be expected='google'.

Known gaps

  • SPF/DKIM/DMARC are not checked. Email deliverability (sending reputation) is a separate concern tracked in knowledge/runbooks/content-ops/.
  • Domain expiry (Porkbun renewal dates) is not checked. Adding that is P1 — a domain expiring would cascade into MX breaking anyway, so this alert would fire as a secondary signal. Direct domain-expiry monitoring would give earlier warning.