Knowledge > Products > Voice Agent > Fallback Strategy
Voice Agent Fallback Strategy
This document covers resilience mechanisms for the voice agent: what happens when a provider fails, how routing falls back, and what's planned for future hardening.
Currently Implemented Fallbacks
1. Routing Fallback — Unknown Church
If a call comes in on a number that isn't in PHONE_REGISTRY (static map) AND isn't found in church_voice_agents (DB lookup), the agent falls back to the Sales Agent rather than dropping the call.
Unknown number
→ DB lookup: church_voice_agents WHERE twilio_phone_number = dialed
→ Not found → Fall back to Sales Agent (never drop the call)
Similarly, if a church's data fails to load (DB error or call limit exceeded):
load_church_data() returns None
→ Fall back to Sales Agent
Where it lives: main.py → _build_church_path()
2. Cache Stale Fallback — Supabase Errors
Church data and product knowledge are cached in-memory with TTLs. If Supabase throws an error during a cache refresh, the stale cached value is served rather than failing the call.
| Data | TTL | Stale behavior |
|---|---|---|
| Church data | 5 min | Serve stale on error |
| Product knowledge | 15 min | Serve stale on error |
| Inline FAQs | 5 min | Serve stale on error |
| Church DB lookup | 5 min | Serve stale on error |
Where it lives: session.py → cache_get() / cache_set()
3. Per-Turn RAG Timeout
If the Supabase vector search for per-turn RAG takes longer than 500ms, it is skipped entirely for that turn. The call continues without extra context rather than blocking the caller.
fetch_turn_rag()
→ asyncio.wait_for(..., timeout=0.5)
→ TimeoutError → return "" (empty RAG context)
→ Call continues normally
Where it lives: core/rag.py → fetch_turn_rag()
4. Non-Fatal Everything
All Supabase writes (prayer requests, callback requests, visitor contacts, call log updates, moderation violations) are wrapped in try/except. A DB write failure never drops a live call. The log entry is lost in the worst case, but the caller's conversation continues uninterrupted.
Provider Configuration (Current)
| Component | Primary Provider | Secondary | Notes |
|---|---|---|---|
| LLM — Coordinator | google/gemini-2.5-flash | None configured | Both Google and Anthropic plugins installed |
| LLM — Care Agent | anthropic/claude-haiku-4-5-20251001 | None configured | Haiku chosen for empathy, lower temperature |
| LLM — Sales Agent | google/gemini-2.5-flash | None configured | Same as Coordinator |
| STT | deepgram/nova-3 | None configured | livekit-plugins-deepgram~=1.5 |
| TTS | cartesia/sonic-3:{voice_id} | None configured | livekit-plugins-cartesia~=1.5 |
| VAD | Silero (pre-warmed) | None | Pre-loaded once per worker process |
| SIP Provider | Twilio | Telnyx (planned) | See Telnyx Migration section |
Note: The LiveKit Agents SDK (livekit-agents~=1.5) supports FallbackSTT, FallbackTTS, and FallbackLLM plugins, but these are not yet configured in the current codebase.
Planned Fallbacks (Not Yet Implemented)
LLM Fallback
The intention (reflected in existing docs) is:
- Coordinator: Gemini 2.5 Flash primary → Claude Haiku 4.5 fallback
- Care Agent: Claude Haiku 4.5 primary → Gemini 2.5 Flash fallback
This would be implemented using LiveKit's FallbackLLM plugin:
from livekit.plugins import fallback
llm = fallback.FallbackLLM(
primary=google.LLM(model="gemini-2.5-flash"),
fallback=anthropic.LLM(model="claude-haiku-4-5-20251001"),
)
Priority: Implement before scaling beyond ~20 concurrent churches. A Gemini outage currently takes down ALL Coordinator agents simultaneously.
STT Fallback
- Primary: Deepgram Nova-3 (most reliable for phone audio)
- Planned fallback: A second Deepgram model or alternative provider
from livekit.plugins import fallback
stt = fallback.FallbackSTT(
deepgram.STT(model="nova-3"),
deepgram.STT(model="nova-2"), # older but battle-tested
)
TTS Fallback
- Primary: Cartesia Sonic-3 (highest quality voice)
- Planned fallback: Cartesia Sonic-2 or ElevenLabs
Cartesia had a 5-day outage in early 2026 (see project memory: project_cartesia_vendor_risk.md) which would have taken down all TTS if the system had been live. A TTS fallback is especially important because TTS failure is immediately audible to the caller.
stt = fallback.FallbackTTS(
cartesia.TTS(model="sonic-3", voice=voice_id),
cartesia.TTS(model="sonic-2", voice=fallback_voice_id),
)
Telnyx Migration (Planned)
Why Telnyx
When churches start signing up at scale, Twilio will be replaced by Telnyx as the SIP trunk provider for church phone numbers. Telnyx offers:
- Lower per-minute cost at volume
- Elastic SIP trunking (no fixed trunk capacity)
- Built-in number management API (enables auto-provisioning)
- Same SIP INVITE format as Twilio — LiveKit sees no difference
What Changes
The voice agent itself requires no code changes. LiveKit Cloud receives SIP calls identically from Twilio or Telnyx. What changes:
- Number provisioning: Buy numbers via Telnyx API instead of Twilio dashboard
- SIP trunk config: Point Telnyx trunk to
cwa-voice-9x077mph.livekit.cloudSIP endpoint session.pyPHONE_REGISTRY: Telnyx numbers are added the same way as Twilio numbers- church_voice_agents table:
twilio_phone_numbercolumn stores the E.164 number (Twilio or Telnyx — the column name is a legacy label)
What Stays Twilio
Demo lines and sales numbers are currently Twilio numbers. These may or may not migrate:
- Toll-free (
+18886030316) — may stay Twilio (toll-free porting is complex) - Demo lines (
+14696152221,+13658254095) — candidate for Telnyx
Migration Path (When Ready)
- Set up Telnyx account and elastic SIP trunk
- Configure trunk to forward calls to LiveKit Cloud SIP endpoint
- For each new church signup: provision number via Telnyx API, update
church_voice_agents - For existing Twilio numbers: port when contracts allow
- No voice agent code changes required
Current State
Telnyx is not yet set up. All numbers are Twilio. The main.py docstring already acknowledges Telnyx as a planned path:
# Phone call → Twilio/Telnyx SIP trunk → LiveKit Cloud SIP gateway
Outage Response
See runbooks/voice-ops/cartesia-outage.md for the runbook covering LiveKit Cloud, Cartesia TTS, and Railway worker outages.
Quick Reference
| Failure | Impact | Response |
|---|---|---|
| LiveKit Cloud SIP down | All calls drop | Check LiveKit status; calls fail silently to caller |
| Deepgram STT down | STT fails; agents can't hear callers | No in-code fallback yet; monitor Deepgram status |
| Cartesia TTS down | No audio output | No in-code fallback yet; monitor Cartesia status |
| Google Gemini down | Coordinator/Sales agents fail | No in-code fallback yet; Care Agent (Haiku) still works |
| Anthropic Claude down | Care Agent fails | No in-code fallback yet; Coordinator still works |
| Railway worker down | New calls not answered | Redeploy Railway; existing calls may have already dropped |
| Supabase down | Church data load fails; calls fall back to Sales Agent | Stale cache serves up to 15 min; then graceful degradation |
| Twilio SIP down | No new inbound calls | No fallback until Telnyx migration complete |