Voice Agent Resilience
Overview
The voice agent uses LiveKit's native FallbackAdapter for automatic provider failover across all three provider types (LLM, TTS, STT). If a primary provider fails, the adapter transparently switches to the backup. It marks the failed provider unhealthy, routes subsequent requests to the backup, periodically health-checks the primary, and resumes using it when recovered.
Additionally, the entrypoint has crash recovery that speaks an apology message to the caller before disconnecting.
Provider Fallback Architecture
Provider Fallback Chains
LLM (per-agent FallbackAdapter)
Each agent class constructs its own llm.FallbackAdapter with the correct provider priority. This restores the cross-fallback design from the Cartesia LINE SDK (LlmConfig.fallbacks) that was lost during the LiveKit migration.
| Agent | Primary | Fallback |
|---|---|---|
| CoordinatorAgent | Gemini 2.5 Flash | Claude Haiku 4.5 |
| CareAgent | Claude Haiku 4.5 | Gemini 2.5 Flash |
| SalesAgent | Gemini 2.5 Flash | Claude Haiku 4.5 |
| DemoAgent | Gemini 2.5 Flash | Claude Haiku 4.5 |
| DemoRouterAgent | Gemini 2.5 Flash | Claude Haiku 4.5 |
Set in verticals/church/agents.py (_coordinator_llm(), _care_llm()) and verticals/sales/agents.py (_sales_llm()). Passed to super().__init__(llm=...).
TTS (session-level + per-agent override)
| Scope | Primary | Fallback |
|---|---|---|
| Session (Coordinator, Sales) | Cartesia Sonic 3 (church voice) | Google TTS |
| CareAgent override | Cartesia Sonic 3 (care voice) | Google TTS |
Set in main.py (_run_call()) for session-level, and verticals/church/agents.py (_care_tts()) for CareAgent.
When CareAgent becomes active via handoff, its per-agent TTS FallbackAdapter replaces the session-level one. This preserves the voice-switching design (different gender for Care).
STT (session-level only)
| Primary | Fallback |
|---|---|
| Deepgram Nova-3 | Google STT |
Set in main.py (_run_call()). No agent overrides STT.
Crash Recovery
If any unhandled exception occurs during call setup or the main call pipeline, the entrypoint catches it and:
- Logs the full exception with traceback
- Creates a minimal TTS-only
AgentSession(no LLM/STT needed) - Speaks: "We're sorry, we're experiencing technical difficulties right now. Please try calling back in a few minutes, or contact the church directly. We apologize for the inconvenience."
- Waits 1 second for TTS to finish playing
- Deletes the LiveKit room (cleanly disconnects the call)
If even TTS fails (e.g., Cartesia is completely down), it logs the error and still cleans up the room (caller gets silence + disconnect, but no stuck room).
Implemented in main.py: _speak_error_and_hangup() and the try/except wrapper around _run_call().
Post-Call Classification Fallback
Separate from the live conversation fallbacks, post-call classification has its own chain:
- Gemini 2.5 Flash (primary)
- Gemini 2.0 Flash (older model, same provider)
- Keyword-based heuristic (no LLM — detects crisis, pastoral, prayer patterns)
Implemented in session.py classify_call().
What Is NOT Covered
- No redundant telephony path. If Telnyx/Twilio or LiveKit's SIP gateway is down, calls fail at the carrier level (busy signal or "not in service"). This is outside application control.
- No geographic failover. The agent runs in
us-eastonly. LiveKit Cloud handles infrastructure redundancy within that region. - No automatic caller callback. If a call fails, the system does not attempt to call the person back.
Monitoring
| Mechanism | What it detects | Frequency |
|---|---|---|
Voice health cron (/api/cron/voice-health) | Missing trunks, stuck rooms, schema drift | Every 15 min |
| LiveKit Cloud dashboard | Agent online/offline, room count | Real-time |
| Loguru logging | All provider errors, fallback activations | Real-time (LiveKit logs) |
| FallbackAdapter internal | Provider health state (healthy/unhealthy) | Automatic |
History
- Pre-March 2026: Cartesia LINE SDK had
LlmConfig(fallbacks=[...])for LLM failover only - March 26, 2026: LiveKit migration — fallbacks lost (single provider per service)
- April 1, 2026: Crash recovery added (
_speak_error_and_hangup) - April 1, 2026: Full fallback chains restored via LiveKit
FallbackAdapter(LLM + TTS + STT)