Expected Output Specification Methodology

The Problem This Solves

The knowledge system documents what the system DOES (features, flows, APIs). But no document defines what the customer SEES and EXPERIENCES. This gap means:

Agents build features that technically work but show wrong content for specific tiers
Tests verify code paths but not customer experience
Bugs like "Starter customer sees ElevenLabs voice picker" ship because no spec said "this should be hidden"

No one can answer "what does a Starter Chat customer's first week look like, step by step?"
The full setup journey (getting a product configured and working) is undocumented

The Three Layers

Layer 1: Feature Matrix (EXISTS — features.yaml, tier-restrictions.md)
  "Starter gets 12 tools, 2 agents, no embed widget"

Layer 2: Process Flows (EXISTS — onboarding.md, checkout-flow.md, etc.)
  "POST /api/onboard creates premium_churches row, sends email, syncs MailerLite"

Layer 3: Expected Outputs (MISSING — this is what we're building)
  The COMPLETE customer journey, documented visually:
  - Every way a customer discovers the product
  - Every screen they see from discovery → signup → email → dashboard → setup → working product
  - Every email they receive and when
  - What "success" looks like at the end — the product is live and working as expected
  - What each tier SEES vs what's hidden, at every step

How to Build an Expected Output Spec

Phase 1: Enumerate User States

For each product, list every possible customer state:

State	Plan	Channel	Status	Key Flags
Starter Chat (preview)	starter	chat	preview	chatbot_enabled, care_enabled
Starter Chat (active)	starter	chat	active	chatbot_enabled, care_enabled
Starter Voice (active)	starter	voice	active	has voice_agent row
Starter Both (active)	starter	both	active	chatbot + voice
Pro Chat (active)	pro	chat	active	all chatbot features
Pro Both (active)	pro	both	active	all features
Suite Chat (active)	suite	chat	active	everything except voice
Suite Both (active)	suite	both	active	everything
Pro Website	pro_website	chat	active	restricted chatbot, PewSearch template
Trial expired	any	any	preview (expired)	chatbot should be offline
Cancelled	any	any	cancelled	dashboard accessible, chatbot offline
Past due	any	any	past_due	grace period behavior
Free (PewSearch claim)	free	-	preview	basic chatbot only

Phase 2: Enumerate Touchpoints

For each state, walk through EVERY customer touchpoint as a sequential journey — the order a real customer would experience them.

A. Discovery Paths — How They Find This Product

Document EVERY way a customer could arrive at this product. Each path is a mini-journey:

Google search → which page do they land on?
PewSearch directory → banner/CTA they see → where it takes them
Denomination landing page (e.g., /ai-for/baptist) → CTA → destination
Blog post → CTA → destination
Pricing page → which plan card → what CTA text
Homepage → which section → what CTA
Chatbot/voice product page → CTA → destination
Peer referral / direct URL → where they land
Facebook/Instagram ad → landing page
PewSearch admin banner (existing PewSearch customer) → upsell CTA

For each path: screenshot the entry page, the CTA, and where it leads.

B. Pre-Purchase Journey (sequential) 11. Marketing/landing page they see (exact copy, images, CTA placement) 12. Onboard form Step 1: Search for church (what results look like, "already claimed" state) 13. Onboard form Step 2: Select/create church (what fields, what's pre-filled) 14. Onboard form Step 3: Contact info (what fields, plan pre-selected?) 15. Stripe checkout page (trial badge? amount shown? promo code field?) 16. Checkout success/confirmation page (what do they see immediately after paying?)

C. Email Journey (in order of receipt) 17. Pre-checkout welcome email (Resend) — subject, copy, CTA 18. Post-checkout welcome email with magic link (Resend) — subject, copy, link text 19. Stripe receipt email — what it shows 20. Lifecycle Email System (Resend + cron): Day 0 welcome + starter kit (immediate) — subject, copy, what it encourages them to do 21. Lifecycle Email System (Resend + cron): Day 2 setup nudge — subject, copy 22. Lifecycle Email System (Resend + cron): Day 7 activation check — subject, copy 23. Lifecycle Email System (Resend + cron): Day 13 trial reminder — subject, copy 24. Notification emails (prayer request received, visitor contact, callback request, care escalation)

D. First Login & Dashboard Discovery (sequential) 25. Magic link click → what page loads 26. Dashboard header (plan badge, status badge, View Page link, upgrade button) 27. Tab navigation (which tabs visible, which order) 28. Overview tab — first impression (stats cards at zero, getting started prompts, share links, upsell cards) 29. Getting Started checklist — what tasks appear, what order, progress indication

E. Setup Journey — Getting the Product Working (sequential)

This is the most critical section. Walk through every step a customer takes to get their product configured and live:

Training tab — Church Knowledge: adding church description, service times, staff info
Training tab — This Week: adding upcoming events and announcements
Training tab — FAQs: adding custom Q&A (locked for some tiers?)
Training tab — Theology: selecting denomination/tradition
Training tab — Agents: configuring agent personalities (which agents visible? voice picker?)
Training tab — Safety: reviewing crisis protocols, notification settings
Training tab — Simulator: testing the chatbot (what does the test interface look like?)
Training tab — Training Progress: checklist showing setup completion
Settings tab — Church Profile: uploading logo, editing name/description
Settings tab — Hours: adding service times and office hours
Settings tab — Notifications: configuring who receives alerts (crisis protocol visible?)
Settings tab — Integrations: connecting Cal.com, Planning Center (which locked by tier?)
Settings tab — Team: inviting team members (what roles available?)

F. Public-Facing Product Pages (what the church's visitors see) 43. Hosted chat page /chat/[slug] — layout, branding, input field 44. Care hub /care/[slug] — layout, agent cards, subscribe option 45. Care subscribe /care/[slug]/subscribe — email signup form 46. Agent-specific chat /care/[slug]/[agent] — direct agent access 47. Embed widget on church website (if applicable) — positioning, branding badge, mobile behavior 48. Pro Website vanity page (if applicable) — hero, sections, chatbot widget placement

G. Ongoing Dashboard Use 49. Calls tab (visible? call history, transcripts, tools used) 50. Requests tab (prayer requests, callbacks, visitor contacts — all in one view) 51. Care tab (care_enabled? broadcast messaging, member list) 52. Social tab (connected accounts, scheduled posts — or locked?) 53. Upgrade tab (current plan display, comparison table, upgrade buttons) 54. Analytics (if applicable — chat volume, response quality, tool usage)

H. Lifecycle Events 55. Approaching usage limit (what warning banner/email?) 56. Usage limit reached (what message, what's disabled?) 57. Trial expiring — Day 10 (email + dashboard banner) 58. Trial expired — Day 15 (chatbot offline? dashboard access? upgrade CTA?) 59. Payment failed (email, dashboard state, grace period?) 60. Cancellation (email, dashboard state, data retention, chatbot offline?) 61. Upgrade (what changes immediately? new tabs appear? features unlock?) 62. Downgrade (what gets locked? existing data preserved? upgrade CTAs appear?)

Phase 3: Define Expected Output for Each Touchpoint

For each touchpoint, the agent pre-populates a draft by reading the code and production site, then the founder confirms or corrects. This is NOT open-ended — it's "here's what I see, is this right?"

Template:

## [Touchpoint Name]
State: [which user states this applies to]
Page/Component: [URL or component name]
Screenshot: [path to screenshot in knowledge/acceptance/screenshots/]

### Should See:
- [exact element, text, link, or behavior]

### Should NOT See:
- [exact element that must be hidden/absent]

### Conditional:
- IF [condition]: show [X]
- IF [condition]: hide [Y]

### Links:
- [button/link name] → [exact URL it should go to]

### Copy:
- [exact text that should appear, especially for tier-specific messaging]

### Success Criteria:
- [what "done right" looks like for this touchpoint — the customer's expectation]

Discovery Path Template:

## Discovery Path: [Name]
Entry point: [Google ad / PewSearch banner / denomination page / etc.]
Landing URL: [exact URL they arrive at]

### Journey:
1. [What they see first] — Screenshot: [path]
2. [What CTA they click] — text: "[exact button text]"
3. [Where it takes them] — Screenshot: [path]
4. [Next action] → leads to Touchpoint [N] (onboard form)

Setup Step Template:

## Setup Step [N]: [Name]
Tab: [Training > Agents / Settings > Hours / etc.]
Page/Component: [component name]
Screenshot (before): [empty/default state]
Screenshot (after): [configured state]

### What the customer does:
- [step-by-step actions they take]

### What they see when done:
- [confirmation message, updated UI, progress indicator]

### Success Criteria:
- [what "working correctly" looks like from the customer's perspective]

Phase 4: Build Tests from Specs

Each expected output becomes one or more Playwright assertions:

// From spec: "Calls tab: MUST NOT be visible (no voice agent)"
await expect(page.getByRole('tab', { name: 'Calls' })).not.toBeVisible();

// From spec: "Header shows 'View Chat Page' linking to churchwiseai.com/chat/[slug]"
const viewPageLink = page.getByRole('link', { name: 'View Chat Page' });
await expect(viewPageLink).toBeVisible();
await expect(viewPageLink).toHaveAttribute('href', /churchwiseai\.com\/chat\//);

Phase 5: Maintain

When ANY code change affects a touchpoint:

Update the expected output spec FIRST
Update the E2E test to match
Then change the code
Run the test to verify

File Structure

Expected output specs live in knowledge/acceptance/:

knowledge/acceptance/
  README.md
  starter-chat.md         ← First one we build
  starter-voice.md
  starter-both.md
  pro-chat.md
  pro-both.md
  suite-chat.md
  suite-both.md
  pro-website.md
  free-claim.md
  trial-expired.md
  cancelled.md
  screenshots/            ← Visual documentation
    starter-chat/
      discovery-*.png
      onboard-*.png
      checkout-*.png
      email-*.png
      dashboard-*.png
      setup-*.png
      public-*.png
    pro-both/
      ...

Each spec file covers 62 touchpoints across 8 categories (A-H), each with screenshots, and follows the Phase 3 templates above.

The Interview Process

Building a spec is a 3-stage process: research, interview, verification.

Stage 1: Agent Research (subagents, no founder needed)

Before the interview, agents pre-populate the entire spec by reading code and production pages:

Discovery researcher — Visit every marketing page, landing page, denomination page, pricing page. Document every CTA that leads to this product. Screenshot each discovery path.
Dashboard researcher — Read all dashboard components for this tier. For each tab/sub-tab, document what's visible and what's hidden based on tier-config.ts, tool-config.ts, agent-type-config.ts. Screenshot the production dashboard.
Journey researcher — Walk the full journey: onboard form → checkout → email templates → first dashboard load → setup steps. Screenshot each step.
Completeness validator — Cross-check all 62 touchpoints are accounted for. Flag any gaps. Verify every property (churchwiseai.com, pewsearch.com, etc.) is covered.

Output: A DRAFT spec with every touchpoint pre-filled and screenshots captured.

Stage 2: Founder Interview (rapid yes/no/tweak)

Walk through the draft spec with the founder. For each touchpoint:

Present what was found: "Touchpoint 28: Agents tab — I see Care Agent (expandable), Coordinator (expandable), Discipleship (locked), Stewardship (locked). No voice picker. Correct?"
Founder says: yes / no, change X / also add Y
Capture corrections immediately in the spec

This turns a 90-minute open-ended interview into ~30 minutes of confirmation.

Stage 3: Test Generation & Verification

Generate Playwright E2E test from the approved spec
Run the test against production
Fix any failures — the spec is right, the code is wrong
Commit both spec and test together

Screenshot Convention

Screenshots live in knowledge/acceptance/screenshots/[tier]/:

knowledge/acceptance/screenshots/
  starter-chat/
    discovery-pricing-page.png
    onboard-step1-search.png
    onboard-step2-contact.png
    checkout-stripe.png
    email-welcome.png
    dashboard-overview-first-visit.png
    dashboard-training-agents.png
    setup-hours-before.png
    setup-hours-after.png
    chat-page-public.png
    ...
  pro-both/
    ...

Screenshots should be captured from production at the time of spec creation and updated whenever the spec is updated.

Key Principles

The FOUNDER defines expected outputs, not agents — but agents propose drafts from code research
Expected outputs are the source of truth for BOTH code and tests
If code doesn't match the spec, the code is wrong (not the spec)
If a spec is missing, no agent should build the feature until the spec exists
Specs are living documents — update them BEFORE changing code
Every touchpoint should have a screenshot — visual documentation is not optional
The full journey matters — not just individual screens, but the sequential experience from discovery to working product

The Problem This Solves​

The Three Layers​

How to Build an Expected Output Spec​

Phase 1: Enumerate User States​

Phase 2: Enumerate Touchpoints​

Phase 3: Define Expected Output for Each Touchpoint​

Phase 4: Build Tests from Specs​

Phase 5: Maintain​

File Structure​

The Interview Process​

Stage 1: Agent Research (subagents, no founder needed)​

Stage 2: Founder Interview (rapid yes/no/tweak)​

Stage 3: Test Generation & Verification​

Screenshot Convention​

Key Principles​