5-Question AI-Powered Goal-Based Testing

Why This Exists

Traditional testing asks "does it work?" Goal-based testing asks "does it succeed?" A page can load without errors, pass every Playwright assertion, display all the right elements, and still fail the customer. The button exists but the pastor doesn't see it. The pricing is accurate but incomprehensible to a board of deacons. The chatbot replies but doesn't move the visitor toward joining a small group.

This methodology bridges the gap between mechanical correctness (Layer A) and actual customer outcomes (Layer C) by adding a human-judgment layer that only AI can scale.

The 3-Layer Testing Architecture

Layer A: Mechanical / Playwright
  "Does the page load? Are there JS errors? Do links work?"
  Tools: Playwright specs, lighthouse, axe-core
  Frequency: Every deploy (CI/CD)

Layer B: AI Goal-Based (THIS DOCUMENT)
  "Would a real person achieve their goal on this page?"
  Tools: 5-Question Framework + journey YAML + AI agent evaluation
  Frequency: Weekly + before launch + after major changes

Layer C: Outcome Verification
  "Did the database record get created? Did the email send? Did Stripe charge?"
  Tools: Supabase queries, Stripe API, email receipt checks
  Frequency: Per-journey (automated after Layer B)

Layer B is the missing middle. Layer A catches broken pages. Layer C catches broken backends. Layer B catches broken experiences.

The 5 Questions

At every step of every user journey, the AI agent asks these five questions:

Q1: "What do I SEE?" (Observation)

Take a screenshot. Describe what human eyes register in the first 3 seconds:

What's the dominant visual element?
What text is largest/boldest?
What colors draw the eye?
Is there a clear visual hierarchy?
What is above the fold vs. below?

Output: A plain-language description of the page as a first-time visitor would perceive it. Not HTML structure -- visual impression.

Example: "A dark hero section with a gold-accented headline reading 'Your Church, Always Available.' Below it, a subheading about AI-powered voice and chat. Two CTA buttons: gold 'Start Free Trial' and outlined 'Watch Demo.' Three trust logos below the fold."

Q2: "Does this match what SHOULD be here?" (Spec Compliance)

Compare the observation from Q1 against the acceptance spec (from knowledge/acceptance/) line by line:

Is every required element present?
Is every forbidden element absent?
Do prices match PRICING.md?
Do feature claims match features.yaml?
Are tier-gated elements correctly shown/hidden?

Output: A pass/fail checklist with specific citations to the spec.

Example:

[PASS] Hero heading mentions "church" and "AI" — spec A01 requires both
[PASS] Starter price shows $14.95/mo — matches PRICING.md
[FAIL] Pricing card says "4 agents" — spec says 2 agents (Coordinator + Care)
[PASS] "Most Popular" badge on Pro tier — spec requires recommendation indicator

Q3: "If I were [this persona], would I know what to do next?" (Persona Empathy)

Step into the shoes of the journey's assigned persona. Consider their:

Tech comfort level — Will they understand what to click?
Key concern — Is their primary worry addressed on this page?
Context — How did they get here? What are they expecting?
Patience — How many clicks will they tolerate before leaving?
Trust — Does this page build or erode trust?

Output: A persona-voiced assessment.

Example: "Pastor Ruth (62, tiny rural church, low tech comfort) landed on the pricing page. She sees 6 pricing cards and feels overwhelmed. She doesn't know what 'RAG tools' means. She's looking for something simple and cheap. The $14.95 Starter plan is buried below the fold. She would probably leave."

Q4: "Forget the spec -- what would make this BETTER?" (AI Creative Judgment)

Set aside the acceptance spec. Using AI judgment about UX best practices, conversion optimization, and empathy, identify improvements that the spec didn't think of:

Is the copy compelling or generic?
Could the layout be clearer?
Is there unnecessary friction?
Are emotional triggers appropriate?
Is there a missed opportunity for trust-building?

This is floor vs. ceiling thinking. Q2 checks the floor (minimum spec). Q4 aims for the ceiling (maximum impact).

Output: Concrete, actionable improvement suggestions ranked by estimated impact.

Example: "The pricing page meets spec but could be significantly better. (1) Add a 'Which plan is right for me?' quiz -- board-level buyers hate choosing without guidance. (2) The annual pricing toggle is easy to miss; make it a pill switch above the cards. (3) Add a '30-second summary' line under each plan name for quick scanning."

Q5: "Is this page moving me toward THE GOAL?" (Goal Achievement)

Zoom out from the individual page to the user's original intent. The goal is defined at the journey level (e.g., "Generate my first sermon using Reformed theology"). Ask:

Am I closer to achieving my goal than I was at the previous step?
Is the path to the goal clear from here?
How many steps remain?
Is there any risk I'll abandon the journey from this point?

Output: A goal-distance assessment.

Example: "Pastor Rachel's goal is to generate her first sermon. She has completed 4 of 6 steps. She has selected her tradition (Reformed), entered her Scripture text (Romans 8:28-30), and chosen a sermon structure. She is now on the generation page waiting for output. Goal achievement is on track -- one step away."

The Mechanical Layer

In addition to the 5 Questions, every page evaluation includes a mechanical check:

Check	How	Severity if failed
JavaScript errors	Browser console log	SPEC VIOLATION
Broken images	`naturalWidth === 0`	SPEC VIOLATION
404 links	Click all links, check status	SPEC VIOLATION
Load time	`performance.timing` < 3s	PERSONA RISK
Mobile responsive	Viewport 375x812	PERSONA RISK
Accessibility	axe-core scan	PERSONA RISK
HTTPS	Protocol check	SPEC VIOLATION
No secrets in HTML	Pattern match against `SECRET_PATTERNS`	SPEC VIOLATION

The mechanical layer runs automatically. It does not require AI judgment. It catches what the 5 Questions assume is already working.

Finding Severity Levels

Every finding from the 5-Question evaluation gets one of four severity levels:

Severity	Definition	Action	Example
SPEC VIOLATION	The page contradicts the acceptance spec or shows incorrect data	Must fix before deploy	Pricing says $9.95 but spec says $14.95
GOAL BLOCKED	The user cannot achieve their goal from this page	Must fix before deploy	"Start Free Trial" button is broken / leads to 404
PERSONA RISK	A specific persona would likely abandon the journey at this point	Should fix before launch	Pastor Ruth can't find the Starter plan because it's below the fold
IMPROVEMENT	The page works but could deliver a better experience	Could fix (backlog)	Adding a "Which plan is right for me?" quiz

Priority order: SPEC VIOLATION = GOAL BLOCKED > PERSONA RISK > IMPROVEMENT

When to Run

Trigger	What runs	Who runs it
Every deploy (CI/CD)	Mechanical layer only	Automated (Playwright in CI)
Weekly QA sweep	Full 5-Question on all 10 journeys	AI agent via `/qa goals`
Before launch	Full audit -- all journeys, all personas, all properties	AI agent + founder review
After major changes	Affected journeys only	AI agent triggered by change detection
New feature ships	New journey created + run	AI agent + founder approval

Journey Definition Format

Journeys are defined as YAML files in knowledge/tests/journeys/. Each file describes a complete user journey from entry to outcome.

journey: kebab-case-name
goal: "User's words -- what they're trying to accomplish"
persona:
  name: Name
  age: 30
  role: Role description
  tech_comfort: low|medium|high
  key_concern: "What matters most to them"
property: domain.com
entry_point: https://domain.com/page
steps:
  - page: page-name
    url: /path
    action: "What the user does -- click, type, scroll, etc."
    spec_ref: "acceptance/spec-file.md#section"
    q2_spec: "What the spec says must be here"
    q3_persona: "Key persona concern at this step"
    q5_goal: "Is this moving toward the goal?"
  - page: next-page
    url: /next-path
    action: "Next action"
    spec_ref: "acceptance/spec-file.md#section"
    q2_spec: "Expected elements"
    q3_persona: "Persona worry"
    q5_goal: "Goal progress"
outcome_verification:
  - type: page_content|database|email|api
    check: "What proves the goal was achieved"
failure_modes:
  - "What could go wrong at each step"
last_run: null
last_result: null

Field Reference

Field	Required	Description
`journey`	Yes	Unique kebab-case identifier
`goal`	Yes	The user's intent in their own words
`persona`	Yes	Who is attempting this journey
`property`	Yes	Which domain this journey tests
`entry_point`	Yes	Full URL where the journey begins
`steps`	Yes	Ordered list of page visits and actions
`steps[].page`	Yes	Human-readable page name
`steps[].url`	Yes	URL path for this step
`steps[].action`	Yes	What the user does on this page
`steps[].spec_ref`	No	Link to acceptance spec section
`steps[].q2_spec`	No	What the spec says should be here
`steps[].q3_persona`	Yes	The persona's concern at this step
`steps[].q5_goal`	Yes	Goal proximity assessment
`outcome_verification`	Yes	How to confirm the goal was achieved
`failure_modes`	Yes	Known risks at each step
`last_run`	Auto	ISO timestamp of last execution
`last_result`	Auto	pass/fail/partial + finding count

How to Add New Journeys

Identify a real user goal that isn't covered by existing journeys. Ask: "What is a real person trying to accomplish that we haven't tested?"
Choose or create a persona. The persona should represent the most challenging user for this journey (lowest tech comfort, highest skepticism, most edge-case constraints).
Walk the journey yourself on the production site. Document every page, every click, every decision point.
Create the YAML file in knowledge/tests/journeys/ following the format above.
Map to acceptance specs. For each step, reference the relevant acceptance spec section in knowledge/acceptance/. If no spec exists for a step, flag it.
Define outcome verification. What proves the goal was achieved? A database record? A page rendering correctly? An email received?
List failure modes. What could go wrong? 404 pages, confusing copy, missing CTAs, tier restrictions, broken forms.
Register the journey in knowledge/tests/baselines/suite-baselines.json under the journeys key.
Run it via /qa goals [journey-name] or as part of the weekly sweep.

How to Add New Personas

Personas live in two places:

Journey YAML files -- inline persona definition for journey-specific context
Playwright spec files -- e2e/delivers/personas/ for automated persona tests

Persona Design Principles

Give them a name, age, and role. Not "User Type A" but "Pastor Ruth, 62, solo pastor of a 45-member rural church."
Define their tech comfort. This determines what they can figure out without help.
Define their key concern. This is the lens through which they evaluate everything.
Make them the hardest case. The persona who would struggle most reveals the most bugs.
Give them a backstory. "Ruth has been burned by two previous church software purchases" changes how she evaluates the product.

Current Persona Library

Persona	Role	Property	Key Concern	Tech
Pastor Ruth	Solo pastor, tiny rural church	CWA	Cost, simplicity	Low
Board Leader Mark	Church operations chair, retired exec	CWA	ROI, security, staff adoption	High
Pastor Maria	Catholic priest needing homily	SermonWise	Theological accuracy, liturgical calendar	Medium
Deacon Bob	Board evaluator, retired engineer	CWA	Defensible recommendation	High
Karen	Church admin, former compliance officer	CWA	Data privacy, AI disclosure	High
Youth Pastor Jake	Youth ministry, mobile-first	CWA	Mobile experience, relevance	High
Pastor Ezekiel	AI skeptic, traditional pastor	CWA	"Will this replace me?"	Low
Pastor Steve	Burned by past tech purchases	CWA	Trust, proof, easy cancellation	Medium
Committee Buyer	5 deacons reviewing a recommendation	CWA	Quick understanding, simple pricing	Mixed
Mark IT Director	Mega-church IT administrator	CWA	Enterprise features, integrations	High
Pastor Rachel	Reformed pastor, first sermon generation	SermonWise	Theological tradition fidelity	Medium
Church Admin Linda	Office admin, Monday morning routine	CWA	Dashboard clarity, actionable data	Medium
Pastor James	Preparing Sunday sermon, needs illustration	ITW	Relevance, theological depth	Medium
Sarah	Unchurched, looking for a church	PewSearch	Location, denomination, welcoming vibe	Medium
Pastor David	Claiming his church's PewSearch listing	PewSearch	Easy claim process, control over listing	Low

SermonWise Example Walkthrough

Journey: sermonwise-first-sermon Goal: "I need to prepare a Reformed sermon on Romans 8:28-30 for this Sunday." Persona: Pastor Rachel, 38, Reformed tradition, medium tech comfort

Step 1: Landing Page (sermonwise.ai)

Q1 -- What do I SEE? A clean landing page with the headline "AI-Powered Sermon Preparation -- Aligned with Your Tradition." A hero image showing a pastor at a desk. Navigation with Home, Showcase, Templates, Pricing, Login. A gold "Start Free" CTA button.

Q2 -- Does this match the spec?

[PASS] Headline mentions "tradition" -- spec requires tradition-awareness in hero
[PASS] "Start Free" CTA present -- spec requires low-friction entry
[CHECK] Does the page mention 17 traditions? Spec says tradition count must appear

Q3 -- If I were Pastor Rachel, would I know what to do next? Rachel is a Reformed pastor looking for a sermon tool that respects her tradition. She sees "Aligned with Your Tradition" and feels seen. She would click "Start Free" or look for a "Reformed" mention to confirm this tool knows her tradition. The CTA is clear.

Q4 -- What would make this BETTER? The landing page could show tradition badges (Reformed, Catholic, Baptist, etc.) in the hero area so Rachel immediately sees her tradition is supported without scrolling.

Q5 -- Is this moving toward THE GOAL? Rachel's goal is to generate a sermon. She's on the landing page. This is Step 1 of 6. She has not started yet, but the page communicates that she's in the right place. On track.

Q1: A simple signup form with email, password, and tradition selector dropdown. Google OAuth option available.

Q2: [PASS] Tradition selector present during signup [PASS] Reformed is in the dropdown [PASS] No credit card required for free tier

Q3: Rachel selects "Reformed" from the dropdown. She appreciates not needing a credit card. She signs up with her email. Low friction.

Q4: The tradition selector could show a brief description of each tradition (e.g., "Reformed -- Emphasizes God's sovereignty, TULIP, covenant theology") to build confidence.

Q5: Step 2 of 6. Signup is a necessary gate. On track.

Step 3: Dashboard / New Sermon

Q1: A dashboard with a prominent "New Sermon" button. Left sidebar with navigation. A getting-started banner for first-time users.

Q2: [PASS] "New Sermon" button visible [PASS] Getting started guide for first use [CHECK] Is tradition shown in the header/profile area?

Q3: Rachel clicks "New Sermon" immediately. The getting-started guide is helpful but she already knows what she wants. She wants to get to the generation form fast.

Q4: For power users like Rachel, add a keyboard shortcut (Cmd+N) for new sermon. Show her tradition badge in the header so she knows it's remembered.

Q5: Step 3 of 6. She's now entering the creation flow. On track.

Step 4: Sermon Configuration

Q1: A form with fields for Scripture reference, sermon title (optional), structure type (expository, topical, narrative), length target, and tradition confirmation (pre-filled as Reformed).

Q2: [PASS] Scripture input field present [PASS] Tradition pre-filled from signup [PASS] Structure options include expository (required for Reformed) [CHECK] Does it support verse-range input like "Romans 8:28-30"?

Q3: Rachel enters "Romans 8:28-30" and sees "Expository" already selected (appropriate for Reformed). She adjusts length to 25 minutes. She feels confident this will produce something tradition-appropriate.

Q4: Show a preview of what the structure will look like ("Introduction > Historical Context > Verse-by-verse Exposition > Application > Conclusion") before generating, so Rachel can adjust.

Q5: Step 4 of 6. Configuration complete, generation next. On track.

Step 5: Sermon Generation

Q1: A loading indicator ("Crafting your sermon...") followed by a generated sermon with clear sections: Title, Introduction, Body (verse-by-verse exposition of Romans 8:28-30), Application, Conclusion.

Q2: [PASS] Sermon generates without error [CHECK] Does the sermon reference Reformed distinctives (sovereignty, predestination, perseverance)? [CHECK] Is the ESV translation used (standard for Reformed)?

Q3: Rachel reads the introduction and immediately checks for theological accuracy. Does it handle predestination (v.29-30) with the nuance her congregation expects? Does it mention the golden chain of redemption? She's reading critically, not casually.

Q4: Add inline theological notes (expandable) that explain why specific Reformed framings were chosen. This builds trust and helps Rachel understand the AI's reasoning.

Q5: Step 5 of 6. The sermon exists. Goal nearly achieved. Rachel needs to review and possibly edit.

Step 6: Review, Edit, and Export

Q1: The generated sermon displayed in an editable rich text editor. Export options visible: Copy, Download PDF, Download DOCX. A "Refine" button for AI-assisted revision.

Q2: [PASS] Edit capability present [PASS] Export to PDF available [CHECK] Does the PDF include proper formatting (headings, Scripture references)? [CHECK] Is there a "Save to Library" option?

Q3: Rachel makes minor edits to the application section, adjusting it for her specific congregation. She exports to PDF for her sermon notes binder. She's satisfied.

Q4: Add a "Share with Elders" option -- Reformed churches often have elder review of sermons. A shareable link (read-only) would serve this workflow.

Q5: Step 6 of 6. GOAL ACHIEVED. Pastor Rachel has a Reformed sermon on Romans 8:28-30 ready for Sunday.

Integration with QA Orchestrator

The QA Orchestrator skill (/qa) supports goal-based testing via the goals domain:

/qa goals                      -- Run all 10 journeys
/qa goals sermonwise           -- Run SermonWise journeys only
/qa goals cwa                  -- Run ChurchWiseAI journeys only
/qa goals pewsearch            -- Run PewSearch journeys only
/qa goals itw                  -- Run ITW journeys only
/qa goals cross                -- Run cross-property journeys only
/qa goals sermonwise-first-sermon  -- Run a single journey by name

Execution Flow

Load journey YAML from knowledge/tests/journeys/
For each step: a. Navigate to the URL b. Take a screenshot c. Run the Mechanical Layer checks d. Apply all 5 Questions e. Log findings with severity levels
Run outcome verification (database checks, API calls, page content)
Generate report with pass/fail per step, all findings, and overall journey status
Update baselines in suite-baselines.json

Report Format

JOURNEY: sermonwise-first-sermon
PERSONA: Pastor Rachel (38, Reformed, medium tech)
GOAL: Generate a Reformed sermon on Romans 8:28-30
STATUS: PASS (6/6 steps passed)

Step 1: Landing Page ................. PASS
  Q1: Hero communicates tradition-aware sermon tool
  Q2: 3/3 spec checks passed
  Q3: Rachel would click "Start Free" -- clear CTA
  Q5: On track (1/6)

Step 2: Signup ...................... PASS
  Q1: Clean signup with tradition selector
  Q2: 3/3 spec checks passed
  Q3: Low friction, no credit card required
  Q5: On track (2/6)

...

FINDINGS:
  0 SPEC VIOLATIONS
  0 GOAL BLOCKED
  1 PERSONA RISK: Tradition badges missing from hero (Step 1, Q4)
  2 IMPROVEMENTS: Keyboard shortcut for new sermon, elder sharing option

OUTCOME VERIFICATION:
  [PASS] Sermon content includes Reformed theological markers
  [PASS] Export to PDF generates valid document

Relationship to Expected Output Methodology

The Expected Output Methodology (knowledge/processes/expected-output-methodology.md) defines what each page SHOULD look like for each customer tier. It is the spec that Q2 checks against.

The 5-Question Framework (this document) defines HOW to evaluate pages beyond mere spec compliance. It adds persona empathy (Q3), creative improvement (Q4), and goal-tracking (Q5) on top of spec checking (Q2).

They work together:

Expected Output Spec   -->  "The pricing page must show $14.95 for Starter"
5-Question Framework   -->  "The pricing page shows $14.95 (Q2 PASS),
                             but Pastor Ruth can't find it because it's
                             below the fold (Q3 PERSONA RISK),
                             and a 'Which plan?' quiz would help (Q4 IMPROVEMENT),
                             and she's 2 steps from signup but at risk
                             of abandoning (Q5 AT RISK)"

Without the Expected Output Spec, Q2 has nothing to check against. Without the 5-Question Framework, the spec is just a checklist with no human judgment.

File Locations

What	Where
This methodology	`knowledge/processes/5-question-testing.md`
Journey YAML files	`knowledge/tests/journeys/*.yaml`
Journey baselines	`knowledge/tests/baselines/suite-baselines.json`
Acceptance specs	`knowledge/acceptance/*.md`
Persona Playwright tests	`churchwiseai-web/e2e/delivers/personas/*.spec.ts`
Existing journey Playwright tests	`churchwiseai-web/e2e/journeys/*.spec.ts`
QA Orchestrator skill	`/qa goals`

Why This Exists​

The 3-Layer Testing Architecture​

The 5 Questions​

Q1: "What do I SEE?" (Observation)​

Q2: "Does this match what SHOULD be here?" (Spec Compliance)​

Q3: "If I were [this persona], would I know what to do next?" (Persona Empathy)​

Q4: "Forget the spec -- what would make this BETTER?" (AI Creative Judgment)​

Q5: "Is this page moving me toward THE GOAL?" (Goal Achievement)​

The Mechanical Layer​

Finding Severity Levels​

When to Run​

Journey Definition Format​

Field Reference​

How to Add New Journeys​

How to Add New Personas​

Persona Design Principles​

Current Persona Library​

SermonWise Example Walkthrough​

Step 1: Landing Page (sermonwise.ai)​

Step 2: Signup / Login​

Step 3: Dashboard / New Sermon​

Step 4: Sermon Configuration​

Step 5: Sermon Generation​

Step 6: Review, Edit, and Export​

Integration with QA Orchestrator​

Execution Flow​

Report Format​

Relationship to Expected Output Methodology​

File Locations​