5-Question AI-Powered Goal-Based Testing
Why This Exists
Traditional testing asks "does it work?" Goal-based testing asks "does it succeed?" A page can load without errors, pass every Playwright assertion, display all the right elements, and still fail the customer. The button exists but the pastor doesn't see it. The pricing is accurate but incomprehensible to a board of deacons. The chatbot replies but doesn't move the visitor toward joining a small group.
This methodology bridges the gap between mechanical correctness (Layer A) and actual customer outcomes (Layer C) by adding a human-judgment layer that only AI can scale.
The 3-Layer Testing Architecture
Layer A: Mechanical / Playwright
"Does the page load? Are there JS errors? Do links work?"
Tools: Playwright specs, lighthouse, axe-core
Frequency: Every deploy (CI/CD)
Layer B: AI Goal-Based (THIS DOCUMENT)
"Would a real person achieve their goal on this page?"
Tools: 5-Question Framework + journey YAML + AI agent evaluation
Frequency: Weekly + before launch + after major changes
Layer C: Outcome Verification
"Did the database record get created? Did the email send? Did Stripe charge?"
Tools: Supabase queries, Stripe API, email receipt checks
Frequency: Per-journey (automated after Layer B)
Layer B is the missing middle. Layer A catches broken pages. Layer C catches broken backends. Layer B catches broken experiences.
The 5 Questions
At every step of every user journey, the AI agent asks these five questions:
Q1: "What do I SEE?" (Observation)
Take a screenshot. Describe what human eyes register in the first 3 seconds:
- What's the dominant visual element?
- What text is largest/boldest?
- What colors draw the eye?
- Is there a clear visual hierarchy?
- What is above the fold vs. below?
Output: A plain-language description of the page as a first-time visitor would perceive it. Not HTML structure -- visual impression.
Example: "A dark hero section with a gold-accented headline reading 'Your Church, Always Available.' Below it, a subheading about AI-powered voice and chat. Two CTA buttons: gold 'Start Free Trial' and outlined 'Watch Demo.' Three trust logos below the fold."
Q2: "Does this match what SHOULD be here?" (Spec Compliance)
Compare the observation from Q1 against the acceptance spec (from knowledge/acceptance/) line by line:
- Is every required element present?
- Is every forbidden element absent?
- Do prices match
PRICING.md? - Do feature claims match
features.yaml? - Are tier-gated elements correctly shown/hidden?
Output: A pass/fail checklist with specific citations to the spec.
Example:
[PASS] Hero heading mentions "church" and "AI" — spec A01 requires both
[PASS] Starter price shows $14.95/mo — matches PRICING.md
[FAIL] Pricing card says "4 agents" — spec says 2 agents (Coordinator + Care)
[PASS] "Most Popular" badge on Pro tier — spec requires recommendation indicator
Q3: "If I were [this persona], would I know what to do next?" (Persona Empathy)
Step into the shoes of the journey's assigned persona. Consider their:
- Tech comfort level — Will they understand what to click?
- Key concern — Is their primary worry addressed on this page?
- Context — How did they get here? What are they expecting?
- Patience — How many clicks will they tolerate before leaving?
- Trust — Does this page build or erode trust?
Output: A persona-voiced assessment.
Example: "Pastor Ruth (62, tiny rural church, low tech comfort) landed on the pricing page. She sees 6 pricing cards and feels overwhelmed. She doesn't know what 'RAG tools' means. She's looking for something simple and cheap. The $14.95 Starter plan is buried below the fold. She would probably leave."
Q4: "Forget the spec -- what would make this BETTER?" (AI Creative Judgment)
Set aside the acceptance spec. Using AI judgment about UX best practices, conversion optimization, and empathy, identify improvements that the spec didn't think of:
- Is the copy compelling or generic?
- Could the layout be clearer?
- Is there unnecessary friction?
- Are emotional triggers appropriate?
- Is there a missed opportunity for trust-building?
This is floor vs. ceiling thinking. Q2 checks the floor (minimum spec). Q4 aims for the ceiling (maximum impact).
Output: Concrete, actionable improvement suggestions ranked by estimated impact.
Example: "The pricing page meets spec but could be significantly better. (1) Add a 'Which plan is right for me?' quiz -- board-level buyers hate choosing without guidance. (2) The annual pricing toggle is easy to miss; make it a pill switch above the cards. (3) Add a '30-second summary' line under each plan name for quick scanning."
Q5: "Is this page moving me toward THE GOAL?" (Goal Achievement)
Zoom out from the individual page to the user's original intent. The goal is defined at the journey level (e.g., "Generate my first sermon using Reformed theology"). Ask:
- Am I closer to achieving my goal than I was at the previous step?
- Is the path to the goal clear from here?
- How many steps remain?
- Is there any risk I'll abandon the journey from this point?
Output: A goal-distance assessment.
Example: "Pastor Rachel's goal is to generate her first sermon. She has completed 4 of 6 steps. She has selected her tradition (Reformed), entered her Scripture text (Romans 8:28-30), and chosen a sermon structure. She is now on the generation page waiting for output. Goal achievement is on track -- one step away."
The Mechanical Layer
In addition to the 5 Questions, every page evaluation includes a mechanical check:
| Check | How | Severity if failed |
|---|---|---|
| JavaScript errors | Browser console log | SPEC VIOLATION |
| Broken images | naturalWidth === 0 | SPEC VIOLATION |
| 404 links | Click all links, check status | SPEC VIOLATION |
| Load time | performance.timing < 3s | PERSONA RISK |
| Mobile responsive | Viewport 375x812 | PERSONA RISK |
| Accessibility | axe-core scan | PERSONA RISK |
| HTTPS | Protocol check | SPEC VIOLATION |
| No secrets in HTML | Pattern match against SECRET_PATTERNS | SPEC VIOLATION |
The mechanical layer runs automatically. It does not require AI judgment. It catches what the 5 Questions assume is already working.
Finding Severity Levels
Every finding from the 5-Question evaluation gets one of four severity levels:
| Severity | Definition | Action | Example |
|---|---|---|---|
| SPEC VIOLATION | The page contradicts the acceptance spec or shows incorrect data | Must fix before deploy | Pricing says $9.95 but spec says $14.95 |
| GOAL BLOCKED | The user cannot achieve their goal from this page | Must fix before deploy | "Start Free Trial" button is broken / leads to 404 |
| PERSONA RISK | A specific persona would likely abandon the journey at this point | Should fix before launch | Pastor Ruth can't find the Starter plan because it's below the fold |
| IMPROVEMENT | The page works but could deliver a better experience | Could fix (backlog) | Adding a "Which plan is right for me?" quiz |
Priority order: SPEC VIOLATION = GOAL BLOCKED > PERSONA RISK > IMPROVEMENT
When to Run
| Trigger | What runs | Who runs it |
|---|---|---|
| Every deploy (CI/CD) | Mechanical layer only | Automated (Playwright in CI) |
| Weekly QA sweep | Full 5-Question on all 10 journeys | AI agent via /qa goals |
| Before launch | Full audit -- all journeys, all personas, all properties | AI agent + founder review |
| After major changes | Affected journeys only | AI agent triggered by change detection |
| New feature ships | New journey created + run | AI agent + founder approval |
Journey Definition Format
Journeys are defined as YAML files in knowledge/tests/journeys/. Each file describes a complete user journey from entry to outcome.
journey: kebab-case-name
goal: "User's words -- what they're trying to accomplish"
persona:
name: Name
age: 30
role: Role description
tech_comfort: low|medium|high
key_concern: "What matters most to them"
property: domain.com
entry_point: https://domain.com/page
steps:
- page: page-name
url: /path
action: "What the user does -- click, type, scroll, etc."
spec_ref: "acceptance/spec-file.md#section"
q2_spec: "What the spec says must be here"
q3_persona: "Key persona concern at this step"
q5_goal: "Is this moving toward the goal?"
- page: next-page
url: /next-path
action: "Next action"
spec_ref: "acceptance/spec-file.md#section"
q2_spec: "Expected elements"
q3_persona: "Persona worry"
q5_goal: "Goal progress"
outcome_verification:
- type: page_content|database|email|api
check: "What proves the goal was achieved"
failure_modes:
- "What could go wrong at each step"
last_run: null
last_result: null
Field Reference
| Field | Required | Description |
|---|---|---|
journey | Yes | Unique kebab-case identifier |
goal | Yes | The user's intent in their own words |
persona | Yes | Who is attempting this journey |
property | Yes | Which domain this journey tests |
entry_point | Yes | Full URL where the journey begins |
steps | Yes | Ordered list of page visits and actions |
steps[].page | Yes | Human-readable page name |
steps[].url | Yes | URL path for this step |
steps[].action | Yes | What the user does on this page |
steps[].spec_ref | No | Link to acceptance spec section |
steps[].q2_spec | No | What the spec says should be here |
steps[].q3_persona | Yes | The persona's concern at this step |
steps[].q5_goal | Yes | Goal proximity assessment |
outcome_verification | Yes | How to confirm the goal was achieved |
failure_modes | Yes | Known risks at each step |
last_run | Auto | ISO timestamp of last execution |
last_result | Auto | pass/fail/partial + finding count |
How to Add New Journeys
-
Identify a real user goal that isn't covered by existing journeys. Ask: "What is a real person trying to accomplish that we haven't tested?"
-
Choose or create a persona. The persona should represent the most challenging user for this journey (lowest tech comfort, highest skepticism, most edge-case constraints).
-
Walk the journey yourself on the production site. Document every page, every click, every decision point.
-
Create the YAML file in
knowledge/tests/journeys/following the format above. -
Map to acceptance specs. For each step, reference the relevant acceptance spec section in
knowledge/acceptance/. If no spec exists for a step, flag it. -
Define outcome verification. What proves the goal was achieved? A database record? A page rendering correctly? An email received?
-
List failure modes. What could go wrong? 404 pages, confusing copy, missing CTAs, tier restrictions, broken forms.
-
Register the journey in
knowledge/tests/baselines/suite-baselines.jsonunder thejourneyskey. -
Run it via
/qa goals [journey-name]or as part of the weekly sweep.
How to Add New Personas
Personas live in two places:
- Journey YAML files -- inline persona definition for journey-specific context
- Playwright spec files --
e2e/delivers/personas/for automated persona tests
Persona Design Principles
- Give them a name, age, and role. Not "User Type A" but "Pastor Ruth, 62, solo pastor of a 45-member rural church."
- Define their tech comfort. This determines what they can figure out without help.
- Define their key concern. This is the lens through which they evaluate everything.
- Make them the hardest case. The persona who would struggle most reveals the most bugs.
- Give them a backstory. "Ruth has been burned by two previous church software purchases" changes how she evaluates the product.
Current Persona Library
| Persona | Role | Property | Key Concern | Tech |
|---|---|---|---|---|
| Pastor Ruth | Solo pastor, tiny rural church | CWA | Cost, simplicity | Low |
| Board Leader Mark | Church operations chair, retired exec | CWA | ROI, security, staff adoption | High |
| Pastor Maria | Catholic priest needing homily | SermonWise | Theological accuracy, liturgical calendar | Medium |
| Deacon Bob | Board evaluator, retired engineer | CWA | Defensible recommendation | High |
| Karen | Church admin, former compliance officer | CWA | Data privacy, AI disclosure | High |
| Youth Pastor Jake | Youth ministry, mobile-first | CWA | Mobile experience, relevance | High |
| Pastor Ezekiel | AI skeptic, traditional pastor | CWA | "Will this replace me?" | Low |
| Pastor Steve | Burned by past tech purchases | CWA | Trust, proof, easy cancellation | Medium |
| Committee Buyer | 5 deacons reviewing a recommendation | CWA | Quick understanding, simple pricing | Mixed |
| Mark IT Director | Mega-church IT administrator | CWA | Enterprise features, integrations | High |
| Pastor Rachel | Reformed pastor, first sermon generation | SermonWise | Theological tradition fidelity | Medium |
| Church Admin Linda | Office admin, Monday morning routine | CWA | Dashboard clarity, actionable data | Medium |
| Pastor James | Preparing Sunday sermon, needs illustration | ITW | Relevance, theological depth | Medium |
| Sarah | Unchurched, looking for a church | PewSearch | Location, denomination, welcoming vibe | Medium |
| Pastor David | Claiming his church's PewSearch listing | PewSearch | Easy claim process, control over listing | Low |
SermonWise Example Walkthrough
Journey: sermonwise-first-sermon
Goal: "I need to prepare a Reformed sermon on Romans 8:28-30 for this Sunday."
Persona: Pastor Rachel, 38, Reformed tradition, medium tech comfort
Step 1: Landing Page (sermonwise.ai)
Q1 -- What do I SEE? A clean landing page with the headline "AI-Powered Sermon Preparation -- Aligned with Your Tradition." A hero image showing a pastor at a desk. Navigation with Home, Showcase, Templates, Pricing, Login. A gold "Start Free" CTA button.
Q2 -- Does this match the spec?
[PASS] Headline mentions "tradition" -- spec requires tradition-awareness in hero
[PASS] "Start Free" CTA present -- spec requires low-friction entry
[CHECK] Does the page mention 17 traditions? Spec says tradition count must appear
Q3 -- If I were Pastor Rachel, would I know what to do next? Rachel is a Reformed pastor looking for a sermon tool that respects her tradition. She sees "Aligned with Your Tradition" and feels seen. She would click "Start Free" or look for a "Reformed" mention to confirm this tool knows her tradition. The CTA is clear.
Q4 -- What would make this BETTER? The landing page could show tradition badges (Reformed, Catholic, Baptist, etc.) in the hero area so Rachel immediately sees her tradition is supported without scrolling.
Q5 -- Is this moving toward THE GOAL? Rachel's goal is to generate a sermon. She's on the landing page. This is Step 1 of 6. She has not started yet, but the page communicates that she's in the right place. On track.
Step 2: Signup / Login
Q1: A simple signup form with email, password, and tradition selector dropdown. Google OAuth option available.
Q2: [PASS] Tradition selector present during signup [PASS] Reformed is in the dropdown [PASS] No credit card required for free tier
Q3: Rachel selects "Reformed" from the dropdown. She appreciates not needing a credit card. She signs up with her email. Low friction.
Q4: The tradition selector could show a brief description of each tradition (e.g., "Reformed -- Emphasizes God's sovereignty, TULIP, covenant theology") to build confidence.
Q5: Step 2 of 6. Signup is a necessary gate. On track.
Step 3: Dashboard / New Sermon
Q1: A dashboard with a prominent "New Sermon" button. Left sidebar with navigation. A getting-started banner for first-time users.
Q2: [PASS] "New Sermon" button visible [PASS] Getting started guide for first use [CHECK] Is tradition shown in the header/profile area?
Q3: Rachel clicks "New Sermon" immediately. The getting-started guide is helpful but she already knows what she wants. She wants to get to the generation form fast.
Q4: For power users like Rachel, add a keyboard shortcut (Cmd+N) for new sermon. Show her tradition badge in the header so she knows it's remembered.
Q5: Step 3 of 6. She's now entering the creation flow. On track.
Step 4: Sermon Configuration
Q1: A form with fields for Scripture reference, sermon title (optional), structure type (expository, topical, narrative), length target, and tradition confirmation (pre-filled as Reformed).
Q2: [PASS] Scripture input field present [PASS] Tradition pre-filled from signup [PASS] Structure options include expository (required for Reformed) [CHECK] Does it support verse-range input like "Romans 8:28-30"?
Q3: Rachel enters "Romans 8:28-30" and sees "Expository" already selected (appropriate for Reformed). She adjusts length to 25 minutes. She feels confident this will produce something tradition-appropriate.
Q4: Show a preview of what the structure will look like ("Introduction > Historical Context > Verse-by-verse Exposition > Application > Conclusion") before generating, so Rachel can adjust.
Q5: Step 4 of 6. Configuration complete, generation next. On track.
Step 5: Sermon Generation
Q1: A loading indicator ("Crafting your sermon...") followed by a generated sermon with clear sections: Title, Introduction, Body (verse-by-verse exposition of Romans 8:28-30), Application, Conclusion.
Q2: [PASS] Sermon generates without error [CHECK] Does the sermon reference Reformed distinctives (sovereignty, predestination, perseverance)? [CHECK] Is the ESV translation used (standard for Reformed)?
Q3: Rachel reads the introduction and immediately checks for theological accuracy. Does it handle predestination (v.29-30) with the nuance her congregation expects? Does it mention the golden chain of redemption? She's reading critically, not casually.
Q4: Add inline theological notes (expandable) that explain why specific Reformed framings were chosen. This builds trust and helps Rachel understand the AI's reasoning.
Q5: Step 5 of 6. The sermon exists. Goal nearly achieved. Rachel needs to review and possibly edit.
Step 6: Review, Edit, and Export
Q1: The generated sermon displayed in an editable rich text editor. Export options visible: Copy, Download PDF, Download DOCX. A "Refine" button for AI-assisted revision.
Q2: [PASS] Edit capability present [PASS] Export to PDF available [CHECK] Does the PDF include proper formatting (headings, Scripture references)? [CHECK] Is there a "Save to Library" option?
Q3: Rachel makes minor edits to the application section, adjusting it for her specific congregation. She exports to PDF for her sermon notes binder. She's satisfied.
Q4: Add a "Share with Elders" option -- Reformed churches often have elder review of sermons. A shareable link (read-only) would serve this workflow.
Q5: Step 6 of 6. GOAL ACHIEVED. Pastor Rachel has a Reformed sermon on Romans 8:28-30 ready for Sunday.
Integration with QA Orchestrator
The QA Orchestrator skill (/qa) supports goal-based testing via the goals domain:
/qa goals -- Run all 10 journeys
/qa goals sermonwise -- Run SermonWise journeys only
/qa goals cwa -- Run ChurchWiseAI journeys only
/qa goals pewsearch -- Run PewSearch journeys only
/qa goals itw -- Run ITW journeys only
/qa goals cross -- Run cross-property journeys only
/qa goals sermonwise-first-sermon -- Run a single journey by name
Execution Flow
- Load journey YAML from
knowledge/tests/journeys/ - For each step: a. Navigate to the URL b. Take a screenshot c. Run the Mechanical Layer checks d. Apply all 5 Questions e. Log findings with severity levels
- Run outcome verification (database checks, API calls, page content)
- Generate report with pass/fail per step, all findings, and overall journey status
- Update baselines in
suite-baselines.json
Report Format
JOURNEY: sermonwise-first-sermon
PERSONA: Pastor Rachel (38, Reformed, medium tech)
GOAL: Generate a Reformed sermon on Romans 8:28-30
STATUS: PASS (6/6 steps passed)
Step 1: Landing Page ................. PASS
Q1: Hero communicates tradition-aware sermon tool
Q2: 3/3 spec checks passed
Q3: Rachel would click "Start Free" -- clear CTA
Q5: On track (1/6)
Step 2: Signup ...................... PASS
Q1: Clean signup with tradition selector
Q2: 3/3 spec checks passed
Q3: Low friction, no credit card required
Q5: On track (2/6)
...
FINDINGS:
0 SPEC VIOLATIONS
0 GOAL BLOCKED
1 PERSONA RISK: Tradition badges missing from hero (Step 1, Q4)
2 IMPROVEMENTS: Keyboard shortcut for new sermon, elder sharing option
OUTCOME VERIFICATION:
[PASS] Sermon content includes Reformed theological markers
[PASS] Export to PDF generates valid document
Relationship to Expected Output Methodology
The Expected Output Methodology (knowledge/processes/expected-output-methodology.md) defines what each page SHOULD look like for each customer tier. It is the spec that Q2 checks against.
The 5-Question Framework (this document) defines HOW to evaluate pages beyond mere spec compliance. It adds persona empathy (Q3), creative improvement (Q4), and goal-tracking (Q5) on top of spec checking (Q2).
They work together:
Expected Output Spec --> "The pricing page must show $14.95 for Starter"
5-Question Framework --> "The pricing page shows $14.95 (Q2 PASS),
but Pastor Ruth can't find it because it's
below the fold (Q3 PERSONA RISK),
and a 'Which plan?' quiz would help (Q4 IMPROVEMENT),
and she's 2 steps from signup but at risk
of abandoning (Q5 AT RISK)"
Without the Expected Output Spec, Q2 has nothing to check against. Without the 5-Question Framework, the spec is just a checklist with no human judgment.
File Locations
| What | Where |
|---|---|
| This methodology | knowledge/processes/5-question-testing.md |
| Journey YAML files | knowledge/tests/journeys/*.yaml |
| Journey baselines | knowledge/tests/baselines/suite-baselines.json |
| Acceptance specs | knowledge/acceptance/*.md |
| Persona Playwright tests | churchwiseai-web/e2e/delivers/personas/*.spec.ts |
| Existing journey Playwright tests | churchwiseai-web/e2e/journeys/*.spec.ts |
| QA Orchestrator skill | /qa goals |