Skip to main content

5-Question AI-Powered Goal-Based Testing

Why This Exists

Traditional testing asks "does it work?" Goal-based testing asks "does it succeed?" A page can load without errors, pass every Playwright assertion, display all the right elements, and still fail the customer. The button exists but the pastor doesn't see it. The pricing is accurate but incomprehensible to a board of deacons. The chatbot replies but doesn't move the visitor toward joining a small group.

This methodology bridges the gap between mechanical correctness (Layer A) and actual customer outcomes (Layer C) by adding a human-judgment layer that only AI can scale.

The 3-Layer Testing Architecture

Layer A: Mechanical / Playwright
"Does the page load? Are there JS errors? Do links work?"
Tools: Playwright specs, lighthouse, axe-core
Frequency: Every deploy (CI/CD)

Layer B: AI Goal-Based (THIS DOCUMENT)
"Would a real person achieve their goal on this page?"
Tools: 5-Question Framework + journey YAML + AI agent evaluation
Frequency: Weekly + before launch + after major changes

Layer C: Outcome Verification
"Did the database record get created? Did the email send? Did Stripe charge?"
Tools: Supabase queries, Stripe API, email receipt checks
Frequency: Per-journey (automated after Layer B)

Layer B is the missing middle. Layer A catches broken pages. Layer C catches broken backends. Layer B catches broken experiences.


The 5 Questions

At every step of every user journey, the AI agent asks these five questions:

Q1: "What do I SEE?" (Observation)

Take a screenshot. Describe what human eyes register in the first 3 seconds:

  • What's the dominant visual element?
  • What text is largest/boldest?
  • What colors draw the eye?
  • Is there a clear visual hierarchy?
  • What is above the fold vs. below?

Output: A plain-language description of the page as a first-time visitor would perceive it. Not HTML structure -- visual impression.

Example: "A dark hero section with a gold-accented headline reading 'Your Church, Always Available.' Below it, a subheading about AI-powered voice and chat. Two CTA buttons: gold 'Start Free Trial' and outlined 'Watch Demo.' Three trust logos below the fold."

Q2: "Does this match what SHOULD be here?" (Spec Compliance)

Compare the observation from Q1 against the acceptance spec (from knowledge/acceptance/) line by line:

  • Is every required element present?
  • Is every forbidden element absent?
  • Do prices match PRICING.md?
  • Do feature claims match features.yaml?
  • Are tier-gated elements correctly shown/hidden?

Output: A pass/fail checklist with specific citations to the spec.

Example:

[PASS] Hero heading mentions "church" and "AI" — spec A01 requires both
[PASS] Starter price shows $14.95/mo — matches PRICING.md
[FAIL] Pricing card says "4 agents" — spec says 2 agents (Coordinator + Care)
[PASS] "Most Popular" badge on Pro tier — spec requires recommendation indicator

Q3: "If I were [this persona], would I know what to do next?" (Persona Empathy)

Step into the shoes of the journey's assigned persona. Consider their:

  • Tech comfort level — Will they understand what to click?
  • Key concern — Is their primary worry addressed on this page?
  • Context — How did they get here? What are they expecting?
  • Patience — How many clicks will they tolerate before leaving?
  • Trust — Does this page build or erode trust?

Output: A persona-voiced assessment.

Example: "Pastor Ruth (62, tiny rural church, low tech comfort) landed on the pricing page. She sees 6 pricing cards and feels overwhelmed. She doesn't know what 'RAG tools' means. She's looking for something simple and cheap. The $14.95 Starter plan is buried below the fold. She would probably leave."

Q4: "Forget the spec -- what would make this BETTER?" (AI Creative Judgment)

Set aside the acceptance spec. Using AI judgment about UX best practices, conversion optimization, and empathy, identify improvements that the spec didn't think of:

  • Is the copy compelling or generic?
  • Could the layout be clearer?
  • Is there unnecessary friction?
  • Are emotional triggers appropriate?
  • Is there a missed opportunity for trust-building?

This is floor vs. ceiling thinking. Q2 checks the floor (minimum spec). Q4 aims for the ceiling (maximum impact).

Output: Concrete, actionable improvement suggestions ranked by estimated impact.

Example: "The pricing page meets spec but could be significantly better. (1) Add a 'Which plan is right for me?' quiz -- board-level buyers hate choosing without guidance. (2) The annual pricing toggle is easy to miss; make it a pill switch above the cards. (3) Add a '30-second summary' line under each plan name for quick scanning."

Q5: "Is this page moving me toward THE GOAL?" (Goal Achievement)

Zoom out from the individual page to the user's original intent. The goal is defined at the journey level (e.g., "Generate my first sermon using Reformed theology"). Ask:

  • Am I closer to achieving my goal than I was at the previous step?
  • Is the path to the goal clear from here?
  • How many steps remain?
  • Is there any risk I'll abandon the journey from this point?

Output: A goal-distance assessment.

Example: "Pastor Rachel's goal is to generate her first sermon. She has completed 4 of 6 steps. She has selected her tradition (Reformed), entered her Scripture text (Romans 8:28-30), and chosen a sermon structure. She is now on the generation page waiting for output. Goal achievement is on track -- one step away."


The Mechanical Layer

In addition to the 5 Questions, every page evaluation includes a mechanical check:

CheckHowSeverity if failed
JavaScript errorsBrowser console logSPEC VIOLATION
Broken imagesnaturalWidth === 0SPEC VIOLATION
404 linksClick all links, check statusSPEC VIOLATION
Load timeperformance.timing < 3sPERSONA RISK
Mobile responsiveViewport 375x812PERSONA RISK
Accessibilityaxe-core scanPERSONA RISK
HTTPSProtocol checkSPEC VIOLATION
No secrets in HTMLPattern match against SECRET_PATTERNSSPEC VIOLATION

The mechanical layer runs automatically. It does not require AI judgment. It catches what the 5 Questions assume is already working.


Finding Severity Levels

Every finding from the 5-Question evaluation gets one of four severity levels:

SeverityDefinitionActionExample
SPEC VIOLATIONThe page contradicts the acceptance spec or shows incorrect dataMust fix before deployPricing says $9.95 but spec says $14.95
GOAL BLOCKEDThe user cannot achieve their goal from this pageMust fix before deploy"Start Free Trial" button is broken / leads to 404
PERSONA RISKA specific persona would likely abandon the journey at this pointShould fix before launchPastor Ruth can't find the Starter plan because it's below the fold
IMPROVEMENTThe page works but could deliver a better experienceCould fix (backlog)Adding a "Which plan is right for me?" quiz

Priority order: SPEC VIOLATION = GOAL BLOCKED > PERSONA RISK > IMPROVEMENT


When to Run

TriggerWhat runsWho runs it
Every deploy (CI/CD)Mechanical layer onlyAutomated (Playwright in CI)
Weekly QA sweepFull 5-Question on all 10 journeysAI agent via /qa goals
Before launchFull audit -- all journeys, all personas, all propertiesAI agent + founder review
After major changesAffected journeys onlyAI agent triggered by change detection
New feature shipsNew journey created + runAI agent + founder approval

Journey Definition Format

Journeys are defined as YAML files in knowledge/tests/journeys/. Each file describes a complete user journey from entry to outcome.

journey: kebab-case-name
goal: "User's words -- what they're trying to accomplish"
persona:
name: Name
age: 30
role: Role description
tech_comfort: low|medium|high
key_concern: "What matters most to them"
property: domain.com
entry_point: https://domain.com/page
steps:
- page: page-name
url: /path
action: "What the user does -- click, type, scroll, etc."
spec_ref: "acceptance/spec-file.md#section"
q2_spec: "What the spec says must be here"
q3_persona: "Key persona concern at this step"
q5_goal: "Is this moving toward the goal?"
- page: next-page
url: /next-path
action: "Next action"
spec_ref: "acceptance/spec-file.md#section"
q2_spec: "Expected elements"
q3_persona: "Persona worry"
q5_goal: "Goal progress"
outcome_verification:
- type: page_content|database|email|api
check: "What proves the goal was achieved"
failure_modes:
- "What could go wrong at each step"
last_run: null
last_result: null

Field Reference

FieldRequiredDescription
journeyYesUnique kebab-case identifier
goalYesThe user's intent in their own words
personaYesWho is attempting this journey
propertyYesWhich domain this journey tests
entry_pointYesFull URL where the journey begins
stepsYesOrdered list of page visits and actions
steps[].pageYesHuman-readable page name
steps[].urlYesURL path for this step
steps[].actionYesWhat the user does on this page
steps[].spec_refNoLink to acceptance spec section
steps[].q2_specNoWhat the spec says should be here
steps[].q3_personaYesThe persona's concern at this step
steps[].q5_goalYesGoal proximity assessment
outcome_verificationYesHow to confirm the goal was achieved
failure_modesYesKnown risks at each step
last_runAutoISO timestamp of last execution
last_resultAutopass/fail/partial + finding count

How to Add New Journeys

  1. Identify a real user goal that isn't covered by existing journeys. Ask: "What is a real person trying to accomplish that we haven't tested?"

  2. Choose or create a persona. The persona should represent the most challenging user for this journey (lowest tech comfort, highest skepticism, most edge-case constraints).

  3. Walk the journey yourself on the production site. Document every page, every click, every decision point.

  4. Create the YAML file in knowledge/tests/journeys/ following the format above.

  5. Map to acceptance specs. For each step, reference the relevant acceptance spec section in knowledge/acceptance/. If no spec exists for a step, flag it.

  6. Define outcome verification. What proves the goal was achieved? A database record? A page rendering correctly? An email received?

  7. List failure modes. What could go wrong? 404 pages, confusing copy, missing CTAs, tier restrictions, broken forms.

  8. Register the journey in knowledge/tests/baselines/suite-baselines.json under the journeys key.

  9. Run it via /qa goals [journey-name] or as part of the weekly sweep.


How to Add New Personas

Personas live in two places:

  • Journey YAML files -- inline persona definition for journey-specific context
  • Playwright spec files -- e2e/delivers/personas/ for automated persona tests

Persona Design Principles

  1. Give them a name, age, and role. Not "User Type A" but "Pastor Ruth, 62, solo pastor of a 45-member rural church."
  2. Define their tech comfort. This determines what they can figure out without help.
  3. Define their key concern. This is the lens through which they evaluate everything.
  4. Make them the hardest case. The persona who would struggle most reveals the most bugs.
  5. Give them a backstory. "Ruth has been burned by two previous church software purchases" changes how she evaluates the product.

Current Persona Library

PersonaRolePropertyKey ConcernTech
Pastor RuthSolo pastor, tiny rural churchCWACost, simplicityLow
Board Leader MarkChurch operations chair, retired execCWAROI, security, staff adoptionHigh
Pastor MariaCatholic priest needing homilySermonWiseTheological accuracy, liturgical calendarMedium
Deacon BobBoard evaluator, retired engineerCWADefensible recommendationHigh
KarenChurch admin, former compliance officerCWAData privacy, AI disclosureHigh
Youth Pastor JakeYouth ministry, mobile-firstCWAMobile experience, relevanceHigh
Pastor EzekielAI skeptic, traditional pastorCWA"Will this replace me?"Low
Pastor SteveBurned by past tech purchasesCWATrust, proof, easy cancellationMedium
Committee Buyer5 deacons reviewing a recommendationCWAQuick understanding, simple pricingMixed
Mark IT DirectorMega-church IT administratorCWAEnterprise features, integrationsHigh
Pastor RachelReformed pastor, first sermon generationSermonWiseTheological tradition fidelityMedium
Church Admin LindaOffice admin, Monday morning routineCWADashboard clarity, actionable dataMedium
Pastor JamesPreparing Sunday sermon, needs illustrationITWRelevance, theological depthMedium
SarahUnchurched, looking for a churchPewSearchLocation, denomination, welcoming vibeMedium
Pastor DavidClaiming his church's PewSearch listingPewSearchEasy claim process, control over listingLow

SermonWise Example Walkthrough

Journey: sermonwise-first-sermon Goal: "I need to prepare a Reformed sermon on Romans 8:28-30 for this Sunday." Persona: Pastor Rachel, 38, Reformed tradition, medium tech comfort

Step 1: Landing Page (sermonwise.ai)

Q1 -- What do I SEE? A clean landing page with the headline "AI-Powered Sermon Preparation -- Aligned with Your Tradition." A hero image showing a pastor at a desk. Navigation with Home, Showcase, Templates, Pricing, Login. A gold "Start Free" CTA button.

Q2 -- Does this match the spec?

[PASS] Headline mentions "tradition" -- spec requires tradition-awareness in hero
[PASS] "Start Free" CTA present -- spec requires low-friction entry
[CHECK] Does the page mention 17 traditions? Spec says tradition count must appear

Q3 -- If I were Pastor Rachel, would I know what to do next? Rachel is a Reformed pastor looking for a sermon tool that respects her tradition. She sees "Aligned with Your Tradition" and feels seen. She would click "Start Free" or look for a "Reformed" mention to confirm this tool knows her tradition. The CTA is clear.

Q4 -- What would make this BETTER? The landing page could show tradition badges (Reformed, Catholic, Baptist, etc.) in the hero area so Rachel immediately sees her tradition is supported without scrolling.

Q5 -- Is this moving toward THE GOAL? Rachel's goal is to generate a sermon. She's on the landing page. This is Step 1 of 6. She has not started yet, but the page communicates that she's in the right place. On track.

Step 2: Signup / Login

Q1: A simple signup form with email, password, and tradition selector dropdown. Google OAuth option available.

Q2: [PASS] Tradition selector present during signup [PASS] Reformed is in the dropdown [PASS] No credit card required for free tier

Q3: Rachel selects "Reformed" from the dropdown. She appreciates not needing a credit card. She signs up with her email. Low friction.

Q4: The tradition selector could show a brief description of each tradition (e.g., "Reformed -- Emphasizes God's sovereignty, TULIP, covenant theology") to build confidence.

Q5: Step 2 of 6. Signup is a necessary gate. On track.

Step 3: Dashboard / New Sermon

Q1: A dashboard with a prominent "New Sermon" button. Left sidebar with navigation. A getting-started banner for first-time users.

Q2: [PASS] "New Sermon" button visible [PASS] Getting started guide for first use [CHECK] Is tradition shown in the header/profile area?

Q3: Rachel clicks "New Sermon" immediately. The getting-started guide is helpful but she already knows what she wants. She wants to get to the generation form fast.

Q4: For power users like Rachel, add a keyboard shortcut (Cmd+N) for new sermon. Show her tradition badge in the header so she knows it's remembered.

Q5: Step 3 of 6. She's now entering the creation flow. On track.

Step 4: Sermon Configuration

Q1: A form with fields for Scripture reference, sermon title (optional), structure type (expository, topical, narrative), length target, and tradition confirmation (pre-filled as Reformed).

Q2: [PASS] Scripture input field present [PASS] Tradition pre-filled from signup [PASS] Structure options include expository (required for Reformed) [CHECK] Does it support verse-range input like "Romans 8:28-30"?

Q3: Rachel enters "Romans 8:28-30" and sees "Expository" already selected (appropriate for Reformed). She adjusts length to 25 minutes. She feels confident this will produce something tradition-appropriate.

Q4: Show a preview of what the structure will look like ("Introduction > Historical Context > Verse-by-verse Exposition > Application > Conclusion") before generating, so Rachel can adjust.

Q5: Step 4 of 6. Configuration complete, generation next. On track.

Step 5: Sermon Generation

Q1: A loading indicator ("Crafting your sermon...") followed by a generated sermon with clear sections: Title, Introduction, Body (verse-by-verse exposition of Romans 8:28-30), Application, Conclusion.

Q2: [PASS] Sermon generates without error [CHECK] Does the sermon reference Reformed distinctives (sovereignty, predestination, perseverance)? [CHECK] Is the ESV translation used (standard for Reformed)?

Q3: Rachel reads the introduction and immediately checks for theological accuracy. Does it handle predestination (v.29-30) with the nuance her congregation expects? Does it mention the golden chain of redemption? She's reading critically, not casually.

Q4: Add inline theological notes (expandable) that explain why specific Reformed framings were chosen. This builds trust and helps Rachel understand the AI's reasoning.

Q5: Step 5 of 6. The sermon exists. Goal nearly achieved. Rachel needs to review and possibly edit.

Step 6: Review, Edit, and Export

Q1: The generated sermon displayed in an editable rich text editor. Export options visible: Copy, Download PDF, Download DOCX. A "Refine" button for AI-assisted revision.

Q2: [PASS] Edit capability present [PASS] Export to PDF available [CHECK] Does the PDF include proper formatting (headings, Scripture references)? [CHECK] Is there a "Save to Library" option?

Q3: Rachel makes minor edits to the application section, adjusting it for her specific congregation. She exports to PDF for her sermon notes binder. She's satisfied.

Q4: Add a "Share with Elders" option -- Reformed churches often have elder review of sermons. A shareable link (read-only) would serve this workflow.

Q5: Step 6 of 6. GOAL ACHIEVED. Pastor Rachel has a Reformed sermon on Romans 8:28-30 ready for Sunday.


Integration with QA Orchestrator

The QA Orchestrator skill (/qa) supports goal-based testing via the goals domain:

/qa goals -- Run all 10 journeys
/qa goals sermonwise -- Run SermonWise journeys only
/qa goals cwa -- Run ChurchWiseAI journeys only
/qa goals pewsearch -- Run PewSearch journeys only
/qa goals itw -- Run ITW journeys only
/qa goals cross -- Run cross-property journeys only
/qa goals sermonwise-first-sermon -- Run a single journey by name

Execution Flow

  1. Load journey YAML from knowledge/tests/journeys/
  2. For each step: a. Navigate to the URL b. Take a screenshot c. Run the Mechanical Layer checks d. Apply all 5 Questions e. Log findings with severity levels
  3. Run outcome verification (database checks, API calls, page content)
  4. Generate report with pass/fail per step, all findings, and overall journey status
  5. Update baselines in suite-baselines.json

Report Format

JOURNEY: sermonwise-first-sermon
PERSONA: Pastor Rachel (38, Reformed, medium tech)
GOAL: Generate a Reformed sermon on Romans 8:28-30
STATUS: PASS (6/6 steps passed)

Step 1: Landing Page ................. PASS
Q1: Hero communicates tradition-aware sermon tool
Q2: 3/3 spec checks passed
Q3: Rachel would click "Start Free" -- clear CTA
Q5: On track (1/6)

Step 2: Signup ...................... PASS
Q1: Clean signup with tradition selector
Q2: 3/3 spec checks passed
Q3: Low friction, no credit card required
Q5: On track (2/6)

...

FINDINGS:
0 SPEC VIOLATIONS
0 GOAL BLOCKED
1 PERSONA RISK: Tradition badges missing from hero (Step 1, Q4)
2 IMPROVEMENTS: Keyboard shortcut for new sermon, elder sharing option

OUTCOME VERIFICATION:
[PASS] Sermon content includes Reformed theological markers
[PASS] Export to PDF generates valid document

Relationship to Expected Output Methodology

The Expected Output Methodology (knowledge/processes/expected-output-methodology.md) defines what each page SHOULD look like for each customer tier. It is the spec that Q2 checks against.

The 5-Question Framework (this document) defines HOW to evaluate pages beyond mere spec compliance. It adds persona empathy (Q3), creative improvement (Q4), and goal-tracking (Q5) on top of spec checking (Q2).

They work together:

Expected Output Spec --> "The pricing page must show $14.95 for Starter"
5-Question Framework --> "The pricing page shows $14.95 (Q2 PASS),
but Pastor Ruth can't find it because it's
below the fold (Q3 PERSONA RISK),
and a 'Which plan?' quiz would help (Q4 IMPROVEMENT),
and she's 2 steps from signup but at risk
of abandoning (Q5 AT RISK)"

Without the Expected Output Spec, Q2 has nothing to check against. Without the 5-Question Framework, the spec is just a checklist with no human judgment.


File Locations

WhatWhere
This methodologyknowledge/processes/5-question-testing.md
Journey YAML filesknowledge/tests/journeys/*.yaml
Journey baselinesknowledge/tests/baselines/suite-baselines.json
Acceptance specsknowledge/acceptance/*.md
Persona Playwright testschurchwiseai-web/e2e/delivers/personas/*.spec.ts
Existing journey Playwright testschurchwiseai-web/e2e/journeys/*.spec.ts
QA Orchestrator skill/qa goals