Skip to main content

Knowledge > Processes > Knowledge Derivation Pipeline

Knowledge Derivation Pipeline

pnpm derive propagates changes from canonical source files (YAML and Markdown in knowledge/) to all downstream targets (TypeScript files, marketing pages, database records, Google Drive). It ensures that when pricing changes in one place, it changes everywhere.


Overview

The system has three concepts:

  1. Sources -- canonical YAML or Markdown files in knowledge/ (e.g., data/pricing.yaml)
  2. Targets -- downstream files or systems that must reflect the source (e.g., src/lib/pricing.ts, product_knowledge table, marketing pages)
  3. Manifest -- knowledge/manifest.yaml maps every source to its targets, including the script to run and the operation type

Manifest structure

The manifest (knowledge/manifest.yaml) defines which sources derive to which targets:

sources:
data/pricing.yaml:
derives:
- target: churchwiseai-web/src/lib/pricing.ts
script: derive-pricing
type: regenerate

- target: supabase:product_knowledge
script: derive-pricing
type: upsert
filter: "category IN ('billing', 'churchwiseai')"

- target: churchwiseai-web/src/app/pricing/page.tsx
script: derive-pricing
type: verify

- target: "gdrive:03-Strategy/Pricing/"
script: derive-gdrive
type: nightly

Target types

TypeWhat it doesWhen changes are applied
regenerateGenerates the entire target file from the source YAMLImmediately on commit
verifyChecks that expected values exist in the target file (does not modify it)Read-only -- flags drift
upsertGenerates SQL INSERT/UPDATE statements for the product_knowledge tablePrints SQL for manual execution
update-sectionReplaces a specific marked section in a target fileImmediately on commit
nightlyQueued for Google Drive sync (runs separately, not during derive)Skipped during derive

Current sources and their targets

SourceTarget countKey targets
data/pricing.yaml17 targetspricing.ts (regenerate), product_knowledge (upsert), 10+ marketing pages (verify), PRICING.md (regenerate), voice agent prompts (verify), PewSearch/ITW pricing pages (verify)
data/features.yaml3 targetspricing.ts (regenerate), product_knowledge (upsert), Google Drive (nightly)
data/products.yaml3 targetsbrand.ts (verify), product_knowledge (upsert), Google Drive (nightly)
data/policies.yaml4 targetsterms page (verify), privacy page (verify), product_knowledge (upsert), Google Drive (nightly)
narrative/vision.md3 targetsCLAUDE.md brand-architecture section (update-section), product_knowledge (upsert), Google Drive (nightly)
narrative/competitive.md2 targetscompare/[slug] pages (verify), Google Drive (nightly)
narrative/sales-playbook.md2 targetsproduct_knowledge (upsert), Google Drive (nightly)
narrative/brand.md1 targetGoogle Drive (nightly)
narrative/strategy.md1 targetGoogle Drive (nightly)
narrative/customer-journey.md1 targetGoogle Drive (nightly)
narrative/operations.md1 targetGoogle Drive (nightly)
data/tools.yaml5 targetschatbot-tools.ts (verify), voice church/sales/core tools (verify), features.yaml tool counts (verify)

Full pipeline: pnpm derive data/pricing.yaml

When run without flags, the pipeline runs all four phases automatically.

Phase 1: Dry-run (preview changes)

FOR each target defined for this source in the manifest:

Load the source YAML file and parse it

IF target type = "regenerate":
1. Call the generator function (e.g., generatePricingTs(pricingYaml, featuresYaml))
2. Read the current file on disk
3. Normalize line endings (CRLF -> LF for consistent comparison on Windows)
4. Compare generated output with current file
5. IF identical: report "UP TO DATE"
6. IF different: report "DRIFT DETECTED" and show a line-by-line diff
(Diff output is capped at 30 lines to keep it readable)

IF target type = "verify":
1. Read the target file
2. Extract expected values from the source YAML
Example: for pricing, extract all price strings ("$14.95", "$34.95", etc.)
3. Check that each expected value appears in the target file content
4. Report per-value: PASS (found), FAIL (missing), WARN (ambiguous)

IF target type = "upsert":
1. Call the SQL generator (e.g., generateProductKnowledgeSQL())
2. Count the number of INSERT/UPDATE statements
3. Report: "X SQL statement(s) would be executed"

IF target type = "update-section":
1. Read the target file
2. Read the narrative source content
3. Call updateSection(currentContent, sectionId, narrativeContent)
4. Compare the result with the current file
5. IF identical: "UP TO DATE"
6. IF different: "WOULD UPDATE"

IF target type = "nightly":
SKIP (handled by Drive sync cron, not derive)

After all targets are processed, write a lockfile (.derive-lock.json) with:

  • Source file name
  • Source file MD5 hash
  • Timestamp
  • All results (pass/fail/skip per target)

Phase 2: Commit (apply changes)

FOR each target:

IF target type = "regenerate":
Generate the content and WRITE it to disk
Report: "WRITTEN"

IF target type = "verify":
Re-run the verification checks (read-only, same as dry-run)
Report: pass/fail counts

IF target type = "upsert":
Generate the SQL statements
PRINT the SQL to console for manual execution
(v1 does not auto-execute SQL -- operator runs it via Supabase MCP or SQL editor)
Report: "X SQL statement(s) generated -- execute manually via Supabase"

IF target type = "update-section":
Generate the updated content and WRITE it to disk
Report: "WRITTEN"

IF target type = "nightly":
SKIP

Phase 3: Verify (confirm changes took effect)

Re-run dry-run checks on all regenerate and verify targets
to confirm the committed changes are correct.

FOR upsert targets: SKIP (manual SQL execution cannot be verified automatically)
FOR update-section targets: re-run the dry-run comparison

Phase 4: Append to changelog

Build a summary entry with timestamp and per-target status:
"## 2026-03-25 14:30 -- data/pricing.yaml"
"- churchwiseai-web/src/lib/pricing.ts: regenerate PASS"
"- supabase:product_knowledge: upsert PASS (3 SQL statements)"
"- churchwiseai-web/src/app/pricing/page.tsx: verify PASS (8/8 checks passed)"

Append to knowledge/changelog.md

Lockfile mechanics

The lockfile (.derive-lock.json) prevents stale commits:

WHEN --dry-run is run:
Write lockfile with: { source, hash, timestamp, results }

WHEN --commit is run:
1. Read lockfile
2. IF lockfile doesn't exist: ERROR "Run --dry-run first"
3. IF lockfile source doesn't match requested source: ERROR
4. IF lockfile is older than 10 minutes: ERROR "Re-run --dry-run"
5. IF source file MD5 has changed since dry-run: ERROR "Source changed -- re-run --dry-run"
6. PROCEED with commit

The 10-minute expiry prevents applying changes based on a stale preview.


CLI usage

pnpm derive data/pricing.yaml Full pipeline (dry-run + commit + verify + changelog)
pnpm derive data/pricing.yaml --dry-run Phase 1 only: preview what would change
pnpm derive data/pricing.yaml --commit Phase 2-4: apply from lockfile
pnpm derive --check Drift detection: scan ALL sources, exit 1 if drift
pnpm derive --all Full pipeline for every source in manifest

Drift detection mode (--check)

WHEN --check is run:
FOR each source in manifest:
1. Filter out nightly-only targets
2. Run dry-run on remaining targets
3. Collect failures

IF any source has failures:
Print drift details
EXIT 1 (non-zero exit code for CI integration)
ELSE:
Print "All sources in sync"
EXIT 0

This can be used in CI/CD to block deploys when knowledge has drifted.


Generator scripts

Each script handles a specific source type:

ScriptSourceWhat it generates
derive-pricingpricing.yaml + features.yamlpricing.ts (full file), product_knowledge SQL, price verification
derive-productsproducts.yamlbrand.ts verification, product_knowledge SQL
derive-policiespolicies.yamlterms/privacy page verification, policy knowledge SQL
derive-narrativevision.md, competitive.md, etc.CLAUDE.md section updates, narrative knowledge SQL
derive-toolstools.yamlTool schema verification across chatbot and voice codebases
derive-gdriveall sourcesGoogle Drive sync (nightly cron, not run during derive)

Cross-source dependencies

Some generators need multiple source files:

  • generatePricingTs() requires BOTH pricing.yaml AND features.yaml
  • The script auto-loads the companion file regardless of which source triggered the derive

Safety properties

  1. Verify targets are read-only -- they never modify the target file, only report drift
  2. Upsert targets print SQL but don't execute it -- the operator reviews and runs manually
  3. Regenerate targets overwrite the entire file -- the generator is the source of truth
  4. Line endings are normalized (CRLF to LF) before comparison to avoid false drift on Windows
  5. Lockfile expiry (10 minutes) prevents applying stale changes
  6. Source hash check prevents committing when the source changed after dry-run
  7. No concurrent execution lock -- agents should use separate git branches to avoid conflicts