Knowledge > Processes > Knowledge Derivation Pipeline

Knowledge Derivation Pipeline

pnpm derive propagates changes from canonical source files (YAML and Markdown in knowledge/) to all downstream targets (TypeScript files, marketing pages, database records, Google Drive). It ensures that when pricing changes in one place, it changes everywhere.

Overview

The system has three concepts:

Sources -- canonical YAML or Markdown files in knowledge/ (e.g., data/pricing.yaml)
Targets -- downstream files or systems that must reflect the source (e.g., src/lib/pricing.ts, product_knowledge table, marketing pages)
Manifest -- knowledge/manifest.yaml maps every source to its targets, including the script to run and the operation type

Manifest structure

The manifest (knowledge/manifest.yaml) defines which sources derive to which targets:

sources:
  data/pricing.yaml:
    derives:
      - target: churchwiseai-web/src/lib/pricing.ts
        script: derive-pricing
        type: regenerate

      - target: supabase:product_knowledge
        script: derive-pricing
        type: upsert
        filter: "category IN ('billing', 'churchwiseai')"

      - target: churchwiseai-web/src/app/pricing/page.tsx
        script: derive-pricing
        type: verify

      - target: "gdrive:03-Strategy/Pricing/"
        script: derive-gdrive
        type: nightly

Target types

Type	What it does	When changes are applied
`regenerate`	Generates the entire target file from the source YAML	Immediately on commit
`verify`	Checks that expected values exist in the target file (does not modify it)	Read-only -- flags drift
`upsert`	Generates SQL INSERT/UPDATE statements for the product_knowledge table	Prints SQL for manual execution
`update-section`	Replaces a specific marked section in a target file	Immediately on commit
`nightly`	Queued for Google Drive sync (runs separately, not during derive)	Skipped during derive

Current sources and their targets

Source	Target count	Key targets
`data/pricing.yaml`	17 targets	pricing.ts (regenerate), product_knowledge (upsert), 10+ marketing pages (verify), PRICING.md (regenerate), voice agent prompts (verify), PewSearch/ITW pricing pages (verify)
`data/features.yaml`	3 targets	pricing.ts (regenerate), product_knowledge (upsert), Google Drive (nightly)
`data/products.yaml`	3 targets	brand.ts (verify), product_knowledge (upsert), Google Drive (nightly)
`data/policies.yaml`	4 targets	terms page (verify), privacy page (verify), product_knowledge (upsert), Google Drive (nightly)
`narrative/vision.md`	3 targets	CLAUDE.md brand-architecture section (update-section), product_knowledge (upsert), Google Drive (nightly)
`narrative/competitive.md`	2 targets	compare/[slug] pages (verify), Google Drive (nightly)
`narrative/sales-playbook.md`	2 targets	product_knowledge (upsert), Google Drive (nightly)
`narrative/brand.md`	1 target	Google Drive (nightly)
`narrative/strategy.md`	1 target	Google Drive (nightly)
`narrative/customer-journey.md`	1 target	Google Drive (nightly)
`narrative/operations.md`	1 target	Google Drive (nightly)
`data/tools.yaml`	5 targets	chatbot-tools.ts (verify), voice church/sales/core tools (verify), features.yaml tool counts (verify)

Full pipeline: `pnpm derive data/pricing.yaml`

When run without flags, the pipeline runs all four phases automatically.

Phase 1: Dry-run (preview changes)

FOR each target defined for this source in the manifest:

  Load the source YAML file and parse it

  IF target type = "regenerate":
    1. Call the generator function (e.g., generatePricingTs(pricingYaml, featuresYaml))
    2. Read the current file on disk
    3. Normalize line endings (CRLF -> LF for consistent comparison on Windows)
    4. Compare generated output with current file
    5. IF identical: report "UP TO DATE"
    6. IF different: report "DRIFT DETECTED" and show a line-by-line diff
       (Diff output is capped at 30 lines to keep it readable)

  IF target type = "verify":
    1. Read the target file
    2. Extract expected values from the source YAML
       Example: for pricing, extract all price strings ("$14.95", "$34.95", etc.)
    3. Check that each expected value appears in the target file content
    4. Report per-value: PASS (found), FAIL (missing), WARN (ambiguous)

  IF target type = "upsert":
    1. Call the SQL generator (e.g., generateProductKnowledgeSQL())
    2. Count the number of INSERT/UPDATE statements
    3. Report: "X SQL statement(s) would be executed"

  IF target type = "update-section":
    1. Read the target file
    2. Read the narrative source content
    3. Call updateSection(currentContent, sectionId, narrativeContent)
    4. Compare the result with the current file
    5. IF identical: "UP TO DATE"
    6. IF different: "WOULD UPDATE"

  IF target type = "nightly":
    SKIP (handled by Drive sync cron, not derive)

After all targets are processed, write a lockfile (.derive-lock.json) with:

Source file name
Source file MD5 hash
Timestamp
All results (pass/fail/skip per target)

Phase 2: Commit (apply changes)

FOR each target:

  IF target type = "regenerate":
    Generate the content and WRITE it to disk
    Report: "WRITTEN"

  IF target type = "verify":
    Re-run the verification checks (read-only, same as dry-run)
    Report: pass/fail counts

  IF target type = "upsert":
    Generate the SQL statements
    PRINT the SQL to console for manual execution
    (v1 does not auto-execute SQL -- operator runs it via Supabase MCP or SQL editor)
    Report: "X SQL statement(s) generated -- execute manually via Supabase"

  IF target type = "update-section":
    Generate the updated content and WRITE it to disk
    Report: "WRITTEN"

  IF target type = "nightly":
    SKIP

Phase 3: Verify (confirm changes took effect)

Re-run dry-run checks on all regenerate and verify targets
to confirm the committed changes are correct.

FOR upsert targets: SKIP (manual SQL execution cannot be verified automatically)
FOR update-section targets: re-run the dry-run comparison

Phase 4: Append to changelog

Build a summary entry with timestamp and per-target status:
  "## 2026-03-25 14:30 -- data/pricing.yaml"
  "- churchwiseai-web/src/lib/pricing.ts: regenerate PASS"
  "- supabase:product_knowledge: upsert PASS (3 SQL statements)"
  "- churchwiseai-web/src/app/pricing/page.tsx: verify PASS (8/8 checks passed)"

Append to knowledge/changelog.md

Lockfile mechanics

The lockfile (.derive-lock.json) prevents stale commits:

WHEN --dry-run is run:
  Write lockfile with: { source, hash, timestamp, results }

WHEN --commit is run:
  1. Read lockfile
  2. IF lockfile doesn't exist: ERROR "Run --dry-run first"
  3. IF lockfile source doesn't match requested source: ERROR
  4. IF lockfile is older than 10 minutes: ERROR "Re-run --dry-run"
  5. IF source file MD5 has changed since dry-run: ERROR "Source changed -- re-run --dry-run"
  6. PROCEED with commit

The 10-minute expiry prevents applying changes based on a stale preview.

CLI usage

pnpm derive data/pricing.yaml              Full pipeline (dry-run + commit + verify + changelog)
pnpm derive data/pricing.yaml --dry-run    Phase 1 only: preview what would change
pnpm derive data/pricing.yaml --commit     Phase 2-4: apply from lockfile
pnpm derive --check                        Drift detection: scan ALL sources, exit 1 if drift
pnpm derive --all                          Full pipeline for every source in manifest

Drift detection mode (--check)

WHEN --check is run:
  FOR each source in manifest:
    1. Filter out nightly-only targets
    2. Run dry-run on remaining targets
    3. Collect failures

  IF any source has failures:
    Print drift details
    EXIT 1 (non-zero exit code for CI integration)
  ELSE:
    Print "All sources in sync"
    EXIT 0

This can be used in CI/CD to block deploys when knowledge has drifted.

Generator scripts

Each script handles a specific source type:

Script	Source	What it generates
`derive-pricing`	pricing.yaml + features.yaml	pricing.ts (full file), product_knowledge SQL, price verification
`derive-products`	products.yaml	brand.ts verification, product_knowledge SQL
`derive-policies`	policies.yaml	terms/privacy page verification, policy knowledge SQL
`derive-narrative`	vision.md, competitive.md, etc.	CLAUDE.md section updates, narrative knowledge SQL
`derive-tools`	tools.yaml	Tool schema verification across chatbot and voice codebases
`derive-gdrive`	all sources	Google Drive sync (nightly cron, not run during derive)

Cross-source dependencies

Some generators need multiple source files:

generatePricingTs() requires BOTH pricing.yaml AND features.yaml
The script auto-loads the companion file regardless of which source triggered the derive

Safety properties

Verify targets are read-only -- they never modify the target file, only report drift
Upsert targets print SQL but don't execute it -- the operator reviews and runs manually
Regenerate targets overwrite the entire file -- the generator is the source of truth
Line endings are normalized (CRLF to LF) before comparison to avoid false drift on Windows
Lockfile expiry (10 minutes) prevents applying stale changes
Source hash check prevents committing when the source changed after dry-run
No concurrent execution lock -- agents should use separate git branches to avoid conflicts

Overview​

Manifest structure​

Target types​

Current sources and their targets​

Full pipeline: pnpm derive data/pricing.yaml​

Phase 1: Dry-run (preview changes)​

Phase 2: Commit (apply changes)​

Phase 3: Verify (confirm changes took effect)​

Phase 4: Append to changelog​

Lockfile mechanics​

CLI usage​

Drift detection mode (--check)​

Generator scripts​

Cross-source dependencies​

Safety properties​