Skip to main content

Knowledge > Products > ITW Premium > Content Pipeline

ITW Content Pipeline

Pipeline Overview

Content flows through a six-stage pipeline from generation to website display. Two codebases share ownership: ai-sermon-assistant (legacy, reference-only) owns content generation and enrichment, while sermon-illustrations owns visibility assignment, stub regeneration, and website queries.

[1] Content Import/Generation (ai-sermon-assistant)
|
v
[2] Enrichment (ai-sermon-assistant)
|
v
[3] Embedding Generation (ai-sermon-assistant / sermon-illustrations)
|
v
[4] Vetting & Approval (ai-sermon-assistant admin)
|
v
[5] Materialized View Refresh (pg_cron, every 15 min)
|
v
[6] Website Reads View (sermon-illustrations)

Each stage is described in detail below.

Stage 1: Content Import/Generation

Owner: ai-sermon-assistant (legacy -- read for reference only, never modify)

Content enters unified_rag_content through multiple paths:

SourceMethodContent TypesVolume
AI generationClaude Max CLI (claude -p)All 26 typesBulk batches
Web scrapingCustom scrapers with exclusion rulessermon_illustration, quote, historical_illustrationPeriodic
Manual entryAdmin dashboardAny typeAd hoc
Partner contentAPI importVariesRare

AI Generation Rules

  • ALWAYS use Claude Max CLI (claude -p) for batch content generation -- never Anthropic/OpenAI APIs
  • The founder pays $200/mo for Claude Max; API costs must be avoided for batch operations
  • For nested CLI invocation from scripts: delete CLAUDE_CODE_ENTRYPOINT and CLAUDECODE env vars before spawning
  • Content must meet minimum quality: word_count >= 100, quality_score >= 0.5

Row Structure on Insert

Every new row in unified_rag_content must include:

FieldRequiredDescription
contentYesFull illustration text
content_typeYesOne of 26 recognized types
source_typeYes'ai_generated', 'scraped', 'manual', 'partner'
content_categoryYes'illustration' for ITW content
content_formatYes'text' for standard content
titleYesDisplay title
summaryYes1-2 sentence summary for cards
word_countYesComputed from content
quality_scoreYes0.0-1.0 quality rating
embeddingYes1536-dim vector (text-embedding-3-small)
embedding_modelYes'text-embedding-3-small'
slugYesURL-safe slug (see slugification rules below)
created_atAutoTimestamp
updated_atAutoTimestamp

Stage 2: Enrichment

Owner: ai-sermon-assistant (legacy -- reference only)

After initial import, content is enriched with structured metadata:

FieldTypeDescription
topicstext[]Subject topics (Faith, Grace, Love, etc.)
themestext[]Thematic tags
emotionstext[]Emotional register (Joy, Hope, Comfort, etc.)
human_needstext[]Needs addressed (belonging, meaning, healing)
keywordstext[]Search keywords
doctrinestext[]Theological doctrines referenced
scripture_referencestext[]Bible passages (e.g., ['John 3:16', 'Romans 8:28'])
theological_lens_idUUIDFK to sai_theological_lenses (optional)
is_universalbooleanTrue if applicable across all traditions
tonetextWriting tone (inspirational, reflective, challenging)
audience_primarytextTarget audience (general, youth, seniors, etc.)
audience_suitabilitytext[]Suitable audiences
sermon_sectionstext[]Where in a sermon this fits (opening, body, closing)
application_pointstext[]Practical takeaways
spiritual_disciplinestext[]Related spiritual practices
creative_approachtextHow the content approaches its subject
source_attributiontextOriginal source credit

Enrichment happens in bulk via AI pipelines in ai-sermon-assistant. The enrichment pipeline is NOT run from sermon-illustrations/.

Stage 3: Embedding Generation

Owner: Both codebases (generation in ai-sermon-assistant, regeneration in sermon-illustrations)

Embeddings use OpenAI text-embedding-3-small (1536 dimensions). The text format for embedding is:

"Scripture: {scripture_references}\n\nAuthor: {primary_author}\n\nSource: {source_attribution}\n\nContent: {content}"

Embedding Rules

  • Every row MUST have a non-null embedding column
  • embedding_model MUST be set to 'text-embedding-3-small'
  • When content is updated (stub regeneration, edits), the embedding MUST be regenerated
  • Embedding dimension: 1536 (enforced by pgvector column type)
  • Never insert a row with a NULL embedding -- the materialized view and RAG queries depend on it

Embedding Generation Pseudocode

async function generateEmbedding(row) {
text = buildEmbeddingText(row)
// text = "Scripture: John 3:16\n\nAuthor: C.S. Lewis\n\nSource: Mere Christianity\n\nContent: ..."

response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: text
})

return response.data[0].embedding // float[1536]
}

function buildEmbeddingText(row) {
parts = []
if (row.scripture_references?.length)
parts.push(`Scripture: ${row.scripture_references.join(', ')}`)
if (row.primary_author)
parts.push(`Author: ${row.primary_author}`)
if (row.source_attribution)
parts.push(`Source: ${row.source_attribution}`)
parts.push(`Content: ${row.content}`)
return parts.join('\n\n')
}

Stage 4: Vetting & Approval

Owner: ai-sermon-assistant admin dashboard (legacy -- reference only)

Content goes through quality gates before appearing on the website:

GateCriteriaOutcome
Minimum word countword_count >= 30 (lower threshold for stubs)Below → flagged as stub
Quality scorequality_score >= 0.5Below → held for review
Content reviewManual spot-check of AI-generated contentApprove or regenerate
Slug checkValid slug, no duplicatesAuto-generated if missing

Approved content appears in the dir_illustrations view immediately — it reflects live data from unified_rag_content.

Stage 5: View Read Layer

Owner: Supabase

The dir_illustrations SQL view (not a materialized view) is the read layer for the ITW website. It reflects live data from unified_rag_content — no refresh cycle is needed.

What the View Contains

The view filters and shapes unified_rag_content for website display:

  • Includes only content with content_category = 'illustration'
  • Excludes rows below quality thresholds
  • Excludes rows without embeddings
  • Indexes by content_type, visibility_tier, quality_score for efficient browse queries
  • Contains ~50,000 rows (subset of ~327K total)

Latency

New or updated content appears on the website immediately — the view is a live SQL view over unified_rag_content, not a materialized snapshot.

Historical note: Earlier docs described a pg_cron-refreshed materialized view with a 15-minute latency. The current production schema uses a regular SQL view with no refresh needed.

Stage 6: Website Reads View

Owner: sermon-illustrations

The ITW website queries dir_illustrations exclusively. It NEVER writes to unified_rag_content directly during normal operation (only scripts like regenerate-stubs.mjs write).

Query functions in src/lib/queries/illustrations.ts handle all reads:

FunctionPurposeTable
getFeaturedIllustrations()Homepage featured contentdir_illustrations
getIllustrationsByTopic()Topic browse pagesdir_illustrations
getIllustrationsByEmotion()Emotion browse pagesdir_illustrations
getIllustrationsByTradition()Tradition browse pagesdir_illustrations
getIllustrationBySlug()Detail pagedir_illustrations
getContentTypeCounts()Browse-by-type pagedir_illustrations
Full-text searchSearch resultsdir_illustrations (search_vector)

Stub Regeneration System

2,030 AI-generated stubs (word_count <= 30) exist in the database. These are placeholder rows created during bulk import that need full content. The regeneration system upgrades them:

Pipeline

[1] Read stubs from unified_rag_content
WHERE source_type = 'ai_generated' AND word_count <= 30

[2] For each stub:
a. Generate new full-length content via Claude Max CLI
(generateWithClaudeMax() from scripts/lib/shared.mjs)
b. Generate new summary
c. Compute new word_count
d. Generate new embedding via text-embedding-3-small
e. Assign new quality_score

[3] Update unified_rag_content:
SET content = new_content,
summary = new_summary,
word_count = new_word_count,
quality_score = new_quality_score,
embedding = new_embedding,
embedding_model = 'text-embedding-3-small',
updated_at = now()
WHERE id = stub.id

Safety Rules for Stub Regeneration

  • Only update stubs: source_type = 'ai_generated' AND word_count <= 30
  • Never delete rows -- 327K records are irreplaceable
  • Always update embedding when content changes
  • Always update word_count and summary when content changes
  • Always set updated_at to track when regeneration occurred
  • Use service role key for database writes (RLS bypass)
  • Rate limit Claude CLI calls to avoid overwhelming the session

Script Location

sermon-illustrations/scripts/regenerate-stubs.mjs
|-- imports: scripts/lib/shared.mjs (generateWithClaudeMax, createEmbedding)
|-- reads: unified_rag_content (stubs only)
|-- writes: unified_rag_content (content, summary, word_count, quality_score,
| embedding, embedding_model, updated_at)

Slugification Rules

Illustration slugs follow this pattern:

function generateSlug(title, id) {
slug = title
.toLowerCase()
.replace(/[^a-z0-9\s-]/g, '') // strip special chars
.replace(/\s+/g, '-') // spaces to hyphens
.replace(/-+/g, '-') // collapse multiple hyphens
.trim()

// Append first 8 chars of UUID for uniqueness
slug += '-' + id.substring(0, 8)

return slug
}

// Example: "God's Grace in the Storm" + UUID "a1b2c3d4-..."
// Result: "gods-grace-in-the-storm-a1b2c3d4"

The 8-character UUID suffix prevents slug collisions when multiple illustrations have similar titles.

Database Write Rules

MUST DO When Writing to unified_rag_content

  1. Update embedding AND embedding_model when content changes
  2. Update word_count when content changes
  3. Update summary when content changes
  4. Set updated_at = now() on every write
  5. Use Supabase service role key (bypasses RLS)
  6. Build embedding text with the proper format (Scripture/Author/Source/Content)
  7. Validate word_count >= 30 (minimum for non-stub content)
  8. Ensure quality_score is between 0.0 and 1.0

MUST NOT Do

  1. Delete rows -- 327K records are irreplaceable
  2. Change metadata owned by enrichment pipeline -- topics, themes, emotions, human_needs, keywords, doctrines (owned by ai-sermon-assistant enrichment)
  3. Change immutable fields -- source_type, content_type, content_category, content_format
  4. Write NULL embeddings -- every row must have a valid 1536-dim vector
  5. Update non-stub rows from regeneration scripts -- only source_type = 'ai_generated' AND word_count <= 30
  6. Insert without required fields -- embedding, summary, word_count, quality_score are all mandatory
  7. Bulk delete -- even for cleanup, archive instead

Content Script Ownership

ScriptOwnerPurpose
assign-visibility-tiers.mjssermon-illustrationsAssigns public/free_signup/premium visibility tiers
regenerate-stubs.mjssermon-illustrationsUpgrades AI stubs to full content
generate-*.mjs variantssermon-illustrationsContent generation utilities
audit-content-quality.mjssermon-illustrationsAudits quality scores and flags issues
Content import/generationai-sermon-assistantBulk content creation (legacy, reference only)
Enrichment pipelineai-sermon-assistantMetadata extraction (legacy, reference only)
Vetting/approvalai-sermon-assistantQuality review dashboard (legacy, reference only)

Monitoring & Health Checks

Key Metrics to Watch

MetricExpectedQuery
Total rows~327KSELECT count(*) FROM unified_rag_content
View rows~50KSELECT count(*) FROM dir_illustrations
Stubs remaining~2,030SELECT count(*) FROM unified_rag_content WHERE source_type='ai_generated' AND word_count<=30
NULL embeddings0SELECT count(*) FROM unified_rag_content WHERE embedding IS NULL
Last view refresh< 15 min agoCheck pg_cron logs

View Staleness Check

-- dir_illustrations is a live SQL view — content is always current.
-- If content appears missing, verify it's in unified_rag_content:
SELECT count(*) FROM unified_rag_content
WHERE content_category = 'illustration' AND is_active = true;

-- Verify view row count matches expected ~50K:
SELECT count(*) FROM dir_illustrations;

Reference Documents

  • sermon-illustrations/internal/content-rules.md -- editorial quality standards, tone guidelines, content type definitions
  • sermon-illustrations/internal/shared-database-contract.md -- table ownership map, write permissions, cross-project rules
  • sermon-illustrations/internal/content-pipeline.md -- original pipeline documentation (may be outdated, this doc is canonical)

See Also