Knowledge > Products > ITW Premium > Content Pipeline
ITW Content Pipeline
Pipeline Overview
Content flows through a six-stage pipeline from generation to website display. Two codebases share ownership: ai-sermon-assistant (legacy, reference-only) owns content generation and enrichment, while sermon-illustrations owns visibility assignment, stub regeneration, and website queries.
[1] Content Import/Generation (ai-sermon-assistant)
|
v
[2] Enrichment (ai-sermon-assistant)
|
v
[3] Embedding Generation (ai-sermon-assistant / sermon-illustrations)
|
v
[4] Vetting & Approval (ai-sermon-assistant admin)
|
v
[5] Materialized View Refresh (pg_cron, every 15 min)
|
v
[6] Website Reads View (sermon-illustrations)
Each stage is described in detail below.
Stage 1: Content Import/Generation
Owner: ai-sermon-assistant (legacy -- read for reference only, never modify)
Content enters unified_rag_content through multiple paths:
| Source | Method | Content Types | Volume |
|---|---|---|---|
| AI generation | Claude Max CLI (claude -p) | All 26 types | Bulk batches |
| Web scraping | Custom scrapers with exclusion rules | sermon_illustration, quote, historical_illustration | Periodic |
| Manual entry | Admin dashboard | Any type | Ad hoc |
| Partner content | API import | Varies | Rare |
AI Generation Rules
- ALWAYS use Claude Max CLI (
claude -p) for batch content generation -- never Anthropic/OpenAI APIs - The founder pays $200/mo for Claude Max; API costs must be avoided for batch operations
- For nested CLI invocation from scripts: delete
CLAUDE_CODE_ENTRYPOINTandCLAUDECODEenv vars before spawning - Content must meet minimum quality: word_count >= 100, quality_score >= 0.5
Row Structure on Insert
Every new row in unified_rag_content must include:
| Field | Required | Description |
|---|---|---|
content | Yes | Full illustration text |
content_type | Yes | One of 26 recognized types |
source_type | Yes | 'ai_generated', 'scraped', 'manual', 'partner' |
content_category | Yes | 'illustration' for ITW content |
content_format | Yes | 'text' for standard content |
title | Yes | Display title |
summary | Yes | 1-2 sentence summary for cards |
word_count | Yes | Computed from content |
quality_score | Yes | 0.0-1.0 quality rating |
embedding | Yes | 1536-dim vector (text-embedding-3-small) |
embedding_model | Yes | 'text-embedding-3-small' |
slug | Yes | URL-safe slug (see slugification rules below) |
created_at | Auto | Timestamp |
updated_at | Auto | Timestamp |
Stage 2: Enrichment
Owner: ai-sermon-assistant (legacy -- reference only)
After initial import, content is enriched with structured metadata:
| Field | Type | Description |
|---|---|---|
topics | text[] | Subject topics (Faith, Grace, Love, etc.) |
themes | text[] | Thematic tags |
emotions | text[] | Emotional register (Joy, Hope, Comfort, etc.) |
human_needs | text[] | Needs addressed (belonging, meaning, healing) |
keywords | text[] | Search keywords |
doctrines | text[] | Theological doctrines referenced |
scripture_references | text[] | Bible passages (e.g., ['John 3:16', 'Romans 8:28']) |
theological_lens_id | UUID | FK to sai_theological_lenses (optional) |
is_universal | boolean | True if applicable across all traditions |
tone | text | Writing tone (inspirational, reflective, challenging) |
audience_primary | text | Target audience (general, youth, seniors, etc.) |
audience_suitability | text[] | Suitable audiences |
sermon_sections | text[] | Where in a sermon this fits (opening, body, closing) |
application_points | text[] | Practical takeaways |
spiritual_disciplines | text[] | Related spiritual practices |
creative_approach | text | How the content approaches its subject |
source_attribution | text | Original source credit |
Enrichment happens in bulk via AI pipelines in ai-sermon-assistant. The enrichment pipeline is NOT run from sermon-illustrations/.
Stage 3: Embedding Generation
Owner: Both codebases (generation in ai-sermon-assistant, regeneration in sermon-illustrations)
Embeddings use OpenAI text-embedding-3-small (1536 dimensions). The text format for embedding is:
"Scripture: {scripture_references}\n\nAuthor: {primary_author}\n\nSource: {source_attribution}\n\nContent: {content}"
Embedding Rules
- Every row MUST have a non-null
embeddingcolumn embedding_modelMUST be set to'text-embedding-3-small'- When content is updated (stub regeneration, edits), the embedding MUST be regenerated
- Embedding dimension: 1536 (enforced by pgvector column type)
- Never insert a row with a NULL embedding -- the materialized view and RAG queries depend on it
Embedding Generation Pseudocode
async function generateEmbedding(row) {
text = buildEmbeddingText(row)
// text = "Scripture: John 3:16\n\nAuthor: C.S. Lewis\n\nSource: Mere Christianity\n\nContent: ..."
response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: text
})
return response.data[0].embedding // float[1536]
}
function buildEmbeddingText(row) {
parts = []
if (row.scripture_references?.length)
parts.push(`Scripture: ${row.scripture_references.join(', ')}`)
if (row.primary_author)
parts.push(`Author: ${row.primary_author}`)
if (row.source_attribution)
parts.push(`Source: ${row.source_attribution}`)
parts.push(`Content: ${row.content}`)
return parts.join('\n\n')
}
Stage 4: Vetting & Approval
Owner: ai-sermon-assistant admin dashboard (legacy -- reference only)
Content goes through quality gates before appearing on the website:
| Gate | Criteria | Outcome |
|---|---|---|
| Minimum word count | word_count >= 30 (lower threshold for stubs) | Below → flagged as stub |
| Quality score | quality_score >= 0.5 | Below → held for review |
| Content review | Manual spot-check of AI-generated content | Approve or regenerate |
| Slug check | Valid slug, no duplicates | Auto-generated if missing |
Approved content appears in the dir_illustrations view immediately — it reflects live data from unified_rag_content.
Stage 5: View Read Layer
Owner: Supabase
The dir_illustrations SQL view (not a materialized view) is the read layer for the ITW website. It reflects live data from unified_rag_content — no refresh cycle is needed.
What the View Contains
The view filters and shapes unified_rag_content for website display:
- Includes only content with
content_category = 'illustration' - Excludes rows below quality thresholds
- Excludes rows without embeddings
- Indexes by content_type, visibility_tier, quality_score for efficient browse queries
- Contains ~50,000 rows (subset of ~327K total)
Latency
New or updated content appears on the website immediately — the view is a live SQL view over unified_rag_content, not a materialized snapshot.
Historical note: Earlier docs described a pg_cron-refreshed materialized view with a 15-minute latency. The current production schema uses a regular SQL view with no refresh needed.
Stage 6: Website Reads View
Owner: sermon-illustrations
The ITW website queries dir_illustrations exclusively. It NEVER writes to unified_rag_content directly during normal operation (only scripts like regenerate-stubs.mjs write).
Query functions in src/lib/queries/illustrations.ts handle all reads:
| Function | Purpose | Table |
|---|---|---|
getFeaturedIllustrations() | Homepage featured content | dir_illustrations |
getIllustrationsByTopic() | Topic browse pages | dir_illustrations |
getIllustrationsByEmotion() | Emotion browse pages | dir_illustrations |
getIllustrationsByTradition() | Tradition browse pages | dir_illustrations |
getIllustrationBySlug() | Detail page | dir_illustrations |
getContentTypeCounts() | Browse-by-type page | dir_illustrations |
| Full-text search | Search results | dir_illustrations (search_vector) |
Stub Regeneration System
2,030 AI-generated stubs (word_count <= 30) exist in the database. These are placeholder rows created during bulk import that need full content. The regeneration system upgrades them:
Pipeline
[1] Read stubs from unified_rag_content
WHERE source_type = 'ai_generated' AND word_count <= 30
[2] For each stub:
a. Generate new full-length content via Claude Max CLI
(generateWithClaudeMax() from scripts/lib/shared.mjs)
b. Generate new summary
c. Compute new word_count
d. Generate new embedding via text-embedding-3-small
e. Assign new quality_score
[3] Update unified_rag_content:
SET content = new_content,
summary = new_summary,
word_count = new_word_count,
quality_score = new_quality_score,
embedding = new_embedding,
embedding_model = 'text-embedding-3-small',
updated_at = now()
WHERE id = stub.id
Safety Rules for Stub Regeneration
- Only update stubs:
source_type = 'ai_generated' AND word_count <= 30 - Never delete rows -- 327K records are irreplaceable
- Always update embedding when content changes
- Always update word_count and summary when content changes
- Always set updated_at to track when regeneration occurred
- Use service role key for database writes (RLS bypass)
- Rate limit Claude CLI calls to avoid overwhelming the session
Script Location
sermon-illustrations/scripts/regenerate-stubs.mjs
|-- imports: scripts/lib/shared.mjs (generateWithClaudeMax, createEmbedding)
|-- reads: unified_rag_content (stubs only)
|-- writes: unified_rag_content (content, summary, word_count, quality_score,
| embedding, embedding_model, updated_at)
Slugification Rules
Illustration slugs follow this pattern:
function generateSlug(title, id) {
slug = title
.toLowerCase()
.replace(/[^a-z0-9\s-]/g, '') // strip special chars
.replace(/\s+/g, '-') // spaces to hyphens
.replace(/-+/g, '-') // collapse multiple hyphens
.trim()
// Append first 8 chars of UUID for uniqueness
slug += '-' + id.substring(0, 8)
return slug
}
// Example: "God's Grace in the Storm" + UUID "a1b2c3d4-..."
// Result: "gods-grace-in-the-storm-a1b2c3d4"
The 8-character UUID suffix prevents slug collisions when multiple illustrations have similar titles.
Database Write Rules
MUST DO When Writing to unified_rag_content
- Update
embeddingANDembedding_modelwhen content changes - Update
word_countwhen content changes - Update
summarywhen content changes - Set
updated_at = now()on every write - Use Supabase service role key (bypasses RLS)
- Build embedding text with the proper format (Scripture/Author/Source/Content)
- Validate word_count >= 30 (minimum for non-stub content)
- Ensure quality_score is between 0.0 and 1.0
MUST NOT Do
- Delete rows -- 327K records are irreplaceable
- Change metadata owned by enrichment pipeline -- topics, themes, emotions, human_needs, keywords, doctrines (owned by ai-sermon-assistant enrichment)
- Change immutable fields -- source_type, content_type, content_category, content_format
- Write NULL embeddings -- every row must have a valid 1536-dim vector
- Update non-stub rows from regeneration scripts -- only
source_type = 'ai_generated' AND word_count <= 30 - Insert without required fields -- embedding, summary, word_count, quality_score are all mandatory
- Bulk delete -- even for cleanup, archive instead
Content Script Ownership
| Script | Owner | Purpose |
|---|---|---|
assign-visibility-tiers.mjs | sermon-illustrations | Assigns public/free_signup/premium visibility tiers |
regenerate-stubs.mjs | sermon-illustrations | Upgrades AI stubs to full content |
generate-*.mjs variants | sermon-illustrations | Content generation utilities |
audit-content-quality.mjs | sermon-illustrations | Audits quality scores and flags issues |
| Content import/generation | ai-sermon-assistant | Bulk content creation (legacy, reference only) |
| Enrichment pipeline | ai-sermon-assistant | Metadata extraction (legacy, reference only) |
| Vetting/approval | ai-sermon-assistant | Quality review dashboard (legacy, reference only) |
Monitoring & Health Checks
Key Metrics to Watch
| Metric | Expected | Query |
|---|---|---|
| Total rows | ~327K | SELECT count(*) FROM unified_rag_content |
| View rows | ~50K | SELECT count(*) FROM dir_illustrations |
| Stubs remaining | ~2,030 | SELECT count(*) FROM unified_rag_content WHERE source_type='ai_generated' AND word_count<=30 |
| NULL embeddings | 0 | SELECT count(*) FROM unified_rag_content WHERE embedding IS NULL |
| Last view refresh | < 15 min ago | Check pg_cron logs |
View Staleness Check
-- dir_illustrations is a live SQL view — content is always current.
-- If content appears missing, verify it's in unified_rag_content:
SELECT count(*) FROM unified_rag_content
WHERE content_category = 'illustration' AND is_active = true;
-- Verify view row count matches expected ~50K:
SELECT count(*) FROM dir_illustrations;
Reference Documents
sermon-illustrations/internal/content-rules.md-- editorial quality standards, tone guidelines, content type definitionssermon-illustrations/internal/shared-database-contract.md-- table ownership map, write permissions, cross-project rulessermon-illustrations/internal/content-pipeline.md-- original pipeline documentation (may be outdated, this doc is canonical)
See Also
- ITW Premium Overview -- product overview, pricing, visibility tiers
- Search & Browse -- how the website queries and displays content
- Auth Flow -- signup, subscription, premium gating
- RAG Query Process -- how chatbot and voice agent query the same content