Turning thousands of plays into a discovery engine for the theater industry

Results
500+ plays integrated, 5,000+ enrichments produced, a mobile-ready search experience, and a custom AI pipeline built to last.
synthetics) — a declarative pipeline that processes 20 enrichment types per play with full traceability, cost tracking, and rollback
Overview
Finding a play to perform is, for most theater troupes, an exercise in frustration. You read blindly, qualify manually, debate from gut feeling, and try to imagine how a text might come alive on stage. There's no structured tool for this. It's throwing darts in the dark.
Damien has spent 8 years in this world. His company, Imparato, is a platform where actors and troupes rehearse their texts using AI-generated voice. Users upload their plays into Imparato to practice — which means the platform sits on a living, growing database of theatrical texts, fed by the users themselves.
PAJ — Pièces À Jouer — was born at the intersection of a Ministry of Culture grant and a strategic insight: what if the same catalog that feeds rehearsals could also help people discover what to perform next? Search a play, shortlist it with your troupe, debate and vote, then rehearse it on Imparato. Discovery as an acquisition channel. A product within a product.
When the engagement started, there was a deck, a clear vision in Damien's head, and a grant application. No code.
The 2.5-month thinking phase
The first move was not to build. It was to think.
For the first month and a half, the work was product conception — deep data modeling sessions on Whimsical, defining the canonical dimensions that would eventually become a 787-dimension classification system, and mapping how AI enrichment would flow through the data model.
Then the designers arrived. Super Serif spent another month and a half on product design: a complete Figma prototype, mobile-first UX designed for the thumb, and full branding. By the end of the year, the design was delivered, the data model was locked, the vision was stable — and the synthetics enrichment framework was already in development.
Two and a half months of zero code. This is unusual. Most projects rush to build. This one rushed to understand.
The build
With the foundation solid, the build moved fast. 308 commits in 4 months across a 3-person team. The backend, data model, AI pipeline, and architecture sat on one side. The user-facing React frontend — chat interface, filter system, mobile experience — on the other, built by Nicolas Carrasco. Thibault contributed the UI component library.
The separation was clean and deliberate: backend/data/AI on one side, frontend/UX on the other. Monday mornings for technical unblocking, Thursday mornings for business and tech sync with Damien. Architecture decisions were non-negotiable; execution within those guardrails was autonomous.
Scope
Phase 1: Conception & Data Modeling
The product started where most products should but few do: with the domain.
Damien brought 8 years of theater expertise. In a burst of creative energy, he used LLMs to generate a massive classification system — hundreds of dimensions covering everything from thematic depth to staging complexity to character archetypes. The job was to take that raw creative output and make it real: structure it, formalize it, model it in PostgreSQL, and connect it to the enrichment pipeline.
The result is a 787-dimension ontology across 8 types: themes, genres, periods, perception, quality, and staging for plays; traits and skills for characters. Each dimension is hierarchical — 74 root categories branching into 713 children. Some are presence-based (is this theme present?), others are spectrum-based (how strong is the dramatic cohesion, from "uneven and confused" to "solid and masterful"?).
This isn't a tagging system. It's a structured attempt to formalize how humans experience dramatic works. Potentially the most granular classification ever built for a theatrical corpus.
Under the hood: the 787-dimension ontology
Eight ontology types span two models:
Play dimensions — 6 types, 439 dimensions:
- Theme — 12 roots → 252 total
- Genre — 12 → 82
- Period — 10 → 48
- Perception — 7 → 42
- Quality — 10, flat
- Staging — 5, flat
Character dimensions — 2 types, 348 dimensions:
- Trait — 12 roots → 252 total
- Skill — 6 → 96
Two evaluation modes govern how dimensions are scored:
- Presence: binary to graduated scale (Absent → Très présent). Used for themes, genres, periods, traits, skills.
- Spectrum: min-max labels (e.g., "Inégale, confuse" → "Solide, maîtrisée"). Used for perception, quality, staging.
Storage: one level deep hierarchy (parent → children), stored as self-referential records with CanonicalPlayDimension and CanonicalCharacterDimension models. Seed data lives in CSV and JSON trees at /ontology/.
Traceability: every dimension evaluation is produced by the synthetics enrichment pipeline and recorded as a PlayDimension or CharacterDimension record — auditable and reversible.
Phase 2: The synthetics enrichment framework
Early in the project, a fundamental tension became clear: LLMs are indeterministic. They give slightly different answers every time. But the product needs deterministic, queryable data — SQL rows that can be filtered, aggregated, compared.
The bridge between those two worlds is synthetics.
Inspired by Rails' own ActiveStorage pattern, synthetics was designed as a standalone framework living inside the application: a declarative DSL where any model can declare its enrichments, a processing engine that handles async execution with retry logic, cost tracking per enrichment, versioned prompts with structured output schemas, and a full audit trail with rollback capability.
The principle was clear: don't pollute the business logic with the AI pipeline. The domain models know nothing about LLMs. synthetics handles everything — from prompt rendering to output validation to data materialization — in its own namespace.
Today, synthetics processes 20 enrichment types across 3 models. A single play triggers 12 enrichments; each character triggers 6 more; each scene triggers 2. Across a 559-play catalog, that's a projected 25,000 to 40,000 enrichments — each one an LLM call with structured JSON output, cost tracking, and a reversible integration.
Under the hood: the synthetics library
Architecture: synthetics is structured like a gem inside the app (app/lib/synthetics/), loaded via initializer, with its own models namespace:
Synthetics::EnrichmentSynthetics::Prompt/Synthetics::PromptVersionSynthetics::IntegrationSynthetics::DataLockSynthetics::Session
Core abstraction: the enriches macro — a class-level DSL that declares what to enrich, how to enrich it, and what to do with the result:
enriches :audience_synopsis,
prompt: "play_audience_synopsis",
scope: :field_value_update,
fields: [:audience_synopsis]
Versioned prompts:
- ERB templates with JSON Schema output validation
- Provider/model configuration per version
- Activation/deactivation controls
Three integrator types:
FieldUpdate— update model fields directlyRecordCreation— create child recordsRecordUpdate— modify related records
Preprocessing pattern:
- Send the full play text to the LLM once
- Receive a compressed interaction matrix
- 7+ downstream enrichments reuse that compressed output
Saves ~6 redundant full-text LLM calls per play.
Safeguards:
- Data locking:
Lockablemodule prevents AI from overwriting human edits at the field level - Cost tracking: every enrichment records input/output tokens and cost snapshot with pricing metadata
- Lifecycle auditing: append-only JSONB event arrays (pending → running → completed/failed), with integration snapshots for full rollback
- Freshness checking: SHA256 digest of input data + prompt version ID comparison before reuse
Admin backoffice (30 Hotwire/Stimulus views):
- Enrichment browsing with advanced filters
- Bulk retry/failure actions
- Output overrides for manual corrections
- Prompt version management
- Data lock administration
Phase 3: The Three-System Architecture
The original plan was simple: connect Imparato's database directly to PAJ via webhooks using n8n. It didn't work. The connection was brittle, opaque, and unstable.
The structural fix was to introduce a middle layer: PAJ-SAS (Staging & Curation). Three systems, each with a clear role:
- Imparato — the source of truth. Raw play texts in their original format.
- PAJ-SAS — the staging layer. Parses raw texts, normalizes titles, manages a status lifecycle. Built in Next.js + Prisma. Damien manages this autonomously through an admin interface.
- PAJ — the intelligence layer. Reads from SAS every 5 minutes, enriches, classifies, searches. Never modifies source data.
Data flows one way: Imparato → SAS → PAJ. The only thing flowing back is a status confirmation. Each system is hermetic. If one breaks, the others keep running.
This wasn't the original architecture. It emerged from a real problem — and it's a better system for it.
Under the hood: the three-system pipeline
Imparato (upstream) — Source of truth. Raw play texts in V0 format, living in Imparato's production database.
PAJ-SAS (middle layer) — Staging and curation. Built in Next.js + Prisma + PostgreSQL on Vercel/CleverCloud:
- Parses V0 texts into structured content
- Normalizes titles via daily n8n workflow (Gemini AI)
- Manages a status lifecycle:
ready→pending_update→pending_unpublish - Damien operates this layer independently through an admin interface
PAJ (intelligence layer) — Rails 8 + Mistral + SolidQueue:
- Sync job polls SAS every 5 minutes for actionable statuses
- Dispatches import/update/unpublish jobs
- Fetches full play data + structured content
- Creates Play/Character/Segment records
- Immediately queues all 12 enrichments per play
Key constraint: PAJ never modifies source data. It reads from SAS and writes intelligence into its own database. The only thing flowing back upstream is a status confirmation (published / unpublished).
Resilience: each system is hermetic. If SAS goes down, PAJ keeps serving its existing enriched data. If PAJ goes down, Imparato and SAS are unaffected.
Phase 4: Search & Discovery (In Progress)
The user-facing product is a conversational search experience: a mobile-first chat interface where actors describe what they're looking for, and the system surfaces matching plays using both deterministic filters (author, cast size, duration) and semantic AI reasoning.
The infrastructure is fully wired — the search model, the message system, the filter architecture with 8 classifications, the React chat UI with speech bubbles and a bottom drawer for filter management. The LLM integration is mocked with 14 pre-written French responses that demonstrate the interaction pattern.
The next milestone is activating the real AI search, connecting Mistral's conversational capabilities to the structured data that synthetics has already produced. The foundation is there; the intelligence just needs to be switched on.
Testimonial coming soon.

Impact
-
An AI framework built to outlive the project
syntheticsisn't glue code. It's a standalone enrichment framework with a declarative DSL, versioned prompts, and full audit trails. When PAJ evolves — new enrichment types, new models, new providers — the framework absorbs the change. -
Domain expertise made computational
Damien's 8 years of theater knowledge, formalized into a 787-dimension ontology, connected to an AI pipeline that can classify any play in the corpus. The knowledge isn't trapped in one person's head anymore — it's structured, queryable, and growing.
-
A team that ships without friction
308 commits across 3 developers in 4 months. Clean backend/frontend separation, PR-based workflow, weekly sync rhythms. Architecture decisions set the rails; developers run autonomously within them.
-
Conception before construction
2.5 months of product design, data modeling, and UX work before writing code. The result: a build phase with almost no backtracking, no major pivots, no architectural regrets.
-
A client who stayed in control
The three-system architecture gives Damien direct access to the curation layer. He manages play statuses, monitors the pipeline, and operates independently — without touching the intelligence layer or needing a developer.