PAJ by Imparato Case Study

Results

A functional alpha in 4 months — from an idea on a grant application to a working AI system

500+ plays integrated, 5,000+ enrichments produced, a mobile-ready search experience, and a custom AI pipeline built to last.

Custom AI enrichment framework (synthetics) — a declarative pipeline that processes 20 enrichment types per play with full traceability, cost tracking, and rollback

787-dimension classification ontology — the most granular theatrical analysis system we know of, mapping themes, genres, periods, character traits, and staging across the entire corpus

Three-system architecture — clean separation between source data, curation, and intelligence, with Damien fully autonomous on the curation layer

A 3-person dev team onboarded and producing — clear architecture, clean branch strategy, weekly rhythms, 308 commits shipped

Overview

Finding a play to perform is, for most theater troupes, an exercise in frustration. You read blindly, qualify manually, debate from gut feeling, and try to imagine how a text might come alive on stage. There's no structured tool for this. It's throwing darts in the dark.

Damien has spent 8 years in this world. His company, Imparato, is a platform where actors and troupes rehearse their texts using AI-generated voice. Users upload their plays into Imparato to practice — which means the platform sits on a living, growing database of theatrical texts, fed by the users themselves.

PAJ — Pièces À Jouer — was born at the intersection of a Ministry of Culture grant and a strategic insight: what if the same catalog that feeds rehearsals could also help people discover what to perform next? Search a play, shortlist it with your troupe, debate and vote, then rehearse it on Imparato. Discovery as an acquisition channel. A product within a product.

When the engagement started, there was a deck, a clear vision in Damien's head, and a grant application. No code.

The 2.5-month thinking phase

The first move was not to build. It was to think.

For the first month and a half, the work was product conception — deep data modeling sessions on Whimsical, defining the canonical dimensions that would eventually become a 787-dimension classification system, and mapping how AI enrichment would flow through the data model.

Then the designers arrived. Super Serif spent another month and a half on product design: a complete Figma prototype, mobile-first UX designed for the thumb, and full branding. By the end of the year, the design was delivered, the data model was locked, the vision was stable — and the synthetics enrichment framework was already in development.

Two and a half months of zero code. This is unusual. Most projects rush to build. This one rushed to understand.

The build

With the foundation solid, the build moved fast. 308 commits in 4 months across a 3-person team. The backend, data model, AI pipeline, and architecture sat on one side. The user-facing React frontend — chat interface, filter system, mobile experience — on the other, built by Nicolas Carrasco. Thibault contributed the UI component library.

The separation was clean and deliberate: backend/data/AI on one side, frontend/UX on the other. Monday mornings for technical unblocking, Thursday mornings for business and tech sync with Damien. Architecture decisions were non-negotiable; execution within those guardrails was autonomous.

Scope

Phase 1: Conception & Data Modeling

The product started where most products should but few do: with the domain.

Damien brought 8 years of theater expertise. In a burst of creative energy, he used LLMs to generate a massive classification system — hundreds of dimensions covering everything from thematic depth to staging complexity to character archetypes. The job was to take that raw creative output and make it real: structure it, formalize it, model it in PostgreSQL, and connect it to the enrichment pipeline.

The result is a 787-dimension ontology across 8 types: themes, genres, periods, perception, quality, and staging for plays; traits and skills for characters. Each dimension is hierarchical — 74 root categories branching into 713 children. Some are presence-based (is this theme present?), others are spectrum-based (how strong is the dramatic cohesion, from "uneven and confused" to "solid and masterful"?).

This isn't a tagging system. It's a structured attempt to formalize how humans experience dramatic works. Potentially the most granular classification ever built for a theatrical corpus.

Under the hood: the 787-dimension ontology

Eight ontology types span two models:

Play dimensions — 6 types, 439 dimensions:

Theme — 12 roots → 252 total
Genre — 12 → 82
Period — 10 → 48
Perception — 7 → 42
Quality — 10, flat
Staging — 5, flat

Character dimensions — 2 types, 348 dimensions:

Trait — 12 roots → 252 total
Skill — 6 → 96

Two evaluation modes govern how dimensions are scored:

Presence: binary to graduated scale (Absent → Très présent). Used for themes, genres, periods, traits, skills.
Spectrum: min-max labels (e.g., "Inégale, confuse" → "Solide, maîtrisée"). Used for perception, quality, staging.

Storage: one level deep hierarchy (parent → children), stored as self-referential records with CanonicalPlayDimension and CanonicalCharacterDimension models. Seed data lives in CSV and JSON trees at /ontology/.

Traceability: every dimension evaluation is produced by the synthetics enrichment pipeline and recorded as a PlayDimension or CharacterDimension record — auditable and reversible.

Phase 2: The synthetics enrichment framework

Early in the project, a fundamental tension became clear: LLMs are indeterministic. They give slightly different answers every time. But the product needs deterministic, queryable data — SQL rows that can be filtered, aggregated, compared.

The bridge between those two worlds is synthetics.

Inspired by Rails' own ActiveStorage pattern, synthetics was designed as a standalone framework living inside the application: a declarative DSL where any model can declare its enrichments, a processing engine that handles async execution with retry logic, cost tracking per enrichment, versioned prompts with structured output schemas, and a full audit trail with rollback capability.

The principle was clear: don't pollute the business logic with the AI pipeline. The domain models know nothing about LLMs. synthetics handles everything — from prompt rendering to output validation to data materialization — in its own namespace.

Today, synthetics processes 20 enrichment types across 3 models. A single play triggers 12 enrichments; each character triggers 6 more; each scene triggers 2. Across a 559-play catalog, that's a projected 25,000 to 40,000 enrichments — each one an LLM call with structured JSON output, cost tracking, and a reversible integration.

Under the hood: the synthetics library

Architecture: synthetics is structured like a gem inside the app (app/lib/synthetics/), loaded via initializer, with its own models namespace:

Synthetics::Enrichment
Synthetics::Prompt / Synthetics::PromptVersion
Synthetics::Integration
Synthetics::DataLock
Synthetics::Session

Core abstraction: the enriches macro — a class-level DSL that declares what to enrich, how to enrich it, and what to do with the result:

enriches :audience_synopsis,
  prompt: "play_audience_synopsis",
  scope: :field_value_update,
  fields: [:audience_synopsis]

Versioned prompts:

ERB templates with JSON Schema output validation
Provider/model configuration per version
Activation/deactivation controls

Three integrator types:

FieldUpdate — update model fields directly
RecordCreation — create child records
RecordUpdate — modify related records

Preprocessing pattern:

Send the full play text to the LLM once
Receive a compressed interaction matrix
7+ downstream enrichments reuse that compressed output

Saves ~6 redundant full-text LLM calls per play.

Safeguards:

Data locking: Lockable module prevents AI from overwriting human edits at the field level
Cost tracking: every enrichment records input/output tokens and cost snapshot with pricing metadata
Lifecycle auditing: append-only JSONB event arrays (pending → running → completed/failed), with integration snapshots for full rollback
Freshness checking: SHA256 digest of input data + prompt version ID comparison before reuse

Admin backoffice (30 Hotwire/Stimulus views):

Enrichment browsing with advanced filters
Bulk retry/failure actions
Output overrides for manual corrections
Prompt version management
Data lock administration

Phase 3: The Three-System Architecture

The original plan was simple: connect Imparato's database directly to PAJ via webhooks using n8n. It didn't work. The connection was brittle, opaque, and unstable.

The structural fix was to introduce a middle layer: PAJ-SAS (Staging & Curation). Three systems, each with a clear role:

Imparato — the source of truth. Raw play texts in their original format.
PAJ-SAS — the staging layer. Parses raw texts, normalizes titles, manages a status lifecycle. Built in Next.js + Prisma. Damien manages this autonomously through an admin interface.
PAJ — the intelligence layer. Reads from SAS every 5 minutes, enriches, classifies, searches. Never modifies source data.

Data flows one way: Imparato → SAS → PAJ. The only thing flowing back is a status confirmation. Each system is hermetic. If one breaks, the others keep running.

This wasn't the original architecture. It emerged from a real problem — and it's a better system for it.

Under the hood: the three-system pipeline

Imparato (upstream) — Source of truth. Raw play texts in V0 format, living in Imparato's production database.

PAJ-SAS (middle layer) — Staging and curation. Built in Next.js + Prisma + PostgreSQL on Vercel/CleverCloud:

Parses V0 texts into structured content
Normalizes titles via daily n8n workflow (Gemini AI)
Manages a status lifecycle: ready → pending_update → pending_unpublish
Damien operates this layer independently through an admin interface

PAJ (intelligence layer) — Rails 8 + Mistral + SolidQueue:

Sync job polls SAS every 5 minutes for actionable statuses
Dispatches import/update/unpublish jobs
Fetches full play data + structured content
Creates Play/Character/Segment records
Immediately queues all 12 enrichments per play

Key constraint: PAJ never modifies source data. It reads from SAS and writes intelligence into its own database. The only thing flowing back upstream is a status confirmation (published / unpublished).

Resilience: each system is hermetic. If SAS goes down, PAJ keeps serving its existing enriched data. If PAJ goes down, Imparato and SAS are unaffected.

Phase 4: Search & Discovery (In Progress)

The user-facing product is a conversational search experience: a mobile-first chat interface where actors describe what they're looking for, and the system surfaces matching plays using both deterministic filters (author, cast size, duration) and semantic AI reasoning.

The infrastructure is fully wired — the search model, the message system, the filter architecture with 8 classifications, the React chat UI with speech bubbles and a bottom drawer for filter management. The LLM integration is mocked with 14 pre-written French responses that demonstrate the interaction pattern.

The next milestone is activating the real AI search, connecting Mistral's conversational capabilities to the structured data that synthetics has already produced. The foundation is there; the intelligence just needs to be switched on.

Impact

An AI framework built to outlive the project
synthetics isn't glue code. It's a standalone enrichment framework with a declarative DSL, versioned prompts, and full audit trails. When PAJ evolves — new enrichment types, new models, new providers — the framework absorbs the change.
Domain expertise made computational
Damien's 8 years of theater knowledge, formalized into a 787-dimension ontology, connected to an AI pipeline that can classify any play in the corpus. The knowledge isn't trapped in one person's head anymore — it's structured, queryable, and growing.
A team that ships without friction
308 commits across 3 developers in 4 months. Clean backend/frontend separation, PR-based workflow, weekly sync rhythms. Architecture decisions set the rails; developers run autonomously within them.
Conception before construction
2.5 months of product design, data modeling, and UX work before writing code. The result: a build phase with almost no backtracking, no major pivots, no architectural regrets.
A client who stayed in control
The three-system architecture gives Damien direct access to the curation layer. He manages play statuses, monitors the pipeline, and operates independently — without touching the intelligence layer or needing a developer.

Turning thousands of plays into a discovery engine for the theater industry