v0.1 · public preview

The AI workflow,
treated as software.

A self-improving workbench for prompt engineering, RAG, synthetic data, and evals. One realistic task threads through every stage — orchestrated by an agent that explains its reasoning out loud.

Start with ticket classification→See the pipeline$pnpm dlx agentic init

orchestrator.log● running

// anchor task: classify support tickets by urgency
▶ eval prompt.v3 → accuracy 0.72 · tone 0.81 · p95 312ms
↳ regression on 14/200 examples in {billing, churn}
// orchestrator reasoning
decide → retrieval likely the bottleneck. switching naive → semantic chunking.
▶ rag.rebuild chunker=semantic k=6
▶ eval prompt.v3+rag.v2 → accuracy 0.89 (+0.17) ↑ graduating to v4

The pipeline

Four stages. One anchor task.

Each stage emits typed artifacts the next one consumes. Pick up at any node — the orchestrator routes around what you skip.

Prompt Studio

Version, diff, and auto-refine prompts with a critic LLM.

prompt.v3.mddiffcritic.log

RAG Builder

Chunk, embed, index. Inspect retrieval traces and grounding.

chunks.jsonlpgvectortrace.tree

Synthetic Data

Generate training pairs from approved prompts. Curate inline.

pairs.parquetcurate.tsxfine-tune.job

Eval Harness

Score accuracy, tone, latency. Flag regressions per commit.

suite.tsscores.csvregressions

Orchestrator agent

Watches eval scores, picks the next technique, narrates the decision. The glue between stages.

claude-opus-4-7

Design principles

Built the way you'd build production software.

Versioned prompts. Typed artifacts. Eval suites that block bad regressions. The agentic parts are observable, not magic.

Continuity: One anchor task threads through every stage — no disconnected demos.
Observability: Every LLM call is logged, scored, and diffable across versions.
Agentic, not scripted: An orchestrator picks the next technique based on real eval scores.
Teaches as it runs: Narrated reasoning explains why each technique applies right now.

Under the hood

Stack you already trust.

Boring foundations on purpose. The interesting parts live in the orchestrator.

Runtime

Next.js 16
React 19
TypeScript 5

AI SDK v6
AI Gateway
Claude Opus / Sonnet

Data

Neon Postgres
pgvector
Drizzle ORM

Infra

Vercel Fluid
Cloudflare R2
Inngest

Stop guessing whether your prompt got better.

Run the whole loop — prompt → retrieval → eval — and let the orchestrator tell you what to try next.

Open the studio →Read the docsMIT · self-host friendly

The AI workflow,treated as software.