agentic.bot
v0.1 · public preview

The AI workflow,
treated as software.

A self-improving workbench for prompt engineering, RAG, synthetic data, and evals. One realistic task threads through every stage — orchestrated by an agent that explains its reasoning out loud.

orchestrator.log● running
// anchor task: classify support tickets by urgency
 eval prompt.v3 → accuracy 0.72 · tone 0.81 · p95 312ms
 regression on 14/200 examples in {billing, churn}
// orchestrator reasoning
decide  retrieval likely the bottleneck. switching naive → semantic chunking.
 rag.rebuild chunker=semantic k=6
 eval prompt.v3+rag.v2 → accuracy 0.89 (+0.17) ↑ graduating to v4

The pipeline

Four stages. One anchor task.

01

Prompt Studio

Version, diff, and auto-refine prompts with a critic LLM.

prompt.v3.mddiffcritic.log
02

RAG Builder

Chunk, embed, index. Inspect retrieval traces and grounding.

chunks.jsonlpgvectortrace.tree
03

Synthetic Data

Generate training pairs from approved prompts. Curate inline.

pairs.parquetcurate.tsxfine-tune.job
04

Eval Harness

Score accuracy, tone, latency. Flag regressions per commit.

suite.tsscores.csvregressions
00

Orchestrator agent

Watches eval scores, picks the next technique, narrates the decision. The glue between stages.

Design principles

Built the way you'd build production software.

Versioned prompts. Typed artifacts. Eval suites that block bad regressions. The agentic parts are observable, not magic.

Continuity
One anchor task threads through every stage — no disconnected demos.
Observability
Every LLM call is logged, scored, and diffable across versions.
Agentic, not scripted
An orchestrator picks the next technique based on real eval scores.
Teaches as it runs
Narrated reasoning explains why each technique applies right now.

Under the hood

Stack you already trust.

Runtime

  • Next.js 16
  • React 19
  • TypeScript 5

AI

  • AI SDK v6
  • AI Gateway
  • Claude Opus / Sonnet

Data

  • Neon Postgres
  • pgvector
  • Drizzle ORM

Infra

  • Vercel Fluid
  • Cloudflare R2
  • Inngest

Stop guessing whether your prompt got better.

Run the whole loop — prompt → retrieval → eval — and let the orchestrator tell you what to try next.

Open the studio →Read the docsMIT · self-host friendly