Assistants lessons

The assistants lessons walk the change-explainer agent end-to-end, plus the five-scorer Evalite rubric that gates it in CI. All code lives under editor/ because ADR-0020 puts editorial assistants in the TypeScript half — behind editor endpoints, not as freelance AI features bolted onto the recommender.

L01 — Evalite bootstrap

evalite, vitest, and autoevals are wired into the editor workspace via editor/evalite.config.ts. Run the full eval:

pnpm --filter editor eval

Run the Evalite UI:

pnpm --filter editor eval:dev

L02 — The change-explainer agent

The agent lives at editor/src/agents/change-explainer.ts with the prompt at editor/src/agents/prompt.ts.

It takes:

{ before: RankedList, after: RankedList, constraint_diff, platform_facts }

and returns:

{ headline, moved_down[], moved_up[], summary }

The LLM call is wrapped with traceAISDKModel so traces appear in the Evalite UI for every fixture row.

L03 — Programmatic fixtures

editor/src/fixtures/generator.ts generates fixtures by calling the FastAPI /preview endpoint twice (once for the “before” config, once for “after”) for each of N×M (user, constraint-diff) combinations. Default 40 fixtures. Seedable — same seed = same fixtures, every time. CI reproducibility is the discipline this generator enforces.

The editor/fixtures/curated/README.md file is the explicit extension point for hand-pinned cases from a domain expert (per the feedback-showcase-humility memory). It’s empty in v1 — the tutorial does not claim editorial expertise.

L04 — Four deterministic scorers

editor/src/scorers/ holds the deterministic rubric:

Scorer	What it checks	Source
grounded-entities	Every article_id / source / topic cited in the output appears in `before` or `after`. Score = `1 - hallucinated/total`	`grounded-entities.ts`
reason-validity	Each move-up / move-down reason is verifiable against `platform_facts`	`reason-validity.ts`
constraint-coverage	The changed constraint is named in the summary. Binary.	`constraint-coverage.ts`
length	`summary ≤ 60` words; per-item reasons `≤ 25` words	`length.ts`

Each scorer is a pure function with its own Vitest unit tests.

L05 — LLM-as-judge: editorial register

editor/src/scorers/editorial-register.ts is a fifth scorer using createScorer from Evalite. Judges whether the summary reads like a Danish news editor would write — concise, specific, no AI-slop, no hedging.

Gated: only runs when grounded-entities ≥ 0.95. Otherwise returns n/a and is excluded from the row’s average. Avoids judging gibberish for register, and saves model spend on rows that are already failing.

L06 — The Evalite suite + CI threshold

editor/evals/change-explainer.eval.ts wires fixtures + agent + scorers. The CI threshold:

Average across all scorers ≥ 0.85
grounded-entities ≥ 0.95 on every row (one hallucination = suite failure)

Threshold enforcement lives in editor/scripts/eval-summary.ts — a wrapper that exits non-zero when the rubric fails, so a regression shows up as a red CI build.

L07 — Editor integration

When an editor moves a slider, two HTMX partials fire in parallel: the recommendations partial AND the change-explainer partial. The partial route is src/routes/change-explainer-partial.ts. The editor sees the new list and the editorial reason for the change side by side.

The dbt/Evalite split

The deterministic constraint metrics (diversity, recency, source mix, sensitive-topic exposure) stay in dbt because they are reproducible checks over materialised platform outputs. The non-deterministic assistant behaviours (faithfulness, register, intent-translation accuracy) live in Evalite. The split is ADR-0020.

Planned future suites

The same pattern applies to two more editorial assistants, deferred to a later phase:

constraint-translator — turns editor intent (“make today’s front more varied”) into specific constraint weight changes.
alternative-suggester — proposes alternative slates for a given context.

Both are explicitly named in ADR-0020 as not-yet-built so a reader knows they aren’t lost work.