Skip to content

Assistants lessons

The assistants lessons walk the change-explainer agent end-to-end, plus the five-scorer Evalite rubric that gates it in CI. All code lives under editor/ because ADR-0020 puts editorial assistants in the TypeScript half — behind editor endpoints, not as freelance AI features bolted onto the recommender.

evalite, vitest, and autoevals are wired into the editor workspace via editor/evalite.config.ts. Run the full eval:

Terminal window
pnpm --filter editor eval

Run the Evalite UI:

Terminal window
pnpm --filter editor eval:dev

The agent lives at editor/src/agents/change-explainer.ts with the prompt at editor/src/agents/prompt.ts.

It takes:

{ before: RankedList, after: RankedList, constraint_diff, platform_facts }

and returns:

{ headline, moved_down[], moved_up[], summary }

The LLM call is wrapped with traceAISDKModel so traces appear in the Evalite UI for every fixture row.

editor/src/fixtures/generator.ts generates fixtures by calling the FastAPI /preview endpoint twice (once for the “before” config, once for “after”) for each of N×M (user, constraint-diff) combinations. Default 40 fixtures. Seedable — same seed = same fixtures, every time. CI reproducibility is the discipline this generator enforces.

The editor/fixtures/curated/README.md file is the explicit extension point for hand-pinned cases from a domain expert (per the feedback-showcase-humility memory). It’s empty in v1 — the tutorial does not claim editorial expertise.

editor/src/scorers/ holds the deterministic rubric:

ScorerWhat it checksSource
grounded-entitiesEvery article_id / source / topic cited in the output appears in before or after. Score = 1 - hallucinated/totalgrounded-entities.ts
reason-validityEach move-up / move-down reason is verifiable against platform_factsreason-validity.ts
constraint-coverageThe changed constraint is named in the summary. Binary.constraint-coverage.ts
lengthsummary ≤ 60 words; per-item reasons ≤ 25 wordslength.ts

Each scorer is a pure function with its own Vitest unit tests.

editor/src/scorers/editorial-register.ts is a fifth scorer using createScorer from Evalite. Judges whether the summary reads like a Danish news editor would write — concise, specific, no AI-slop, no hedging.

Gated: only runs when grounded-entities ≥ 0.95. Otherwise returns n/a and is excluded from the row’s average. Avoids judging gibberish for register, and saves model spend on rows that are already failing.

editor/evals/change-explainer.eval.ts wires fixtures + agent + scorers. The CI threshold:

  • Average across all scorers ≥ 0.85
  • grounded-entities ≥ 0.95 on every row (one hallucination = suite failure)

Threshold enforcement lives in editor/scripts/eval-summary.ts — a wrapper that exits non-zero when the rubric fails, so a regression shows up as a red CI build.

When an editor moves a slider, two HTMX partials fire in parallel: the recommendations partial AND the change-explainer partial. The partial route is src/routes/change-explainer-partial.ts. The editor sees the new list and the editorial reason for the change side by side.

The deterministic constraint metrics (diversity, recency, source mix, sensitive-topic exposure) stay in dbt because they are reproducible checks over materialised platform outputs. The non-deterministic assistant behaviours (faithfulness, register, intent-translation accuracy) live in Evalite. The split is ADR-0020.

The same pattern applies to two more editorial assistants, deferred to a later phase:

  • constraint-translator — turns editor intent (“make today’s front more varied”) into specific constraint weight changes.
  • alternative-suggester — proposes alternative slates for a given context.

Both are explicitly named in ADR-0020 as not-yet-built so a reader knows they aren’t lost work.