Skip to content

Evaluation lessons

The full evaluation lesson uses the checked-in holdout fixture to exercise the same ranker the serving API uses. It evaluates eight pure metrics: NDCG@10, MRR, hit-rate@10, intra-list diversity, catalog coverage, median recency, sentiment-distribution divergence, and sensitive-topic exposure.

This lesson prose is deliberately table-shaped: every metric and grid point maps back to rows in the analytical contract.

The sweep grid crosses all three soft editorial weights:

  • diversity_weight: 0.00, 0.15, 0.30, 0.60, 1.00
  • recency_weight: 0.00, 0.20, 0.40, 0.70, 1.00
  • sentiment_weight: 0.00, 0.10, 0.20, 0.50, 1.00

This is a 5 by 5 by 5 sweep. It includes the click-only baseline and the platform default, then writes eval_sweep_results as the analytical contract for SQL, notebooks, and chart generation.