A tour of the tutorial
This page is a 10-minute walk through the whole tutorial. It is the page to read second — after the thesis essay — if you want to see what was built before you read why.
The tutorial is a working platform, not a write-up. The lessons across the modules build up that platform from zero. This tour zooms out so you can see the parts in one place.
The shape of the platform
Section titled “The shape of the platform”flowchart LR
subgraph platform["Data platform (Python)"]
ingest[("dlt sources<br/>EB-NeRD · Adressa · MIND")]
parquet["Partitioned Parquet"]
duckdb["DuckDB + dbt models"]
dagster["Dagster orchestration"]
ingest --> parquet --> duckdb --> dagster
end
appContract{{"App contract<br/>FastAPI + OpenAPI"}}
analyticalContract{{"Analytical contract<br/>DuckDB tables + dbt docs"}}
editor["TypeScript editor<br/>Express + HTMX"]
swap["Streamlit swap-demo"]
notebook["Analyst notebook / SQL"]
duckdb --> appContract
duckdb --> analyticalContract
appContract --> editor
appContract --> swap
analyticalContract --> notebook
Two contracts, two audiences. Editors talk to the app contract through a small editorial UI. Analysts talk to the analytical contract in SQL or a notebook. Both contracts read the same Parquet-backed DuckDB platform. That is the architectural claim from ADR-0006.
What happens when an editor moves a slider
Section titled “What happens when an editor moves a slider”The most concrete way to see the platform working is to follow a single slider change through every layer:
sequenceDiagram
autonumber
participant Editor as Editor (browser)
participant TSApp as TS editor (Express)
participant API as FastAPI app contract
participant Ranker as Ranker (pure fn)
participant DuckDB as DuckDB + Parquet
Editor->>TSApp: drag diversity slider → HTMX hx-post
TSApp->>API: POST /preview (config inline)
API->>DuckDB: read candidate articles + user embedding
DuckDB-->>API: candidate set (~100 articles)
API->>Ranker: rank(candidate_set, config)
Ranker-->>API: ranked list (≤10)
API-->>TSApp: ranked recommendations
TSApp-->>Editor: HTMX swaps the preview pane
No page reload. No JavaScript framework. The slider update produces a fresh recommendation list that reflects the new editorial configuration — visible to the editor before they commit the change. This is the editorial accountability promise made operational.
How the five constraints combine
Section titled “How the five constraints combine”The ranker is the heart of the platform. It is a pure function from
(candidate_set, constraint_configuration) to a ranked list. The
combination model is mixed enforcement (ADR-0010) — soft weights for tunable
preferences, hard rules for editorial commitments:
flowchart TB
candidates["Candidate set<br/>~100 articles"] --> score
subgraph soft["Soft constraints (weighted score)"]
diversity["Topical diversity<br/>weight 0..1"]
recency["Recency / freshness<br/>weight + half-life"]
sentiment["Sentiment balance<br/>weight + target"]
end
soft --> score["score = relevance<br/>+ Σ wᵢ · soft_termᵢ"]
score --> sortStep["Sort descending"]
sortStep --> hardStep
subgraph hard["Hard rules (applied after sort)"]
promotion["Editorial promotion<br/>insert at fixed positions"]
sensitive["Sensitive-topic guard<br/>hard cap"]
end
hardStep[hard rules] --> final["Ranked list ≤10"]
The formulas live in ADR-0015. The ranker is implemented as a pure function precisely because that boundary makes the editorial transformation testable without a database, a web server, or a UI.
Module map
Section titled “Module map”The tutorial is grouped by role, not by tool. A module is one role on the platform; a lesson inside it is one step of that role.
flowchart LR
foundations[Foundations] --> ingestion[Ingestion]
ingestion --> transformation[Transformation]
transformation --> orchestration[Orchestration]
transformation --> lakehouse[Lakehouse / Storage]
transformation --> modeling[Modeling]
modeling --> editorial[Editorial]
editorial --> serving[Serving]
serving --> editor[Editor]
serving --> assistants[Assistants]
editorial --> evaluation[Evaluation]
modeling --> evaluation
Reading order is left-to-right, but lessons are runnable independently. Each module page in the Modules section is the README for that role; its Lesson sub-page is where the prose walkthrough lives.
Module roles in plain English:
| Module | What this role is doing |
|---|---|
| Foundations | Establish the local DuckDB shape and the news-event vocabulary. Read this if you have never touched DuckDB. |
| Ingestion | Pull EB-NeRD, Adressa, and MIND into the platform with dlt. Three publishers, one canonical staging shape. |
| Transformation | dbt models: staging, marts, tests, and column-level docs. |
| Orchestration | Wrap dlt + dbt as Dagster assets. Daily schedule + new-file sensor. |
| Lakehouse / Storage | Partitioned Parquet layout, DuckDB scans, scale measurements, cloud-migration pathway (no actual deploy). |
| Modeling | Sentence-embedding candidate generation. Modest model with cold-start fallback. |
| Editorial | The five constraints + the mixed-enforcement ranker. Pure-function ranker is the headline TDD module. |
| Serving | FastAPI app contract: /articles/{id}, /recommendations/{user_id}, /preview, /constraint-configurations. Plus the Streamlit swap-demo. |
| Editor | TypeScript Express + HTMX. The reference editor interface. |
| Assistants | Editorial AI assistants behind editor endpoints. Phase 1 is change-explainer, evaluated with Evalite. |
| Evaluation | Constraint-configuration sweeps, NDCG@10 vs diversity Pareto charts across publishers. |
Where to start, by who you are
Section titled “Where to start, by who you are”- An interviewer at JP/Politikens Hus — read the thesis essay, glance at this tour, then look at the evaluation module for the Pareto chart and the serving module for what the editor surface actually does.
- A data engineer kicking the tyres — start at foundations, then walk modules left-to-right.
- An analyst — go straight to the data reference for dbt-generated lineage and column docs, then dip into evaluation for the metric definitions.
- A developer integrating a different editor client — open the API reference. Any client that speaks the OpenAPI contract is interchangeable; the Streamlit swap-demo proves it.
- A future maintainer — read how this was built for the lab, Sandcastle, and ADR-driven grilling workflow that produced the artefact.
Where things live in the repo
Section titled “Where things live in the repo”| Path | What’s in it |
|---|---|
tutorial/foundations/ | First SQL lesson — local DuckDB foundations. |
tutorial/serving/ | FastAPI + ranker + dbt project + Streamlit swap-demo. |
tutorial/storage/ | Partitioned-Parquet substrate + scale test + cloud-migration prose. |
editor/ | TypeScript Express + HTMX editor interface. |
docs-site/ | This site (Astro Starlight). |
docs/adr/ | Architecture Decision Records 0001–0016. Authoritative. |
CONTEXT.md | Domain glossary. The terminology contract for the whole repo. |
.sandcastle/ | The Sandcastle parallel-planner runner that built most of this. |
The argument behind every layer
Section titled “The argument behind every layer”If you take one thing from this tour, take this: every layer above is in service of a single architectural claim — the editorial questions live in the platform, not in the model. The recommender model is deliberately simple. The contracts are deliberately separated. The constraints are deliberately mixed-enforcement. The editor interface is deliberately ordinary. Each “deliberately” is what gives the platform leverage.
The thesis essay makes that argument in full prose. This tour makes it in pictures. The rest of the modules show it in code.