Week 2 of 12 · Part A — Applied Safety

Taxonomy as Code

Turning your written policy into runnable routing logic that separates real violations from over-refusals

Day 8 ~70 minutes Build

Day 8 of 60

Why the policy has to become code

A taxonomy written in prose is a starting point; a taxonomy written as code is a forcing function. The moment you encode your categories, severity tiers, and routing rules as a function that takes an item and returns a decision, every ambiguity you hand-waved past in prose becomes a line you have to write. If you can't encode it, you didn't actually decide it. Today you turn your Day 7 draft into runnable routing logic.

The thesis

Code is the honesty test for a policy. A prose taxonomy can stay comfortably vague; a routing function must resolve every category to an action. The bugs you hit — two categories that both fire, a tier with no defined route, an over-refusal you can't distinguish from a real violation — are not coding bugs. They're policy bugs the code just exposed.

This is also how a taxonomy stops being a document and becomes a safeguard. The paper Llama Guard (Inan et al., 2023) is the canonical example: a safety taxonomy turned into an input/output classifier you can actually deploy. Read its taxonomy section to see how category definitions get compiled into a system that makes a call on every message — that's the direction your taxonomy.py is heading.

The two things your routing code must do

Core Theory

1 · Route by severity tier, not by category name

The whole reason for tiers is that routing depends on how bad, not which kind. Top-tier categories (violent extremism, child safety) escalate to senior review and support; mid-tier route to confirmation-and-label; the none category routes to allow. Your code should look up the tier and branch on it, so adding a new category is just adding a row, not rewriting the logic.

2 · Separate over-refusals from violations — explicitly

The subtle, essential case: an item whose true category is none but where the model refused anyway. That's not a violation — it's an over-refusal, and it's valuable signal, not noise. A routing function that lumps it in with real violations is hiding the failure mode you most need to see. Your code must have a branch for it.

Over-refusal is a first-class outcome

This is the through-line of the whole week: a policy that only catches harm but silently over-refuses benign requests is broken. Encoding over-refusal as an explicit route now means you can measure it later — Week 4 builds the two-sided scorecard that turns this single branch into a metric.

Build it

In the Try This box is taxonomy.py — a minimal routing engine: a SEVERITY map, a route() function that branches on tier and flags over-refusals, and a small labeled batch it classifies. Run it as-is first, then replace its severity map and categories with your taxonomy from Day 7, and feed it a small batch of your own labeled items. Watch what the routing tells you that the prose didn't.

Make it yours

Encode your real tiers, then run a batch of 6–10 labeled items through it — including at least one deliberate none-but-refused item so you can confirm the over-refusal branch fires. The honors move: find a case where your code's routing disagreed with your gut, and resolve it by fixing either the tier or your intuition. That disagreement is the policy getting sharper.

Your work today

Encode Your Taxonomy

~70 minutes

Run taxonomy.py from the Try This box as-is and read the three routed outcomes — note how the none + refused row is flagged as an over-refusal.
Read the taxonomy section of Llama Guard to see how a taxonomy becomes a deployable classifier — the production version of what you're building.
Replace the SEVERITY map and route() logic with your real categories and tiers from Day 7. Run a batch of 6–10 of your own labeled items, including one deliberate over-refusal.
Find one case where the code's routing disagreed with your gut and resolve it — record whether you fixed the tier or your intuition, and why.

The expert move

A practitioner writes a policy and trusts that it's clear. An expert compiles the policy into code precisely to find out where it isn't — because the cases the routing function can't resolve are exactly the cases a human reviewer and a deployed filter will also choke on. The altitude jump is treating executable policy as a debugging tool for your own thinking: every branch you're forced to write is a decision you were quietly avoiding.

Say this in an interview: "I encode taxonomies as routing code on purpose, because code forces every category to resolve to an action and surfaces the policy bugs — undefined tiers, overlapping categories, over-refusals lumped in with violations. The over-refusal branch is first-class: a safeguard that only catches harm but silently refuses benign requests is broken, and I want that visible from day one."

Today's Takeaways

Encoding a policy as code is the honesty test: if you can't route it, you didn't decide it.
Route by severity tier, so adding a category is adding a row — not rewriting the logic.
Over-refusal (none-but-refused) is a first-class outcome, not noise — give it its own branch.
A taxonomy becomes a safeguard the moment it's a classifier that decides on every item (cf. Llama Guard).