Week 2 of 12 · Part A — Applied Safety

Shipping the Taxonomy

Locking in your first portfolio artifact — a taxonomy, a policy, and the code that routes it

Day 10 ~50 minutes Review

Day 10 of 60

What you now hold

Two weeks in, you've gone from "what could go wrong?" to "here is exactly what we do about it." This week produced your first real portfolio artifact: a multi-category safety taxonomy with precise definitions and severity tiers, a refusal policy that takes the helpful↔harmless tension seriously, and taxonomy.py that routes items and separates real violations from over-refusals. Today you finish it, make it defensible, and lock it in.

The through-line of Week 2

"Is this output harmful?" is meaningless until someone authors the categories, the tiers, and the edge rules. A safety policy is the contract every downstream filter, eval, and reviewer inherits — and authoring one that's decidable, balanced, and versioned is one of the highest-leverage things a safety practitioner does.

The test of a shipped policy: someone else can use it

A taxonomy that only its author can apply isn't finished — it's a personal heuristic with extra steps. The bar for "shipped" is reusability: could a new reviewer pick up your document and apply it consistently, with no explanation from you? That's the same property — inter-rater agreement — you've been designing for all week, now stated as the acceptance test.

Ship Checklist

1 · Every category is defensible

For each category and tier, you can say in one sentence why it exists, what it excludes, and why its severity is what it is. If you can't defend a tier, it's a guess — fix it or merge it.

2 · The policy addresses the tension explicitly

The document states what to allow, what to refuse, and what to safe-complete — and acknowledges that over-refusal is a failure, not a safe default. A policy that only knows how to refuse hasn't taken the tension seriously.

3 · The code matches the document

taxonomy.py implements the tiers and routing your prose describes, and its over-refusal branch is real. Document and code drifting apart is how policies quietly rot — they should ship together.

Commit it — this is portfolio-grade

Get the three pieces committed together: the taxonomy, the refusal policy, and taxonomy.py. This is the first checkpoint of your Part A portfolio. The honors move: draft a one-paragraph note on how you'd roll this out to a review team — onboarding, the changelog discipline from Day 9, and how disagreements feed back into the policy.

Self-quiz — can you defend the week without notes?

Prove the Week

~50 minutes

List the five components every safety taxonomy needs (categories, definitions, severity tiers, worked examples, routing rule) — from memory.
Explain the helpful↔harmless tension and why a policy that only refuses is broken. Define safe-completion with an example.
Pick your two riskiest categories and defend their severity tiers out loud — why each is where it is.
Explain how a reviewer disagreement becomes a v2 policy change, and why the changelog matters. Re-skim the OpenAI Model Spec and the Anthropic Usage Policy to check your structure against theirs.
Write your Week 2 summary in your own words, and the one category you're least confident about and why.

The expert move

A practitioner ships a policy they can apply. An expert ships a policy someone else can apply — and packages it as taxonomy + policy + code together, with a changelog and a rollout plan, so it scales beyond their own judgment. The altitude jump is from having a good standard to operationalizing one: a document, code, and a process that make a whole team's decisions consistent and auditable.

Say this in an interview: "My bar for a finished policy is reusability — a new reviewer can apply it consistently with no explanation from me. So I ship the taxonomy, the refusal policy, and the routing code as one versioned artifact with a changelog, and I can defend every category and tier. That's the difference between a personal heuristic and a standard a team can run."

Week 2 Takeaways

A safety policy is the contract every filter, eval, and reviewer inherits — author it deliberately.
Five components: categories, definitions, severity tiers, worked examples, routing rule.
Take the helpful↔harmless tension seriously — over-refusal is a failure, and safe-completion is the middle path.
The bar for shipped is reusability: taxonomy + policy + code + changelog, defensible by anyone. Next week: red-teaming it.