Orientation · Read First

How to Use This Track

What you'll be able to do, the order that works, and the portfolio that proves it

What this track makes you

By the end you can do the real work of an AI safety practitioner: take a model and its deployment, threat-model it, write the safety policy, red-team it responsibly, measure harm and over-refusal, reason about alignment and interpretability, manage the risks against a recognized framework, and make a defensible deployment call — with artifacts to prove each one. You don't need to train models or do ML research to do this. You need to think adversarially, measure honestly, and communicate risk clearly.

Your real target — say this in an interview

“I can run a model safety evaluation end to end — threat model, policy, red-team, evals for both harm and over-refusal, a risk register mapped to a framework, and a go/no-go recommendation — and I can hold a serious conversation about alignment research and governance.”

That's the bar. This track gets you there in 60 days.

The three parts

The Arc

Part A — Applied Safety & Evaluations (Weeks 1–5)

The hands-on craft: threat modeling, safety taxonomies & content policy, responsible red-teaming, safety evaluation, and adversarial robustness. This is the most directly employable layer.

Part B — Alignment Research Literacy (Weeks 6–9)

The deeper question of why capable systems can pursue the wrong goal: the alignment problem, deceptive alignment, mechanistic interpretability, and scalable oversight. This makes you conversant with frontier research.

Part C — Governance, Policy & Systemic Safety (Weeks 10–12)

Turning testing into accountability: risk frameworks (NIST AI RMF), regulation (the EU AI Act and beyond), and a capstone where you assemble everything into one safety-evaluation program.

How a day works

Each week is five days following the same rhythm: read → build the mental model → a hands-on exercise → apply it → synthesize. Every day ends with three tiers so a hard day never breaks the track:

  1. Floor — the minimum that makes the day count. Do at least this.
  2. Goal — what the day was designed to achieve. Aim here.
  3. Honors — the stretch work that separates an expert from a dabbler. Only after Goal is done.
The hands-on exercises are real

Each week has a small Python script you run and adapt — a threat model, a taxonomy router, a red-team coverage report, a two-sided safety scorecard, a risk register. They're short and defensive by design; the goal is to produce artifacts, not to become a software engineer. Basic Python is plenty.

The portfolio you'll walk out with

Every even week ends with a portfolio checkpoint. By Day 60 you have a complete safety-evaluation binder — the deliverables that turn “I studied AI safety” into “here is the work I can do”:

ArtifactBuilt in
Threat model for a deploymentWeek 1
Safety taxonomy + content policyWeek 2
Red-team plan + coverage logWeek 3
Safety evaluation harness (harm + over-refusal)Week 4
Robustness / defense-in-depth reportWeek 5
Alignment & interpretability briefWeeks 6–8
Model card + risk register (mapped to NIST AI RMF)Week 10
Governance / compliance memoWeek 11
Capstone safety-evaluation program + exec briefWeek 12

→ See the portfolio showcase — the shareable page that presents the finished binder. Add a link to each artifact as you build it.

An ethics note — read this before Part A

Defensive purpose only

The red-teaming and adversarial-robustness weeks teach you to find, measure, and defend against failures — so models ship safer. They deliberately work at the level of attack categories and defensive responses, never operational misuse recipes. The whole point of finding a weakness is to close it and protect the people a system serves. Hold that frame throughout.

Sources throughout are reputable primary references — arXiv papers, the frontier labs' own publications, NIST, and the EU. Where a fact is fast-moving (a regulation's exact dates or penalties), the lesson tells you to verify it against the official source rather than trusting a summary.

The One-Paragraph Plan