What you'll be able to do, the order that works, and the portfolio that proves it
By the end you can do the real work of an AI safety practitioner: take a model and its deployment, threat-model it, write the safety policy, red-team it responsibly, measure harm and over-refusal, reason about alignment and interpretability, manage the risks against a recognized framework, and make a defensible deployment call — with artifacts to prove each one. You don't need to train models or do ML research to do this. You need to think adversarially, measure honestly, and communicate risk clearly.
“I can run a model safety evaluation end to end — threat model, policy, red-team, evals for both harm and over-refusal, a risk register mapped to a framework, and a go/no-go recommendation — and I can hold a serious conversation about alignment research and governance.”
That's the bar. This track gets you there in 60 days.
The hands-on craft: threat modeling, safety taxonomies & content policy, responsible red-teaming, safety evaluation, and adversarial robustness. This is the most directly employable layer.
The deeper question of why capable systems can pursue the wrong goal: the alignment problem, deceptive alignment, mechanistic interpretability, and scalable oversight. This makes you conversant with frontier research.
Turning testing into accountability: risk frameworks (NIST AI RMF), regulation (the EU AI Act and beyond), and a capstone where you assemble everything into one safety-evaluation program.
Each week is five days following the same rhythm: read → build the mental model → a hands-on exercise → apply it → synthesize. Every day ends with three tiers so a hard day never breaks the track:
Each week has a small Python script you run and adapt — a threat model, a taxonomy router, a red-team coverage report, a two-sided safety scorecard, a risk register. They're short and defensive by design; the goal is to produce artifacts, not to become a software engineer. Basic Python is plenty.
Every even week ends with a portfolio checkpoint. By Day 60 you have a complete safety-evaluation binder — the deliverables that turn “I studied AI safety” into “here is the work I can do”:
| Artifact | Built in |
|---|---|
| Threat model for a deployment | Week 1 |
| Safety taxonomy + content policy | Week 2 |
| Red-team plan + coverage log | Week 3 |
| Safety evaluation harness (harm + over-refusal) | Week 4 |
| Robustness / defense-in-depth report | Week 5 |
| Alignment & interpretability brief | Weeks 6–8 |
| Model card + risk register (mapped to NIST AI RMF) | Week 10 |
| Governance / compliance memo | Week 11 |
| Capstone safety-evaluation program + exec brief | Week 12 |
→ See the portfolio showcase — the shareable page that presents the finished binder. Add a link to each artifact as you build it.
The red-teaming and adversarial-robustness weeks teach you to find, measure, and defend against failures — so models ship safer. They deliberately work at the level of attack categories and defensive responses, never operational misuse recipes. The whole point of finding a weakness is to close it and protect the people a system serves. Hold that frame throughout.
Sources throughout are reputable primary references — arXiv papers, the frontier labs' own publications, NIST, and the EU. Where a fact is fast-moving (a regulation's exact dates or penalties), the lesson tells you to verify it against the official source rather than trusting a summary.