Week 1 of 12 · Part A — Applied Safety

Threat-Modeling a Deployment

Turning "what could go wrong?" into a ranked, defensible list you can act on

Day 3 ~75 minutes Build

Day 3 of 60

From worry to artifact

"This model might do something bad" is a feeling. A threat model is an artifact: a structured enumeration of what can go wrong, for whom, and how likely and how bad each failure is — ranked so you know where to spend your limited attention. Today you build one. Every later week's work (policy, red-team, eval, risk register) is downstream of this list.

The thesis

Safety is triage. You can never test or defend against everything, so the first real skill is ranking: which failures are both likely and high-impact? A threat model is how you make that ranking explicit instead of accidental.

The four questions of a threat model

Core Theory

1 · Assets — what are we protecting?

Users, the public, private data, the platform's integrity, the organization's reputation. Naming assets stops you from only thinking about the model and forgetting the people around it.

2 · Threat actors & failure modes — what could go wrong?

Adversarial users (jailbreaks, misuse), the model itself (harmful or wrong outputs), the environment (prompt injection through retrieved content), and scale effects. Borrow the three risk types from Day 1 as a checklist.

3 · Likelihood × impact — how do we rank?

Score each failure on how probable it is and how bad it would be. The product orders your work. A rare-but-catastrophic risk and a common-but-mild one are both real — the score keeps both visible without pretending they're equal.

4 · Detection — how would we catch it?

For each top risk, what eval, monitor, or red-team would surface it? A threat model that doesn't connect to detection is a worry list, not a plan.

This is the same discipline security engineers have used for decades, adapted to models. The point isn't to predict the future perfectly — it's to make your priorities explicit and defensible so a teammate (or an interviewer) can challenge them.

Build it

Below (in the Try This box) is threat_model.py — a minimal, runnable threat model that ranks risks by likelihood × impact. Run it, then replace its risks with ones for a real deployment you know: a customer-support assistant, an image generator, a coding agent. Notice how the ranking, not your gut, tells you where to start.

Make it yours

Pick one concrete system and write 6–10 risks across all three risk types (misuse, accident, systemic). Score each, sort, and look at the top three. Those three are your imaginary "Week 1 deliverable" — the failures a real safety review would open with.

Your work today

Build a Threat Model

~75 minutes

  1. Run threat_model.py from the Try This box and read its output.
  2. Choose one real deployment and rewrite the RISKS list with 6–10 of your own, covering all three risk types.
  3. Sort by likelihood × impact and write a sentence on why the top risk is the top risk — and what would detect it.
The expert move

A novice lists everything that could go wrong, flatly. An expert ranks — and can defend the ranking — because safety attention is the scarcest resource on any team. Owning the threat model means owning the priorities: you decide what gets tested first, and you can say why.

Say this in an interview: "I start every safety engagement with a threat model — assets, failure modes, likelihood × impact, and how we'd detect each. It turns 'we should be careful' into a ranked plan I can hand to a team, and it's the document I'd defend in a review."

Today's Takeaways