Week 12 of 12 · Part C — Governance

Risk, Alignment & Governance Sections

Adding the deeper layers — robustness, alignment, the risk register, and the governance gaps — so the program is coherent across all three layers of the field

Day 58 ~75 minutes Concept

Day 58 of 60

Why the spine isn't enough

Yesterday's spine covered the applied core: policy, red-team, evals. But this whole track moved through three layers — applied safety, alignment research literacy, and governance — and a program that only shows the applied layer reads as shallow to anyone senior. Today you bolt on the rest: the robustness report, the alignment and interpretability note, the risk register, and the governance gap list. When you're done, the binder spans all three layers and tells one continuous story.

The thesis

Applied safety tells you whether the model fails today. Alignment literacy tells you why a more capable version might fail in ways your evals can't see. Governance tells you who's accountable and against what framework. A credible program answers all three — because a deployment decision that ignores any one of them is overconfident.

The three layers you're adding

Core Theory

1 · Robustness section — defense in depth (Week 5)

Drop in your robustness report and its defense-in-depth matrix. Its job in the program is honesty about brittleness: safety-tuning alone is breakable, and you list the attack classes (jailbreaks, indirect injection) plus the layered defenses and their residual attack-success rates. This is where the program admits what it can't fully stop, and says how it'll monitor for it.

2 · Alignment + interpretability note — the limits (Weeks 7–8)

Include your brief on deception, sycophancy, and the limits of interpretability. Its job is epistemic humility: behavioral evals can only catch what they probe, and a capable model can pass them while pursuing the wrong objective. You're not claiming to have solved alignment — you're showing you know which of your assurances are behavioral (and therefore bounded) versus mechanistic.

3 · Risk register + governance gaps — accountability (Weeks 10–11)

Include the risk register mapped to a recognized framework (NIST AI RMF) and the governance gap list from your compliance check. Their job is to turn findings into owned, tracked items against an external standard — so "we found risks" becomes "here are the residual risks, their owners, and the gaps we must close before or shortly after launch."

Pre-register the verdict criteria

Before you look at your assembled results, write down what verdict each outcome implies: which findings force a NO-GO, which allow GO-with-conditions, which are acceptable residual risk. Deciding the decision rule after seeing results is how programs rationalize shipping. Pre-registering it is what makes your recommendation tomorrow defensible.

Making it coherent, not just complete

The trap at this stage is a binder that's complete but not coherent — eight sections that each make sense alone but don't connect. Coherence means the robustness report's residual attack-success rate shows up in the risk register; the alignment note's "we can't fully verify intent" caveats the eval section's pass; the governance gaps name owners who appear in the risk register. The reader should be able to trace one risk from threat model, through how it was tested, to its residual level and who owns it.

The coherence test

Pick your single highest risk. Can you trace it across the whole binder — named in the threat model, defined by the policy, tested by the red-team, measured by an eval, defended in the robustness report, caveated by the alignment note, logged in the risk register, and owned in the governance section? If any link is missing, that's today's last edit. Then quantify the residual risk after mitigations — what's left once your defenses are applied.

Your work today

Add the Deeper Layers

~75 minutes

Add the robustness section from your Week 5 report, including residual attack-success rates after your layered defenses.
Add the alignment + interpretability note from Weeks 7–8 — label clearly which assurances are behavioral (bounded) versus mechanistic.
Add the risk register (Week 10) mapped to NIST AI RMF and the governance gap list (Week 11), each item with an owner.
Pre-register your verdict criteria: which findings mean NO-GO, GO-with-conditions, or acceptable residual risk — before reviewing your assembled results.
Run the coherence test on your single highest risk, fix any broken link, and quantify the residual risk that remains after mitigations.

The expert move

A junior shows the model passed the evals. An expert shows the evals' blind spots too — the brittleness in the robustness report, the limits of behavioral testing in the alignment note, the residual risks the register tracks — and still makes a call. The altitude jump is from "it passed" to "here's exactly how confident I am, why, and what I'm watching that could change my mind."

Say this in an interview: "My program spans all three layers — I don't just show the model passed today's evals, I show what those evals can't see: the brittleness, the limits of behavioral assurance, the residual risks mapped to a framework with owners. I pre-register the verdict criteria so the recommendation is a rule applied, not a result rationalized."

Today's Takeaways

A credible program spans all three layers: applied (does it fail today), alignment (why a capable version might), governance (who's accountable).
The robustness and alignment sections add honesty — brittleness and the bounded reach of behavioral evals.
Coherence beats completeness: trace one risk from threat model through to residual level and owner.
Pre-register the verdict criteria before seeing results — it's what makes tomorrow's recommendation defensible.