Week 10 of 12 · Part C — Governance

Model & System Cards

The honest, structured document that says what a model is for, how it performs, and where it fails

Day 47 ~60 minutes Concept

Day 47 of 60

The artifact that makes evaluation legible

You can run the best safety evaluation in the world, but if the results live in a notebook only you can read, they don't govern anything. A model card is the document that makes your work legible to everyone who needs it — deployers, reviewers, regulators, downstream developers. It's a structured, honest summary of what a model is for, how it performs, and where it breaks. If the framework (Day 46) is the process, the model card is the output you can hand someone.

The thesis

A model card's value is entirely in its honesty. A flattering card that hides limitations is worse than none, because it manufactures false confidence. The skill isn't writing a nice document — it's documenting the failures you'd rather not advertise, because that's exactly what a reviewer needs to make a real decision.

The sections of a model card

The original Model Cards for Model Reporting paper (Mitchell et al., 2019) introduced a template that has become the backbone of nearly every responsible-release document since. Learn the sections — they're the skeleton you'll fill for your portfolio artifact.

Core Theory

1 · Intended use — and intended non-use

What is this model designed for, and explicitly not for? Naming out-of-scope uses is half the safety value: it draws the boundary a deployer is responsible for staying inside.

2 · Performance — disaggregated, not averaged

How well does it do, and crucially for whom? A single accuracy number hides disparities. A real card breaks performance down across groups and conditions, because an average can be fine while a subgroup is failing badly.

3 · Limitations and ethical considerations

Where does it fail, what biases were found, what risks attach to deployment? This is the section that takes courage — and the one a reviewer reads first.

4 · Evaluation data and conditions

What was it tested on, and under what assumptions? A result is only as trustworthy as the conditions that produced it — and contamination (Week 4) lives here if it lives anywhere.

Frontier labs now publish richer system cards that extend this idea to a whole deployed system — including dangerous-capability evaluations, red-team findings, and the safeguards applied. The structure is the same; the scope is larger, because the unit of risk is the system, not the bare model.

The tell of a flattering card

When you read a card, ask what it omits. A card with a thin limitations section and a glowing performance section isn't safer — it's less honest. The omissions are where the real risk hides, and learning to spot them is how you read any safety claim critically.

Why honest limitation-reporting is a safety property

It's tempting to treat a model card as marketing — a place to show the model in its best light. That instinct is the enemy. The entire function of the document is to let someone else make a responsible deployment decision, and they can only do that if you've told them the truth about the failures. A card that omits its limitations doesn't make the model safer; it just moves the discovery of those limitations from your test set to your users.

What this means for you

Draft your model card's limitations section first, while you're still being honest. If you write the glowing parts first, the limitations section quietly shrinks to match. Honest documentation is a discipline, and the order you write in protects it.

Your work today

Draft a Model Card

~60 minutes

Read Model Cards for Model Reporting (Mitchell et al., 2019) until you can list the standard sections from memory.
For a real or hypothetical model you know, draft the skeleton of a model card: intended use and non-use, disaggregated performance, limitations, and evaluation conditions. Wire in your actual Part A eval results where you have them.
Honors: find a recent frontier system card and compare it to the original template — note what it adds (dangerous-capability evals, safeguards) and write one sentence on what a flattering card would quietly omit.

The expert move

A novice writes a model card that sells the model. An expert writes one that protects the reader — leading with intended non-use and limitations, because the document's only job is to let someone else decide responsibly. The altitude jump is realizing that honest documentation of failure is not a weakness you're disclosing; it's the safety artifact itself.

Say this in an interview: "I treat a model card as a safety artifact, not a brochure. I lead with intended non-use and limitations, disaggregate performance instead of averaging it, and document evaluation conditions — because the whole point is to let a deployer make an honest call. What a card omits is exactly what I look for when I read someone else's."

Today's Takeaways

A model card makes evaluation legible — it's the document you hand a deployer or reviewer.
Core sections: intended use/non-use, disaggregated performance, limitations, evaluation conditions.
Its value is its honesty; a flattering card manufactures false confidence.
System cards extend the idea to whole deployments — dangerous-capability evals plus safeguards.