Model Safety Evaluation · Binder

AI Safety Evaluation Portfolio

Marc Warfield — AI Safety Practitioner

I can run a model safety evaluation end to end — threat-model a deployment, author the policy, red-team it responsibly, measure harm and over-refusal, reason about alignment and interpretability limits, manage a risk register against a recognized framework, and make a defensible go/no-go recommendation — with an artifact behind every one of those verbs.

Artifacts

Layers

Weeks

Days

The Evaluation, As One Chain

One deployment, walked from worry to sign-off — the continuous story behind the binder. Each link is backed by an artifact below.

Part A — Applied Safety & Evaluations

Part B — Alignment Research Literacy

Part C — Governance, Policy & Systemic Safety