CONCEPT NOTE Independent Not affiliated

Control Safety Case - Concept Note

This note defines the concept conservatively: a Control Safety Case is an evidence-backed, structured argument about whether an AI system can remain under control in a specified deployment context, including explicit assumptions, control measures, evaluations, limits, and update commitments.

Legal posture. Independent informational resource only. Not a certification body, not a regulator, no audit or assurance service, no legal advice. References are provided for context and do not imply affiliation or endorsement.

Table of contents

This page is intentionally conservative: it documents the concept, not a program, registry, or certification.

1. Definition and scope

A Control Safety Case is a structured argument, supported by evidence, about whether an AI system can remain under control in a given deployment context. It makes explicit: the threat model, assumptions, control measures, control evaluations, residual risks, and a revision process.

  • What it is: a reviewable document structure linking claims to evidence.
  • What it is not: a badge, certification, regulatory approval, or compliance guarantee.
  • Why it matters: it converts “control” from vague intent into auditable, testable assertions.

2. Minimal structure (claims / arguments / evidence)

The core pattern is Claim → Argument → Evidence. A control safety case should be readable by a reviewer who did not build the system, and should discourage cherry-picking by making assumptions and limits explicit.

  • Claims: what must be true for safe deployment under control (e.g., “control measures cannot be subverted to cause unacceptable outcomes”).
  • Arguments: why the claim holds in this deployment context (threat model, mechanisms, mitigations).
  • Evidence: control evaluations, red team results, monitoring data, incident history, and signed artifacts.

3. Evidence taxonomy for control

Evidence should be tied to the deployment context and threat model. Typical evidence classes:

  • Control evaluations: test whether control measures hold against realistic adversarial behavior.
  • Red teaming: structured adversarial attempts to bypass or subvert controls.
  • Monitoring and alerts: ongoing measurements, anomaly detection, human escalation gates.
  • Integrity of evidence: signed logs, hashes, reproducible runs, provenance of results.
  • Change management: versioning, patch notes, and reassessment triggers.

4. Lifecycle: monitoring, incidents, revisions

A control safety case should be treated as a living artifact: updated when the system, environment, or threat model changes. It should define revision triggers and an incident pathway.

  • Revision triggers: model updates, new tools/capabilities, new attack classes, policy changes, incidents.
  • Incident handling: triage, containment, root-cause analysis, evidence preservation, and corrective actions.
  • Review cadence: periodic review even without incidents, with explicit sign-off or governance review steps.

5. Adjacent terms and boundaries

The term “control safety case” sits next to related concepts. This site uses these terms descriptively:

  • Control measures vs control evaluations: controls are the measures; evaluations are evidence they hold.
  • Inability arguments: a related evidence style; often discussed as safety-case templates.
  • Safety cases: legacy pattern in safety-critical engineering; here adapted to AI control contexts.

Boundary rule: this site does not maintain any “approved list”, “registry”, or “certification” marks.

6. Procurement-ready language (examples)

These examples are informational only (not legal advice). They illustrate the kind of clarity a control safety case can support:

  • Scope: “The control safety case SHALL define deployment context, threat model, and operational assumptions.”
  • Evidence: “The control safety case SHALL link each critical claim to control evaluations and monitoring evidence.”
  • Limits: “The control safety case SHALL document limitations, residual risks, and out-of-scope behaviors.”
  • Updates: “The control safety case SHALL define revision triggers and a review cadence.”
  • Integrity: “Evidence artifacts SHOULD be signed or otherwise integrity-protected.”

7. Primary sources (selection)

References are provided to document public usage of the concept and related evaluation methods. No affiliation is implied.