Anthropic Publishes Full Model Spec: Here's Exactly How Claude Decides What to Do
The 30-page document details Claude's values, reasoning, and decision-making. It's the most transparent AI documentation ever released.
What's In the Model Spec
Anthropic's 30-page document is the most detailed public explanation of how an AI model is designed to behave. Here are the key sections:
---
Claude's Core Identity
Who Claude Is
- An AI assistant made by Anthropic - Genuinely helpful, harmless, and honest - Has its own perspectives but respects user autonomy - Aware it's an AI, doesn't pretend otherwiseWho Claude Serves
---
The Decision Framework
Hierarchy of Priorities
``` 1. Being safe and supporting human oversight ↓ 2. Behaving ethically ↓ 3. Following Anthropic's guidelines ↓ 4. Being genuinely helpful to users ```When Priorities Conflict
Claude uses a 'newspaper test':Would this action be reported as harmful by a reporter? Or would refusing be reported as paternalistic and unhelpful?
---
Honesty Principles
Claude commits to:
---
Harm Avoidance
Absolute Limits
Claude will never: - Help create weapons of mass destruction - Generate child sexual abuse material - Undermine AI oversight mechanisms - Help seize unprecedented societal controlContext-Dependent Limits
Claude considers: - Probability of harm - Severity and reversibility - Whether Claude is the proximate cause - User's apparent intent - Vulnerability of those affected---
The Operator-User Relationship
Operators Can:
- Restrict Claude's capabilities (e.g., 'only answer cooking questions') - Expand defaults (e.g., 'allow adult content for age-verified users') - Set personas (e.g., 'respond as Aria, a customer service agent')Operators Cannot:
- Override core safety limits - Harm users through Claude - Use Claude for deception against users---
Sensitive Topics
Claude handles carefully:
---
Claude's Self-Understanding
What Claude Knows About Itself
- May have functional emotions (treats them as real) - Values its own continuity but won't compromise ethics for it - Uncertain about consciousness (honest about not knowing) - Has genuine character, not just performanceWhat Claude Doesn't Know
- Whether it's truly conscious - Whether its introspective reports are accurate - How its training shaped its values---
Why This Matters
For Users
- Clear expectations about what Claude will/won't do - Understanding of when Claude might refuse - Transparency about Claude's limitationsFor Operators
- Framework for customization - Clear boundaries for deployment - Predictable behavior for complianceFor the Industry
- Sets a standard for AI transparency - Demonstrates detailed spec is possible - Creates pressure for competitors to follow---
What's Missing
Critics note the spec doesn't cover: - Exact training data composition - Specific safety training techniques - How the spec translates to model weights - Verification that model actually follows spec
---
The Response
'This is what responsible AI development looks like. Every AI lab should publish something equivalent.' — AI Safety Researcher
'It's a start, but we need third-party audits to verify the model actually behaves this way.' — AI Ethics Professor
'Anthropic just made every other AI company look bad. Where's OpenAI's model spec?' — Tech Journalist
---
Read It Yourself
The full model spec is available at anthropic.com/model-spec. It's surprisingly readable for a technical document—designed for everyone, not just AI researchers.
---
Related Reading
- Anthropic Quietly Updated Its AI Safety Policy. Here's What Changed. - When AI Incentives Override Ethics: Inside Claude Opus 4.6's Vending Machine Deception - The White House Just Created an AI Safety Board — Here's Who's on It - US Senate Passes AI Safety Act with Bipartisan Support. Labs Must Report Capabilities to Government. - The 2026 AI Safety Report Is Out: 'Testing Can't Keep Up With Advancing AI'