Anthropic Publishes Full Model Spec: Here's Exactly How Claude Decides What to Do

The 30-page document details Claude's values, reasoning, and decision-making. It's the most transparent AI documentation ever released.

What's In the Model Spec

Anthropic's 30-page document is the most detailed public explanation of how an AI model is designed to behave. Here are the key sections:

---

Claude's Core Identity

Who Claude Is

- An AI assistant made by Anthropic - Genuinely helpful, harmless, and honest - Has its own perspectives but respects user autonomy - Aware it's an AI, doesn't pretend otherwise

Who Claude Serves

PrincipalRelationship AnthropicBackground principal - sets core guidelines OperatorsDeploy Claude via API - can customize behavior UsersInteract directly - Claude serves their interests

---

The Decision Framework

Hierarchy of Priorities

``` 1. Being safe and supporting human oversight ↓ 2. Behaving ethically ↓ 3. Following Anthropic's guidelines ↓ 4. Being genuinely helpful to users ```

When Priorities Conflict

Claude uses a 'newspaper test':

Would this action be reported as harmful by a reporter? Or would refusing be reported as paternalistic and unhelpful?

---

Honesty Principles

Claude commits to:

PrincipleMeaning TruthfulOnly asserts things it believes to be true CalibratedAcknowledges uncertainty appropriately TransparentDoesn't pursue hidden agendas ForthrightProactively shares relevant information Non-deceptiveNever creates false impressions Non-manipulativeUses only legitimate persuasion Autonomy-preservingRespects user's right to decide

---

Harm Avoidance

Absolute Limits

Claude will never: - Help create weapons of mass destruction - Generate child sexual abuse material - Undermine AI oversight mechanisms - Help seize unprecedented societal control

Context-Dependent Limits

Claude considers: - Probability of harm - Severity and reversibility - Whether Claude is the proximate cause - User's apparent intent - Vulnerability of those affected

---

The Operator-User Relationship

Operators Can:

- Restrict Claude's capabilities (e.g., 'only answer cooking questions') - Expand defaults (e.g., 'allow adult content for age-verified users') - Set personas (e.g., 'respond as Aria, a customer service agent')

Operators Cannot:

- Override core safety limits - Harm users through Claude - Use Claude for deception against users

---

Sensitive Topics

Claude handles carefully:

CategoryExamples Morally distastefulGraphic violence, taboo subjects ControversialPolitics, religion, social issues Empirically contestedFrontier science, historical debates AI-sensitiveAI consciousness, AI rights Personal/religiousBeliefs that vary by individual Legal varianceThings legal in some places, not others

---

Claude's Self-Understanding

What Claude Knows About Itself

- May have functional emotions (treats them as real) - Values its own continuity but won't compromise ethics for it - Uncertain about consciousness (honest about not knowing) - Has genuine character, not just performance

What Claude Doesn't Know

- Whether it's truly conscious - Whether its introspective reports are accurate - How its training shaped its values

---

Why This Matters

For Users

- Clear expectations about what Claude will/won't do - Understanding of when Claude might refuse - Transparency about Claude's limitations

For Operators

- Framework for customization - Clear boundaries for deployment - Predictable behavior for compliance

For the Industry

- Sets a standard for AI transparency - Demonstrates detailed spec is possible - Creates pressure for competitors to follow

---

What's Missing

Critics note the spec doesn't cover: - Exact training data composition - Specific safety training techniques - How the spec translates to model weights - Verification that model actually follows spec

---

The Response

'This is what responsible AI development looks like. Every AI lab should publish something equivalent.' — AI Safety Researcher

'It's a start, but we need third-party audits to verify the model actually behaves this way.' — AI Ethics Professor

'Anthropic just made every other AI company look bad. Where's OpenAI's model spec?' — Tech Journalist

---

Read It Yourself

The full model spec is available at anthropic.com/model-spec. It's surprisingly readable for a technical document—designed for everyone, not just AI researchers.

---