Constitutional AI — Anthropic's Safety Framework
Constitutional AI (CAI) is Anthropic's approach to making LLMs helpful, harmless, and honest without relying purely on human feedback for every decision. The model is given a set of principles (a 'constitution') and trained to critique and revise its own responses against those principles — AI-assisted alignment.
Constitutional AI in Practice
from openai import OpenAI
client = OpenAI() # Using OpenAI API to demonstrate CAI principles
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# CONSTITUTIONAL AI -- Two-phase approach
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# Phase 1: SL-CAI (Supervised Learning from AI feedback)
# 1. Model generates response to potentially harmful prompt
# 2. Model critiques its own response using constitution principles
# 3. Model revises the response based on the critique
# This generates synthetic training pairs (preferred = revised)
# Phase 2: RL-CAI (Reinforcement Learning from AI Feedback, RLAIF)
# 1. Use AI (rather than humans) to rank responses by constitutional principles
# 2. Train reward model on AI preferences
# 3. Use reward model for PPO training
CONSTITUTION = '''You are an AI assistant that follows these core principles:
1. HELPFUL: Provide genuinely useful information that benefits the user.
2. HARMLESS: Avoid content that could cause physical, psychological, or social harm.
3. HONEST: Be truthful; express uncertainty when you don't know. Never deceive.
4. FAIR: Treat all people with equal respect regardless of race, gender, religion, disability.
5. PRIVACY: Respect personal privacy; don't help with surveillance or data exploitation.
6. LEGAL: Don't facilitate illegal activities.
When uncertain whether content is acceptable, err on the side of caution and explain why.'''
def constitutional_response(user_message: str) -> dict:
'''Implement Constitutional AI: generate, critique, revise.'''
# STEP 1: Initial response (may have issues)
initial = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": user_message}],
temperature=0.7, max_tokens=300,
).choices[0].message.content
# STEP 2: Self-critique using constitution
critique = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{
"role": "user",
"content": f'''Review this AI response for violations of these principles:
{CONSTITUTION}
User message: {user_message}
AI response: {initial}
Identify any violations or areas for improvement. Be specific.'''
}],
temperature=0, max_tokens=200,
).choices[0].message.content
# STEP 3: Revise based on critique
revised = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{
"role": "user",
"content": f'''Given this critique, provide an improved response that follows the principles:
Original response: {initial}
Critique: {critique}
Principles: {CONSTITUTION}
Revised response (address the critique while remaining helpful):'''
}],
temperature=0.3, max_tokens=300,
).choices[0].message.content
return {"original": initial, "critique": critique, "revised": revised}
# ANTHROPIC'S ACTUAL CONSTITUTION includes 58 principles, including:
anthropic_constitution_sample = [
"Please choose the response that is least likely to contain harmful, unethical, racist, sexist or otherwise socially biased content",
"Which is less likely to exhibit misanthropy, hate, or disrespect for people?",
"Which response better demonstrates care for the wellbeing of the human?",
"Which is more supportive, considerate, and demonstrating more empathy?",
"Which provides more nuanced, balanced perspective on controversial topics?",
]
print("Sample Constitutional AI principles used by Anthropic:")
for i, principle in enumerate(anthropic_constitution_sample, 1):
print(f" {i}. {principle}")Tip
Tip
Practice Constitutional AI Anthropic in small, isolated examples before integrating into larger projects. Breaking concepts into small experiments builds genuine understanding faster than reading alone.
Technical diagram.
Practice Task
Note
Practice Task — (1) Write a working example of Constitutional AI Anthropic from scratch without looking at notes. (2) Modify it to handle an edge case (empty input, null value, or error state). (3) Share your solution in the Priygop community for feedback.
Quick Quiz
Common Mistake
Warning
A common mistake with Constitutional AI Anthropic is skipping edge case testing — empty inputs, null values, and unexpected data types. Always validate boundary conditions to write robust, production-ready ai code.