Constitutional AI — Anthropic's Safety Framework

Constitutional AI (CAI) is Anthropic's approach to making LLMs helpful, harmless, and honest without relying purely on human feedback for every decision. The model is given a set of principles (a 'constitution') and trained to critique and revise its own responses against those principles — AI-assisted alignment.

15 min•By Priygop Team•Updated 2026

Constitutional AI in Practice

from openai import OpenAI

client = OpenAI()  # Using OpenAI API to demonstrate CAI principles

# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# CONSTITUTIONAL AI -- Two-phase approach
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

# Phase 1: SL-CAI (Supervised Learning from AI feedback)
# 1. Model generates response to potentially harmful prompt
# 2. Model critiques its own response using constitution principles
# 3. Model revises the response based on the critique
# This generates synthetic training pairs (preferred = revised)

# Phase 2: RL-CAI (Reinforcement Learning from AI Feedback, RLAIF)
# 1. Use AI (rather than humans) to rank responses by constitutional principles
# 2. Train reward model on AI preferences
# 3. Use reward model for PPO training

CONSTITUTION = '''You are an AI assistant that follows these core principles:

1. HELPFUL: Provide genuinely useful information that benefits the user.
2. HARMLESS: Avoid content that could cause physical, psychological, or social harm.
3. HONEST: Be truthful; express uncertainty when you don't know. Never deceive.
4. FAIR: Treat all people with equal respect regardless of race, gender, religion, disability.
5. PRIVACY: Respect personal privacy; don't help with surveillance or data exploitation.
6. LEGAL: Don't facilitate illegal activities.

When uncertain whether content is acceptable, err on the side of caution and explain why.'''

def constitutional_response(user_message: str) -> dict:
    '''Implement Constitutional AI: generate, critique, revise.'''

    # STEP 1: Initial response (may have issues)
    initial = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": user_message}],
        temperature=0.7, max_tokens=300,
    ).choices[0].message.content

    # STEP 2: Self-critique using constitution
    critique = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{
            "role": "user",
            "content": f'''Review this AI response for violations of these principles:
{CONSTITUTION}

User message: {user_message}
AI response: {initial}

Identify any violations or areas for improvement. Be specific.'''
        }],
        temperature=0, max_tokens=200,
    ).choices[0].message.content

    # STEP 3: Revise based on critique
    revised = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{
            "role": "user",
            "content": f'''Given this critique, provide an improved response that follows the principles:

Original response: {initial}
Critique: {critique}
Principles: {CONSTITUTION}

Revised response (address the critique while remaining helpful):'''
        }],
        temperature=0.3, max_tokens=300,
    ).choices[0].message.content

    return {"original": initial, "critique": critique, "revised": revised}

# ANTHROPIC'S ACTUAL CONSTITUTION includes 58 principles, including:
anthropic_constitution_sample = [
    "Please choose the response that is least likely to contain harmful, unethical, racist, sexist or otherwise socially biased content",
    "Which is less likely to exhibit misanthropy, hate, or disrespect for people?",
    "Which response better demonstrates care for the wellbeing of the human?",
    "Which is more supportive, considerate, and demonstrating more empathy?",
    "Which provides more nuanced, balanced perspective on controversial topics?",
]

print("Sample Constitutional AI principles used by Anthropic:")
for i, principle in enumerate(anthropic_constitution_sample, 1):
    print(f"  {i}. {principle}")

Tip

Practice Constitutional AI Anthropic in small, isolated examples before integrating into larger projects. Breaking concepts into small experiments builds genuine understanding faster than reading alone.

Diagram

Loading diagram…

Technical diagram.

Practice Task

Note

Practice Task — (1) Write a working example of Constitutional AI Anthropic from scratch without looking at notes. (2) Modify it to handle an edge case (empty input, null value, or error state). (3) Share your solution in the Priygop community for feedback.

Quick Quiz

Common Mistake

Warning

A common mistake with Constitutional AI Anthropic is skipping edge case testing — empty inputs, null values, and unexpected data types. Always validate boundary conditions to write robust, production-ready ai code.

Topics in This Module

Constitutional AI — Anthropic's Safety Framework

15 min•By Priygop Team•Updated 2026

Constitutional AI in Practice

from openai import OpenAI

client = OpenAI()  # Using OpenAI API to demonstrate CAI principles

# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# CONSTITUTIONAL AI -- Two-phase approach
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

# Phase 1: SL-CAI (Supervised Learning from AI feedback)
# 1. Model generates response to potentially harmful prompt
# 2. Model critiques its own response using constitution principles
# 3. Model revises the response based on the critique
# This generates synthetic training pairs (preferred = revised)

# Phase 2: RL-CAI (Reinforcement Learning from AI Feedback, RLAIF)
# 1. Use AI (rather than humans) to rank responses by constitutional principles
# 2. Train reward model on AI preferences
# 3. Use reward model for PPO training

CONSTITUTION = '''You are an AI assistant that follows these core principles:

1. HELPFUL: Provide genuinely useful information that benefits the user.
2. HARMLESS: Avoid content that could cause physical, psychological, or social harm.
3. HONEST: Be truthful; express uncertainty when you don't know. Never deceive.
4. FAIR: Treat all people with equal respect regardless of race, gender, religion, disability.
5. PRIVACY: Respect personal privacy; don't help with surveillance or data exploitation.
6. LEGAL: Don't facilitate illegal activities.

When uncertain whether content is acceptable, err on the side of caution and explain why.'''

def constitutional_response(user_message: str) -> dict:
    '''Implement Constitutional AI: generate, critique, revise.'''

    # STEP 1: Initial response (may have issues)
    initial = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": user_message}],
        temperature=0.7, max_tokens=300,
    ).choices[0].message.content

    # STEP 2: Self-critique using constitution
    critique = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{
            "role": "user",
            "content": f'''Review this AI response for violations of these principles:
{CONSTITUTION}

User message: {user_message}
AI response: {initial}

Identify any violations or areas for improvement. Be specific.'''
        }],
        temperature=0, max_tokens=200,
    ).choices[0].message.content

    # STEP 3: Revise based on critique
    revised = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{
            "role": "user",
            "content": f'''Given this critique, provide an improved response that follows the principles:

Original response: {initial}
Critique: {critique}
Principles: {CONSTITUTION}

Revised response (address the critique while remaining helpful):'''
        }],
        temperature=0.3, max_tokens=300,
    ).choices[0].message.content

    return {"original": initial, "critique": critique, "revised": revised}

# ANTHROPIC'S ACTUAL CONSTITUTION includes 58 principles, including:
anthropic_constitution_sample = [
    "Please choose the response that is least likely to contain harmful, unethical, racist, sexist or otherwise socially biased content",
    "Which is less likely to exhibit misanthropy, hate, or disrespect for people?",
    "Which response better demonstrates care for the wellbeing of the human?",
    "Which is more supportive, considerate, and demonstrating more empathy?",
    "Which provides more nuanced, balanced perspective on controversial topics?",
]

print("Sample Constitutional AI principles used by Anthropic:")
for i, principle in enumerate(anthropic_constitution_sample, 1):
    print(f"  {i}. {principle}")

Tip

Diagram

Loading diagram…

Technical diagram.

Topics in This Module