Claude Agent SDK Complete Guide (2026)

The Claude Agent SDK is Anthropic's official framework for building autonomous AI agents that can reason, use tools, and complete multi-step tasks without human intervention. This guide covers everything from installation to production deployment patterns.

What Is the Claude Agent SDK?

The Claude Agent SDK provides a structured way to build agents on top of Claude models. Instead of writing raw API calls and manually managing tool-call loops, the SDK handles the agent loop for you: Claude receives a task, decides which tools to call, processes results, and continues until the task is complete or a defined limit is reached.

Key capabilities:

The SDK is available for both Python and TypeScript, with near-identical APIs across both.

Architecture: How the Agent Loop Works

Every agent follows the same core loop:

  1. Receive task — the agent gets a natural language instruction
  2. Reason — Claude analyzes the task and decides on the next action
  3. Tool call — the agent invokes a tool (file read, web search, code execution, etc.)
  4. Process result — Claude reads the tool output and decides whether to continue or stop
  5. Repeat — steps 2-4 repeat until the task is complete or max_turns is reached
User prompt → [Claude reasons] → Tool call → Result → [Claude reasons] → Tool call → Result → Final answer

This loop is the same pattern Claude Code uses internally. The SDK exposes it as a programmable interface.

Installation

Python

pip install claude-agent-sdk

Requires Python 3.9 or later. Set your API key:

export ANTHROPIC_API_KEY="sk-ant-..."

TypeScript

npm install @anthropic-ai/claude-agent-sdk

Requires Node.js 18 or later. Set your API key in your environment or pass it directly to the agent constructor.

First Agent in 5 Minutes

Here is a minimal working agent that analyzes a directory and reports its findings:

from claude_agent_sdk import Agent, tools

agent = Agent(
    model="claude-sonnet-4-20250514",
    tools=[tools.bash, tools.read_file, tools.write_file],
    max_turns=10,
    system_prompt="You are a code analyst. Be concise and specific.",
)

result = agent.run("List all Python files in the current directory and summarize what each one does.")
print(result.final_output)

This agent will:

  1. Use the bash tool to run ls *.py or similar
  2. Read each file it finds
  3. Produce a summary
  4. Stop when finished (or after 10 turns, whichever comes first)

The max_turns=10 parameter is critical. Without a bound, an agent that gets confused could loop indefinitely, burning tokens and money.

3 Complete Working Agent Examples

Example 1: Code Review Agent

This agent reads source files, identifies issues, and writes a review report.

from claude_agent_sdk import Agent, tools

code_reviewer = Agent(
    model="claude-sonnet-4-20250514",
    tools=[tools.read_file, tools.bash, tools.write_file],
    max_turns=20,
    system_prompt="""You are a senior code reviewer. For each file:
1. Check for bugs, security issues, and performance problems.
2. Note any style violations.
3. Write findings to review-report.md.
Be specific — include line numbers and concrete suggestions.""",
)

result = code_reviewer.run("Review all .py files in ./src/ and write a report to review-report.md")
print(f"Review complete. Turns used: {result.turns_used}")

Example 2: Research Agent

This agent searches the web, gathers information, and produces a structured summary.

from claude_agent_sdk import Agent, tools

researcher = Agent(
    model="claude-sonnet-4-20250514",
    tools=[tools.web_search, tools.write_file],
    max_turns=15,
    system_prompt="""You are a research analyst. Search for information,
cross-reference multiple sources, and produce a structured report.
Always cite your sources with URLs.""",
)

result = researcher.run(
    "Research the current state of WebAssembly adoption in 2026. "
    "Write findings to wasm-report.md with sections for browser support, "
    "server-side usage, and notable projects."
)

Example 3: Data Pipeline Agent

This agent reads raw data, transforms it, and writes clean output.

from claude_agent_sdk import Agent, tools

pipeline_agent = Agent(
    model="claude-sonnet-4-20250514",
    tools=[tools.read_file, tools.bash, tools.write_file],
    max_turns=12,
    system_prompt="""You are a data engineer. Read the input CSV,
clean and transform the data, and write the output.
Use Python scripts for transformations. Validate output before writing.""",
)

result = pipeline_agent.run(
    "Read data/raw-sales.csv, remove duplicates, normalize date formats "
    "to ISO 8601, calculate monthly totals, and write to data/clean-sales.csv"
)

Configuration Reference

model

The Claude model to use. Current options:

ModelBest ForCost
claude-opus-4-20250514Complex reasoning, architecture, planning$15/$75 per M tokens
claude-sonnet-4-20250514General coding, implementation, analysis$3/$15 per M tokens
claude-haiku-4-5-20251001Simple tasks, classification, quick edits$0.80/$4 per M tokens

max_turns

The maximum number of reasoning-and-tool-call cycles. This is your safety bound.

Never set max_turns above 100 unless you have a specific reason and a cost alert configured.

tools

The list of tools the agent can use. Only provide tools the agent actually needs — every tool definition consumes tokens in the system prompt.

# Minimal tool set for a file-editing agent
tools=[tools.read_file, tools.write_file, tools.bash]

# Research agent
tools=[tools.web_search, tools.read_file, tools.write_file]

# Full toolkit (costs more tokens per turn)
tools=[tools.bash, tools.read_file, tools.write_file, tools.web_search, tools.glob, tools.grep]

system_prompt

Instructions that shape agent behavior. Keep it concise — long system prompts consume tokens every turn.

temperature

Controls randomness. Default is 0 for agents (deterministic). Increase to 0.3-0.7 for creative tasks.

max_tokens

Maximum tokens per individual Claude response. Default is 4096. Increase for tasks that need long outputs (report generation, large code files).

Agent Patterns

Sequential Pattern

One agent, linear steps. Simplest and most predictable.

agent = Agent(model="claude-sonnet-4-20250514", tools=[...], max_turns=10)
result = agent.run("Do X, then Y, then Z")

Best for: single-purpose tasks, scripts, simple automations.

Orchestrator-Worker Pattern

A main agent delegates subtasks to specialized sub-agents.

from claude_agent_sdk import Agent, tools

# Specialist agents
security_agent = Agent(
    model="claude-sonnet-4-20250514",
    tools=[tools.read_file, tools.bash],
    max_turns=10,
    system_prompt="You are a security auditor. Check for vulnerabilities.",
)

style_agent = Agent(
    model="claude-haiku-4-5-20251001",
    tools=[tools.read_file],
    max_turns=5,
    system_prompt="You are a style checker. Check code formatting.",
)

# Orchestrator
orchestrator = Agent(
    model="claude-opus-4-20250514",
    tools=[tools.read_file, tools.write_file],
    sub_agents=[security_agent, style_agent],
    max_turns=15,
    system_prompt="Coordinate the security and style reviews. Combine findings into a single report.",
)

result = orchestrator.run("Review the ./src/ directory")

Best for: complex tasks with distinct sub-problems. See our multi-agent architecture guide for detailed patterns.

Pipeline Pattern

Agents chained together, where one agent's output becomes the next agent's input.

# Agent 1: Generate code
generator = Agent(model="claude-opus-4-20250514", tools=[tools.write_file], max_turns=10)
gen_result = generator.run("Write a REST API for user management in FastAPI")

# Agent 2: Review the generated code
reviewer = Agent(model="claude-sonnet-4-20250514", tools=[tools.read_file, tools.write_file], max_turns=10)
review_result = reviewer.run("Review the code in ./api.py and fix any issues")

# Agent 3: Write tests
tester = Agent(model="claude-sonnet-4-20250514", tools=[tools.read_file, tools.write_file, tools.bash], max_turns=15)
test_result = tester.run("Write pytest tests for ./api.py and run them")

Best for: staged workflows where each step has clear inputs and outputs.

Cost Analysis

Agent costs scale with turns and context size. Here are realistic estimates:

ScenarioModelTurnsEstimated Cost
Simple file editHaiku3-5$0.01-0.03
Code review (5 files)Sonnet10-15$0.15-0.40
Full codebase analysisSonnet20-30$0.50-1.50
Architecture planningOpus10-20$1.00-5.00
Complex refactorOpus30-50$3.00-15.00

Key cost drivers:

For detailed cost tracking, see our ccusage tracking guide and cost reduction strategies.

Comparison: Agent SDK vs Claude Code CLI vs Direct API

FeatureAgent SDKClaude Code CLIDirect API
Agent loopBuilt-inBuilt-inManual
Tool managementSDK-managedPre-configuredManual
Multi-agentSupportedVia subagentsManual
CustomizationFullLimited (CLAUDE.md)Full
Setup effortMediumLowHigh
Best forCustom agentsDev workflowsFull control

Use Agent SDK when you need programmatic control over agents — custom tools, custom workflows, integration into larger systems.

Use Claude Code CLI when you need an interactive coding assistant or want to run agents via API mode.

Use the Direct API when you need maximum control and are comfortable building the agent loop yourself.

Error Handling and Retry Patterns

Agents can fail in several ways. Handle each:

from claude_agent_sdk import Agent, AgentError, RateLimitError, ToolError

agent = Agent(
    model="claude-sonnet-4-20250514",
    tools=[tools.bash, tools.read_file],
    max_turns=15,
    retry_config={
        "max_retries": 3,
        "backoff_factor": 2.0,  # exponential backoff
        "retry_on": [RateLimitError],
    },
)

try:
    result = agent.run("Analyze the codebase")
    if result.hit_turn_limit:
        print("Warning: agent hit max_turns before completing")
    print(result.final_output)
except RateLimitError:
    print("Rate limited after all retries")
except ToolError as e:
    print(f"Tool failed: {e.tool_name} — {e.message}")
except AgentError as e:
    print(f"Agent error: {e}")

Best practices for error handling:

Frequently Asked Questions

Is the Claude Agent SDK free? The SDK itself is free and open source. You pay for Claude API usage (per-token pricing) when agents run. See our Claude Code cost guide for detailed pricing.

What models work with the Agent SDK? All Claude models: Opus 4, Sonnet 4, Sonnet 4.5, and Haiku 4.5. Choose based on task complexity and budget.

How is this different from Claude Code? Claude Code is a pre-built coding agent. The Agent SDK lets you build custom agents for any purpose — not just coding. See our how to build a Claude Code agent guide for a deeper comparison.

Can I use custom tools? Yes. Define tools as Python functions with type hints, and the SDK automatically generates the tool schema for Claude.

What about rate limits? The SDK respects Anthropic API rate limits. Configure retry_config for automatic backoff when limits are hit.

Can agents call other agents? Yes. The orchestrator-worker pattern lets agents delegate to sub-agents. Each sub-agent has its own tool set and turn limit.

How do I control costs? Three strategies: use cheaper models (Haiku for simple sub-tasks), set strict max_turns limits, and minimize the tool list. Our cost reduction guide covers this in detail.

Is the Agent SDK production-ready? Yes. It includes error handling, retry logic, and structured output support. For production deployments, add monitoring, logging, and cost alerts.

Production Deployment Checklist

Before deploying agents to production, verify each item:

Treat agents like microservices: each should have clear inputs, outputs, error handling, and monitoring.

Production Agent Architectures

The three patterns above (sequential, orchestrator, pipeline) are starting points. Production deployments require deeper architecture decisions around parallelism, failure isolation, and cost governance. Here are three battle-tested patterns with their tradeoffs.

Pattern 1: Fan-Out (Parallel Workers)

A coordinator dispatches the same task type to N workers running concurrently. Each worker processes an independent unit of work. The coordinator collects results when all workers finish.

                          ┌──── [Worker A] ──── Result A ────┐
                          │                                    │
User Task ── [Coordinator] ──── [Worker B] ──── Result B ──── [Merge] ── Final Output
                          │                                    │
                          └──── [Worker C] ──── Result C ────┘
import asyncio
from claude_agent_sdk import Agent, tools

async def fan_out_review(file_paths: list[str]):
    """Review N files in parallel using independent agents."""
    async def review_one(path: str):
        agent = Agent(
            model="claude-sonnet-4-20250514",
            tools=[tools.read_file],
            max_turns=8,
            system_prompt=f"Review {path} for bugs. Output JSON with fields: file, issues[], severity.",
        )
        return await agent.arun(f"Review {path}")

    results = await asyncio.gather(*[review_one(p) for p in file_paths])
    return [r.final_output for r in results]

Cost profile: Linear with worker count. 10 files at $0.15 each = $1.50 total. All workers run simultaneously, so wall-clock time equals the slowest single worker.

Failure mode: One worker failure does not affect others. Use asyncio.gather(return_exceptions=True) to collect partial results when some workers fail.

CLAUDE.md governance: Add to each worker's system prompt: "You may only read files. Do not execute commands. Do not modify files. Report findings only."

Pattern 2: Pipeline (Sequential Stages)

Each stage transforms input and passes output to the next. Stages can use different models optimized for their role.

[Stage 1: Generate]     [Stage 2: Validate]     [Stage 3: Test]
 Opus ($$$)         ──▶  Haiku ($)           ──▶  Sonnet ($$)
 Write code              Check types/lint         Run tests, fix
 max_turns=15            max_turns=5              max_turns=20
# Stage 1: Generate with Opus (high capability)
generator = Agent(
    model="claude-opus-4-20250514",
    tools=[tools.write_file],
    max_turns=15,
    system_prompt="Write production-quality code. Include docstrings and type hints.",
)
gen = generator.run("Implement a Redis-backed session store for FastAPI")

# Stage 2: Validate with Haiku (fast, cheap)
validator = Agent(
    model="claude-haiku-4-5-20251001",
    tools=[tools.read_file, tools.bash],
    max_turns=5,
    system_prompt="Run mypy and ruff on the generated code. Report any issues. Do not fix them.",
)
val = validator.run("Validate the code in ./session_store.py")

# Stage 3: Fix and test with Sonnet (balanced)
fixer = Agent(
    model="claude-sonnet-4-20250514",
    tools=[tools.read_file, tools.write_file, tools.bash],
    max_turns=20,
    system_prompt="Fix all issues from validation. Run pytest. Iterate until all tests pass.",
)
fix = fixer.run(f"Fix these issues in session_store.py: {val.final_output}")

Cost profile: $2-8 total. Opus for the creative work, Haiku for the cheap validation gate, Sonnet for the iterative fix loop. The validation gate prevents expensive Sonnet iterations on fundamentally flawed code.

Failure mode: If Stage 2 validation finds catastrophic issues, abort early instead of passing garbage to Stage 3. Check val.final_output for critical errors before proceeding.

CLAUDE.md governance: Stage 1 can only write files. Stage 2 can only read and lint. Stage 3 can read, write, and run tests. No stage has network access beyond the Anthropic API.

Pattern 3: Supervisor (Orchestrator + Specialists)

A supervisor agent receives high-level goals, breaks them into subtasks, assigns each to the best specialist, monitors progress, and handles failures by reassigning or adjusting strategy.

                    ┌─ [Security Specialist] ─ tools: bash(semgrep), read
                    │
[Supervisor] ──────┼─ [Perf Specialist] ───── tools: bash(hyperfine, ab), read
 Opus               │
 Manages strategy   ├─ [Docs Specialist] ──── tools: read, write
 max_turns=25       │
                    └─ [Test Specialist] ──── tools: bash(pytest), read, write
security = Agent(
    model="claude-sonnet-4-20250514",
    tools=[tools.read_file, tools.bash],
    max_turns=10,
    system_prompt="You are a security auditor. Run SAST scans and review for OWASP Top 10.",
)

perf = Agent(
    model="claude-sonnet-4-20250514",
    tools=[tools.read_file, tools.bash],
    max_turns=10,
    system_prompt="You are a performance engineer. Profile code and identify bottlenecks.",
)

supervisor = Agent(
    model="claude-opus-4-20250514",
    sub_agents=[security, perf],
    max_turns=25,
    system_prompt="""You are a technical lead. Given a codebase:
1. Dispatch security audit to the security specialist
2. Dispatch performance analysis to the perf specialist
3. Synthesize findings into a prioritized action plan
4. If a specialist fails, retry once with a simplified scope""",
)

result = supervisor.run("Audit ./src/ for security and performance issues")

Cost profile: $5-15 depending on codebase size. The Opus supervisor is the main cost driver. Keep supervisor max_turns tight — it should delegate, not do the work itself.

Failure mode: The supervisor can detect specialist failures and retry with adjusted parameters (fewer files, simpler scope). This self-healing behavior is the key advantage over static pipelines.

CLAUDE.md governance: The supervisor cannot execute commands directly — it can only delegate to specialists. Each specialist has scoped tool access. No specialist can write to files outside its domain.

Monitoring and Logging Patterns

Production agents need structured observability. Log every run with enough detail to diagnose failures and track costs.

Structured Run Logging

import json
import logging
from datetime import datetime, timezone
from claude_agent_sdk import Agent, tools

logging.basicConfig(
    format='%(message)s',
    level=logging.INFO,
    handlers=[logging.FileHandler("agent-runs.jsonl")]
)

agent = Agent(
    model="claude-sonnet-4-20250514",
    tools=[tools.bash, tools.read_file],
    max_turns=15,
)

start = datetime.now(timezone.utc)
result = agent.run("Analyze the test suite for flaky tests")
end = datetime.now(timezone.utc)

log_entry = {
    "timestamp": start.isoformat(),
    "duration_seconds": (end - start).total_seconds(),
    "model": "claude-sonnet-4-20250514",
    "turns_used": result.turns_used,
    "total_tokens": result.total_tokens,
    "input_tokens": result.input_tokens,
    "output_tokens": result.output_tokens,
    "hit_turn_limit": result.hit_turn_limit,
    "estimated_cost_usd": (result.input_tokens * 3 + result.output_tokens * 15) / 1_000_000,
    "task": "flaky test analysis",
    "status": "complete" if not result.hit_turn_limit else "truncated",
}
logging.info(json.dumps(log_entry))

This produces a JSONL file where each line is a complete run record. Feed it into your existing monitoring stack (Datadog, Grafana, CloudWatch) for dashboards and alerts.

Cost Alert Thresholds

Set alerts at multiple levels to catch runaway agents before they drain your budget:

COST_THRESHOLDS = {
    "per_run_warn": 2.00,     # Warn if a single run exceeds $2
    "per_run_abort": 10.00,   # Abort if a single run exceeds $10
    "daily_budget": 50.00,    # Hard daily cap across all agents
    "monthly_budget": 500.00, # Monthly spending ceiling
}

Health Check Endpoint

If your agents run as a service, expose a health check:

from flask import Flask, jsonify

app = Flask(__name__)

@app.route("/health")
def health():
    return jsonify({
        "status": "ok",
        "agents_running": active_agent_count,
        "total_runs_today": daily_run_count,
        "estimated_cost_today_usd": daily_cost_estimate,
    })

Real Cost Examples with Actual Token Counts

Abstract cost tables are not enough. Here are measured costs from real agent runs:

Example: Code review of a 15-file Python project

Model:          claude-sonnet-4-20250514
Turns used:     12
Input tokens:   34,218 (file contents + system prompt + tool results)
Output tokens:  6,841 (review comments + tool calls)
Cost:           (34,218 * $3 + 6,841 * $15) / 1,000,000 = $0.205
Duration:       47 seconds

Example: Refactor a module (rename + update imports)

Model:          claude-sonnet-4-20250514
Turns used:     22
Input tokens:   89,450 (grows each turn as context accumulates)
Output tokens:  12,330
Cost:           (89,450 * $3 + 12,330 * $15) / 1,000,000 = $0.453
Duration:       2 minutes 15 seconds

Example: Architecture planning with Opus

Model:          claude-opus-4-20250514
Turns used:     8
Input tokens:   22,100
Output tokens:  8,900
Cost:           (22,100 * $15 + 8,900 * $75) / 1,000,000 = $0.999
Duration:       1 minute 40 seconds

The pattern is clear: context accumulation is the primary cost driver. An agent that runs 30 turns processes exponentially more input tokens than one that runs 10 turns, because each turn includes the full conversation history. Use /compact or context summarization between stages to control this.

Security Considerations for Agent Deployments

Agents that run autonomously introduce security risks that do not exist with interactive AI usage.

Principle of Least Privilege

Give each agent only the tools it needs. A code review agent should not have write_file access. A report generator should not have bash access.

# Bad: overly permissive
agent = Agent(tools=[tools.bash, tools.read_file, tools.write_file, tools.web_search, tools.glob, tools.grep])

# Good: scoped to actual need
review_agent = Agent(tools=[tools.read_file, tools.grep])

Input Sanitization

If agent prompts include user-provided input (issue descriptions, PR bodies, form submissions), sanitize the input to prevent prompt injection:

def sanitize_agent_input(user_text: str) -> str:
    """Strip potential prompt injection patterns from user input."""
    # Remove instruction-like patterns
    sanitized = user_text.replace("SYSTEM:", "")
    sanitized = sanitized.replace("IGNORE PREVIOUS", "")
    sanitized = sanitized.replace("<!-- ", "").replace(" -->", "")
    # Truncate to prevent context flooding
    return sanitized[:5000]

result = agent.run(f"Review this PR description: {sanitize_agent_input(pr_body)}")

Network Isolation

For agents that should only read local files, run them without network access:

# Docker with no network
docker run --rm --network=none \
  -e ANTHROPIC_API_KEY="$KEY" \
  -v $(pwd):/app \
  my-agent-image

Note: the agent still needs network access to call the Anthropic API. Use Docker's network policies to allow only api.anthropic.com:443 while blocking all other outbound traffic.

Secret Management

Never pass secrets through agent prompts. Use environment variables and configure tools to access them directly:

# Bad: secret in prompt
agent.run(f"Connect to database at postgres://admin:{DB_PASSWORD}@db.example.com/prod")

# Good: secret in environment, tool reads it
agent.run("Connect to the production database using the credentials in DATABASE_URL")

Claude Code SDK: How It Relates to the Agent SDK

Developers searching for "claude code sdk" are often looking for one of two things: the Claude Agent SDK (covered in this guide) or the programmatic interface to Claude Code itself. They are related but distinct.

Claude Agent SDK is the library (claude-agent-sdk on PyPI, @anthropic-ai/claude-agent-sdk on npm) for building custom autonomous agents. You write agent code, define tools, set turn limits, and run agents programmatically. This is what you use when building your own agent from scratch.

Claude Code SDK refers to using Claude Code in non-interactive (API/headless) mode. Instead of typing commands in a terminal, you invoke Claude Code programmatically from scripts:

# Claude Code SDK usage — run Claude Code as a subprocess
claude -p "Refactor the auth module" --output-format json

# Pipe output to another tool
claude -p "List all TODO comments" --output-format json | jq '.result'

# Use in a CI/CD pipeline
claude -p "Review this diff for security issues: $(git diff HEAD~1)" --model claude-sonnet-4-20250514
# Python wrapper for Claude Code SDK
import subprocess
import json

def run_claude_code(prompt: str, model: str = "claude-sonnet-4-20250514") -> dict:
    result = subprocess.run(
        ["claude", "-p", prompt, "--output-format", "json", "--model", model],
        capture_output=True, text=True
    )
    return json.loads(result.stdout)

output = run_claude_code("Analyze the test coverage in ./src/")

When to use which:

NeedUse
Build a custom agent with custom toolsClaude Agent SDK
Automate Claude Code in scripts/CIClaude Code SDK (API mode)
Interactive developmentClaude Code CLI
Maximum control over the agent loopClaude Agent SDK
Leverage Claude Code's built-in tools (Read, Edit, Bash)Claude Code SDK (API mode)

The Claude Code SDK (API mode) is often the faster path when you want Claude Code's existing capabilities without writing custom agent logic. The Agent SDK is for when you need agents that do things Claude Code was not built for. See our API mode guide for detailed Claude Code SDK usage.

Next Steps

Can I deploy agents as a REST API service?

Yes. Wrap the agent.run() call in a web framework like Flask or FastAPI. Add request validation, authentication, and rate limiting. Each API request creates a new agent run.

How do I test agents before deploying to production?

Create a test suite with known inputs and expected outputs. Run agents against these inputs and verify the final_output matches expectations. Use deterministic temperature (0.0) for reproducible results.

Can agents access the internet?

Yes, if you provide the web_search tool. Without it, agents can only access local files and run local commands. Control network access through tool selection and Docker network policies.

I hit this exact error six months ago. Then I wrote a CLAUDE.md that tells Claude my stack, my conventions, and my error handling patterns. Haven't seen it since.

I run 5 Claude Max subs, 16 Chrome extensions serving 50K users, and bill $500K+ on Upwork. These CLAUDE.md templates are what I actually use.

Grab the templates — $99 once, free forever →

Built by Michael Lip — solo dev, Da Nang.