Claude API Pricing Complete Guide (2026)

May 2026 · ClaudHQ

Anthropic offers Claude through consumer subscriptions (Free, Pro, Max) and developer API access with per-token billing. The pricing structure has evolved significantly through 2025 and 2026, with the introduction of new model tiers, batch processing discounts, and prompt caching.

This guide breaks down every pricing dimension so you can calculate your actual costs before committing to a plan or API integration.

Consumer Plans Overview

These plans give you access to Claude through claude.ai and mobile apps. They do not provide API access.

Free Tier

Cost: $0/month
Model access: Claude Sonnet 4 (limited)
Usage: Limited messages per day (varies by demand)
Features: Basic chat, file uploads, web search
Context window: Standard
No API access

Pro Plan

Cost: $20/month
Model access: Claude Sonnet 4, Claude Opus 4, Claude Haiku
Usage: 5x more than Free tier
Features: Projects, extended thinking, priority access, Claude Code access
Context window: 200K tokens
Rate limit: Usage cap with 5-hour rolling window (details)

Max Plan (5x)

Cost: $100/month
Model access: All models including Opus 4
Usage: 5x more than Pro
Features: Everything in Pro plus higher rate limits
Context window: 200K tokens
Claude Code: Included with higher limits

Max Plan (20x)

Cost: $200/month
Model access: All models
Usage: 20x more than Pro
Features: Everything in Max 5x plus highest rate limits
Context window: 200K tokens
Claude Code: Included with highest limits

API Pricing by Model

API pricing is per-token, measured separately for input and output tokens. All prices are per million tokens (MTok).

Claude Opus 4 (claude-opus-4-0520)

Anthropic's most capable model. Best for complex reasoning, multi-step analysis, and agentic workflows.

Metric	Price
Input tokens	$15.00 / MTok
Output tokens	$75.00 / MTok
Context window	200K tokens
Max output	32K tokens
Training data cutoff	Early 2025

Cost example: A 2,000-token prompt with a 1,000-token response costs:

Input: 2,000 / 1,000,000 * $15.00 = $0.03
Output: 1,000 / 1,000,000 * $75.00 = $0.075
Total: $0.105 per request

Claude Sonnet 4 (claude-sonnet-4-0514)

Balanced model for most production workloads. Strong coding, analysis, and writing capabilities at a lower price point than Opus.

Metric	Price
Input tokens	$3.00 / MTok
Output tokens	$15.00 / MTok
Context window	200K tokens
Max output	16K tokens
Training data cutoff	Early 2025

Cost example: Same 2,000-token prompt with 1,000-token response:

Input: 2,000 / 1,000,000 * $3.00 = $0.006
Output: 1,000 / 1,000,000 * $15.00 = $0.015
Total: $0.021 per request (5x cheaper than Opus 4)

Claude Haiku 3.5 (claude-3-5-haiku-20241022)

Fastest model. Best for high-volume, latency-sensitive tasks like classification, extraction, and simple Q&A.

Metric	Price
Input tokens	$0.80 / MTok
Output tokens	$4.00 / MTok
Context window	200K tokens
Max output	8K tokens
Training data cutoff	Early 2024

Cost example: Same prompt:

Input: 2,000 / 1,000,000 * $0.80 = $0.0016
Output: 1,000 / 1,000,000 * $4.00 = $0.004
Total: $0.0056 per request (19x cheaper than Opus 4)

Model Comparison Table

Model	Input/MTok	Output/MTok	Speed	Best For
Opus 4	$15.00	$75.00	Slowest	Complex reasoning, agentic tasks
Sonnet 4	$3.00	$15.00	Medium	General production, coding
Haiku 3.5	$0.80	$4.00	Fastest	High volume, classification

Prompt Caching

Prompt caching reduces costs when you send the same prompt prefix repeatedly. This is critical for applications that include large system prompts, CLAUDE.md files, or static context in every request.

How Prompt Caching Works

First request: Full price for all tokens. Cached tokens incur a 25% write surcharge.
Subsequent requests: Cached portion is charged at a reduced read rate.
Cache lifetime: 5 minutes from last use (extends with each hit).

Prompt Caching Prices

Model	Cache Write (per MTok)	Cache Read (per MTok)	Savings vs Input
Opus 4	$18.75 (+25%)	$1.50	90% on reads
Sonnet 4	$3.75 (+25%)	$0.30	90% on reads
Haiku 3.5	$1.00 (+25%)	$0.08	90% on reads

Cache Savings Example

Imagine a chatbot with a 4,000-token system prompt using Sonnet 4:

Without caching (100 requests):

System prompt: 100 4,000 / 1M $3.00 = $1.20

With caching (100 requests):

First request (write): 4,000 / 1M * $3.75 = $0.015
99 subsequent (read): 99 4,000 / 1M $0.30 = $0.119
Total: $0.134 (89% savings)

Cache Requirements

Minimum cacheable prefix: 1,024 tokens (Haiku), 2,048 tokens (Sonnet/Opus)
Cache is per-organization, not per-API-key
Cached content must be identical byte-for-byte
Cache blocks use 128-token granularity

Batch API

The Batch API processes requests asynchronously at a 50% discount. Results are returned within 24 hours.

Batch API Prices

Model	Input/MTok	Output/MTok	Discount
Opus 4	$7.50	$37.50	50%
Sonnet 4	$1.50	$7.50	50%
Haiku 3.5	$0.40	$2.00	50%

When to Use Batch API

Content generation: Generating hundreds of product descriptions, articles, or summaries
Data processing: Classifying, extracting, or transforming large datasets
Evaluation: Running model evaluations across test suites
Migration: Processing existing content through Claude for reformatting

Batch API Limitations

No streaming
No real-time responses (up to 24-hour SLA)
Maximum 100,000 requests per batch
Each request limited to standard model context window
No prompt caching within batches

Extended Thinking Pricing

Extended thinking allows Claude to "think" before responding, improving performance on complex tasks. Thinking tokens are billed as output tokens.

Model	Thinking Token Price	Same As
Opus 4	$75.00 / MTok	Output rate
Sonnet 4	$15.00 / MTok	Output rate

Important: Extended thinking can generate thousands of thinking tokens before the visible response. A request that appears to have a 500-token response may actually consume 5,000+ tokens when thinking tokens are included.

Controlling Extended Thinking Costs

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 5000  # Cap thinking tokens
    },
    messages=[{"role": "user", "content": "Your prompt"}]
)

Setting budget_tokens caps the thinking cost. Without it, Claude may use the full output token limit for thinking.

Rate Limits

Rate limits vary by plan tier and are measured in requests per minute (RPM) and tokens per minute (TPM).

API Rate Limits by Tier

Tier	RPM	Input TPM	Output TPM	Spend Requirement
Tier 1 (Free)	50	40,000	8,000	$0
Tier 2	1,000	80,000	16,000	$40 credit
Tier 3	2,000	160,000	32,000	$200 credit
Tier 4	4,000	400,000	80,000	$400+ spend

Rate limit headers are included in every API response:

x-ratelimit-limit-requests: 1000
x-ratelimit-limit-tokens: 80000
x-ratelimit-remaining-requests: 999
x-ratelimit-remaining-tokens: 79000
x-ratelimit-reset-requests: 2026-04-24T12:00:01Z
x-ratelimit-reset-tokens: 2026-04-24T12:00:01Z

Handling Rate Limits

When you hit a rate limit, the API returns a 429 error. Best practices:

Implement exponential backoff with jitter
Track remaining tokens from response headers
Use request queuing for high-volume applications
Consider the Batch API for non-real-time workloads

import time
import random

def call_with_backoff(func, max_retries=5):
    for attempt in range(max_retries):
        try:
            return func()
        except anthropic.RateLimitError:
            wait = (2 ** attempt) + random.uniform(0, 1)
            time.sleep(wait)
    raise Exception("Max retries exceeded")

Claude Code Pricing

Claude Code uses the API under the hood. When you use Claude Code with your own API key, you pay standard API rates for whichever model you select.

Claude Code with Subscription Plans

Pro ($20/mo): Claude Code access included. Uses Sonnet 4 by default. Subject to 5-hour usage limits.
Max 5x ($100/mo): Higher Claude Code limits. Can use Opus 4.
Max 20x ($200/mo): Highest limits. Full Opus 4 access.

Claude Code with API Key

When using ANTHROPIC_API_KEY, you pay per-token at standard API rates. A typical Claude Code session costs:

Light session (simple edits): $0.05-0.50
Medium session (feature implementation): $0.50-5.00
Heavy session (large refactoring): $5.00-50.00+

Track your Claude Code costs with token usage auditing.

Claude Code with OpenRouter

You can route Claude Code through OpenRouter for potentially different pricing and access to multiple model providers through a single API key.

Cost Optimization Strategies

1. Choose the Right Model

Use Haiku 3.5 for simple tasks, Sonnet 4 for most work, and Opus 4 only when you need maximum quality. A common pattern:

# Route by task complexity
def get_model(task_type):
    if task_type == "classification":
        return "claude-3-5-haiku-20241022"  # $0.80/MTok input
    elif task_type == "coding":
        return "claude-sonnet-4-20250514"   # $3.00/MTok input
    elif task_type == "architecture":
        return "claude-opus-4-0520"         # $15.00/MTok input

2. Maximize Prompt Caching

Structure your prompts with static content first:

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": large_static_system_prompt,  # Cached
                "cache_control": {"type": "ephemeral"}
            },
            {
                "type": "text",
                "text": dynamic_user_query  # Not cached
            }
        ]
    }
]

3. Use Batch API for Non-Urgent Work

If your workload can tolerate 24-hour latency, the 50% batch discount is substantial:

Monthly Volume	Real-time (Sonnet)	Batch (Sonnet)	Savings
10M input tokens	$30.00	$15.00	$15.00
10M output tokens	$150.00	$75.00	$75.00
Total	$180.00	$90.00	$90.00

4. Optimize Token Usage

Trim unnecessary context from prompts
Use structured output to reduce verbose responses
Set appropriate max_tokens limits

5. Monitor and Alert

Set up cost alerts to catch unexpected usage spikes before they become expensive.

Comparing Claude API to Competitors

Claude vs OpenAI GPT-4o

Metric	Claude Sonnet 4	GPT-4o
Input/MTok	$3.00	$2.50
Output/MTok	$15.00	$10.00
Context window	200K	128K
Batch discount	50%	50%
Caching discount	90% on reads	50% on reads

Claude vs Google Gemini 2.5

Metric	Claude Sonnet 4	Gemini 2.5 Pro
Input/MTok	$3.00	$1.25
Output/MTok	$15.00	$10.00
Context window	200K	1M
Batch discount	50%	None
Caching	90% reads	Free after 128K

When Claude Is More Cost-Effective

Applications with heavy prompt caching (90% read discount beats competitors)
Batch processing workloads (consistent 50% discount)
Complex coding tasks where fewer iterations save total token spend
Applications requiring 200K context with strong instruction following

Token Counting: What Costs Money

Understanding what counts as tokens is critical for cost prediction. Many developers are surprised by the actual token counts in their API calls.

Text Tokens

English text averages 4 characters per token. Examples:

"Hello" = 1 token
"The quick brown fox jumps over the lazy dog" = 10 tokens
A 500-word blog post = ~650 tokens
A typical CLAUDE.md file = 500-2,000 tokens

Code is less token-efficient than prose because of syntax characters:

A 100-line Python function = ~400-600 tokens
A 100-line TypeScript function = ~500-800 tokens (types add tokens)
A package.json with 30 dependencies = ~300 tokens

System Prompt Tokens

Every API call includes a system prompt (even if you do not specify one, Anthropic adds a default). System prompts count as input tokens on every request.

For applications with a 2,000-token system prompt making 1,000 API calls per day using Sonnet 4:

Daily system prompt cost: 1,000 2,000 / 1M $3.00 = $6.00
Monthly: $180 just for system prompts

This is why prompt caching is critical for production applications.

Conversation History Tokens

In multi-turn conversations, the entire history is sent with each new message. A conversation with 10 turns where each turn averages 500 tokens:

Turn 1: 500 tokens input
Turn 2: 1,000 tokens input (500 history + 500 new)
Turn 3: 1,500 tokens input
Turn 10: 5,000 tokens input

Total input tokens for 10 turns: 27,500 tokens (not 5,000)

This quadratic growth is why long conversations get expensive fast, and why context management matters.

Tool Use Tokens

When you define tools for Claude, the tool definitions are included as input tokens on every request. Each tool definition costs roughly 200-500 tokens. If you define 10 tools:

Tool definitions: ~3,000 tokens per request
At 100 requests/day with Sonnet 4: 100 3,000 / 1M $3.00 = $0.90/day

When Claude calls a tool, the tool call JSON is output tokens and the tool result is input tokens.

Image Tokens

Images are converted to tokens based on resolution:

Image Size	Approximate Tokens
100x100	~100 tokens
512x512	~800 tokens
1024x1024	~1,600 tokens
2048x2048	~3,200 tokens
Full screenshot (1920x1080)	~2,400 tokens

For image-heavy applications, these token costs add up quickly.

PDF Tokens

PDF pages are converted to a combination of text and image tokens. A typical document page costs 1,000-3,000 tokens depending on content complexity.

Cost Calculator Examples

Example 1: Customer Support Chatbot

500 conversations per day
Average 5 turns per conversation
200-token system prompt
Using Sonnet 4 with prompt caching

Without caching:

System prompt: 500 5 200 / 1M * $3.00 = $1.50/day
User messages: 500 5 150 / 1M * $3.00 = $1.13/day
History replay: ~500 10 150 / 1M * $3.00 = $2.25/day
Responses: 500 5 300 / 1M * $15.00 = $11.25/day
Daily total: ~$16.13 ($484/month)

With prompt caching (system prompt cached):

System prompt (cached reads): 500 5 200 / 1M * $0.30 = $0.15/day (90% savings)
Other costs remain similar
Daily total: ~$14.78 ($443/month)

Example 2: Code Review Automation

50 PRs per day
Average diff: 2,000 tokens
System prompt with review guidelines: 1,500 tokens
Using Sonnet 4 with batch API

Real-time:

Input: 50 3,500 / 1M $3.00 = $0.53/day
Output: 50 1,000 / 1M $15.00 = $0.75/day
Daily total: $1.28 ($38/month)

Batch API (50% discount):

Input: 50 3,500 / 1M $1.50 = $0.26/day
Output: 50 1,000 / 1M $7.50 = $0.38/day
Daily total: $0.64 ($19/month)

Example 3: Document Analysis Pipeline

100 documents per day
Average document: 10,000 tokens
Summary output: 500 tokens per document
Using Haiku 3.5 (speed and cost priority)

Input: 100 10,000 / 1M $0.80 = $0.80/day
Output: 100 500 / 1M $4.00 = $0.20/day
Daily total: $1.00 ($30/month)

Same pipeline with Sonnet 4 would cost $4.50/day ($135/month). Model choice matters.

Billing and Payment

API Billing

Billed monthly in arrears
Usage tracked in real-time on the Anthropic Console
Prepaid credits available (required for tier upgrades)
No minimum commitment

Subscription Billing

Billed monthly (no annual discount currently)
Cancel anytime, access continues through billing period
Downgrades take effect at next billing cycle

Frequently Asked Questions

How do I check my current API spending?

Visit the Anthropic Console at console.anthropic.com. The Usage page shows real-time token counts and costs broken down by model and day.

Are there free API credits for new accounts?

Anthropic occasionally offers trial credits. Check the Console for current promotions. The free tier gives limited access without credit card.

Can I set a spending limit?

Yes. In the Console, navigate to Settings and set a monthly spending limit. The API will return errors once the limit is reached.

Do thinking tokens count against rate limits?

Yes. Extended thinking tokens count as output tokens for both billing and rate limit purposes.

Is there a difference between MTok pricing and per-token pricing?

No. MTok (million tokens) is just the unit. $15.00/MTok equals $0.000015 per token. The per-million convention avoids tiny decimal numbers.

How are images priced?

Images are converted to tokens based on their dimensions. A typical 1024x1024 image costs approximately 1,600 tokens at input rates.

Can I negotiate volume discounts?

For large-scale enterprise usage, contact Anthropic's sales team. Volume commitments can lead to custom pricing.

What happens if I exceed my rate limit?

You receive a 429 HTTP error. The response headers indicate when the limit resets. Implement retry logic to handle this gracefully.

How does Claude API pricing compare to running local models?

Local models (via Ollama, llama.cpp) have zero per-token cost but require GPU hardware ($2,000-$10,000+). For most developers, API access is cheaper unless you process millions of tokens daily. The breakeven point depends on your hardware cost, electricity, and usage volume.

Can I use multiple API keys for higher rate limits?

Each API key shares the organization-level rate limit. Multiple keys do not increase your total throughput. To get higher limits, upgrade your API tier by adding prepaid credits.

Try it: Estimate your Claude Code costs

I hit this exact error six months ago. Then I wrote a CLAUDE.md that tells Claude my stack, my conventions, and my error handling patterns. Haven't seen it since.

I run 5 Claude Max subs, 16 Chrome extensions serving 50K users, and bill $500K+ on Upwork. These CLAUDE.md templates are what I actually use.

Grab the templates — $99 once, free forever →

Built by Michael Lip — solo dev, Da Nang.