Claude API Pricing Complete Guide (2026)

Anthropic offers Claude through consumer subscriptions (Free, Pro, Max) and developer API access with per-token billing. The pricing structure has evolved significantly through 2025 and 2026, with the introduction of new model tiers, batch processing discounts, and prompt caching.

This guide breaks down every pricing dimension so you can calculate your actual costs before committing to a plan or API integration.

Consumer Plans Overview

These plans give you access to Claude through claude.ai and mobile apps. They do not provide API access.

Free Tier

Pro Plan

Max Plan (5x)

Max Plan (20x)

API Pricing by Model

API pricing is per-token, measured separately for input and output tokens. All prices are per million tokens (MTok).

Claude Opus 4 (claude-opus-4-0520)

Anthropic's most capable model. Best for complex reasoning, multi-step analysis, and agentic workflows.

MetricPrice
Input tokens$15.00 / MTok
Output tokens$75.00 / MTok
Context window200K tokens
Max output32K tokens
Training data cutoffEarly 2025

Cost example: A 2,000-token prompt with a 1,000-token response costs:

Claude Sonnet 4 (claude-sonnet-4-0514)

Balanced model for most production workloads. Strong coding, analysis, and writing capabilities at a lower price point than Opus.

MetricPrice
Input tokens$3.00 / MTok
Output tokens$15.00 / MTok
Context window200K tokens
Max output16K tokens
Training data cutoffEarly 2025

Cost example: Same 2,000-token prompt with 1,000-token response:

Claude Haiku 3.5 (claude-3-5-haiku-20241022)

Fastest model. Best for high-volume, latency-sensitive tasks like classification, extraction, and simple Q&A.

MetricPrice
Input tokens$0.80 / MTok
Output tokens$4.00 / MTok
Context window200K tokens
Max output8K tokens
Training data cutoffEarly 2024

Cost example: Same prompt:

Model Comparison Table

ModelInput/MTokOutput/MTokSpeedBest For
Opus 4$15.00$75.00SlowestComplex reasoning, agentic tasks
Sonnet 4$3.00$15.00MediumGeneral production, coding
Haiku 3.5$0.80$4.00FastestHigh volume, classification

Prompt Caching

Prompt caching reduces costs when you send the same prompt prefix repeatedly. This is critical for applications that include large system prompts, CLAUDE.md files, or static context in every request.

How Prompt Caching Works

  1. First request: Full price for all tokens. Cached tokens incur a 25% write surcharge.
  2. Subsequent requests: Cached portion is charged at a reduced read rate.
  3. Cache lifetime: 5 minutes from last use (extends with each hit).

Prompt Caching Prices

ModelCache Write (per MTok)Cache Read (per MTok)Savings vs Input
Opus 4$18.75 (+25%)$1.5090% on reads
Sonnet 4$3.75 (+25%)$0.3090% on reads
Haiku 3.5$1.00 (+25%)$0.0890% on reads

Cache Savings Example

Imagine a chatbot with a 4,000-token system prompt using Sonnet 4:

Without caching (100 requests):

With caching (100 requests):

Cache Requirements

Batch API

The Batch API processes requests asynchronously at a 50% discount. Results are returned within 24 hours.

Batch API Prices

ModelInput/MTokOutput/MTokDiscount
Opus 4$7.50$37.5050%
Sonnet 4$1.50$7.5050%
Haiku 3.5$0.40$2.0050%

When to Use Batch API

Batch API Limitations

Extended Thinking Pricing

Extended thinking allows Claude to "think" before responding, improving performance on complex tasks. Thinking tokens are billed as output tokens.

ModelThinking Token PriceSame As
Opus 4$75.00 / MTokOutput rate
Sonnet 4$15.00 / MTokOutput rate

Important: Extended thinking can generate thousands of thinking tokens before the visible response. A request that appears to have a 500-token response may actually consume 5,000+ tokens when thinking tokens are included.

Controlling Extended Thinking Costs

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 5000  # Cap thinking tokens
    },
    messages=[{"role": "user", "content": "Your prompt"}]
)

Setting budget_tokens caps the thinking cost. Without it, Claude may use the full output token limit for thinking.

Rate Limits

Rate limits vary by plan tier and are measured in requests per minute (RPM) and tokens per minute (TPM).

API Rate Limits by Tier

TierRPMInput TPMOutput TPMSpend Requirement
Tier 1 (Free)5040,0008,000$0
Tier 21,00080,00016,000$40 credit
Tier 32,000160,00032,000$200 credit
Tier 44,000400,00080,000$400+ spend

Rate limit headers are included in every API response:

x-ratelimit-limit-requests: 1000
x-ratelimit-limit-tokens: 80000
x-ratelimit-remaining-requests: 999
x-ratelimit-remaining-tokens: 79000
x-ratelimit-reset-requests: 2026-04-24T12:00:01Z
x-ratelimit-reset-tokens: 2026-04-24T12:00:01Z

Handling Rate Limits

When you hit a rate limit, the API returns a 429 error. Best practices:

  1. Implement exponential backoff with jitter
  2. Track remaining tokens from response headers
  3. Use request queuing for high-volume applications
  4. Consider the Batch API for non-real-time workloads
import time
import random

def call_with_backoff(func, max_retries=5):
    for attempt in range(max_retries):
        try:
            return func()
        except anthropic.RateLimitError:
            wait = (2 ** attempt) + random.uniform(0, 1)
            time.sleep(wait)
    raise Exception("Max retries exceeded")

Claude Code Pricing

Claude Code uses the API under the hood. When you use Claude Code with your own API key, you pay standard API rates for whichever model you select.

Claude Code with Subscription Plans

Claude Code with API Key

When using ANTHROPIC_API_KEY, you pay per-token at standard API rates. A typical Claude Code session costs:

Track your Claude Code costs with token usage auditing.

Claude Code with OpenRouter

You can route Claude Code through OpenRouter for potentially different pricing and access to multiple model providers through a single API key.

Cost Optimization Strategies

1. Choose the Right Model

Use Haiku 3.5 for simple tasks, Sonnet 4 for most work, and Opus 4 only when you need maximum quality. A common pattern:

# Route by task complexity
def get_model(task_type):
    if task_type == "classification":
        return "claude-3-5-haiku-20241022"  # $0.80/MTok input
    elif task_type == "coding":
        return "claude-sonnet-4-20250514"   # $3.00/MTok input
    elif task_type == "architecture":
        return "claude-opus-4-0520"         # $15.00/MTok input

2. Maximize Prompt Caching

Structure your prompts with static content first:

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": large_static_system_prompt,  # Cached
                "cache_control": {"type": "ephemeral"}
            },
            {
                "type": "text",
                "text": dynamic_user_query  # Not cached
            }
        ]
    }
]

3. Use Batch API for Non-Urgent Work

If your workload can tolerate 24-hour latency, the 50% batch discount is substantial:

Monthly VolumeReal-time (Sonnet)Batch (Sonnet)Savings
10M input tokens$30.00$15.00$15.00
10M output tokens$150.00$75.00$75.00
Total$180.00$90.00$90.00

4. Optimize Token Usage

5. Monitor and Alert

Set up cost alerts to catch unexpected usage spikes before they become expensive.

Comparing Claude API to Competitors

Claude vs OpenAI GPT-4o

MetricClaude Sonnet 4GPT-4o
Input/MTok$3.00$2.50
Output/MTok$15.00$10.00
Context window200K128K
Batch discount50%50%
Caching discount90% on reads50% on reads

Claude vs Google Gemini 2.5

MetricClaude Sonnet 4Gemini 2.5 Pro
Input/MTok$3.00$1.25
Output/MTok$15.00$10.00
Context window200K1M
Batch discount50%None
Caching90% readsFree after 128K

When Claude Is More Cost-Effective

Token Counting: What Costs Money

Understanding what counts as tokens is critical for cost prediction. Many developers are surprised by the actual token counts in their API calls.

Text Tokens

English text averages 4 characters per token. Examples:

Code is less token-efficient than prose because of syntax characters:

System Prompt Tokens

Every API call includes a system prompt (even if you do not specify one, Anthropic adds a default). System prompts count as input tokens on every request.

For applications with a 2,000-token system prompt making 1,000 API calls per day using Sonnet 4:

This is why prompt caching is critical for production applications.

Conversation History Tokens

In multi-turn conversations, the entire history is sent with each new message. A conversation with 10 turns where each turn averages 500 tokens:

Total input tokens for 10 turns: 27,500 tokens (not 5,000)

This quadratic growth is why long conversations get expensive fast, and why context management matters.

Tool Use Tokens

When you define tools for Claude, the tool definitions are included as input tokens on every request. Each tool definition costs roughly 200-500 tokens. If you define 10 tools:

When Claude calls a tool, the tool call JSON is output tokens and the tool result is input tokens.

Image Tokens

Images are converted to tokens based on resolution:

Image SizeApproximate Tokens
100x100~100 tokens
512x512~800 tokens
1024x1024~1,600 tokens
2048x2048~3,200 tokens
Full screenshot (1920x1080)~2,400 tokens

For image-heavy applications, these token costs add up quickly.

PDF Tokens

PDF pages are converted to a combination of text and image tokens. A typical document page costs 1,000-3,000 tokens depending on content complexity.

Cost Calculator Examples

Example 1: Customer Support Chatbot

Without caching:

With prompt caching (system prompt cached):

Example 2: Code Review Automation

Real-time:

Batch API (50% discount):

Example 3: Document Analysis Pipeline

Same pipeline with Sonnet 4 would cost $4.50/day ($135/month). Model choice matters.

Billing and Payment

API Billing

Subscription Billing

Frequently Asked Questions

How do I check my current API spending?

Visit the Anthropic Console at console.anthropic.com. The Usage page shows real-time token counts and costs broken down by model and day.

Are there free API credits for new accounts?

Anthropic occasionally offers trial credits. Check the Console for current promotions. The free tier gives limited access without credit card.

Can I set a spending limit?

Yes. In the Console, navigate to Settings and set a monthly spending limit. The API will return errors once the limit is reached.

Do thinking tokens count against rate limits?

Yes. Extended thinking tokens count as output tokens for both billing and rate limit purposes.

Is there a difference between MTok pricing and per-token pricing?

No. MTok (million tokens) is just the unit. $15.00/MTok equals $0.000015 per token. The per-million convention avoids tiny decimal numbers.

How are images priced?

Images are converted to tokens based on their dimensions. A typical 1024x1024 image costs approximately 1,600 tokens at input rates.

Can I negotiate volume discounts?

For large-scale enterprise usage, contact Anthropic's sales team. Volume commitments can lead to custom pricing.

What happens if I exceed my rate limit?

You receive a 429 HTTP error. The response headers indicate when the limit resets. Implement retry logic to handle this gracefully.

How does Claude API pricing compare to running local models?

Local models (via Ollama, llama.cpp) have zero per-token cost but require GPU hardware ($2,000-$10,000+). For most developers, API access is cheaper unless you process millions of tokens daily. The breakeven point depends on your hardware cost, electricity, and usage volume.

Can I use multiple API keys for higher rate limits?

Each API key shares the organization-level rate limit. Multiple keys do not increase your total throughput. To get higher limits, upgrade your API tier by adding prepaid credits.

I hit this exact error six months ago. Then I wrote a CLAUDE.md that tells Claude my stack, my conventions, and my error handling patterns. Haven't seen it since.

I run 5 Claude Max subs, 16 Chrome extensions serving 50K users, and bill $500K+ on Upwork. These CLAUDE.md templates are what I actually use.

Grab the templates — $99 once, free forever →

Built by Michael Lip — solo dev, Da Nang.