Fix Claude Code 429 Rate Limit with Retry-After

May 2026 · ClaudHQ

The Error

Error 429: Too Many Requests
{
  "type": "error",
  "error": {
    "type": "rate_limit_error",
    "message": "Number of request tokens has exceeded your per-minute rate limit"
  }
}

# Headers include:
retry-after: 34
x-ratelimit-limit-requests: 1000
x-ratelimit-remaining-requests: 0

The Fix

1. Parse and respect the retry-after header

import anthropic
import time

def call_with_rate_limit(client, max_retries=5, **kwargs):
    for attempt in range(max_retries):
        try:
            return client.messages.create(**kwargs)
        except anthropic.RateLimitError as e:
            retry_after = int(e.response.headers.get("retry-after", 30))
            print(f"Rate limited. Waiting {retry_after}s (attempt {attempt + 1})")
            time.sleep(retry_after)
    raise RuntimeError("Rate limit exceeded after max retries")

2. Check your current rate limit status

curl -s -D - https://api.anthropic.com/v1/messages \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{"model":"claude-haiku-4-20250514","max_tokens":1,"messages":[{"role":"user","content":"hi"}]}' \
  2>&1 | grep -i "x-ratelimit"

Why This Happens

Anthropic enforces per-minute rate limits on both request count and token count. The limits vary by plan tier (free, build, scale) and model. When you exceed either limit, the API returns 429 with a retry-after header indicating how many seconds to wait.

Common triggers: parallel requests from multiple Claude Code sessions, batch scripts without throttling, or a single large prompt that exceeds the tokens-per-minute budget.

If That Does Not Work

Reduce parallelism: Run one Claude Code session at a time or add time.sleep(2) between requests
Use the Batch API: It has separate, higher limits via client.batches.create(...)
Check your limits: Visit console.anthropic.com/settings/limits to see your exact per-minute limits

Prevention

Always respect retry-after headers on 429 responses. Default to 2-second delays between API calls in batch scripts. Use a token counter to stay under per-minute token limits. Prefer the Batch API for workloads exceeding 50 requests per minute.

Estimate your usage with our Cost Calculator to stay under rate limits.

Master Claude Code

Get lifetime access to all ClaudHQ tools, advanced workflows, and production-grade templates.

Get Lifetime Access

Written by the ClaudHQ team · Expert Claude Code guides and tools