Fix Claude Code 429 Rate Limit with Retry-After
The Error
Error 429: Too Many Requests
{
"type": "error",
"error": {
"type": "rate_limit_error",
"message": "Number of request tokens has exceeded your per-minute rate limit"
}
}
# Headers include:
retry-after: 34
x-ratelimit-limit-requests: 1000
x-ratelimit-remaining-requests: 0
The Fix
1. Parse and respect the retry-after header
import anthropic
import time
def call_with_rate_limit(client, max_retries=5, **kwargs):
for attempt in range(max_retries):
try:
return client.messages.create(**kwargs)
except anthropic.RateLimitError as e:
retry_after = int(e.response.headers.get("retry-after", 30))
print(f"Rate limited. Waiting {retry_after}s (attempt {attempt + 1})")
time.sleep(retry_after)
raise RuntimeError("Rate limit exceeded after max retries")
2. Check your current rate limit status
curl -s -D - https://api.anthropic.com/v1/messages \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "content-type: application/json" \
-d '{"model":"claude-haiku-4-20250514","max_tokens":1,"messages":[{"role":"user","content":"hi"}]}' \
2>&1 | grep -i "x-ratelimit"
Why This Happens
Anthropic enforces per-minute rate limits on both request count and token count. The limits vary by plan tier (free, build, scale) and model. When you exceed either limit, the API returns 429 with a retry-after header indicating how many seconds to wait.
Common triggers: parallel requests from multiple Claude Code sessions, batch scripts without throttling, or a single large prompt that exceeds the tokens-per-minute budget.
If That Does Not Work
- Reduce parallelism: Run one Claude Code session at a time or add
time.sleep(2)between requests - Use the Batch API: It has separate, higher limits via
client.batches.create(...) - Check your limits: Visit
console.anthropic.com/settings/limitsto see your exact per-minute limits
Prevention
Always respect retry-after headers on 429 responses. Default to 2-second delays between API calls in batch scripts. Use a token counter to stay under per-minute token limits. Prefer the Batch API for workloads exceeding 50 requests per minute.
Estimate your usage with our Cost Calculator to stay under rate limits.
Master Claude Code
Get lifetime access to all ClaudHQ tools, advanced workflows, and production-grade templates.
Get Lifetime AccessWritten by the ClaudHQ team ยท Expert Claude Code guides and tools