Claude API Timeout Errors: Handling and Retry Guide
When you integrate the Claude API into a production application, timeout errors are inevitable. Network interruptions, high-latency regions, large prompts, and long-running completions all create situations where API calls exceed their time budget. The difference between an application that handles these failures gracefully and one that surfaces raw errors to users comes down to timeout configuration and retry logic.
I have built several production services that rely on the Claude API, and timeout handling is consistently one of the first things I get right in the architecture. This guide covers the practical patterns I use.
Why Claude API Calls Time Out
API timeouts fall into three categories, and each requires a different response strategy.
Connection Timeouts
The client cannot establish a TCP connection to the Anthropic API endpoint. Causes include DNS resolution failures, network partitions, firewall rules blocking outbound HTTPS, or the API endpoint being temporarily unreachable. Connection timeouts are fast (typically under 10 seconds) and almost always transient.
Read Timeouts
The connection is established, the request is sent, but the response takes too long to arrive. This is the most common timeout type with the Claude API because response generation time scales with output length. A request that produces 4,000 tokens takes meaningfully longer than one producing 200 tokens. Read timeouts are the ones you need to tune most carefully.
Overall Request Timeouts
Some HTTP client libraries enforce a single timeout covering the entire request lifecycle, from DNS lookup through the last byte of the response body. If your client's default is 30 seconds and the Claude API needs 45 seconds for a complex generation, the client kills the connection before the response completes.
Setting Timeout Values in the Anthropic SDK
Python
The official Anthropic Python SDK accepts a timeout parameter that controls the overall request timeout:
import anthropic
client = anthropic.Anthropic(
timeout=120.0, # seconds
)
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
messages=[{"role": "user", "content": "Analyze this codebase..."}]
)
For more granular control, you can use httpx.Timeout directly:
import httpx
import anthropic
client = anthropic.Anthropic(
timeout=httpx.Timeout(
connect=10.0, # time to establish connection
read=120.0, # time to receive response
write=30.0, # time to send request
pool=10.0 # time to acquire connection from pool
)
)
I recommend setting the read timeout to at least 120 seconds for any request where max_tokens exceeds 2,000. Complex analysis prompts with large context windows can legitimately take 60-90 seconds.
TypeScript/Node.js
The Anthropic TypeScript SDK follows a similar pattern:
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({
timeout: 120 * 1000, // milliseconds
});
const response = await client.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 4096,
messages: [{ role: "user", content: "Analyze this codebase..." }],
});
For per-request timeout overrides:
const response = await client.messages.create(
{
model: "claude-sonnet-4-20250514",
max_tokens: 4096,
messages: [{ role: "user", content: prompt }],
},
{ timeout: 180 * 1000 }
);
Per-request overrides are valuable when your application makes both fast (summarization) and slow (code generation) API calls. Set a conservative default and override for known-slow operations.
Implementing Retry Logic with Exponential Backoff
Not every timeout should be retried. Connection timeouts are almost always safe to retry. Read timeouts on non-streaming requests are safe because the API is stateless. But retrying after a partial streaming response requires more care to avoid duplicate processing.
Python Implementation
import time
import anthropic
from anthropic import APITimeoutError, APIConnectionError
def call_with_retry(client, messages, max_retries=3, base_delay=2.0):
for attempt in range(max_retries + 1):
try:
return client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
messages=messages,
)
except APITimeoutError as e:
if attempt == max_retries:
raise
delay = base_delay * (2 ** attempt) # 2s, 4s, 8s
print(f"Timeout on attempt {attempt + 1}, retrying in {delay}s")
time.sleep(delay)
except APIConnectionError as e:
if attempt == max_retries:
raise
delay = base_delay * (2 ** attempt)
print(f"Connection error on attempt {attempt + 1}, retrying in {delay}s")
time.sleep(delay)
The exponential backoff pattern (2 seconds, 4 seconds, 8 seconds) avoids hammering the API during temporary issues. I add jitter in production to prevent thundering herd problems when multiple instances retry simultaneously:
import random
delay = base_delay * (2 ** attempt) + random.uniform(0, 1)
TypeScript Implementation
import Anthropic from "@anthropic-ai/sdk";
async function callWithRetry(
client: Anthropic,
messages: Anthropic.MessageParam[],
maxRetries = 3,
baseDelay = 2000
): Promise<Anthropic.Message> {
for (let attempt = 0; attempt <= maxRetries; attempt++) {
try {
return await client.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 4096,
messages,
});
} catch (err) {
if (
attempt === maxRetries ||
!(err instanceof Anthropic.APIConnectionTimeoutError ||
err instanceof Anthropic.APIConnectionError)
) {
throw err;
}
const delay = baseDelay * Math.pow(2, attempt) + Math.random() * 1000;
console.log(`Attempt ${attempt + 1} failed, retrying in ${delay}ms`);
await new Promise((resolve) => setTimeout(resolve, delay));
}
}
throw new Error("Exhausted retries");
}
Handling Timeouts with Streaming Responses
Streaming significantly reduces perceived timeout risk because you receive tokens as they are generated. However, streaming introduces a different failure mode: the stream can stall mid-response.
Python Streaming with Timeout Recovery
import anthropic
client = anthropic.Anthropic(timeout=120.0)
collected_text = ""
try:
with client.messages.stream(
model="claude-sonnet-4-20250514",
max_tokens=4096,
messages=[{"role": "user", "content": prompt}],
) as stream:
for text in stream.text_stream:
collected_text += text
print(text, end="", flush=True)
except anthropic.APITimeoutError:
if collected_text:
print(f"\nStream timed out after receiving {len(collected_text)} chars")
# You have partial content - decide whether to use it or retry
else:
print("Stream timed out before any content was received")
# Safe to retry the full request
When a stream stalls, you have partial output. Whether you use that partial output or retry depends on your application. For user-facing chat, displaying what you received plus an error message is usually better than silently retrying and making the user wait again.
Production Patterns for Timeout Management
Circuit Breaker Pattern
If the Claude API is consistently timing out, continuing to send requests wastes resources and degrades user experience. A circuit breaker stops making calls after repeated failures and periodically tests whether the service has recovered:
class CircuitBreaker:
def __init__(self, failure_threshold=5, recovery_timeout=60):
self.failure_count = 0
self.failure_threshold = failure_threshold
self.recovery_timeout = recovery_timeout
self.last_failure_time = 0
self.state = "closed" # closed = healthy, open = failing
def can_execute(self):
if self.state == "closed":
return True
if time.time() - self.last_failure_time > self.recovery_timeout:
self.state = "half-open"
return True
return False
def record_success(self):
self.failure_count = 0
self.state = "closed"
def record_failure(self):
self.failure_count += 1
self.last_failure_time = time.time()
if self.failure_count >= self.failure_threshold:
self.state = "open"
This pairs well with rate limit handling strategies you should already have in place. Rate limit errors (429) and timeout errors often co-occur during API degradation events.
Timeout Budget Allocation
When your application makes multiple sequential API calls (for example, a chain where one response feeds into the next prompt), allocate a total time budget and reduce the timeout for each subsequent call:
import time
total_budget = 300 # 5 minutes for the entire chain
start_time = time.time()
for step in chain_steps:
elapsed = time.time() - start_time
remaining = total_budget - elapsed
if remaining < 30:
raise TimeoutError("Insufficient budget for remaining steps")
step_timeout = min(remaining * 0.5, 120)
client_with_timeout = anthropic.Anthropic(timeout=step_timeout)
result = call_with_retry(client_with_timeout, step.messages)
This prevents a single slow step from consuming the entire budget and leaving no time for later steps. The timeout budget workflow tutorial covers similar patterns in the context of Claude Code skill execution.
Choosing the Right Timeout Value
There is no universal correct timeout. The right value depends on your use case:
For interactive chat (user waiting for response): 60-90 seconds. Users will not wait longer than this. If the request cannot complete in this window, scope down the prompt.
For background processing (async jobs, batch operations): 120-300 seconds. Latency matters less, so give the API room to complete complex generations.
For real-time integrations (webhooks, middleware): 15-30 seconds. The upstream caller likely has its own timeout. Keep your API call well within that budget.
For agentic loops with tool use: 120-180 seconds per turn. Tool-using conversations involve multiple round trips within a single API call, each consuming time. See the main timeout guide for strategies on scoping agentic tasks.
Monitoring and Alerting
In production, track these metrics for Claude API calls:
- p50, p95, and p99 latency per endpoint
- Timeout rate as a percentage of total requests
- Retry success rate (what percentage of retries eventually succeed)
- Circuit breaker state transitions
A sudden spike in timeout rate often indicates an API-side issue. A gradual increase over weeks may indicate that your prompts are growing in size or complexity. Either way, these metrics give you the signal to act before users notice.
Summary
Handling Claude API timeouts reliably requires three layers:
- Correct timeout configuration in your SDK client, with separate values for connection and read timeouts
- Retry logic with exponential backoff and jitter for transient failures
- Application-level patterns (circuit breakers, timeout budgets, streaming) for sustained reliability
The defaults in most HTTP clients are too short for Claude API calls that generate long outputs. Set explicit timeouts, implement retries, and monitor the results.
Know your costs: Use our Claude Code Cost Calculator to estimate your monthly spend.
Related Reading
- Claude Code Skill Timeout Error: How to Increase the Limit — Comprehensive guide to all timeout causes and workarounds in Claude Code