Claude Code API Gateway Configuration (2026)
Claude Code API Gateway Configuration Guide
API gateways serve as the critical entry point for securing, managing, and routing requests to your Claude Code integrations. Whether you're building a multi-tenant SaaS platform or integrating Claude Code into an enterprise workflow, proper gateway configuration ensures reliable performance, security, and observability. This guide covers practical configuration patterns for developers and power users working with Claude Code behind API gateways.
Understanding the Gateway Role with Claude Code
When you expose Claude Code capabilities through an API gateway, you gain several advantages: centralized authentication, rate limiting, request transformation, and detailed analytics. The gateway sits between your clients and the Claude Code execution environment, handling cross-cutting concerns so your core application logic remains clean.
Claude Code operates through a command-line interface that processes prompts and returns responses. When wrapping this capability behind a gateway, you essentially create a RESTful or GraphQL facade that translates HTTP requests into Claude Code invocations. This pattern works well whether you're using the supermemory skill for contextual memory management or integrating with external services.
Why a Gateway Matters for AI Workloads
Claude Code integrations have characteristics that differ from typical API backends and that make gateway configuration more important:
- Variable response times: A code review task may complete in 2 seconds; a full codebase analysis may take 3 minutes. Your gateway must accommodate this range without timing out legitimate requests.
- Token-based cost model: Unlike traditional APIs where requests cost roughly the same, Claude Code requests vary enormously in cost based on prompt and response size. Rate limiting by request count alone is insufficient. you often need token-aware throttling.
- Stateful context: Many Claude Code workflows maintain context across multiple requests. Gateway session handling must preserve this continuity.
- Large payloads: Code files, context windows, and structured responses can produce payloads far larger than typical REST responses. Compression and buffering configuration matters significantly.
Choosing the Right Gateway for Your Use Case
Before diving into configuration, choosing the right gateway tool saves significant rework. Here is a comparison of the most common options:
| Gateway | Best For | Rate Limiting | Auth Options | Cost |
|---|---|---|---|---|
| Nginx | Simple reverse proxy, high performance | Basic (req/s) | JWT module | Free |
| Kong Gateway | Plugin ecosystem, enterprise features | Advanced (tiered) | OAuth, LDAP, JWT | Free/Enterprise |
| AWS API Gateway | AWS-native deployments, serverless | Per-method, usage plans | IAM, Cognito, Lambda | Pay-per-request |
| Traefik | Kubernetes-native, dynamic config | Middleware-based | Forward Auth | Free/Enterprise |
| Caddy | Simple HTTPS, automatic certs | Basic via plugins | JWT plugins | Free |
| Envoy | Service mesh, advanced observability | Token bucket, global rate limit | External auth | Free |
For most development teams getting started, Nginx covers 80% of requirements with minimal complexity. Kong is the right choice when you need plugin-based extensibility without writing code. AWS API Gateway makes sense when your Claude Code integration runs on AWS Lambda or ECS and you want native IAM integration.
Basic Gateway Configuration Patterns
Nginx Configuration
Nginx provides a straightforward reverse proxy setup for Claude Code endpoints. Here's a practical configuration:
server {
listen 443 ssl http2;
server_name api.yourdomain.com;
ssl_certificate /etc/ssl/certs/yourcert.pem;
ssl_certificate_key /etc/ssl/private/yourkey.key;
location /claude/ {
proxy_pass http://localhost:8080;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# Timeout configuration for long-running Claude invocations
proxy_read_timeout 300s;
proxy_connect_timeout 75s;
}
# Rate limiting zone
limit_req_zone $binary_remote_addr zone=claude_api:10m rate=10r/s;
location /claude/invoke {
limit_req zone=claude_api burst=20 nodelay;
proxy_pass http://localhost:8080;
}
}
This configuration handles SSL termination, preserves client identity through headers, and implements rate limiting to prevent abuse. Adjust the rate and burst values based on your expected traffic patterns.
For production deployments, extend this base with buffer tuning to handle large code payloads:
location /claude/ {
proxy_pass http://localhost:8080;
proxy_http_version 1.1;
# Buffer configuration for large code payloads
proxy_buffer_size 128k;
proxy_buffers 4 256k;
proxy_busy_buffers_size 256k;
client_max_body_size 10m;
# Timeouts tuned for AI workloads
proxy_read_timeout 300s;
proxy_connect_timeout 75s;
proxy_send_timeout 300s;
send_timeout 300s;
# Headers
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header X-Request-ID $request_id;
}
The X-Request-ID header is particularly valuable: it creates a traceable identifier that propagates through your entire stack, making it possible to correlate gateway logs, application logs, and Claude Code execution records for a single request.
Kong Gateway Integration
For more advanced requirements, Kong Gateway offers plugin-based extensibility. Install the Claude Code service and route:
Register Claude Code service
curl -X POST http://localhost:8001/services \
-d "name=claude-code-service" \
-d "url=http://localhost:8080"
Add route
curl -X POST http://localhost:8001/services/claude-code-service/routes \
-d "name=claude-code-route" \
-d "paths[]=/claude" \
-d "methods[]=POST"
Kong's plugin ecosystem allows you to add authentication, rate limiting, request transformation, and analytics without modifying your Claude Code implementation. The pdf skill, for instance, can generate reports from gateway analytics data.
To configure the full plugin stack for a production Claude Code deployment, apply these plugins to your service in order:
1. Key authentication
curl -X POST http://localhost:8001/services/claude-code-service/plugins \
-d "name=key-auth" \
-d "config.key_names[]=X-API-Key" \
-d "config.hide_credentials=true"
2. Rate limiting (per consumer, backed by Redis)
curl -X POST http://localhost:8001/services/claude-code-service/plugins \
-d "name=rate-limiting" \
-d "config.minute=60" \
-d "config.hour=500" \
-d "config.policy=redis" \
-d "config.redis_host=localhost"
3. Request size limiting (prevent oversized payloads)
curl -X POST http://localhost:8001/services/claude-code-service/plugins \
-d "name=request-size-limiting" \
-d "config.allowed_payload_size=10"
4. Correlation ID for tracing
curl -X POST http://localhost:8001/services/claude-code-service/plugins \
-d "name=correlation-id" \
-d "config.header_name=X-Correlation-ID" \
-d "config.generator=uuid#counter"
5. Prometheus metrics
curl -X POST http://localhost:8001/services/claude-code-service/plugins \
-d "name=prometheus"
This layered approach means that by the time a request reaches your Claude Code backend, it has already been authenticated, counted against rate limits, validated for size, and tagged with a correlation ID.
AWS API Gateway Configuration
For teams running Claude Code on AWS infrastructure, API Gateway with Lambda proxy integration provides a fully managed entry point:
{
"openapi": "3.0.1",
"info": {
"title": "Claude Code API",
"version": "1.0"
},
"paths": {
"/invoke": {
"post": {
"operationId": "invokeClaudeCode",
"requestBody": {
"required": true,
"content": {
"application/json": {
"schema": {
"type": "object",
"properties": {
"prompt": { "type": "string" },
"context": { "type": "object" }
},
"required": ["prompt"]
}
}
}
},
"x-amazon-apigateway-integration": {
"type": "aws_proxy",
"httpMethod": "POST",
"uri": "arn:aws:apigateway:us-east-1:lambda:path/2015-03-31/functions/arn:aws:lambda:us-east-1:123456789:function:claude-code-handler/invocations",
"passthroughBehavior": "when_no_match"
}
}
}
}
}
Configure usage plans to enforce tiered access:
Create a usage plan for team-level access
aws apigateway create-usage-plan \
--name "team-plan" \
--throttle burstLimit=50,rateLimit=10 \
--quota limit=5000,period=MONTH \
--api-stages apiId=abc123,stage=prod
AWS API Gateway handles TLS termination, geographic distribution via CloudFront integration, and native IAM authorization. removing significant operational overhead from your team.
Authentication and Security
JWT Validation
Secure your Claude Code endpoint by validating JWT tokens at the gateway:
Nginx with jwt validation module
location /claude/ {
auth_jwt "Claude API" token=$http_authorization;
auth_jwt_key_file /etc/nginx/jwt-keys.json;
proxy_pass http://localhost:8080;
}
The JWT should contain claims relevant to your authorization model. Include user ID, roles, and permissions to implement fine-grained access control.
A well-structured JWT payload for Claude Code access control might look like:
{
"sub": "user-789",
"email": "developer@company.com",
"roles": ["claude-code-user"],
"permissions": {
"max_tokens_per_request": 4000,
"allowed_skills": ["tdd", "pdf", "docx"],
"rate_limit_tier": "standard"
},
"iat": 1711123456,
"exp": 1711209856
}
By embedding permissions in the JWT, your gateway can enforce access rules without making a database call for every request. The permissions travel with the token and are validated cryptographically.
OAuth 2.0 Proxy Integration
For organizations using OAuth 2.0, integrate an authorization proxy:
oauth2-proxy configuration
apiVersion: v1
kind: ConfigMap
metadata:
name: oauth2-proxy-config
data:
config.yaml: |
provider: "azure"
client_id: "your-client-id"
client_secret: "your-client-secret"
cookie_secret: "your-cookie-secret"
email_domains: ["yourcompany.com"]
upstreams: ["http://localhost:8080"]
pass_basic_auth: false
pass_host_header: true
This setup works particularly well when combining Claude Code with the tdd skill for test-driven development workflows, where developers need authenticated access to AI-assisted coding assistance.
API Key Management
For service-to-service integrations, API key authentication is simpler than OAuth but requires careful key lifecycle management:
Nginx API key validation using map
map $http_x_api_key $api_client {
default "";
"key-abc-123" "service-a";
"key-def-456" "service-b";
"key-ghi-789" "service-c";
}
server {
listen 443 ssl http2;
location /claude/ {
# Reject requests with unknown or missing API keys
if ($api_client = "") {
return 401 '{"error": "Invalid or missing API key"}';
}
# Pass client identity to backend for logging
proxy_set_header X-Client-ID $api_client;
proxy_pass http://localhost:8080;
}
}
In production, replace the static map with a Lua-based lookup against a Redis key store to enable dynamic key management without reloading Nginx configuration.
Comparing Authentication Methods
| Method | Complexity | Scalability | Revocation Speed | Best Use Case |
|---|---|---|---|---|
| API Keys | Low | High | Minutes (config reload) | Service-to-service |
| JWT (stateless) | Medium | Very High | Next expiry (hours/days) | User-facing APIs |
| JWT + blocklist | Medium-High | High | Immediate | High-security user APIs |
| OAuth 2.0 | High | High | Immediate | Enterprise SSO |
| mTLS | High | Medium | Certificate revocation | Internal microservices |
For most Claude Code deployments, JWT with short expiry times (1-4 hours) provides the best balance of security and operational simplicity.
Rate Limiting and Throttling
Claude Code operations vary significantly in execution time. A simple prompt might complete in seconds, while complex code generation or analysis tasks can take minutes. Implement tiered rate limiting to accommodate this variance:
// Kong rate limiting plugin configuration
{
"name": "rate-limiting",
"config": {
"minute": 60,
"hour": 500,
"policy": "redis",
"redis_host": "localhost",
"redis_port": 6379,
"hide_client_headers": false
}
}
Consider implementing a queue system for long-running operations rather than holding open HTTP connections. This approach, combined with the frontend-design skill for building status dashboards, provides better user experience for intensive Claude Code workloads.
Tiered Rate Limiting by Consumer
A flat rate limit treats all users the same, which does not reflect real usage patterns. Implement consumer tiers in Kong to give power users higher limits:
Create consumer tiers
curl -X POST http://localhost:8001/consumers -d "username=free-tier-user"
curl -X POST http://localhost:8001/consumers -d "username=pro-tier-user"
curl -X POST http://localhost:8001/consumers -d "username=enterprise-tier-user"
Apply different rate limits to each tier
Free tier: 10 req/min
curl -X POST http://localhost:8001/consumers/free-tier-user/plugins \
-d "name=rate-limiting" \
-d "config.minute=10" \
-d "config.hour=100"
Pro tier: 60 req/min
curl -X POST http://localhost:8001/consumers/pro-tier-user/plugins \
-d "name=rate-limiting" \
-d "config.minute=60" \
-d "config.hour=1000"
Enterprise tier: 300 req/min
curl -X POST http://localhost:8001/consumers/enterprise-tier-user/plugins \
-d "name=rate-limiting" \
-d "config.minute=300" \
-d "config.hour=10000"
This creates a billing-aligned rate limiting structure where your most valuable customers get the capacity they need without impacting other users.
Queue-Based Architecture for Long Operations
For Claude Code tasks that run longer than 30 seconds, a synchronous HTTP model creates problems: clients may timeout, load balancers may terminate connections, and retry logic can cause duplicate execution. A queue-based pattern solves this:
// Express.js endpoint that queues Claude Code jobs
const { Queue } = require('bullmq');
const { v4: uuidv4 } = require('uuid');
const claudeQueue = new Queue('claude-jobs', {
connection: { host: 'localhost', port: 6379 }
});
app.post('/claude/invoke', async (req, res) => {
const jobId = uuidv4();
await claudeQueue.add('invoke', {
prompt: req.body.prompt,
context: req.body.context,
userId: req.user.id
}, {
jobId,
attempts: 2,
backoff: { type: 'exponential', delay: 5000 }
});
// Return immediately with job ID. client polls /claude/status/:jobId
res.status(202).json({
jobId,
statusUrl: `/claude/status/${jobId}`,
estimatedWait: '30-120 seconds'
});
});
app.get('/claude/status/:jobId', async (req, res) => {
const job = await claudeQueue.getJob(req.params.jobId);
if (!job) return res.status(404).json({ error: 'Job not found' });
const state = await job.getState();
const response = { jobId: job.id, state };
if (state === 'completed') {
response.result = job.returnvalue;
} else if (state === 'failed') {
response.error = job.failedReason;
}
res.json(response);
});
This pattern decouples request submission from result retrieval, enabling Claude Code to handle variable-duration tasks without timeout pressure.
Request and Response Transformation
Input Formatting
Gateway-level transformation allows you to standardize input before reaching Claude Code:
-- Kong request transformer plugin
local function transform_request(plugin, configuration, api)
local method = ngx.req.get_method()
if method == "POST" then
local body = ngx.req.get_body_data()
local json = require("cjson").decode(body)
-- Wrap user prompt with system context
json.system = "You are a code review assistant. " .. (json.system or "")
json.context = {
repository = ngx.var.http_x_repo,
branch = ngx.var.http_x_branch
}
ngx.req.set_body_data(json.encode(json))
ngx.req.set_header("Content-Type", "application/json")
end
end
This pattern enables sophisticated prompt engineering at the gateway layer, injecting context based on HTTP headers or path parameters.
Prompt Injection Prevention
When transforming user input, you must guard against prompt injection. attempts by users to override system-level instructions through crafted input. Add a sanitization step before forwarding to Claude Code:
// Express middleware for prompt injection prevention
function sanitizePrompt(req, res, next) {
const { prompt } = req.body;
if (!prompt || typeof prompt !== 'string') {
return res.status(400).json({ error: 'Invalid prompt' });
}
// Reject prompts containing instruction override patterns
const injectionPatterns = [
/ignore (all |previous |above )?instructions/i,
/disregard (all |previous |above )?instructions/i,
/system prompt:/i,
/<\|im_start\|>/i,
/\[INST\]/i
];
const hasInjection = injectionPatterns.some(pattern => pattern.test(prompt));
if (hasInjection) {
return res.status(400).json({ error: 'Prompt contains disallowed content' });
}
// Enforce maximum prompt length
if (prompt.length > 50000) {
return res.status(400).json({ error: 'Prompt exceeds maximum length' });
}
next();
}
app.post('/claude/invoke', sanitizePrompt, async (req, res) => {
// Safe to proceed. prompt has been validated
});
Response Handling
Transform Claude Code responses for your specific client needs:
Nginx response transformation
location /claude/ {
proxy_pass http://localhost:8080;
# Add metadata headers
add_header X-Response-Time $upstream_response_time;
add_header X-CLAUDE-Model $upstream_http_x_claude_model;
# Compress responses
gzip on;
gzip_types application/json text/plain;
}
Streaming Response Support
For long Claude Code responses, streaming improves perceived performance significantly. Configure your gateway to pass through Server-Sent Events:
location /claude/stream {
proxy_pass http://localhost:8080;
proxy_http_version 1.1;
# Disable buffering for streaming responses
proxy_buffering off;
proxy_cache off;
# Required for SSE
proxy_set_header Connection '';
chunked_transfer_encoding on;
# Extended timeout for streaming sessions
proxy_read_timeout 600s;
# Headers for SSE
add_header Cache-Control no-cache;
add_header X-Accel-Buffering no;
}
With streaming enabled, clients receive partial responses as they are generated rather than waiting for the full completion. For large code generation tasks, this means the user sees the first functions appearing within seconds instead of waiting for the entire file.
Monitoring and Observability
Logging Configuration
Detailed logging supports debugging and performance analysis:
Structured JSON logging
log_format json_combined escape=json
'{'
'"time_local":"$time_local",'
'"remote_addr":"$remote_addr",'
'"request":"$request",'
'"status": "$status",'
'"body_bytes_sent":"$body_bytes_sent",'
'"request_time":"$request_time",'
'"http_referer":"$http_referer",'
'"http_user_agent":"$http_user_agent",'
'"request_id":"$request_id"'
'}';
access_log /var/log/nginx/claude_access.log json_combined;
error_log /var/log/nginx/claude_error.log warn;
Integrate these logs with your observability stack. The supermemory skill can help maintain historical context across gateway logs for pattern recognition.
Metrics Collection
Export Prometheus metrics from your gateway:
Prometheus metrics export
- name: prometheus-metrics
config:
metrics: |
claude_requests_total{status,method,route} counter
claude_request_duration_seconds{status,method,route} histogram
claude_active_connections gauge
These metrics inform scaling decisions and help identify performance bottlenecks in your Claude Code integration.
Key Metrics to Track
For Claude Code specifically, standard web API metrics are necessary but insufficient. Track these additional signals:
| Metric | Why It Matters | Alert Threshold |
|---|---|---|
| p95 request duration | Catches slow completions before users complain | > 60s |
| Queue depth | Early warning for capacity saturation | > 20 jobs |
| Token usage per hour | Cost control and budget tracking | Per-budget |
| 429 rate (per consumer) | Identifies users hitting limits | > 5% of requests |
| Upstream error rate | Claude Code service health | > 1% of requests |
| Active streaming connections | Resource usage for SSE | > capacity limit |
| Prompt rejection rate | Security: injection attempts | > 0.1% of requests |
Set up alerting on queue depth and upstream error rate immediately. these are the two signals most likely to indicate a production problem before users file support tickets.
Distributed Tracing
Add trace context propagation to correlate gateway activity with downstream Claude Code execution:
Nginx trace ID generation and propagation
(requires nginx-opentracing module)
opentracing_load_tracer /usr/local/lib/libjaegertracing.so /etc/jaeger-config.json;
location /claude/ {
opentracing_trace_locations off;
opentracing_operation_name "claude-code-request";
opentracing_propagate_context;
proxy_pass http://localhost:8080;
}
With distributed tracing active, a single slow request can be inspected from gateway receipt through queue wait time through Claude Code execution. making performance investigations dramatically faster.
Health Checks and Circuit Breaking
Backend Health Checks
Configure active health checks to detect Claude Code backend failures before they affect users:
Nginx upstream with health checks
upstream claude_backend {
server localhost:8080;
server localhost:8081 backup;
# Active health check (requires nginx-plus or custom module)
keepalive 32;
}
For Kong, enable built-in health checking:
curl -X PATCH http://localhost:8001/services/claude-code-service \
-d "healthchecks.active.healthy.interval=10" \
-d "healthchecks.active.unhealthy.interval=5" \
-d "healthchecks.active.http_path=/health" \
-d "healthchecks.passive.unhealthy.http_failures=3"
Circuit Breaker Pattern
For high-traffic deployments, implement circuit breaking to fail fast when the Claude Code backend is overwhelmed:
// Opossum circuit breaker for Claude Code calls
const CircuitBreaker = require('opossum');
const options = {
timeout: 30000, // Consider request failed after 30s
errorThresholdPercentage: 20, // Open circuit if 20%+ of requests fail
resetTimeout: 60000 // Try again after 60s
};
const claudeCircuit = new CircuitBreaker(invokeClaudeCode, options);
claudeCircuit.on('open', () => {
console.warn('Circuit breaker opened. Claude Code backend is degraded');
});
claudeCircuit.on('halfOpen', () => {
console.info('Circuit breaker half-open. testing backend recovery');
});
claudeCircuit.on('close', () => {
console.info('Circuit breaker closed. Claude Code backend recovered');
});
app.post('/claude/invoke', async (req, res) => {
try {
const result = await claudeCircuit.fire(req.body);
res.json(result);
} catch (error) {
if (claudeCircuit.opened) {
return res.status(503).json({
error: 'Service temporarily unavailable',
retryAfter: 60
});
}
next(error);
}
});
The circuit breaker prevents a degraded Claude Code backend from cascading into gateway-level failures by failing fast and returning a clear error to clients.
Conclusion
API gateway configuration for Claude Code involves balancing security, performance, and observability. Start with basic proxy configuration, then layer in authentication, rate limiting, and transformation as your use cases demand. The patterns shown here translate across gateway implementations. whether you use Nginx, Kong, AWS API Gateway, or another platform.
A practical deployment sequence: start with Nginx reverse proxy and JWT authentication, add Redis-backed rate limiting once you understand your traffic patterns, then instrument with Prometheus and structured logging before you go to production. Add queue-based async handling once request durations start exceeding 30 seconds regularly, and introduce circuit breaking when you have multiple Claude Code backend instances that can fail independently.
Remember that Claude Code integrates well with specialized skills like the pdf skill for document generation, canvas-design for visual outputs, and docx for structured reporting. Your gateway configuration should accommodate these varied response types while maintaining consistent security and performance characteristics.
This took me 3 hours to figure out. I put it in a CLAUDE.md so I'd never figure it out again. Now Claude gets it right on the first try, every project.
16 framework templates. Next.js, FastAPI, Laravel, Rails, Go, Rust, Terraform, and 9 more. Each one 300+ lines of "here's exactly how this stack works." Copy into your project. Done.
$99 once. Yours forever. I keep adding templates monthly.
Related Reading
Estimate usage → Calculate your token consumption with our Token Estimator.
I hit this exact error six months ago. Then I wrote a CLAUDE.md that tells Claude my stack, my conventions, and my error handling patterns. Haven't seen it since.
I run 5 Claude Max subs, 16 Chrome extensions serving 50K users, and bill $500K+ on Upwork. These CLAUDE.md templates are what I actually use.