Claude Code API Gateway Configuration (2026)

May 2026 · ClaudHQ

Claude Code API Gateway Configuration Guide

API gateways serve as the critical entry point for securing, managing, and routing requests to your Claude Code integrations. Whether you're building a multi-tenant SaaS platform or integrating Claude Code into an enterprise workflow, proper gateway configuration ensures reliable performance, security, and observability. This guide covers practical configuration patterns for developers and power users working with Claude Code behind API gateways.

Understanding the Gateway Role with Claude Code

When you expose Claude Code capabilities through an API gateway, you gain several advantages: centralized authentication, rate limiting, request transformation, and detailed analytics. The gateway sits between your clients and the Claude Code execution environment, handling cross-cutting concerns so your core application logic remains clean.

Claude Code operates through a command-line interface that processes prompts and returns responses. When wrapping this capability behind a gateway, you essentially create a RESTful or GraphQL facade that translates HTTP requests into Claude Code invocations. This pattern works well whether you're using the supermemory skill for contextual memory management or integrating with external services.

Why a Gateway Matters for AI Workloads

Claude Code integrations have characteristics that differ from typical API backends and that make gateway configuration more important:

Variable response times: A code review task may complete in 2 seconds; a full codebase analysis may take 3 minutes. Your gateway must accommodate this range without timing out legitimate requests.
Token-based cost model: Unlike traditional APIs where requests cost roughly the same, Claude Code requests vary enormously in cost based on prompt and response size. Rate limiting by request count alone is insufficient. you often need token-aware throttling.
Stateful context: Many Claude Code workflows maintain context across multiple requests. Gateway session handling must preserve this continuity.
Large payloads: Code files, context windows, and structured responses can produce payloads far larger than typical REST responses. Compression and buffering configuration matters significantly.

Choosing the Right Gateway for Your Use Case

Before diving into configuration, choosing the right gateway tool saves significant rework. Here is a comparison of the most common options:

Gateway	Best For	Rate Limiting	Auth Options	Cost
Nginx	Simple reverse proxy, high performance	Basic (req/s)	JWT module	Free
Kong Gateway	Plugin ecosystem, enterprise features	Advanced (tiered)	OAuth, LDAP, JWT	Free/Enterprise
AWS API Gateway	AWS-native deployments, serverless	Per-method, usage plans	IAM, Cognito, Lambda	Pay-per-request
Traefik	Kubernetes-native, dynamic config	Middleware-based	Forward Auth	Free/Enterprise
Caddy	Simple HTTPS, automatic certs	Basic via plugins	JWT plugins	Free
Envoy	Service mesh, advanced observability	Token bucket, global rate limit	External auth	Free

For most development teams getting started, Nginx covers 80% of requirements with minimal complexity. Kong is the right choice when you need plugin-based extensibility without writing code. AWS API Gateway makes sense when your Claude Code integration runs on AWS Lambda or ECS and you want native IAM integration.

Basic Gateway Configuration Patterns

Nginx Configuration

Nginx provides a straightforward reverse proxy setup for Claude Code endpoints. Here's a practical configuration:

server {
 listen 443 ssl http2;
 server_name api.yourdomain.com;

 ssl_certificate /etc/ssl/certs/yourcert.pem;
 ssl_certificate_key /etc/ssl/private/yourkey.key;

 location /claude/ {
 proxy_pass http://localhost:8080;
 proxy_http_version 1.1;
 proxy_set_header Host $host;
 proxy_set_header X-Real-IP $remote_addr;
 proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
 proxy_set_header X-Forwarded-Proto $scheme;

 # Timeout configuration for long-running Claude invocations
 proxy_read_timeout 300s;
 proxy_connect_timeout 75s;
 }

 # Rate limiting zone
 limit_req_zone $binary_remote_addr zone=claude_api:10m rate=10r/s;

 location /claude/invoke {
 limit_req zone=claude_api burst=20 nodelay;
 proxy_pass http://localhost:8080;
 }
}

This configuration handles SSL termination, preserves client identity through headers, and implements rate limiting to prevent abuse. Adjust the rate and burst values based on your expected traffic patterns.

For production deployments, extend this base with buffer tuning to handle large code payloads:

location /claude/ {
 proxy_pass http://localhost:8080;
 proxy_http_version 1.1;

 # Buffer configuration for large code payloads
 proxy_buffer_size 128k;
 proxy_buffers 4 256k;
 proxy_busy_buffers_size 256k;
 client_max_body_size 10m;

 # Timeouts tuned for AI workloads
 proxy_read_timeout 300s;
 proxy_connect_timeout 75s;
 proxy_send_timeout 300s;
 send_timeout 300s;

 # Headers
 proxy_set_header Host $host;
 proxy_set_header X-Real-IP $remote_addr;
 proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
 proxy_set_header X-Forwarded-Proto $scheme;
 proxy_set_header X-Request-ID $request_id;
}

The X-Request-ID header is particularly valuable: it creates a traceable identifier that propagates through your entire stack, making it possible to correlate gateway logs, application logs, and Claude Code execution records for a single request.

Kong Gateway Integration

For more advanced requirements, Kong Gateway offers plugin-based extensibility. Install the Claude Code service and route:

Register Claude Code service
curl -X POST http://localhost:8001/services \
 -d "name=claude-code-service" \
 -d "url=http://localhost:8080"

Add route
curl -X POST http://localhost:8001/services/claude-code-service/routes \
 -d "name=claude-code-route" \
 -d "paths[]=/claude" \
 -d "methods[]=POST"

Kong's plugin ecosystem allows you to add authentication, rate limiting, request transformation, and analytics without modifying your Claude Code implementation. The pdf skill, for instance, can generate reports from gateway analytics data.

To configure the full plugin stack for a production Claude Code deployment, apply these plugins to your service in order:

1. Key authentication
curl -X POST http://localhost:8001/services/claude-code-service/plugins \
 -d "name=key-auth" \
 -d "config.key_names[]=X-API-Key" \
 -d "config.hide_credentials=true"

2. Rate limiting (per consumer, backed by Redis)
curl -X POST http://localhost:8001/services/claude-code-service/plugins \
 -d "name=rate-limiting" \
 -d "config.minute=60" \
 -d "config.hour=500" \
 -d "config.policy=redis" \
 -d "config.redis_host=localhost"

3. Request size limiting (prevent oversized payloads)
curl -X POST http://localhost:8001/services/claude-code-service/plugins \
 -d "name=request-size-limiting" \
 -d "config.allowed_payload_size=10"

4. Correlation ID for tracing
curl -X POST http://localhost:8001/services/claude-code-service/plugins \
 -d "name=correlation-id" \
 -d "config.header_name=X-Correlation-ID" \
 -d "config.generator=uuid#counter"

5. Prometheus metrics
curl -X POST http://localhost:8001/services/claude-code-service/plugins \
 -d "name=prometheus"

This layered approach means that by the time a request reaches your Claude Code backend, it has already been authenticated, counted against rate limits, validated for size, and tagged with a correlation ID.

AWS API Gateway Configuration

For teams running Claude Code on AWS infrastructure, API Gateway with Lambda proxy integration provides a fully managed entry point:

{
 "openapi": "3.0.1",
 "info": {
 "title": "Claude Code API",
 "version": "1.0"
 },
 "paths": {
 "/invoke": {
 "post": {
 "operationId": "invokeClaudeCode",
 "requestBody": {
 "required": true,
 "content": {
 "application/json": {
 "schema": {
 "type": "object",
 "properties": {
 "prompt": { "type": "string" },
 "context": { "type": "object" }
 },
 "required": ["prompt"]
 }
 }
 }
 },
 "x-amazon-apigateway-integration": {
 "type": "aws_proxy",
 "httpMethod": "POST",
 "uri": "arn:aws:apigateway:us-east-1:lambda:path/2015-03-31/functions/arn:aws:lambda:us-east-1:123456789:function:claude-code-handler/invocations",
 "passthroughBehavior": "when_no_match"
 }
 }
 }
 }
}

Configure usage plans to enforce tiered access:

Create a usage plan for team-level access
aws apigateway create-usage-plan \
 --name "team-plan" \
 --throttle burstLimit=50,rateLimit=10 \
 --quota limit=5000,period=MONTH \
 --api-stages apiId=abc123,stage=prod

AWS API Gateway handles TLS termination, geographic distribution via CloudFront integration, and native IAM authorization. removing significant operational overhead from your team.

Authentication and Security

JWT Validation

Secure your Claude Code endpoint by validating JWT tokens at the gateway:

Nginx with jwt validation module
location /claude/ {
 auth_jwt "Claude API" token=$http_authorization;
 auth_jwt_key_file /etc/nginx/jwt-keys.json;

 proxy_pass http://localhost:8080;
}

The JWT should contain claims relevant to your authorization model. Include user ID, roles, and permissions to implement fine-grained access control.

A well-structured JWT payload for Claude Code access control might look like:

{
 "sub": "user-789",
 "email": "developer@company.com",
 "roles": ["claude-code-user"],
 "permissions": {
 "max_tokens_per_request": 4000,
 "allowed_skills": ["tdd", "pdf", "docx"],
 "rate_limit_tier": "standard"
 },
 "iat": 1711123456,
 "exp": 1711209856
}

By embedding permissions in the JWT, your gateway can enforce access rules without making a database call for every request. The permissions travel with the token and are validated cryptographically.

OAuth 2.0 Proxy Integration

For organizations using OAuth 2.0, integrate an authorization proxy:

oauth2-proxy configuration
apiVersion: v1
kind: ConfigMap
metadata:
 name: oauth2-proxy-config
data:
 config.yaml: |
 provider: "azure"
 client_id: "your-client-id"
 client_secret: "your-client-secret"
 cookie_secret: "your-cookie-secret"
 email_domains: ["yourcompany.com"]
 upstreams: ["http://localhost:8080"]
 pass_basic_auth: false
 pass_host_header: true

This setup works particularly well when combining Claude Code with the tdd skill for test-driven development workflows, where developers need authenticated access to AI-assisted coding assistance.

API Key Management

For service-to-service integrations, API key authentication is simpler than OAuth but requires careful key lifecycle management:

Nginx API key validation using map
map $http_x_api_key $api_client {
 default "";
 "key-abc-123" "service-a";
 "key-def-456" "service-b";
 "key-ghi-789" "service-c";
}

server {
 listen 443 ssl http2;

 location /claude/ {
 # Reject requests with unknown or missing API keys
 if ($api_client = "") {
 return 401 '{"error": "Invalid or missing API key"}';
 }

 # Pass client identity to backend for logging
 proxy_set_header X-Client-ID $api_client;
 proxy_pass http://localhost:8080;
 }
}

In production, replace the static map with a Lua-based lookup against a Redis key store to enable dynamic key management without reloading Nginx configuration.

Comparing Authentication Methods

Method	Complexity	Scalability	Revocation Speed	Best Use Case
API Keys	Low	High	Minutes (config reload)	Service-to-service
JWT (stateless)	Medium	Very High	Next expiry (hours/days)	User-facing APIs
JWT + blocklist	Medium-High	High	Immediate	High-security user APIs
OAuth 2.0	High	High	Immediate	Enterprise SSO
mTLS	High	Medium	Certificate revocation	Internal microservices

For most Claude Code deployments, JWT with short expiry times (1-4 hours) provides the best balance of security and operational simplicity.

Rate Limiting and Throttling

Claude Code operations vary significantly in execution time. A simple prompt might complete in seconds, while complex code generation or analysis tasks can take minutes. Implement tiered rate limiting to accommodate this variance:

// Kong rate limiting plugin configuration
{
 "name": "rate-limiting",
 "config": {
 "minute": 60,
 "hour": 500,
 "policy": "redis",
 "redis_host": "localhost",
 "redis_port": 6379,
 "hide_client_headers": false
 }
}

Consider implementing a queue system for long-running operations rather than holding open HTTP connections. This approach, combined with the frontend-design skill for building status dashboards, provides better user experience for intensive Claude Code workloads.

Tiered Rate Limiting by Consumer

A flat rate limit treats all users the same, which does not reflect real usage patterns. Implement consumer tiers in Kong to give power users higher limits:

Create consumer tiers
curl -X POST http://localhost:8001/consumers -d "username=free-tier-user"
curl -X POST http://localhost:8001/consumers -d "username=pro-tier-user"
curl -X POST http://localhost:8001/consumers -d "username=enterprise-tier-user"

Apply different rate limits to each tier
Free tier: 10 req/min
curl -X POST http://localhost:8001/consumers/free-tier-user/plugins \
 -d "name=rate-limiting" \
 -d "config.minute=10" \
 -d "config.hour=100"

Pro tier: 60 req/min
curl -X POST http://localhost:8001/consumers/pro-tier-user/plugins \
 -d "name=rate-limiting" \
 -d "config.minute=60" \
 -d "config.hour=1000"

Enterprise tier: 300 req/min
curl -X POST http://localhost:8001/consumers/enterprise-tier-user/plugins \
 -d "name=rate-limiting" \
 -d "config.minute=300" \
 -d "config.hour=10000"

This creates a billing-aligned rate limiting structure where your most valuable customers get the capacity they need without impacting other users.

Queue-Based Architecture for Long Operations

For Claude Code tasks that run longer than 30 seconds, a synchronous HTTP model creates problems: clients may timeout, load balancers may terminate connections, and retry logic can cause duplicate execution. A queue-based pattern solves this:

// Express.js endpoint that queues Claude Code jobs
const { Queue } = require('bullmq');
const { v4: uuidv4 } = require('uuid');

const claudeQueue = new Queue('claude-jobs', {
 connection: { host: 'localhost', port: 6379 }
});

app.post('/claude/invoke', async (req, res) => {
 const jobId = uuidv4();

 await claudeQueue.add('invoke', {
 prompt: req.body.prompt,
 context: req.body.context,
 userId: req.user.id
 }, {
 jobId,
 attempts: 2,
 backoff: { type: 'exponential', delay: 5000 }
 });

 // Return immediately with job ID. client polls /claude/status/:jobId
 res.status(202).json({
 jobId,
 statusUrl: `/claude/status/${jobId}`,
 estimatedWait: '30-120 seconds'
 });
});

app.get('/claude/status/:jobId', async (req, res) => {
 const job = await claudeQueue.getJob(req.params.jobId);

 if (!job) return res.status(404).json({ error: 'Job not found' });

 const state = await job.getState();
 const response = { jobId: job.id, state };

 if (state === 'completed') {
 response.result = job.returnvalue;
 } else if (state === 'failed') {
 response.error = job.failedReason;
 }

 res.json(response);
});

This pattern decouples request submission from result retrieval, enabling Claude Code to handle variable-duration tasks without timeout pressure.

Request and Response Transformation

Input Formatting

Gateway-level transformation allows you to standardize input before reaching Claude Code:

-- Kong request transformer plugin
local function transform_request(plugin, configuration, api)
 local method = ngx.req.get_method()

 if method == "POST" then
 local body = ngx.req.get_body_data()
 local json = require("cjson").decode(body)

 -- Wrap user prompt with system context
 json.system = "You are a code review assistant. " .. (json.system or "")
 json.context = {
 repository = ngx.var.http_x_repo,
 branch = ngx.var.http_x_branch
 }

 ngx.req.set_body_data(json.encode(json))
 ngx.req.set_header("Content-Type", "application/json")
 end
end

This pattern enables sophisticated prompt engineering at the gateway layer, injecting context based on HTTP headers or path parameters.

Prompt Injection Prevention

When transforming user input, you must guard against prompt injection. attempts by users to override system-level instructions through crafted input. Add a sanitization step before forwarding to Claude Code:

// Express middleware for prompt injection prevention
function sanitizePrompt(req, res, next) {
 const { prompt } = req.body;

 if (!prompt || typeof prompt !== 'string') {
 return res.status(400).json({ error: 'Invalid prompt' });
 }

 // Reject prompts containing instruction override patterns
 const injectionPatterns = [
 /ignore (all |previous |above )?instructions/i,
 /disregard (all |previous |above )?instructions/i,
 /system prompt:/i,
 /<\|im_start\|>/i,
 /\[INST\]/i
 ];

 const hasInjection = injectionPatterns.some(pattern => pattern.test(prompt));
 if (hasInjection) {
 return res.status(400).json({ error: 'Prompt contains disallowed content' });
 }

 // Enforce maximum prompt length
 if (prompt.length > 50000) {
 return res.status(400).json({ error: 'Prompt exceeds maximum length' });
 }

 next();
}

app.post('/claude/invoke', sanitizePrompt, async (req, res) => {
 // Safe to proceed. prompt has been validated
});

Response Handling

Transform Claude Code responses for your specific client needs:

Nginx response transformation
location /claude/ {
 proxy_pass http://localhost:8080;

 # Add metadata headers
 add_header X-Response-Time $upstream_response_time;
 add_header X-CLAUDE-Model $upstream_http_x_claude_model;

 # Compress responses
 gzip on;
 gzip_types application/json text/plain;
}

Streaming Response Support

For long Claude Code responses, streaming improves perceived performance significantly. Configure your gateway to pass through Server-Sent Events:

location /claude/stream {
 proxy_pass http://localhost:8080;
 proxy_http_version 1.1;

 # Disable buffering for streaming responses
 proxy_buffering off;
 proxy_cache off;

 # Required for SSE
 proxy_set_header Connection '';
 chunked_transfer_encoding on;

 # Extended timeout for streaming sessions
 proxy_read_timeout 600s;

 # Headers for SSE
 add_header Cache-Control no-cache;
 add_header X-Accel-Buffering no;
}

With streaming enabled, clients receive partial responses as they are generated rather than waiting for the full completion. For large code generation tasks, this means the user sees the first functions appearing within seconds instead of waiting for the entire file.

Monitoring and Observability

Logging Configuration

Detailed logging supports debugging and performance analysis:

Structured JSON logging
log_format json_combined escape=json
 '{'
 '"time_local":"$time_local",'
 '"remote_addr":"$remote_addr",'
 '"request":"$request",'
 '"status": "$status",'
 '"body_bytes_sent":"$body_bytes_sent",'
 '"request_time":"$request_time",'
 '"http_referer":"$http_referer",'
 '"http_user_agent":"$http_user_agent",'
 '"request_id":"$request_id"'
 '}';

access_log /var/log/nginx/claude_access.log json_combined;
error_log /var/log/nginx/claude_error.log warn;

Integrate these logs with your observability stack. The supermemory skill can help maintain historical context across gateway logs for pattern recognition.

Metrics Collection

Export Prometheus metrics from your gateway:

Prometheus metrics export
- name: prometheus-metrics
 config:
 metrics: |
 claude_requests_total{status,method,route} counter
 claude_request_duration_seconds{status,method,route} histogram
 claude_active_connections gauge

These metrics inform scaling decisions and help identify performance bottlenecks in your Claude Code integration.

Key Metrics to Track

For Claude Code specifically, standard web API metrics are necessary but insufficient. Track these additional signals:

Metric	Why It Matters	Alert Threshold
p95 request duration	Catches slow completions before users complain	> 60s
Queue depth	Early warning for capacity saturation	> 20 jobs
Token usage per hour	Cost control and budget tracking	Per-budget
429 rate (per consumer)	Identifies users hitting limits	> 5% of requests
Upstream error rate	Claude Code service health	> 1% of requests
Active streaming connections	Resource usage for SSE	> capacity limit
Prompt rejection rate	Security: injection attempts	> 0.1% of requests

Set up alerting on queue depth and upstream error rate immediately. these are the two signals most likely to indicate a production problem before users file support tickets.

Distributed Tracing

Add trace context propagation to correlate gateway activity with downstream Claude Code execution:

Nginx trace ID generation and propagation
(requires nginx-opentracing module)
opentracing_load_tracer /usr/local/lib/libjaegertracing.so /etc/jaeger-config.json;

location /claude/ {
 opentracing_trace_locations off;
 opentracing_operation_name "claude-code-request";
 opentracing_propagate_context;

 proxy_pass http://localhost:8080;
}

With distributed tracing active, a single slow request can be inspected from gateway receipt through queue wait time through Claude Code execution. making performance investigations dramatically faster.

Health Checks and Circuit Breaking

Backend Health Checks

Configure active health checks to detect Claude Code backend failures before they affect users:

Nginx upstream with health checks
upstream claude_backend {
 server localhost:8080;
 server localhost:8081 backup;

 # Active health check (requires nginx-plus or custom module)
 keepalive 32;
}

For Kong, enable built-in health checking:

curl -X PATCH http://localhost:8001/services/claude-code-service \
 -d "healthchecks.active.healthy.interval=10" \
 -d "healthchecks.active.unhealthy.interval=5" \
 -d "healthchecks.active.http_path=/health" \
 -d "healthchecks.passive.unhealthy.http_failures=3"

Circuit Breaker Pattern

For high-traffic deployments, implement circuit breaking to fail fast when the Claude Code backend is overwhelmed:

// Opossum circuit breaker for Claude Code calls
const CircuitBreaker = require('opossum');

const options = {
 timeout: 30000, // Consider request failed after 30s
 errorThresholdPercentage: 20, // Open circuit if 20%+ of requests fail
 resetTimeout: 60000 // Try again after 60s
};

const claudeCircuit = new CircuitBreaker(invokeClaudeCode, options);

claudeCircuit.on('open', () => {
 console.warn('Circuit breaker opened. Claude Code backend is degraded');
});

claudeCircuit.on('halfOpen', () => {
 console.info('Circuit breaker half-open. testing backend recovery');
});

claudeCircuit.on('close', () => {
 console.info('Circuit breaker closed. Claude Code backend recovered');
});

app.post('/claude/invoke', async (req, res) => {
 try {
 const result = await claudeCircuit.fire(req.body);
 res.json(result);
 } catch (error) {
 if (claudeCircuit.opened) {
 return res.status(503).json({
 error: 'Service temporarily unavailable',
 retryAfter: 60
 });
 }
 next(error);
 }
});

The circuit breaker prevents a degraded Claude Code backend from cascading into gateway-level failures by failing fast and returning a clear error to clients.

Conclusion

API gateway configuration for Claude Code involves balancing security, performance, and observability. Start with basic proxy configuration, then layer in authentication, rate limiting, and transformation as your use cases demand. The patterns shown here translate across gateway implementations. whether you use Nginx, Kong, AWS API Gateway, or another platform.

A practical deployment sequence: start with Nginx reverse proxy and JWT authentication, add Redis-backed rate limiting once you understand your traffic patterns, then instrument with Prometheus and structured logging before you go to production. Add queue-based async handling once request durations start exceeding 30 seconds regularly, and introduce circuit breaking when you have multiple Claude Code backend instances that can fail independently.

Remember that Claude Code integrates well with specialized skills like the pdf skill for document generation, canvas-design for visual outputs, and docx for structured reporting. Your gateway configuration should accommodate these varied response types while maintaining consistent security and performance characteristics.

This took me 3 hours to figure out. I put it in a CLAUDE.md so I'd never figure it out again. Now Claude gets it right on the first try, every project.

16 framework templates. Next.js, FastAPI, Laravel, Rails, Go, Rust, Terraform, and 9 more. Each one 300+ lines of "here's exactly how this stack works." Copy into your project. Done.

See the templates →

$99 once. Yours forever. I keep adding templates monthly.