Rate Limiting Architecture
A²D implements in-memory rate limiting for MCP endpoints to prevent abuse and ensure fair usage across all organizations.
Overview
Rate limiting protects the platform from excessive requests while maintaining high performance for normal usage patterns.
Key Features:
- In-Memory Storage: Fast, no external dependencies
- Sliding Window Algorithm: Fair distribution of requests over time
- Environment-Aware: Only enforced in production
- Automatic Cleanup: Prevents memory leaks in long-running processes
- Detailed Logging: Track rate limit events with structured logging
Rate Limit Tiers
Rate limits are now fully configurable via environment variables and database settings:
| Endpoint | Identifier | Default Limit | Window | Configuration Method |
|---|---|---|---|---|
/api/platform/[id]/mcp | IP Address | 100 req/min | 60s | Two-tier: Global max + per-org override |
/api/platform-mcp/mcp | Organization ID | 500 req/min | 60s | Environment variable only |
Why Different Limits?
Public Endpoints (100/min default):
- Accessed by external clients
- Rate limited by IP to prevent individual abuse
- Lower default protects against DDoS and scraping
- Configurable per-organization for premium tiers
Platform Endpoints (500/min default):
- Accessed by authenticated organizations
- Higher limit for legitimate high-volume usage
- Organization-scoped for fair multi-tenant usage
- Simpler configuration (environment variable only)
Configuration
A²D uses a flexible two-tier configuration system for rate limiting that balances infrastructure protection with organizational flexibility.
Public MCP Endpoints
Public endpoints (/api/platform/[id]/mcp) use a two-tier system:
Tier 1: Global Maximum (Environment Variable)
Sets the absolute ceiling that no organization can exceed:
MAX_PUBLIC_RATE_LIMIT=1000 # Default: 100, Recommended: 1000Add this to your .env file or configure in your deployment platform (Vercel, AWS, etc.):
# .env or Vercel environment variables
MAX_PUBLIC_RATE_LIMIT=1000Purpose:
- Infrastructure protection: Prevents any single organization from overwhelming the system
- Global policy enforcement: Ensures fair resource allocation across all tenants
- Graceful fallback: Used when organization-specific limit is not set
Tier 2: Per-Organization Override (Database)
Organizations can set custom limits via the max_public_rate_limit column in the organizations table:
-- Set organization-specific limit
UPDATE organizations
SET max_public_rate_limit = 500
WHERE id = 'org-uuid-here';
-- Set different limits for different tiers
UPDATE organizations
SET max_public_rate_limit = 1000
WHERE name LIKE 'Premium%';
UPDATE organizations
SET max_public_rate_limit = 200
WHERE name LIKE 'Basic%';
-- View all organization-specific limits
SELECT
name,
max_public_rate_limit,
created_at
FROM organizations
WHERE max_public_rate_limit IS NOT NULL
ORDER BY max_public_rate_limit DESC;
-- Remove override (use global MAX_PUBLIC_RATE_LIMIT)
UPDATE organizations
SET max_public_rate_limit = NULL
WHERE id = 'org-uuid-here';Limit Calculation Logic:
The actual rate limit applied is the minimum of the two tiers:
actual_limit = min(org.max_public_rate_limit, MAX_PUBLIC_RATE_LIMIT)If org.max_public_rate_limit is NULL, the global MAX_PUBLIC_RATE_LIMIT is used.
Example Scenarios:
| Global Max | Org Override | Applied Limit | Explanation |
|---|---|---|---|
| 1000 | 500 | 500 | Org limit is lower, org gets 500/min |
| 1000 | 1500 | 1000 | Org limit exceeds global max, capped at 1000/min |
| 1000 | NULL | 1000 | No org override, uses global default of 1000/min |
| 100 | NULL | 100 | No org override, uses global default of 100/min |
| 1000 | 250 | 250 | Org requests lower limit, gets 250/min |
Use Cases:
- Premium tiers: Set higher limits (up to global max) for paying customers
- Basic tiers: Set lower limits for free or trial accounts
- Throttling: Temporarily reduce limits for misbehaving organizations
- Testing: Set higher limits for internal testing organizations
Platform MCP Endpoint
Platform endpoints (/api/platform-mcp/mcp) use a simple environment variable configuration:
PLATFORM_RATE_LIMIT=500 # Default: 500Add to your .env file:
# .env or Vercel environment variables
PLATFORM_RATE_LIMIT=500Why No Per-Organization Override?
- Platform endpoints are already authenticated and organization-scoped
- Simpler configuration reduces complexity
- All authenticated organizations are trusted equally
- Can be added in the future if needed
Caching Strategy
To optimize performance and reduce database load, organization-specific rate limits are cached for 5 minutes:
Cache Behavior:
- First request: Queries database for
org.max_public_rate_limit - Subsequent requests: Uses cached value for 5 minutes
- Cache miss: Falls back to global
MAX_PUBLIC_RATE_LIMIT - Database error: Gracefully falls back to global limit
Implementation:
interface OrgRateLimitCache {
limit: number
fetchedAt: number
}
const orgLimitCache = new Map<string, OrgRateLimitCache>()
const CACHE_TTL_MS = 5 * 60 * 1000 // 5 minutes
async function getOrgRateLimit(orgId: string): Promise<number> {
const now = Date.now()
const cached = orgLimitCache.get(orgId)
// Return cached value if fresh
if (cached && now - cached.fetchedAt < CACHE_TTL_MS) {
return cached.limit
}
// Query database for org-specific limit
const { data, error } = await supabase
.from('organizations')
.select('max_public_rate_limit')
.eq('id', orgId)
.single()
// Fallback to global max on error
if (error) {
return parseInt(process.env.MAX_PUBLIC_RATE_LIMIT || '100', 10)
}
// Calculate actual limit (cap at global max)
const globalMax = parseInt(process.env.MAX_PUBLIC_RATE_LIMIT || '100', 10)
const orgLimit = data.max_public_rate_limit || globalMax
const actualLimit = Math.min(orgLimit, globalMax)
// Cache the result
orgLimitCache.set(orgId, { limit: actualLimit, fetchedAt: now })
return actualLimit
}Cache Performance:
- Database queries: Only on cache miss (once per 5 minutes per org)
- Memory overhead: ~100 bytes per cached organization
- Typical memory usage: ~100 KB for 1000 organizations
- Automatic cleanup: Stale entries removed with rate limit store cleanup
Configuration Examples
Example 1: Default Setup (No Custom Limits)
MAX_PUBLIC_RATE_LIMIT=100
PLATFORM_RATE_LIMIT=500-- All organizations have NULL max_public_rate_limit
-- Result: All orgs get 100 req/min for public endpointsExample 2: Tiered Service Levels
MAX_PUBLIC_RATE_LIMIT=1000
PLATFORM_RATE_LIMIT=500-- Premium tier: Higher limits
UPDATE organizations
SET max_public_rate_limit = 1000
WHERE subscription_tier = 'premium';
-- Standard tier: Medium limits
UPDATE organizations
SET max_public_rate_limit = 500
WHERE subscription_tier = 'standard';
-- Free tier: Lower limits
UPDATE organizations
SET max_public_rate_limit = 100
WHERE subscription_tier = 'free';Example 3: Gradual Rollout
# Start with conservative global max
MAX_PUBLIC_RATE_LIMIT=200
PLATFORM_RATE_LIMIT=500-- Enable higher limits for beta testers
UPDATE organizations
SET max_public_rate_limit = 1000
WHERE id IN (SELECT organization_id FROM beta_program);Then increase global max once stable:
# Increase global max after validation
MAX_PUBLIC_RATE_LIMIT=1000
PLATFORM_RATE_LIMIT=500Example 4: Throttling Misbehaving Organization
-- Temporarily reduce limit for problematic org
UPDATE organizations
SET max_public_rate_limit = 50
WHERE id = 'misbehaving-org-uuid';
-- Monitor behavior, then restore
UPDATE organizations
SET max_public_rate_limit = NULL -- Uses global default
WHERE id = 'misbehaving-org-uuid';Sliding Window Algorithm
A²D uses a sliding window algorithm for rate limiting, providing more accurate and fair rate limiting than fixed windows.
How It Works
Token Bucket Model:
Algorithm Steps:
- First Request: Creates new entry with count=1, resetAt=now+60s
- Subsequent Requests: Increments count if resetAt > now
- Window Expired: Resets count to 1, new resetAt=now+60s
- Limit Exceeded: Returns 429 with retry information
Example Timeline
Time Request Count Remaining Status
----- ------- ----- --------- ------
00:00 #1 1 99 ✓ Allowed
00:01 #2 2 98 ✓ Allowed
00:30 #50 50 50 ✓ Allowed
00:59 #100 100 0 ✓ Allowed
00:59 #101 100 0 ✗ 429 (retry after 1s)
01:00 #102 1 99 ✓ Allowed (new window)Enforcement Examples with Different Configurations
Scenario 1: Organization with No Override (Uses Global Max)
-- Organization Alpha has no custom limit
SELECT max_public_rate_limit FROM organizations WHERE name = 'Org Alpha';
-- Result: NULLMAX_PUBLIC_RATE_LIMIT=1000Result: Org Alpha gets 1000 requests/minute (global default).
Scenario 2: Organization with Lower Override
-- Organization Beta sets a conservative limit
UPDATE organizations SET max_public_rate_limit = 500 WHERE name = 'Org Beta';MAX_PUBLIC_RATE_LIMIT=1000Result: Org Beta gets 500 requests/minute (org override is lower than global max).
Scenario 3: Organization Tries to Exceed Global Max
-- Organization Gamma tries to set a very high limit
UPDATE organizations SET max_public_rate_limit = 5000 WHERE name = 'Org Gamma';MAX_PUBLIC_RATE_LIMIT=1000Result: Org Gamma gets 1000 requests/minute (capped at global max, not 5000).
Scenario 4: Different Limits for Different Tiers
-- Set up tiered limits
UPDATE organizations SET max_public_rate_limit = 100 WHERE subscription_tier = 'free';
UPDATE organizations SET max_public_rate_limit = 500 WHERE subscription_tier = 'standard';
UPDATE organizations SET max_public_rate_limit = 1000 WHERE subscription_tier = 'premium';MAX_PUBLIC_RATE_LIMIT=1000Result:
- Free tier orgs: 100 requests/minute
- Standard tier orgs: 500 requests/minute
- Premium tier orgs: 1000 requests/minute
Scenario 5: Platform Endpoint (Simpler Configuration)
PLATFORM_RATE_LIMIT=500Result: All organizations get 500 requests/minute for platform MCP endpoint, regardless of database settings.
Implementation Details
In-Memory Storage
Rate limiting uses a simple in-memory Map for tracking requests:
interface RateLimitEntry {
count: number // Number of requests in current window
resetAt: number // Timestamp when count resets
}
const rateLimitStore = new Map<string, RateLimitEntry>()Key Format:
- Public:
public:{ip}(e.g.,public:192.168.1.100) - Platform:
platform:{orgId}(e.g.,platform:11111111-1111-1111-1111-111111111111)
Core Logic
function checkRateLimit(
key: string,
limit: number,
windowMs: number = 60000
): RateLimitResult {
const now = Date.now()
const entry = rateLimitStore.get(key)
// No entry or expired - start new window
if (!entry || entry.resetAt < now) {
rateLimitStore.set(key, { count: 1, resetAt: now + windowMs })
return { allowed: true, remaining: limit - 1, resetAt: now + windowMs }
}
// Within limit - increment
if (entry.count < limit) {
entry.count++
return { allowed: true, remaining: limit - entry.count, resetAt: entry.resetAt }
}
// Limit exceeded
const retryAfter = Math.ceil((entry.resetAt - now) / 1000)
return { allowed: false, remaining: 0, resetAt: entry.resetAt, retryAfter }
}Automatic Cleanup
To prevent memory leaks, expired entries are cleaned up every 5 minutes:
setInterval(() => {
const now = Date.now()
rateLimitStore.forEach((entry, key) => {
if (entry.resetAt < now) {
rateLimitStore.delete(key)
}
})
}, 5 * 60 * 1000)Cleanup Strategy:
- Runs every 5 minutes in background
- Removes entries where
resetAt < now - Logs cleanup count at debug level
- Minimal performance impact
Environment Handling
Production Mode
When NODE_ENV=production, rate limiting is fully enforced:
export function checkPublicRateLimit(ip: string): RateLimitResult {
if (process.env.NODE_ENV !== 'production') {
return { allowed: true, remaining: 100, resetAt: Date.now() + 60000 }
}
const key = `public:${ip}`
return checkRateLimit(key, 100, 60000)
}Production Behavior:
- Full rate limit enforcement
- 429 responses when exceeded
- Rate limit headers in all responses
- Warning logs when approaching limits
Development Mode
In development (NODE_ENV !== production), rate limiting is disabled:
Development Behavior:
- All requests allowed
- No rate limit tracking
- Debug logs show “Rate limiting disabled”
- Faster development iteration
Rate limiting is disabled in development to make testing easier. To test rate limits locally, set NODE_ENV=production npm run dev.
Client IP Detection
For public endpoints, rate limiting requires accurate IP address detection:
export function getClientIp(request: Request): string {
// Check X-Forwarded-For (most proxies)
const forwardedFor = request.headers.get('x-forwarded-for')
if (forwardedFor) {
return forwardedFor.split(',')[0].trim()
}
// Check X-Real-IP (alternative)
const realIp = request.headers.get('x-real-ip')
if (realIp) {
return realIp.trim()
}
// Fallback
return 'unknown'
}Header Priority:
X-Forwarded-For- Standard proxy header (takes first IP)X-Real-IP- Alternative proxy headerunknown- Fallback for development
Vercel and most cloud platforms automatically set X-Forwarded-For with the true client IP.
Response Format
Success Response
When within rate limits, standard MCP response with headers:
HTTP/1.1 200 OK
Content-Type: text/event-stream
X-RateLimit-Remaining: 95
X-RateLimit-Reset: 1678901234
data: {"jsonrpc":"2.0","id":1,"result":{...}}Rate Limit Exceeded
When limit exceeded, 429 response with JSON-RPC error:
HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 42
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1678901234
{
"jsonrpc": "2.0",
"id": 1,
"error": {
"code": -32000,
"message": "Rate limit exceeded. Please retry after 42 seconds."
}
}Response Headers:
| Header | Type | Description | Example |
|---|---|---|---|
Retry-After | integer | Seconds to wait before retrying | 42 |
X-RateLimit-Remaining | integer | Requests remaining in window | 0 |
X-RateLimit-Reset | integer | Unix timestamp when limit resets | 1678901234 |
Logging and Monitoring
Rate limiting uses structured logging for observability:
Log Events
Rate Limit Exceeded:
{
"level": "warn",
"message": "Public rate limit exceeded",
"metadata": {
"ip": "192.168.1.100",
"resetAt": "2024-03-15T10:30:00.000Z",
"retryAfter": 42
}
}Approaching Limit:
{
"level": "warn",
"message": "Public rate limit approaching",
"metadata": {
"ip": "192.168.1.100",
"remaining": 8
}
}Cleanup Completed:
{
"level": "debug",
"message": "Rate limit store cleanup completed",
"metadata": {
"cleanedCount": 42
}
}Monitoring Queries
To monitor rate limiting in production:
# Count rate limit violations (last hour)
grep "Rate limit exceeded" logs.json | jq -s 'length'
# Top IPs by rate limit hits
grep "Rate limit exceeded" logs.json | jq -r '.metadata.ip' | sort | uniq -c | sort -rn
# Average retry times
grep "Rate limit exceeded" logs.json | jq -r '.metadata.retryAfter' | awk '{sum+=$1; count++} END {print sum/count}'Why In-Memory?
A²D uses in-memory storage instead of external services like Redis.
Advantages
Simplicity:
- No external dependencies
- No configuration required
- Runs anywhere (Vercel, Docker, local)
Performance:
- Sub-millisecond lookups
- No network latency
- No connection overhead
Cost:
- Zero infrastructure cost
- No managed service fees
- Scales with compute
Tradeoffs
Multi-Instance Behavior:
- Each serverless instance has its own memory
- Actual limit = configured limit × number of instances
- Example: 100/min with 5 instances = effectively 500/min across all instances
Memory Usage:
- ~100 bytes per tracked IP/org
- 1000 tracked entities = ~100 KB
- Automatic cleanup prevents unbounded growth
No Persistence:
- Rate limit state lost on restart
- Acceptable for short windows (60s)
- Fresh start after deployment
For most use cases, the multi-instance tradeoff is acceptable. If you need strict limits, consider using Redis or similar distributed storage.
Production Considerations
Scaling Behavior
Single Region Deployment:
- Vercel deploys to single region by default
- In-memory storage works well (1-3 instances typical)
- Effective limit: ~100-300 req/min (public), ~500-1500 req/min (platform)
Multi-Region Deployment:
- Each region has independent instances
- Rate limits are per-region
- Consider Redis for global rate limiting
Memory Management
Memory Profile:
// Assuming 1000 concurrent tracked entities:
// - Key: ~50 bytes (UUID or IP)
// - Entry: ~50 bytes (count + resetAt)
// - Total: 1000 × 100 bytes = 100 KB
// With cleanup every 5 minutes:
// - Max entries: ~5000 (assuming 1000/min new IPs)
// - Max memory: ~500 KB
// - Negligible for 512 MB+ serverless instancesCleanup Strategy:
- Runs every 5 minutes
- Removes expired entries
- Typical cleanup: 50-200 entries
- Memory reclaimed immediately (V8 GC)
Attack Mitigation
DDoS Protection:
- Rate limiting provides basic DDoS protection
- Consider Cloudflare or similar for advanced protection
- Monitor for sustained 429 responses
Distributed Attacks:
- IP-based limiting handles single-source attacks
- For distributed attacks, consider WAF rules
- Monitor organization-level platform limits
Code Examples
Implementing in Endpoint
import { checkPublicRateLimit, getClientIp } from '@/lib/middleware/rate-limit-memory'
import { logger } from '@/lib/logger'
export async function POST(request: Request) {
// Check rate limit
const clientIp = getClientIp(request)
const rateLimit = checkPublicRateLimit(clientIp)
// Add rate limit headers
const headers = new Headers({
'X-RateLimit-Remaining': rateLimit.remaining.toString(),
'X-RateLimit-Reset': rateLimit.resetAt.toString(),
})
// Handle rate limit exceeded
if (!rateLimit.allowed) {
headers.set('Retry-After', rateLimit.retryAfter!.toString())
logger.warn('Rate limit exceeded', { ip: clientIp })
return new Response(
JSON.stringify({
jsonrpc: '2.0',
id: null,
error: {
code: -32000,
message: `Rate limit exceeded. Please retry after ${rateLimit.retryAfter} seconds.`
}
}),
{ status: 429, headers: { ...headers, 'Content-Type': 'application/json' } }
)
}
// Process request normally
// ...
}Client-Side Retry Logic
async function callMCPWithRetry(
endpoint: string,
request: any,
maxRetries = 3
): Promise<Response> {
for (let attempt = 0; attempt < maxRetries; attempt++) {
const response = await fetch(endpoint, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(request)
})
// Success
if (response.ok) {
return response
}
// Rate limited
if (response.status === 429) {
const retryAfter = parseInt(response.headers.get('Retry-After') || '60')
const remaining = response.headers.get('X-RateLimit-Remaining')
console.log(`Rate limited. Remaining: ${remaining}. Retrying in ${retryAfter}s...`)
if (attempt < maxRetries - 1) {
await new Promise(resolve => setTimeout(resolve, retryAfter * 1000))
continue
}
}
// Other error
throw new Error(`Request failed: ${response.status}`)
}
throw new Error('Max retries exceeded')
}Monitoring Remaining Requests
const response = await fetch(endpoint, { /* ... */ })
const remaining = parseInt(response.headers.get('X-RateLimit-Remaining') || '0')
const reset = parseInt(response.headers.get('X-RateLimit-Reset') || '0')
if (remaining < 10) {
const resetDate = new Date(reset * 1000)
console.warn(`Only ${remaining} requests remaining until ${resetDate.toISOString()}`)
}Managing Organization-Specific Limits
Use these SQL queries to manage per-organization rate limits in the database.
Viewing Current Limits
View all organizations with custom limits:
SELECT
id,
name,
max_public_rate_limit,
created_at,
updated_at
FROM organizations
WHERE max_public_rate_limit IS NOT NULL
ORDER BY max_public_rate_limit DESC;View organizations using global default:
SELECT
id,
name,
created_at
FROM organizations
WHERE max_public_rate_limit IS NULL
LIMIT 10;Check a specific organization’s limit:
SELECT
name,
max_public_rate_limit,
COALESCE(max_public_rate_limit, 1000) as effective_limit
FROM organizations
WHERE id = 'org-uuid-here';Setting Organization Limits
Set limit for a single organization:
UPDATE organizations
SET max_public_rate_limit = 500
WHERE id = 'org-uuid-here';Set limits for multiple organizations by name pattern:
-- Set premium tier limits
UPDATE organizations
SET max_public_rate_limit = 1000
WHERE name LIKE '%Premium%';
-- Set basic tier limits
UPDATE organizations
SET max_public_rate_limit = 200
WHERE name LIKE '%Basic%';Set limits based on subscription tier (if you have a subscription_tier column):
UPDATE organizations
SET max_public_rate_limit = CASE subscription_tier
WHEN 'enterprise' THEN 2000
WHEN 'premium' THEN 1000
WHEN 'standard' THEN 500
WHEN 'free' THEN 100
ELSE 100
END
WHERE subscription_tier IS NOT NULL;Removing Organization Limits
Remove limit for a single organization (use global default):
UPDATE organizations
SET max_public_rate_limit = NULL
WHERE id = 'org-uuid-here';Remove all custom limits (reset to global default):
UPDATE organizations
SET max_public_rate_limit = NULL
WHERE max_public_rate_limit IS NOT NULL;Bulk Operations
Find organizations with unusually high limits:
SELECT
id,
name,
max_public_rate_limit,
created_at
FROM organizations
WHERE max_public_rate_limit > 1000
ORDER BY max_public_rate_limit DESC;Find organizations with very low limits (potential throttling):
SELECT
id,
name,
max_public_rate_limit,
created_at
FROM organizations
WHERE max_public_rate_limit < 100
ORDER BY max_public_rate_limit ASC;Count organizations by rate limit tiers:
SELECT
CASE
WHEN max_public_rate_limit IS NULL THEN 'Global Default'
WHEN max_public_rate_limit >= 1000 THEN 'High (1000+)'
WHEN max_public_rate_limit >= 500 THEN 'Medium (500-999)'
WHEN max_public_rate_limit >= 100 THEN 'Low (100-499)'
ELSE 'Very Low (<100)'
END as tier,
COUNT(*) as org_count
FROM organizations
GROUP BY tier
ORDER BY MIN(COALESCE(max_public_rate_limit, 1000)) DESC;Auditing Changes
Track when limits were last modified:
SELECT
id,
name,
max_public_rate_limit,
updated_at,
AGE(NOW(), updated_at) as time_since_update
FROM organizations
WHERE max_public_rate_limit IS NOT NULL
ORDER BY updated_at DESC
LIMIT 20;Migration Script
If you’re adding rate limits to an existing deployment, run the migration first:
-- Migration: Add max_public_rate_limit column (if not already applied)
ALTER TABLE organizations
ADD COLUMN IF NOT EXISTS max_public_rate_limit INTEGER NULL;
-- Add explanatory comment
COMMENT ON COLUMN organizations.max_public_rate_limit IS
'Per-organization rate limit ceiling for public MCP endpoints (/api/platform/[id]/mcp). The actual rate limit applied is min(max_public_rate_limit, MAX_PUBLIC_RATE_LIMIT). If NULL, uses MAX_PUBLIC_RATE_LIMIT env var. Default: NULL (use global limit)';Verification after migration:
-- Verify column exists
SELECT column_name, data_type, is_nullable
FROM information_schema.columns
WHERE table_name = 'organizations'
AND column_name = 'max_public_rate_limit';
-- Expected result:
-- column_name: max_public_rate_limit
-- data_type: integer
-- is_nullable: YESRemember: Changes to organization rate limits are cached for 5 minutes. If you update a limit and don’t see immediate effects, wait up to 5 minutes for the cache to expire, or restart the application to clear the cache.
Testing Rate Limits
Local Testing
Enable rate limiting in development:
NODE_ENV=production npm run devThen make rapid requests:
# Bash script to test rate limiting
for i in {1..105}; do
echo "Request $i"
curl -X POST http://localhost:3000/api/platform/test-server/mcp \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","clientInfo":{"name":"test","version":"1.0.0"},"capabilities":{}}}' \
-w "\nStatus: %{http_code}\n" \
-s | grep -E "(Status|error)" | head -2
# Small delay to see progression
sleep 0.5
doneExpected Output:
Request 1-100: Status: 200
Request 101+: Status: 429, error: "Rate limit exceeded..."Unit Testing
import { checkPublicRateLimit } from '@/lib/middleware/rate-limit-memory'
describe('Rate Limiting', () => {
beforeEach(() => {
process.env.NODE_ENV = 'production'
})
it('allows requests within limit', () => {
for (let i = 0; i < 100; i++) {
const result = checkPublicRateLimit('test-ip')
expect(result.allowed).toBe(true)
expect(result.remaining).toBe(99 - i)
}
})
it('blocks requests over limit', () => {
// Use up limit
for (let i = 0; i < 100; i++) {
checkPublicRateLimit('test-ip-2')
}
// Should block 101st
const result = checkPublicRateLimit('test-ip-2')
expect(result.allowed).toBe(false)
expect(result.remaining).toBe(0)
expect(result.retryAfter).toBeGreaterThan(0)
})
it('resets after window expires', async () => {
// Use up limit
for (let i = 0; i < 100; i++) {
checkPublicRateLimit('test-ip-3')
}
// Wait for window to expire
await new Promise(resolve => setTimeout(resolve, 61000))
// Should allow again
const result = checkPublicRateLimit('test-ip-3')
expect(result.allowed).toBe(true)
expect(result.remaining).toBe(99)
})
})Troubleshooting
”Rate limit exceeded” in Development
Cause: NODE_ENV=production is set in development environment.
Solution:
unset NODE_ENV
npm run devRate Limits Too Strict in Production
Cause: Multiple users behind same NAT/proxy share same IP.
Solution:
- Increase public endpoint limit
- Encourage users to use platform endpoints (org-scoped)
- Consider IP whitelist for known corporate networks
Memory Growing Over Time
Cause: Cleanup interval not running (very rare).
Solution:
- Check logs for “Rate limit store cleanup completed”
- Restart application
- If persistent, report issue
Inconsistent Rate Limiting
Cause: Multi-instance deployment with in-memory storage.
Expected Behavior:
- Each instance has its own limits
- Effective limit = configured limit × instances
- This is normal for in-memory approach
If Strict Limits Required:
- Implement Redis-based rate limiting
- Use distributed rate limiter (e.g., Upstash)
Future Enhancements
Potential improvements to rate limiting:
- Redis Support: Optional Redis backend for strict distributed limits
- Dynamic Limits: Adjust limits based on server load or user tier
- Per-Method Limits: Different limits for initialize vs. tools/call
- Burst Allowance: Allow short bursts above limit
- Rate Limit Tiers: Premium orgs get higher limits
Related Documentation
- API Reference: Endpoints - Rate limit headers and responses
- Local Development - Rate limiting in development mode
- Multi-Tenancy - Organization-scoped rate limiting
- Structured Logging - LOG_LEVEL configuration
Questions? Rate limiting issues are typically environmental. Check NODE_ENV and review logs with LOG_LEVEL=debug.