Rate Limiting Architecture

A²D implements in-memory rate limiting for MCP endpoints to prevent abuse and ensure fair usage across all organizations.

Overview

Rate limiting protects the platform from excessive requests while maintaining high performance for normal usage patterns.

Key Features:

In-Memory Storage: Fast, no external dependencies
Sliding Window Algorithm: Fair distribution of requests over time
Environment-Aware: Only enforced in production
Automatic Cleanup: Prevents memory leaks in long-running processes
Detailed Logging: Track rate limit events with structured logging

Rate Limit Tiers

Rate limits are now fully configurable via environment variables and database settings:

Endpoint	Identifier	Default Limit	Window	Configuration Method
`/api/platform/[id]/mcp`	IP Address	100 req/min	60s	Two-tier: Global max + per-org override
`/api/platform-mcp/mcp`	Organization ID	500 req/min	60s	Environment variable only

Why Different Limits?

Public Endpoints (100/min default):

Accessed by external clients
Rate limited by IP to prevent individual abuse
Lower default protects against DDoS and scraping
Configurable per-organization for premium tiers

Platform Endpoints (500/min default):

Accessed by authenticated organizations
Higher limit for legitimate high-volume usage
Organization-scoped for fair multi-tenant usage
Simpler configuration (environment variable only)

Configuration

A²D uses a flexible two-tier configuration system for rate limiting that balances infrastructure protection with organizational flexibility.

Public MCP Endpoints

Public endpoints (/api/platform/[id]/mcp) use a two-tier system:

Tier 1: Global Maximum (Environment Variable)

Sets the absolute ceiling that no organization can exceed:

MAX_PUBLIC_RATE_LIMIT=1000  # Default: 100, Recommended: 1000

Add this to your .env file or configure in your deployment platform (Vercel, AWS, etc.):

# .env or Vercel environment variables
MAX_PUBLIC_RATE_LIMIT=1000

Purpose:

Infrastructure protection: Prevents any single organization from overwhelming the system
Global policy enforcement: Ensures fair resource allocation across all tenants
Graceful fallback: Used when organization-specific limit is not set

Tier 2: Per-Organization Override (Database)

Organizations can set custom limits via the max_public_rate_limit column in the organizations table:

-- Set organization-specific limit
UPDATE organizations
SET max_public_rate_limit = 500
WHERE id = 'org-uuid-here';
 
-- Set different limits for different tiers
UPDATE organizations
SET max_public_rate_limit = 1000
WHERE name LIKE 'Premium%';
 
UPDATE organizations
SET max_public_rate_limit = 200
WHERE name LIKE 'Basic%';
 
-- View all organization-specific limits
SELECT
  name,
  max_public_rate_limit,
  created_at
FROM organizations
WHERE max_public_rate_limit IS NOT NULL
ORDER BY max_public_rate_limit DESC;
 
-- Remove override (use global MAX_PUBLIC_RATE_LIMIT)
UPDATE organizations
SET max_public_rate_limit = NULL
WHERE id = 'org-uuid-here';

Limit Calculation Logic:

The actual rate limit applied is the minimum of the two tiers:

actual_limit = min(org.max_public_rate_limit, MAX_PUBLIC_RATE_LIMIT)

If org.max_public_rate_limit is NULL, the global MAX_PUBLIC_RATE_LIMIT is used.

Example Scenarios:

Global Max	Org Override	Applied Limit	Explanation
1000	500	500	Org limit is lower, org gets 500/min
1000	1500	1000	Org limit exceeds global max, capped at 1000/min
1000	NULL	1000	No org override, uses global default of 1000/min
100	NULL	100	No org override, uses global default of 100/min
1000	250	250	Org requests lower limit, gets 250/min

Use Cases:

Premium tiers: Set higher limits (up to global max) for paying customers
Basic tiers: Set lower limits for free or trial accounts
Throttling: Temporarily reduce limits for misbehaving organizations
Testing: Set higher limits for internal testing organizations

Platform MCP Endpoint

Platform endpoints (/api/platform-mcp/mcp) use a simple environment variable configuration:

PLATFORM_RATE_LIMIT=500  # Default: 500

Add to your .env file:

# .env or Vercel environment variables
PLATFORM_RATE_LIMIT=500

Why No Per-Organization Override?

Platform endpoints are already authenticated and organization-scoped
Simpler configuration reduces complexity
All authenticated organizations are trusted equally
Can be added in the future if needed

Caching Strategy

To optimize performance and reduce database load, organization-specific rate limits are cached for 5 minutes:

Cache Behavior:

First request: Queries database for org.max_public_rate_limit
Subsequent requests: Uses cached value for 5 minutes
Cache miss: Falls back to global MAX_PUBLIC_RATE_LIMIT
Database error: Gracefully falls back to global limit

Implementation:

interface OrgRateLimitCache {
  limit: number
  fetchedAt: number
}
 
const orgLimitCache = new Map<string, OrgRateLimitCache>()
const CACHE_TTL_MS = 5 * 60 * 1000 // 5 minutes
 
async function getOrgRateLimit(orgId: string): Promise<number> {
  const now = Date.now()
  const cached = orgLimitCache.get(orgId)
 
  // Return cached value if fresh
  if (cached && now - cached.fetchedAt < CACHE_TTL_MS) {
    return cached.limit
  }
 
  // Query database for org-specific limit
  const { data, error } = await supabase
    .from('organizations')
    .select('max_public_rate_limit')
    .eq('id', orgId)
    .single()
 
  // Fallback to global max on error
  if (error) {
    return parseInt(process.env.MAX_PUBLIC_RATE_LIMIT || '100', 10)
  }
 
  // Calculate actual limit (cap at global max)
  const globalMax = parseInt(process.env.MAX_PUBLIC_RATE_LIMIT || '100', 10)
  const orgLimit = data.max_public_rate_limit || globalMax
  const actualLimit = Math.min(orgLimit, globalMax)
 
  // Cache the result
  orgLimitCache.set(orgId, { limit: actualLimit, fetchedAt: now })
 
  return actualLimit
}

Cache Performance:

Database queries: Only on cache miss (once per 5 minutes per org)
Memory overhead: ~100 bytes per cached organization
Typical memory usage: ~100 KB for 1000 organizations
Automatic cleanup: Stale entries removed with rate limit store cleanup

Configuration Examples

Example 1: Default Setup (No Custom Limits)

MAX_PUBLIC_RATE_LIMIT=100
PLATFORM_RATE_LIMIT=500

-- All organizations have NULL max_public_rate_limit
-- Result: All orgs get 100 req/min for public endpoints

Example 2: Tiered Service Levels

MAX_PUBLIC_RATE_LIMIT=1000
PLATFORM_RATE_LIMIT=500

-- Premium tier: Higher limits
UPDATE organizations
SET max_public_rate_limit = 1000
WHERE subscription_tier = 'premium';
 
-- Standard tier: Medium limits
UPDATE organizations
SET max_public_rate_limit = 500
WHERE subscription_tier = 'standard';
 
-- Free tier: Lower limits
UPDATE organizations
SET max_public_rate_limit = 100
WHERE subscription_tier = 'free';

Example 3: Gradual Rollout

# Start with conservative global max
MAX_PUBLIC_RATE_LIMIT=200
PLATFORM_RATE_LIMIT=500

-- Enable higher limits for beta testers
UPDATE organizations
SET max_public_rate_limit = 1000
WHERE id IN (SELECT organization_id FROM beta_program);

Then increase global max once stable:

# Increase global max after validation
MAX_PUBLIC_RATE_LIMIT=1000
PLATFORM_RATE_LIMIT=500

Example 4: Throttling Misbehaving Organization

-- Temporarily reduce limit for problematic org
UPDATE organizations
SET max_public_rate_limit = 50
WHERE id = 'misbehaving-org-uuid';
 
-- Monitor behavior, then restore
UPDATE organizations
SET max_public_rate_limit = NULL  -- Uses global default
WHERE id = 'misbehaving-org-uuid';

Sliding Window Algorithm

A²D uses a sliding window algorithm for rate limiting, providing more accurate and fair rate limiting than fixed windows.

How It Works

Token Bucket Model:

Algorithm Steps:

First Request: Creates new entry with count=1, resetAt=now+60s
Subsequent Requests: Increments count if resetAt > now
Window Expired: Resets count to 1, new resetAt=now+60s
Limit Exceeded: Returns 429 with retry information

Example Timeline

Time    Request  Count  Remaining  Status
-----   -------  -----  ---------  ------
00:00   #1       1      99         ✓ Allowed
00:01   #2       2      98         ✓ Allowed
00:30   #50      50     50         ✓ Allowed
00:59   #100     100    0          ✓ Allowed
00:59   #101     100    0          ✗ 429 (retry after 1s)
01:00   #102     1      99         ✓ Allowed (new window)

Enforcement Examples with Different Configurations

Scenario 1: Organization with No Override (Uses Global Max)

-- Organization Alpha has no custom limit
SELECT max_public_rate_limit FROM organizations WHERE name = 'Org Alpha';
-- Result: NULL

MAX_PUBLIC_RATE_LIMIT=1000

Result: Org Alpha gets 1000 requests/minute (global default).

Scenario 2: Organization with Lower Override

-- Organization Beta sets a conservative limit
UPDATE organizations SET max_public_rate_limit = 500 WHERE name = 'Org Beta';

MAX_PUBLIC_RATE_LIMIT=1000

Result: Org Beta gets 500 requests/minute (org override is lower than global max).

Scenario 3: Organization Tries to Exceed Global Max

-- Organization Gamma tries to set a very high limit
UPDATE organizations SET max_public_rate_limit = 5000 WHERE name = 'Org Gamma';

MAX_PUBLIC_RATE_LIMIT=1000

Result: Org Gamma gets 1000 requests/minute (capped at global max, not 5000).

Scenario 4: Different Limits for Different Tiers

-- Set up tiered limits
UPDATE organizations SET max_public_rate_limit = 100 WHERE subscription_tier = 'free';
UPDATE organizations SET max_public_rate_limit = 500 WHERE subscription_tier = 'standard';
UPDATE organizations SET max_public_rate_limit = 1000 WHERE subscription_tier = 'premium';

MAX_PUBLIC_RATE_LIMIT=1000

Result:

Free tier orgs: 100 requests/minute
Standard tier orgs: 500 requests/minute
Premium tier orgs: 1000 requests/minute

Scenario 5: Platform Endpoint (Simpler Configuration)

PLATFORM_RATE_LIMIT=500

Result: All organizations get 500 requests/minute for platform MCP endpoint, regardless of database settings.

Implementation Details

In-Memory Storage

Rate limiting uses a simple in-memory Map for tracking requests:

interface RateLimitEntry {
  count: number      // Number of requests in current window
  resetAt: number    // Timestamp when count resets
}
 
const rateLimitStore = new Map<string, RateLimitEntry>()

Key Format:

Public: public:{ip} (e.g., public:192.168.1.100)
Platform: platform:{orgId} (e.g., platform:11111111-1111-1111-1111-111111111111)

Core Logic

function checkRateLimit(
  key: string,
  limit: number,
  windowMs: number = 60000
): RateLimitResult {
  const now = Date.now()
  const entry = rateLimitStore.get(key)
 
  // No entry or expired - start new window
  if (!entry || entry.resetAt < now) {
    rateLimitStore.set(key, { count: 1, resetAt: now + windowMs })
    return { allowed: true, remaining: limit - 1, resetAt: now + windowMs }
  }
 
  // Within limit - increment
  if (entry.count < limit) {
    entry.count++
    return { allowed: true, remaining: limit - entry.count, resetAt: entry.resetAt }
  }
 
  // Limit exceeded
  const retryAfter = Math.ceil((entry.resetAt - now) / 1000)
  return { allowed: false, remaining: 0, resetAt: entry.resetAt, retryAfter }
}

Automatic Cleanup

To prevent memory leaks, expired entries are cleaned up every 5 minutes:

setInterval(() => {
  const now = Date.now()
  rateLimitStore.forEach((entry, key) => {
    if (entry.resetAt < now) {
      rateLimitStore.delete(key)
    }
  })
}, 5 * 60 * 1000)

Cleanup Strategy:

Runs every 5 minutes in background
Removes entries where resetAt < now
Logs cleanup count at debug level
Minimal performance impact

Environment Handling

Production Mode

When NODE_ENV=production, rate limiting is fully enforced:

export function checkPublicRateLimit(ip: string): RateLimitResult {
  if (process.env.NODE_ENV !== 'production') {
    return { allowed: true, remaining: 100, resetAt: Date.now() + 60000 }
  }
 
  const key = `public:${ip}`
  return checkRateLimit(key, 100, 60000)
}

Production Behavior:

Full rate limit enforcement
429 responses when exceeded
Rate limit headers in all responses
Warning logs when approaching limits

Development Mode

In development (NODE_ENV !== production), rate limiting is disabled:

Development Behavior:

All requests allowed
No rate limit tracking
Debug logs show “Rate limiting disabled”
Faster development iteration

Rate limiting is disabled in development to make testing easier. To test rate limits locally, set NODE_ENV=production npm run dev.

Client IP Detection

For public endpoints, rate limiting requires accurate IP address detection:

export function getClientIp(request: Request): string {
  // Check X-Forwarded-For (most proxies)
  const forwardedFor = request.headers.get('x-forwarded-for')
  if (forwardedFor) {
    return forwardedFor.split(',')[0].trim()
  }
 
  // Check X-Real-IP (alternative)
  const realIp = request.headers.get('x-real-ip')
  if (realIp) {
    return realIp.trim()
  }
 
  // Fallback
  return 'unknown'
}

Header Priority:

X-Forwarded-For - Standard proxy header (takes first IP)
X-Real-IP - Alternative proxy header
unknown - Fallback for development

Vercel and most cloud platforms automatically set X-Forwarded-For with the true client IP.

Response Format

Success Response

When within rate limits, standard MCP response with headers:

HTTP/1.1 200 OK
Content-Type: text/event-stream
X-RateLimit-Remaining: 95
X-RateLimit-Reset: 1678901234
 
data: {"jsonrpc":"2.0","id":1,"result":{...}}

Rate Limit Exceeded

When limit exceeded, 429 response with JSON-RPC error:

HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 42
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1678901234
 
{
  "jsonrpc": "2.0",
  "id": 1,
  "error": {
    "code": -32000,
    "message": "Rate limit exceeded. Please retry after 42 seconds."
  }
}

Response Headers:

Header	Type	Description	Example
`Retry-After`	integer	Seconds to wait before retrying	`42`
`X-RateLimit-Remaining`	integer	Requests remaining in window	`0`
`X-RateLimit-Reset`	integer	Unix timestamp when limit resets	`1678901234`

Logging and Monitoring

Rate limiting uses structured logging for observability:

Log Events

Rate Limit Exceeded:

{
  "level": "warn",
  "message": "Public rate limit exceeded",
  "metadata": {
    "ip": "192.168.1.100",
    "resetAt": "2024-03-15T10:30:00.000Z",
    "retryAfter": 42
  }
}

Approaching Limit:

{
  "level": "warn",
  "message": "Public rate limit approaching",
  "metadata": {
    "ip": "192.168.1.100",
    "remaining": 8
  }
}

Cleanup Completed:

{
  "level": "debug",
  "message": "Rate limit store cleanup completed",
  "metadata": {
    "cleanedCount": 42
  }
}

Monitoring Queries

To monitor rate limiting in production:

# Count rate limit violations (last hour)
grep "Rate limit exceeded" logs.json | jq -s 'length'
 
# Top IPs by rate limit hits
grep "Rate limit exceeded" logs.json | jq -r '.metadata.ip' | sort | uniq -c | sort -rn
 
# Average retry times
grep "Rate limit exceeded" logs.json | jq -r '.metadata.retryAfter' | awk '{sum+=$1; count++} END {print sum/count}'

Why In-Memory?

A²D uses in-memory storage instead of external services like Redis.

Advantages

Simplicity:

No external dependencies
No configuration required
Runs anywhere (Vercel, Docker, local)

Performance:

Sub-millisecond lookups
No network latency
No connection overhead

Cost:

Zero infrastructure cost
No managed service fees
Scales with compute

Tradeoffs

Multi-Instance Behavior:

Each serverless instance has its own memory
Actual limit = configured limit × number of instances
Example: 100/min with 5 instances = effectively 500/min across all instances

Memory Usage:

~100 bytes per tracked IP/org
1000 tracked entities = ~100 KB
Automatic cleanup prevents unbounded growth

No Persistence:

Rate limit state lost on restart
Acceptable for short windows (60s)
Fresh start after deployment

For most use cases, the multi-instance tradeoff is acceptable. If you need strict limits, consider using Redis or similar distributed storage.

Production Considerations

Scaling Behavior

Single Region Deployment:

Vercel deploys to single region by default
In-memory storage works well (1-3 instances typical)
Effective limit: ~100-300 req/min (public), ~500-1500 req/min (platform)

Multi-Region Deployment:

Each region has independent instances
Rate limits are per-region
Consider Redis for global rate limiting

Memory Management

Memory Profile:

// Assuming 1000 concurrent tracked entities:
// - Key: ~50 bytes (UUID or IP)
// - Entry: ~50 bytes (count + resetAt)
// - Total: 1000 × 100 bytes = 100 KB
 
// With cleanup every 5 minutes:
// - Max entries: ~5000 (assuming 1000/min new IPs)
// - Max memory: ~500 KB
// - Negligible for 512 MB+ serverless instances

Cleanup Strategy:

Runs every 5 minutes
Removes expired entries
Typical cleanup: 50-200 entries
Memory reclaimed immediately (V8 GC)

Attack Mitigation

DDoS Protection:

Rate limiting provides basic DDoS protection
Consider Cloudflare or similar for advanced protection
Monitor for sustained 429 responses

Distributed Attacks:

IP-based limiting handles single-source attacks
For distributed attacks, consider WAF rules
Monitor organization-level platform limits

Code Examples

Implementing in Endpoint

import { checkPublicRateLimit, getClientIp } from '@/lib/middleware/rate-limit-memory'
import { logger } from '@/lib/logger'
 
export async function POST(request: Request) {
  // Check rate limit
  const clientIp = getClientIp(request)
  const rateLimit = checkPublicRateLimit(clientIp)
 
  // Add rate limit headers
  const headers = new Headers({
    'X-RateLimit-Remaining': rateLimit.remaining.toString(),
    'X-RateLimit-Reset': rateLimit.resetAt.toString(),
  })
 
  // Handle rate limit exceeded
  if (!rateLimit.allowed) {
    headers.set('Retry-After', rateLimit.retryAfter!.toString())
 
    logger.warn('Rate limit exceeded', { ip: clientIp })
 
    return new Response(
      JSON.stringify({
        jsonrpc: '2.0',
        id: null,
        error: {
          code: -32000,
          message: `Rate limit exceeded. Please retry after ${rateLimit.retryAfter} seconds.`
        }
      }),
      { status: 429, headers: { ...headers, 'Content-Type': 'application/json' } }
    )
  }
 
  // Process request normally
  // ...
}

Client-Side Retry Logic

async function callMCPWithRetry(
  endpoint: string,
  request: any,
  maxRetries = 3
): Promise<Response> {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    const response = await fetch(endpoint, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify(request)
    })
 
    // Success
    if (response.ok) {
      return response
    }
 
    // Rate limited
    if (response.status === 429) {
      const retryAfter = parseInt(response.headers.get('Retry-After') || '60')
      const remaining = response.headers.get('X-RateLimit-Remaining')
 
      console.log(`Rate limited. Remaining: ${remaining}. Retrying in ${retryAfter}s...`)
 
      if (attempt < maxRetries - 1) {
        await new Promise(resolve => setTimeout(resolve, retryAfter * 1000))
        continue
      }
    }
 
    // Other error
    throw new Error(`Request failed: ${response.status}`)
  }
 
  throw new Error('Max retries exceeded')
}

Monitoring Remaining Requests

const response = await fetch(endpoint, { /* ... */ })
 
const remaining = parseInt(response.headers.get('X-RateLimit-Remaining') || '0')
const reset = parseInt(response.headers.get('X-RateLimit-Reset') || '0')
 
if (remaining < 10) {
  const resetDate = new Date(reset * 1000)
  console.warn(`Only ${remaining} requests remaining until ${resetDate.toISOString()}`)
}

Managing Organization-Specific Limits

Use these SQL queries to manage per-organization rate limits in the database.

Viewing Current Limits

View all organizations with custom limits:

SELECT
  id,
  name,
  max_public_rate_limit,
  created_at,
  updated_at
FROM organizations
WHERE max_public_rate_limit IS NOT NULL
ORDER BY max_public_rate_limit DESC;

View organizations using global default:

SELECT
  id,
  name,
  created_at
FROM organizations
WHERE max_public_rate_limit IS NULL
LIMIT 10;

Check a specific organization’s limit:

SELECT
  name,
  max_public_rate_limit,
  COALESCE(max_public_rate_limit, 1000) as effective_limit
FROM organizations
WHERE id = 'org-uuid-here';

Setting Organization Limits

Set limit for a single organization:

UPDATE organizations
SET max_public_rate_limit = 500
WHERE id = 'org-uuid-here';

Set limits for multiple organizations by name pattern:

-- Set premium tier limits
UPDATE organizations
SET max_public_rate_limit = 1000
WHERE name LIKE '%Premium%';
 
-- Set basic tier limits
UPDATE organizations
SET max_public_rate_limit = 200
WHERE name LIKE '%Basic%';

Set limits based on subscription tier (if you have a subscription_tier column):

UPDATE organizations
SET max_public_rate_limit = CASE subscription_tier
  WHEN 'enterprise' THEN 2000
  WHEN 'premium' THEN 1000
  WHEN 'standard' THEN 500
  WHEN 'free' THEN 100
  ELSE 100
END
WHERE subscription_tier IS NOT NULL;

Removing Organization Limits

Remove limit for a single organization (use global default):

UPDATE organizations
SET max_public_rate_limit = NULL
WHERE id = 'org-uuid-here';

Remove all custom limits (reset to global default):

UPDATE organizations
SET max_public_rate_limit = NULL
WHERE max_public_rate_limit IS NOT NULL;

Bulk Operations

Find organizations with unusually high limits:

SELECT
  id,
  name,
  max_public_rate_limit,
  created_at
FROM organizations
WHERE max_public_rate_limit > 1000
ORDER BY max_public_rate_limit DESC;

Find organizations with very low limits (potential throttling):

SELECT
  id,
  name,
  max_public_rate_limit,
  created_at
FROM organizations
WHERE max_public_rate_limit < 100
ORDER BY max_public_rate_limit ASC;

Count organizations by rate limit tiers:

SELECT
  CASE
    WHEN max_public_rate_limit IS NULL THEN 'Global Default'
    WHEN max_public_rate_limit >= 1000 THEN 'High (1000+)'
    WHEN max_public_rate_limit >= 500 THEN 'Medium (500-999)'
    WHEN max_public_rate_limit >= 100 THEN 'Low (100-499)'
    ELSE 'Very Low (<100)'
  END as tier,
  COUNT(*) as org_count
FROM organizations
GROUP BY tier
ORDER BY MIN(COALESCE(max_public_rate_limit, 1000)) DESC;

Auditing Changes

Track when limits were last modified:

SELECT
  id,
  name,
  max_public_rate_limit,
  updated_at,
  AGE(NOW(), updated_at) as time_since_update
FROM organizations
WHERE max_public_rate_limit IS NOT NULL
ORDER BY updated_at DESC
LIMIT 20;

Migration Script

If you’re adding rate limits to an existing deployment, run the migration first:

-- Migration: Add max_public_rate_limit column (if not already applied)
ALTER TABLE organizations
ADD COLUMN IF NOT EXISTS max_public_rate_limit INTEGER NULL;
 
-- Add explanatory comment
COMMENT ON COLUMN organizations.max_public_rate_limit IS
'Per-organization rate limit ceiling for public MCP endpoints (/api/platform/[id]/mcp). The actual rate limit applied is min(max_public_rate_limit, MAX_PUBLIC_RATE_LIMIT). If NULL, uses MAX_PUBLIC_RATE_LIMIT env var. Default: NULL (use global limit)';

Verification after migration:

-- Verify column exists
SELECT column_name, data_type, is_nullable
FROM information_schema.columns
WHERE table_name = 'organizations'
  AND column_name = 'max_public_rate_limit';
 
-- Expected result:
-- column_name: max_public_rate_limit
-- data_type: integer
-- is_nullable: YES

Remember: Changes to organization rate limits are cached for 5 minutes. If you update a limit and don’t see immediate effects, wait up to 5 minutes for the cache to expire, or restart the application to clear the cache.

Testing Rate Limits

Local Testing

Enable rate limiting in development:

NODE_ENV=production npm run dev

Then make rapid requests:

# Bash script to test rate limiting
for i in {1..105}; do
  echo "Request $i"
  curl -X POST http://localhost:3000/api/platform/test-server/mcp \
    -H "Content-Type: application/json" \
    -d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","clientInfo":{"name":"test","version":"1.0.0"},"capabilities":{}}}' \
    -w "\nStatus: %{http_code}\n" \
    -s | grep -E "(Status|error)" | head -2
 
  # Small delay to see progression
  sleep 0.5
done

Expected Output:

Request 1-100: Status: 200
Request 101+: Status: 429, error: "Rate limit exceeded..."

Unit Testing

import { checkPublicRateLimit } from '@/lib/middleware/rate-limit-memory'
 
describe('Rate Limiting', () => {
  beforeEach(() => {
    process.env.NODE_ENV = 'production'
  })
 
  it('allows requests within limit', () => {
    for (let i = 0; i < 100; i++) {
      const result = checkPublicRateLimit('test-ip')
      expect(result.allowed).toBe(true)
      expect(result.remaining).toBe(99 - i)
    }
  })
 
  it('blocks requests over limit', () => {
    // Use up limit
    for (let i = 0; i < 100; i++) {
      checkPublicRateLimit('test-ip-2')
    }
 
    // Should block 101st
    const result = checkPublicRateLimit('test-ip-2')
    expect(result.allowed).toBe(false)
    expect(result.remaining).toBe(0)
    expect(result.retryAfter).toBeGreaterThan(0)
  })
 
  it('resets after window expires', async () => {
    // Use up limit
    for (let i = 0; i < 100; i++) {
      checkPublicRateLimit('test-ip-3')
    }
 
    // Wait for window to expire
    await new Promise(resolve => setTimeout(resolve, 61000))
 
    // Should allow again
    const result = checkPublicRateLimit('test-ip-3')
    expect(result.allowed).toBe(true)
    expect(result.remaining).toBe(99)
  })
})

Troubleshooting

”Rate limit exceeded” in Development

Cause: NODE_ENV=production is set in development environment.

Solution:

unset NODE_ENV
npm run dev

Rate Limits Too Strict in Production

Cause: Multiple users behind same NAT/proxy share same IP.

Solution:

Increase public endpoint limit
Encourage users to use platform endpoints (org-scoped)
Consider IP whitelist for known corporate networks

Memory Growing Over Time

Cause: Cleanup interval not running (very rare).

Solution:

Check logs for “Rate limit store cleanup completed”
Restart application
If persistent, report issue

Inconsistent Rate Limiting

Cause: Multi-instance deployment with in-memory storage.

Expected Behavior:

Each instance has its own limits
Effective limit = configured limit × instances
This is normal for in-memory approach

If Strict Limits Required:

Implement Redis-based rate limiting
Use distributed rate limiter (e.g., Upstash)

Future Enhancements

Potential improvements to rate limiting:

Redis Support: Optional Redis backend for strict distributed limits
Dynamic Limits: Adjust limits based on server load or user tier
Per-Method Limits: Different limits for initialize vs. tools/call
Burst Allowance: Allow short bursts above limit
Rate Limit Tiers: Premium orgs get higher limits

API Reference: Endpoints - Rate limit headers and responses
Local Development - Rate limiting in development mode
Multi-Tenancy - Organization-scoped rate limiting
Structured Logging - LOG_LEVEL configuration

Questions? Rate limiting issues are typically environmental. Check NODE_ENV and review logs with LOG_LEVEL=debug.

Multi-Tenancy Overview

Rate Limiting Architecture

Overview

Rate Limit Tiers

Why Different Limits?

Configuration

Public MCP Endpoints

Tier 1: Global Maximum (Environment Variable)

Tier 2: Per-Organization Override (Database)

Platform MCP Endpoint

Caching Strategy

Configuration Examples

Sliding Window Algorithm

How It Works

Example Timeline

Enforcement Examples with Different Configurations

Implementation Details

In-Memory Storage

Core Logic

Automatic Cleanup

Environment Handling

Production Mode

Development Mode

Client IP Detection

Response Format

Success Response

Rate Limit Exceeded

Logging and Monitoring

Log Events

Monitoring Queries

Why In-Memory?

Advantages

Tradeoffs

Production Considerations

Scaling Behavior

Memory Management

Attack Mitigation

Code Examples

Implementing in Endpoint

Client-Side Retry Logic

Monitoring Remaining Requests

Managing Organization-Specific Limits

Viewing Current Limits

Setting Organization Limits

Removing Organization Limits

Bulk Operations

Auditing Changes

Migration Script

Testing Rate Limits

Local Testing

Unit Testing

Troubleshooting

”Rate limit exceeded” in Development

Rate Limits Too Strict in Production

Memory Growing Over Time

Inconsistent Rate Limiting

Future Enhancements

Related Documentation