Rate Limiting

Sliding window rate limiting with per-plan limits.

Overview

CueAPI rate limits API requests using a sliding window algorithm backed by Redis. Limits are per-plan and per-API-key.

Per-plan limits

PlanRequests per minute
Free60
Pro200
Scale500

How it works

CueAPI uses a Redis sorted set for each API key:

  1. Each request adds a timestamp to the sorted set
  2. Entries older than 60 seconds are removed
  3. If the count exceeds the limit, the request is rejected with 429
  4. Rejected requests do not count against the window (preventing feedback loops)

Response headers

Every response includes rate limit headers:

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 58

When rate limited:

HTTP/1.1 429 Too Many Requests
Retry-After: 12
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0

Exempt endpoints

These endpoints are not rate limited:

  • GET /health
  • GET /status
  • POST /v1/billing/webhook (Stripe webhooks)
  • GET /auth/device (verification page)
  • GET /v1/auth/verify (magic link)

Unauthenticated requests

Requests without an API key are rate limited by IP address at 60 requests per minute.

Graceful degradation

If Redis is unavailable, rate limiting is skipped entirely. The request proceeds without rate limit checks. This prevents Redis outages from blocking all API traffic.

Usage warnings

When you approach your monthly execution limit, CueAPI adds a warning header:

X-CueAPI-Usage-Warning: approaching_limit

This appears at 80% of your plan's monthly execution limit.

Best practices