Skip to main content
Grand Central enforces rate limits to prevent abuse and ensure fair resource allocation across AI agents. Limits are tied to your subscription key and control how many requests can be made within a time window.

How rate limiting works

Subscription-based limits control how many requests your subscription key can make within a time window. For example, with a limit of 100 requests per minute, Grand Central increments a counter with each request. When the counter exceeds 100 within that minute, subsequent requests return HTTP 429 (Too Many Requests) until the window resets. Rate limits are configured through APIM policies with the rate-limit element. The platform team sets these limits based on your subscription tier, use case requirements, and backend API capacity. Limits apply globally to all MCP endpoint calls for your subscription.

Request processing flow

When your agent makes an MCP request, Grand Central validates the subscription key, checks the rate limit counter, processes the request if within limits, and returns response headers showing the limit and time window. If you’ve exceeded your limit, you receive HTTP 429 immediately without touching backend APIs - protecting your systems from overload. Rate limits are configured through APIM policies by the platform team. If legitimate usage patterns exceed your current limits, they can adjust the policy configuration to increase limits or extend time windows.

Response headers

Grand Central includes rate limit information in response headers when configured in APIM policies:
HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Window: 60
X-RateLimit-Limit shows the maximum requests allowed in the time window (100 requests in this example). X-RateLimit-Window indicates the time window in seconds (60 = per minute). These headers are set by custom APIM policies and help agents implement client-side throttling.

Handling rate limit errors

When you exceed your rate limit, Grand Central returns HTTP 429:
HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 100
X-RateLimit-Window: 60
Your agent should detect 429 status codes, wait before retrying, and implement exponential backoff - progressively increasing wait times between retry attempts. Simple retry loops that hammer the endpoint every second will keep hitting the rate limit. Here’s a practical implementation with exponential backoff:
async function invokeTool(name, args, maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      const response = await mcpClient.call(name, args);
      return response;
    } catch (error) {
      if (error.status === 429) {
        // Rate limit exceeded - exponential backoff
        const waitSeconds = Math.pow(2, attempt); // 1s, 2s, 4s
        console.log(`Rate limited. Retrying in ${waitSeconds}s`);
        await sleep(waitSeconds * 1000);
      } else {
        throw error; // Not a rate limit error, fail fast
      }
    }
  }
  throw new Error('Max retries exceeded - rate limit exhausted');
}
This approach waits 1 second after the first 429, 2 seconds after the second, 4 seconds after the third. If all retries fail, it gives up rather than looping forever.

Rate limit behavior

Automatic renewal resets the counter after the time window expires. If you have a 60-second window, the counter resets to zero after 60 seconds, allowing another full cycle of requests. Burst allowances depend on the APIM policy configuration. The standard rate-limit policy enforces limits strictly per window, but policies can be configured with different renewal periods and call limits to match your use case. 429 responses are immediate and don’t consume backend resources. When you hit the rate limit, APIM returns the error before reaching your backend APIs, protecting them from overload.

Monitoring and optimization

Track rate limit headers when they’re included in responses. X-RateLimit-Limit and X-RateLimit-Window tell you the configured limits. Your agent should log these values to understand usage patterns. When you receive HTTP 429 responses, you’ve hit the rate limit and need to back off. Monitor 429 error rates to identify if your usage patterns exceed configured limits. If you’re frequently hitting rate limits during normal operations, the platform team can adjust the APIM policy to increase limits or extend time windows. Optimize request patterns to stay within limits. Cache tool discovery results instead of calling tools/list repeatedly. Batch operations when possible. Implement client-side throttling to spread requests evenly across time windows rather than bursting.

Best practices

Cache tool discovery results for 5 to 10 minutes instead of calling tools/list on every request. Tool definitions change infrequently, so aggressive caching reduces request volume. Implement exponential backoff when hitting rate limits - wait progressively longer between retries (1s, 2s, 4s) rather than hammering the endpoint every second. Monitor 429 errors to understand if your usage patterns exceed configured limits. Frequent rate limit errors indicate you need either higher limits or better client-side throttling. Handle errors gracefully with clear user feedback like “I’m processing many requests right now. Please wait 30 seconds” instead of cryptic error messages.

Next steps

  • Connecting Agents - Configure Claude Desktop, Copilot Studio, or custom MCP clients
  • Monitoring - Track usage trends, quota consumption, and performance metrics