How rate limiting works
Subscription-based limits control how many requests your subscription key can make within a time window. For example, with a limit of 100 requests per minute, Grand Central increments a counter with each request. When the counter exceeds 100 within that minute, subsequent requests return HTTP 429 (Too Many Requests) until the window resets. Rate limits are configured through APIM policies with therate-limit element. The platform team sets these limits based on your subscription tier, use case requirements, and backend API capacity. Limits apply globally to all MCP endpoint calls for your subscription.
Request processing flow
When your agent makes an MCP request, Grand Central validates the subscription key, checks the rate limit counter, processes the request if within limits, and returns response headers showing the limit and time window. If you’ve exceeded your limit, you receive HTTP 429 immediately without touching backend APIs - protecting your systems from overload. Rate limits are configured through APIM policies by the platform team. If legitimate usage patterns exceed your current limits, they can adjust the policy configuration to increase limits or extend time windows.Response headers
Grand Central includes rate limit information in response headers when configured in APIM policies:Handling rate limit errors
When you exceed your rate limit, Grand Central returns HTTP 429:Rate limit behavior
Automatic renewal resets the counter after the time window expires. If you have a 60-second window, the counter resets to zero after 60 seconds, allowing another full cycle of requests. Burst allowances depend on the APIM policy configuration. The standardrate-limit policy enforces limits strictly per window, but policies can be configured with different renewal periods and call limits to match your use case.
429 responses are immediate and don’t consume backend resources. When you hit the rate limit, APIM returns the error before reaching your backend APIs, protecting them from overload.
Monitoring and optimization
Track rate limit headers when they’re included in responses. X-RateLimit-Limit and X-RateLimit-Window tell you the configured limits. Your agent should log these values to understand usage patterns. When you receive HTTP 429 responses, you’ve hit the rate limit and need to back off. Monitor 429 error rates to identify if your usage patterns exceed configured limits. If you’re frequently hitting rate limits during normal operations, the platform team can adjust the APIM policy to increase limits or extend time windows. Optimize request patterns to stay within limits. Cache tool discovery results instead of callingtools/list repeatedly. Batch operations when possible. Implement client-side throttling to spread requests evenly across time windows rather than bursting.
Best practices
Cache tool discovery results for 5 to 10 minutes instead of callingtools/list on every request. Tool definitions change infrequently, so aggressive caching reduces request volume.
Implement exponential backoff when hitting rate limits - wait progressively longer between retries (1s, 2s, 4s) rather than hammering the endpoint every second.
Monitor 429 errors to understand if your usage patterns exceed configured limits. Frequent rate limit errors indicate you need either higher limits or better client-side throttling.
Handle errors gracefully with clear user feedback like “I’m processing many requests right now. Please wait 30 seconds” instead of cryptic error messages.
Next steps
- Connecting Agents - Configure Claude Desktop, Copilot Studio, or custom MCP clients
- Monitoring - Track usage trends, quota consumption, and performance metrics