Architecture

Grand Central’s MCP implementation sits between AI agents and your banking APIs, handling protocol translation, authentication, rate limiting, and audit logging.

System overview

The MCP server layer accepts JSON-RPC 2.0 requests from AI agents on the /mcp endpoint. It handles tool discovery (tools/list) and invocation (tools/call), translating between MCP protocol and your backend REST APIs. Tool definitions are generated from OpenAPI specifications you’ve uploaded to Grand Central’s API catalog. Authentication validates every request using subscription keys (API keys) passed in the Ocp-Apim-Subscription-Key header. Optionally, JWT tokens in the Authorization header provide user context for operations that need to know which end-user the agent is acting on behalf of. The authentication layer checks token signatures, expiration, and audience claims before allowing requests through. Rate limiting enforces per-subscription quotas to prevent abuse and ensure fair resource allocation. If an agent exceeds 100 requests per minute (typical limit), subsequent requests return HTTP 429 with retry-after guidance. The system tracks usage in-memory with periodic persistence for quota reporting. Audit logging captures every tool invocation with timestamps, subscription keys, tool names, parameters (optionally redacted for sensitive data), and response status codes. Logs flow to your compliance and security systems for regulatory requirements, incident investigation, and usage analytics.

Authentication flow

When an AI agent calls Grand Central’s MCP endpoint, the system validates credentials, checks quotas, invokes backend APIs, and logs the complete interaction: API key validation happens first. Grand Central checks that the subscription key in the Ocp-Apim-Subscription-Key header exists, hasn’t been revoked, and belongs to a subscription with access to the requested tool. If validation fails, the request returns HTTP 401 immediately without touching backend systems. JWT validation runs when provided. If the request includes an Authorization: Bearer <token> header, Grand Central verifies the token signature against your configured identity provider, checks expiration and audience claims, and extracts user identity. This user context flows through to backend APIs for operations like getMyAccountBalance that need to know which user the agent represents. JWT validation is optional - most tools work with subscription keys alone. Rate limiting checks quota before backend calls. Grand Central maintains request counters per subscription key and time window (typically per minute). If the agent has exceeded its quota, the system returns HTTP 429 with X-RateLimit-Limit and X-RateLimit-Window headers. The backend API never sees the request, protecting your systems from overload. Backend API calls use standard REST. Grand Central translates MCP tool invocations into HTTP requests to your backend APIs - GET, POST, PUT, DELETE with JSON payloads. It maps tool parameters to API parameters (path, query, body) based on your OpenAPI specification, calls the backend, and waits for a response. If the backend times out or returns errors, Grand Central translates those into JSON-RPC error responses the agent can handle.

Request flow details

Tool discovery happens when agents start or need to refresh their understanding of available capabilities. The agent sends a tools/list request to Grand Central, which queries its tool registry - a database of operations accessible to your subscription based on access policies. The registry includes tool names, descriptions, parameter schemas, and authentication requirements generated from OpenAPI specifications. Grand Central caches these definitions aggressively since they change infrequently, returning discovery responses in under 200ms typically. Tool invocation executes actual backend operations. The agent provides tool name and arguments in a tools/call request. Grand Central extracts parameters, validates them against the tool’s input schema, checks rate limits, and constructs an HTTP request to your backend API. The backend processes the request and returns data, which Grand Central wraps in JSON-RPC format and sends back to the agent. Meanwhile, audit logging captures the complete interaction asynchronously to avoid blocking the response.

Security architecture

Grand Central implements defense in depth with multiple security layers that protect both your backend APIs and comply with financial services regulations. Layer 1: API key validation is mandatory for every request. Subscription keys identify which application is making requests, enable rate limiting per client, and determine which tools the client can access. Keys get validated against Grand Central’s subscription database before any other processing happens. Invalid or revoked keys receive HTTP 401 responses immediately. Layer 2: Optional JWT authentication adds user context for operations that need to know who the agent represents. When a JWT token is provided in the Authorization header, Grand Central validates its signature, checks expiration, and extracts user claims. This user identity propagates to backend APIs so they can enforce user-scoped authorization - ensuring agents can only access data the user has permission to see. Layer 3: Rate limiting prevents abuse and ensures fair resource allocation across clients. Each subscription key has quotas (like 100 requests per minute) enforced before backend calls happen. When quotas are exceeded, agents receive HTTP 429 responses with retry-after guidance. This protects your backend systems from accidental or malicious overload. Layer 4: Audit logging captures every tool invocation for compliance, security investigations, and usage analytics. Logs include timestamps, subscription keys, tool names, parameters (optionally redacted), response codes, and latency. They flow to your SIEM and compliance systems asynchronously, providing complete traceability without impacting request performance.

Performance characteristics

Tool discovery latency averages under 200ms because Grand Central caches tool definitions aggressively. Since tools change infrequently, the system serves discovery requests from memory most of the time. Cache invalidation happens automatically when tool configurations change. Tool invocation latency depends entirely on your backend API performance - Grand Central adds minimal overhead (authentication ~50ms, rate limiting ~10ms, audit logging ~100ms async). If your backend takes 1.5 seconds to query a database and return customer data, the total request time will be around 1.7 seconds. Backend timeouts typically occur at 30 seconds, after which Grand Central returns timeout errors to agents. Authentication validation runs in under 50ms for API keys (database lookup) and under 100ms for JWT tokens (signature verification + claims extraction). These operations are highly optimized since they run on every request. Rate limit checks take under 10ms because quotas are tracked in-memory with periodic persistence. The system uses sliding window counters to accurately enforce limits without database calls on every request. Audit logging runs asynchronously to avoid blocking responses. Logs write to message queues with ~100ms latency, then flow to your audit storage and SIEM systems. Even if audit infrastructure is temporarily unavailable, MCP requests continue processing - audit writes retry automatically.

Scalability and deployment

Grand Central’s MCP infrastructure runs in multi-region deployments with automatic failover, targeting 99.5% uptime SLA. When one region experiences issues, traffic routes to healthy regions transparently. The system auto-scales based on request volume - adding capacity during peak hours, scaling down during quiet periods to optimize costs. Deployments follow standard environment promotion: Development → Staging → Production. When you request new tools or configuration changes, automated pipelines test in development first, validate in staging with production-like load, then promote to production during maintenance windows. MCP server configuration (which tools are exposed, rate limits, authentication requirements) is version-controlled and deployed like infrastructure as code. Observability happens through standard observability showing tool invocations, success rates, latency percentiles, and quota consumption. Audit logs provide complete request/response trails for compliance teams. You can pull detailed reports on usage trends, cost allocation by subscription, and error patterns for troubleshooting.

Next steps

Tool Discovery - How agents discover available operations and their schemas
Tool Invocation - Request/response flow for executing backend operations
Authentication - Subscription keys, JWT tokens, and user context
Connecting Agents - Configure Claude Desktop, Copilot Studio, or custom clients

Overview

Architecture and technology

Agent Development Lifecycle

Getting started

Starter kits

BB AI SDK

MCP support

CI/CD workflows

System overview

Authentication flow

Request flow details

Security architecture

Performance characteristics

Scalability and deployment

Next steps

Overview

Architecture and technology

Agent Development Lifecycle

Getting started

Starter kits

BB AI SDK

MCP support

CI/CD workflows

​System overview

​Authentication flow

​Request flow details

​Security architecture

​Performance characteristics

​Scalability and deployment

​Next steps

System overview

Authentication flow

Request flow details

Security architecture

Performance characteristics

Scalability and deployment

Next steps