Debugging AI agents in production is challenging - you can’t print() your way through distributed systems. When your agent makes unexpected decisions, you need to trace every LLM call, tool invocation, and reasoning step to understand what happened.The Observability module gives you full visibility into your AI agents using OpenTelemetry. It exports traces to Langfuse (default) or your existing observability stack (Datadog, Grafana, custom OTLP), enabling you to:
Debug issues: See exactly what your agent did and why
Monitor performance: Track latency, token usage, and costs
Audit decisions: Maintain compliance with full trace history
Built on OpenTelemetry standards. Switch backends anytime without changing your code. Your traces are portable and vendor-independent.
The SDK supports multiple observability backends. Langfuse is the default and requires no backend specification. For other platforms, specify the backend when initializing.
Langfuse (default)
Datadog
Grafana
Custom otlp
Default backend - Optimized for AI observability with prompt management, cost tracking, and user feedback.Setup:
Set environment variables:
.env
Copy
Ask AI
LANGFUSE_PUBLIC_KEY=pk-lf-xxxLANGFUSE_SECRET_KEY=sk-lf-xxx# Optional: use for self-hosted or custom Langfuse instanceLANGFUSE_HOST=https://cloud.langfuse.com
Initialize (no backend parameter needed):
Copy
Ask AI
from bb_ai_sdk.observability import initinit(agent_name="my-agent") # Uses Langfuse by default
Langfuse observability features:
Detailed tracing: Captures spans for LLM calls, tool calls, and agent loop reasoning
Tool call tracking: Each tool invocation is captured as a span with function names
Cost tracking: Monitor token usage and costs per trace, session, or user
Performance metrics: Track latency, throughput, and error rates
Session grouping: Group related traces by session for end-to-end analysis
Real-time insights: View traces and metrics in real-time dashboards
Debugging tools: Inspect full request/response payloads and trace hierarchies
Export traces to Datadog APM for teams already using Datadog for infrastructure monitoring.Setup:
Set environment variable:
.env
Copy
Ask AI
DD_API_KEY=your-datadog-api-key
Initialize with Datadog backend:
Copy
Ask AI
from bb_ai_sdk.observability import initinit(agent_name="my-agent", backend="datadog")
Export to Grafana Cloud via OTLP endpoint.Setup:
Set environment variable:
.env
Copy
Ask AI
GRAFANA_BEARER_TOKEN=your-grafana-token
Initialize with Grafana backend:
Copy
Ask AI
from bb_ai_sdk.observability import initinit( agent_name="my-agent", backend="grafana", otlp_endpoint="https://your-grafana.com/otlp")
Use any OpenTelemetry-compatible collector or backend (e.g., self-hosted Langfuse, local instance, Grafana Alloy or other OTLP-compatible systems).Setup:
Copy
Ask AI
from bb_ai_sdk.observability import initinit( agent_name="my-agent", backend="custom", otlp_endpoint="https://your-collector.com/v1/traces", otlp_headers={"Authorization": "Bearer your-token"} #pass headers if required)
Load environment variables before importing the SDK:
Copy
Ask AI
import dotenvdotenv.load_dotenv()
3
Initialize and instrument
Initialize observability and instrument the OpenAI library:
Copy
Ask AI
from bb_ai_sdk.observability import init, get_tracer_providerfrom openinference.instrumentation.openai import OpenAIInstrumentor# Initialize observability (call once per application)init(agent_name="my-agent")# Instrument OpenAI library to capture LLM callsprovider = get_tracer_provider()OpenAIInstrumentor().instrument(tracer_provider=provider)
⚠️ IMPORTANT:
Call init() only ONCE per application session
You must instrument the OpenAI library - init() alone doesn’t generate traces
Initialize before creating AI Gateway instances or making LLM calls
All LLM calls are now automatically traced and exported to your chosen backend!
💡 Enhance your traces with optional parametersThe init() function accepts many optional parameters to better organize and filter your traces. For example:
environment: Tag traces by environment (development, staging, production)
organization_id: Enable multi-tenant cost attribution and filtering
organization_name: Human-readable organization name for dashboards
resource_attributes: Custom metadata for filtering and analysis
For applications using AIGateway directly (no agentic frameworks):
Copy
Ask AI
from bb_ai_sdk.observability import init, get_tracer_providerfrom bb_ai_sdk.ai_gateway import AIGatewayfrom openinference.instrumentation.openai import OpenAIInstrumentor# Step 1: initialize observabilityinit(agent_name="my-agent")# Step 2: instrument OpenAI library# Openaiinstrumentor monkey-patches the OpenAI library to capture all LLM callsprovider = get_tracer_provider()OpenAIInstrumentor().instrument(tracer_provider=provider)# Step 3: create AI gateway - all calls are now tracedgateway = AIGateway.create( model_id="gpt-4o", agent_id="550e8400-e29b-41d4-a716-446655440000")# Step 4: make LLM calls - automatically traced!response = gateway.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Hello!"}])
Why OpenAIInstrumentor?When using AIGateway with vanilla code (no agentic frameworks), OpenAIInstrumentor is the standard way to instrument OpenAI-compatible clients. It monkey-patches the OpenAI library at the SDK level, capturing all LLM calls made through AIGateway’s OpenAI-compatible interface.
from bb_ai_sdk.observability import init, get_tracer_providerfrom openinference.instrumentation.agno import AgnoInstrumentor# Step 1: initialize observabilityinit(agent_name="my-agent")# Step 2: instrument Agno framework# Agnoinstrumentor monkey-patches the Agno framework to capture all LLM callsprovider = get_tracer_provider()AgnoInstrumentor().instrument(tracer_provider=provider)# Your Agno agent code here - all calls are automatically traced
from bb_ai_sdk.observability import init, traceinit(agent_name="my-agent")@trace()def process_user_request(user_input: str) -> str: """This function is automatically traced.""" # Your logic here return "processed result"@trace(name="custom-span-name")def validate_input(data: dict) -> bool: """Trace with a custom name for clarity.""" return True
Configure backend credentials via environment variables (recommended for security):
.env
Copy
Ask AI
# Langfuse (default backend)LANGFUSE_PUBLIC_KEY=pk-lf-xxxLANGFUSE_SECRET_KEY=sk-lf-xxxLANGFUSE_HOST=https://cloud.langfuse.com # Optional, defaults to cloud# Datadog (if using backend="datadog")DD_API_KEY=your-datadog-api-key# Grafana (if using backend="grafana")GRAFANA_BEARER_TOKEN=your-grafana-token# Custom otlp endpointOTEL_EXPORTER_OTLP_ENDPOINT=https://your-otlp-endpoint.comOTEL_EXPORTER_OTLP_HEADERS=Authorization=Bearer xxx# Control observabilityOBSERVABILITY_ENABLED=true # Set to false to disable
When working within the Backbase network, configure proxy settings correctly:
.env
Copy
Ask AI
# Backbase web proxyHTTP_PROXY=http://webproxy.infra.backbase.cloud:8888HTTPS_PROXY=http://webproxy.infra.backbase.cloud:8888# Bypass proxy for observability endpointsNO_PROXY=localhost,127.0.0.1,cloud.langfuse.com,*.langfuse.com,langfuse
NO_PROXY is required for both Langfuse Cloud and local Langfuse instances.The Backbase web proxy can interfere with OTLP trace exports. If your traces aren’t appearing, proxy misconfiguration is a common cause.
Traces are batched for efficient network usage. By default, traces may take up to 5 seconds to appear:
Copy
Ask AI
init( agent_name="my-agent", otlp_batch_size=512, # Max spans per batch (default: 512) otlp_batch_timeout=5.0, # Max wait time in seconds (default: 5.0) otlp_max_queue_size=10000 # Max buffered spans (default: 10000))
For faster feedback during development, reduce the batch timeout:
Copy
Ask AI
init(agent_name="dev-agent", otlp_batch_timeout=1.0) # Export every 1 second
Don’t use low timeouts in production - it increases network overhead.
2. Use optional parameters for better organization
Take advantage of init() optional parameters to improve trace organization and filtering:
Copy
Ask AI
# ✅ good: use optional parameters for better organizationinit( agent_name="customer-support-agent", environment="production", # Filter traces by environment organization_id="org-123", # Enable multi-tenant cost attribution organization_name="Acme Corp", # Human-readable for dashboards resource_attributes={ "service.version": "1.2.3", # Track deployments "deployment.region": "us-east-1", # Filter by region "team": "platform-team" # Group by team })# ❌ bad: minimal configuration limits filtering optionsinit(agent_name="my-agent") # Can't filter by environment or organization
Use environment to separate development, staging, and production traces. Use organization_id and organization_name for multi-tenant applications to track costs and filter traces per organization.
from bb_ai_sdk.observability import get_tracer_providerprovider = get_tracer_provider()# Use with instrumentorsfrom openinference.instrumentation.openai import OpenAIInstrumentorOpenAIInstrumentor().instrument(tracer_provider=provider)
The SDK automatically redacts sensitive data from traces:
API keys (patterns like api_key=..., apikey=...)
Tokens (Bearer tokens, access tokens)
Passwords and secrets
Langfuse credentials
Copy
Ask AI
@trace(attributes={ "api_key": "sk-secret-key-123", # Automatically redacted to "[REDACTED]" "user.name": "John" # Not redacted, safe to log})def my_function(): pass
While automatic redaction helps, design your tracing to capture operational metrics, not secrets. Avoid passing sensitive data as span attributes in the first place.
Cause: Span queue growing due to export failures.Solution: Check network connectivity to your OTLP endpoint. Reduce otlp_max_queue_size if memory is constrained:
Custom OTLP endpoint URL. For Langfuse, this is typically https://cloud.langfuse.com/api/public/otel/v1/traces. If not specified and using Langfuse backend, the SDK constructs the endpoint from LANGFUSE_HOST environment variable.
The OpenTelemetry TracerProvider configured by the SDK. Use this with instrumentors like OpenAIInstrumentor or AgnoInstrumentor to enable automatic tracing.
Show Example
Copy
Ask AI
from bb_ai_sdk.observability import init, get_tracer_providerfrom openinference.instrumentation.openai import OpenAIInstrumentorinit(agent_name="my-agent")provider = get_tracer_provider()OpenAIInstrumentor().instrument(tracer_provider=provider)