Debugging AI agents in production is challenging—you can’t print() your way through distributed systems. When your agent makes unexpected decisions, you need to trace every LLM call, tool invocation, and reasoning step to understand what happened.The Observability module gives you full visibility into your AI agents using OpenTelemetry. It automatically exports traces to LangFuse (default) or your existing observability stack (Datadog, Grafana), enabling you to:
Debug issues: See exactly what your agent did and why
Monitor performance: Track latency, token usage, and costs
Audit decisions: Maintain compliance with full trace history
Built on OpenTelemetry standards—switch backends anytime without changing your code. Your traces are portable and vendor-agnostic.
The simplest way to add tracing—wrap any function:
Copy
Ask AI
from bb_ai_sdk.observability import init, traceinit(agent_name="my-agent")@trace()def process_user_request(user_input: str) -> str: """This function is automatically traced.""" # Your logic here return "processed result"@trace(name="custom-span-name")def validate_input(data: dict) -> bool: """Trace with a custom name for clarity.""" return True
If you’re using LangChain or LangGraph, use the built-in callback handlers to trace framework-specific operations like chain execution and graph transitions.
The SDK supports multiple observability backends through simple presets. Choose based on your existing infrastructure:
LangFuse (Default)
Datadog
Grafana
Custom OTLP
LangFuse is the default backend, optimized for AI observability with prompt management, cost tracking, and user feedback.
Copy
Ask AI
from bb_ai_sdk.observability import init# Option 1: Use environment variables (recommended)# Set LANGFUSE_PUBLIC_KEY and LANGFUSE_SECRET_KEY in .envinit(agent_name="my-agent", backend="langfuse")# Option 2: Explicit parameters (for testing)init( agent_name="my-agent", backend="langfuse", langfuse_public_key="pk-lf-xxx", langfuse_secret_key="sk-lf-xxx")
LangFuse features:
Prompt management and versioning
Cost tracking per trace
User feedback collection
Session grouping
Export traces to Datadog APM for teams already using Datadog for infrastructure monitoring.
Copy
Ask AI
from bb_ai_sdk.observability import init# Use DD_API_KEY environment variableinit(agent_name="my-agent", backend="datadog")# Or explicit parameterinit( agent_name="my-agent", backend="datadog", datadog_api_key="your-api-key")
Export to Grafana Cloud via OTLP endpoint.
Copy
Ask AI
from bb_ai_sdk.observability import initinit( agent_name="my-agent", backend="grafana", otlp_endpoint="https://your-grafana.com/otlp", grafana_bearer_token="your-token")
Use any OpenTelemetry-compatible collector or backend.
Copy
Ask AI
from bb_ai_sdk.observability import initinit( agent_name="my-agent", backend="custom", otlp_endpoint="https://your-collector.com/v1/traces", otlp_headers={"Authorization": "Bearer your-token"})
Configure your backend credentials via environment variables (recommended for security):
.env
Copy
Ask AI
# LangFuse Authentication (default backend)LANGFUSE_PUBLIC_KEY=pk-lf-xxxLANGFUSE_SECRET_KEY=sk-lf-xxxLANGFUSE_HOST=https://cloud.langfuse.com # Optional, defaults to cloud# Alternative: Custom OTLP EndpointOTEL_EXPORTER_OTLP_ENDPOINT=https://your-otlp-endpoint.comOTEL_EXPORTER_OTLP_HEADERS=Authorization=Bearer xxx# Datadog (if using backend="datadog")DD_API_KEY=your-datadog-api-key# Grafana (if using backend="grafana")GRAFANA_BEARER_TOKEN=your-grafana-token# Control observabilityOBSERVABILITY_ENABLED=true # Set to false to disable
Never commit API keys to version control. Use environment variables or a secrets management solution.
Let the SDK handle tracing for AI Gateway calls automatically. Only add custom traces for your business logic.
Copy
Ask AI
# AI Gateway calls are traced automatically - no @trace neededresponse = gateway.chat.completions.create(...)# Add @trace only for your own functions@trace(name="process-customer-request")def handle_request(user_input): # Your business logic here pass
The SDK automatically redacts sensitive data from traces to prevent credential leakage:
API keys (patterns like api_key=..., apikey=...)
Tokens (Bearer tokens, access tokens)
Passwords and secrets
LangFuse credentials
Copy
Ask AI
from bb_ai_sdk.observability import trace@trace(attributes={ "api_key": "sk-secret-key-123", # Automatically redacted to "[REDACTED]" "user.name": "John" # Not redacted, safe to log})def my_function(): pass
While automatic redaction helps, design your tracing to capture operational metrics, not secrets. Avoid passing sensitive data as span attributes in the first place.
Sometimes you need to access trace context outside of decorated functions—for example, to correlate your existing logging with OpenTelemetry traces, or to add attributes based on runtime decisions.
For advanced scenarios where you need to set a custom span as context:
Copy
Ask AI
from bb_ai_sdk.observability import set_contextfrom opentelemetry import trace# Create a custom span and set it as contexttracer = trace.get_tracer(__name__)with tracer.start_as_current_span("custom-context") as span: set_context(span) # Subsequent operations use this span as parent
For high-throughput agents, tune how spans are batched and exported:
Copy
Ask AI
init( agent_name="high-throughput-agent", otlp_batch_size=1024, # Max spans per batch (default: 512) otlp_batch_timeout=10.0, # Max wait time in seconds (default: 5.0) otlp_max_queue_size=20000 # Max buffered spans (default: 10000))
Increase otlp_batch_size and otlp_max_queue_size for high-throughput agents. This reduces export overhead but uses more memory.
Observability is designed to never break your agent:
Copy
Ask AI
from bb_ai_sdk.observability import init, trace# If initialization fails, agent continues without observabilityinit(agent_name="my-agent") # Fails gracefully if LangFuse unreachable@trace()def my_function(): # If tracing fails, function executes normally return "result" # Always returns, even if span creation fails
Check that keys are valid in your LangFuse dashboard.
High memory usage
Cause: Span queue growing due to export failures.Solution: Check network connectivity to your OTLP endpoint. Reduce otlp_max_queue_size if memory is constrained:
Cause: Spans not properly nested or context not propagated.Solution: Ensure you’re using the @trace decorator or trace_context context manager consistently. For async code, OpenTelemetry context propagates automatically via contextvars.
OBSERVABILITY_ENABLED not working
Cause: Environment variable not loaded before initialization.Solution: Load .env file before importing the SDK:
Copy
Ask AI
from dotenv import load_dotenvload_dotenv() # Load before importfrom bb_ai_sdk.observability import initinit(agent_name="my-agent")