Observability - Grand Central Documentation

Why Observability Matters

Debugging AI agents in production is challenging—you can’t print() your way through distributed systems. When your agent makes unexpected decisions, you need to trace every LLM call, tool invocation, and reasoning step to understand what happened. The Observability module gives you full visibility into your AI agents using OpenTelemetry. It automatically exports traces to LangFuse (default) or your existing observability stack (Datadog, Grafana), enabling you to:

Debug issues: See exactly what your agent did and why
Monitor performance: Track latency, token usage, and costs
Audit decisions: Maintain compliance with full trace history

Built on OpenTelemetry standards—switch backends anytime without changing your code. Your traces are portable and vendor-agnostic.

Quick Start

1. Set Up Environment Variables

Before initializing observability, configure your backend credentials. The default backend is LangFuse:

.env

# LangFuse credentials (default backend)
LANGFUSE_PUBLIC_KEY=pk-lf-xxx
LANGFUSE_SECRET_KEY=sk-lf-xxx

# AI Gateway credentials (for auto-instrumented LLM calls)
AI_GATEWAY_API_KEY=your-api-key
AI_GATEWAY_ENDPOINT=your-ai-gateway-endpoint

# Optional: disable observability in tests
OBSERVABILITY_ENABLED=true

Never commit API keys to version control. Add .env to your .gitignore.

Don’t have LangFuse credentials? Refer to our Onboarding guide.

2. Basic Initialization

Get tracing working in two lines:

from bb_ai_sdk.observability import init

init(agent_name="my-agent")  # That's it. You're tracing.

All AI Gateway calls are now automatically traced and sent to LangFuse.

3. With AI Gateway (Recommended)

The most common pattern combines observability with the AI Gateway for automatic instrumentation:

from bb_ai_sdk.observability import init
from bb_ai_sdk.ai_gateway import AIGateway

# 1. Initialize observability first
init(
    agent_name="customer-support-agent",
    organization_id="org_123",
    environment="production"
)

# 2. Create gateway - calls are automatically traced
gateway = AIGateway.create(
    model_id="gpt-4o",
    agent_id="550e8400-e29b-41d4-a716-446655440000"
)

# 3. This call appears in LangFuse with full context
response = gateway.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

Initialize observability before creating AI Gateway instances. This ensures all LLM calls are captured from the start.

Custom Tracing

While AI Gateway calls are traced automatically, you’ll want to trace your own functions to see the complete picture of your agent’s behavior.

The `@trace` Decorator

The simplest way to add tracing—wrap any function:

from bb_ai_sdk.observability import init, trace

init(agent_name="my-agent")

@trace()
def process_user_request(user_input: str) -> str:
    """This function is automatically traced."""
    # Your logic here
    return "processed result"

@trace(name="custom-span-name")
def validate_input(data: dict) -> bool:
    """Trace with a custom name for clarity."""
    return True

Adding Custom Attributes

Attributes let you filter and analyze traces. Add any key-value pairs you need:

@trace(attributes={
    "prompt.version": "v1.2.3",
    "prompt.name": "customer-support-prompt",
    "user.id": "user-123",
    "session.id": "session-456",
    "experiment.variant": "A"
})
def run_experiment():
    """Traces include all custom attributes for analysis."""
    pass

Common uses for attributes: prompt versioning, A/B testing, user tracking, and cost attribution per customer.

Async Function Support

The decorator works seamlessly with async functions:

@trace()
async def async_operation():
    """Async functions are traced automatically."""
    await some_async_call()
    return "async result"

The `trace_context` Context Manager

For fine-grained control within a function, use the context manager:

from bb_ai_sdk.observability import init, trace_context

init(agent_name="my-agent")

def complex_operation():
    with trace_context("multi-step-operation") as span:
        # Add events during execution
        span.add_event("Step 1: Validating input")
        validate_input()
        
        span.add_event("Step 2: Processing data")
        result = process_data()
        
        # Add attributes dynamically based on results
        span.set_attribute("result.count", len(result))
        span.add_event("Step 3: Complete")
        
        return result

You can also pass initial attributes:

with trace_context("prompt-execution", attributes={
    "prompt.version": "v2.0.0",
    "prompt.name": "summarization",
    "input.length": len(user_input)
}) as span:
    result = run_prompt(user_input)
    span.set_attribute("output.length", len(result))

Framework Integration

If you’re using LangChain or LangGraph, use the built-in callback handlers to trace framework-specific operations like chain execution and graph transitions.

LangChain Callback Handler

Trace LangChain chains, LLM calls, tool invocations, and retriever operations:

from bb_ai_sdk.observability import init, LangChainOpenTelemetryCallbackHandler
from bb_ai_sdk.ai_gateway import AIGateway
from bb_ai_sdk.ai_gateway.adapters.langchain import to_langchain
from langchain.prompts import ChatPromptTemplate
from langchain.schema.output_parser import StrOutputParser

# Initialize observability
init(agent_name="langchain-agent")

# Create callback handler
callback = LangChainOpenTelemetryCallbackHandler()

# Setup LangChain
gateway = AIGateway.create(model_id="gpt-4o", agent_id="...")
model = to_langchain(gateway)

prompt = ChatPromptTemplate.from_template("Tell me about {topic}")
chain = prompt | model | StrOutputParser()

# Use callback with chain invocation
result = chain.invoke(
    {"topic": "AI observability"},
    config={"callbacks": [callback]}
)

The handler automatically traces:

Chain execution (start, end, error)
LLM calls with token usage
Tool invocations
Retriever operations

LangGraph Callback Handler

Trace graph executions and node transitions:

from bb_ai_sdk.observability import init, LangGraphOpenTelemetryCallbackHandler
from langgraph.graph import StateGraph, END
from langchain_core.runnables import RunnableConfig

# Initialize observability
init(agent_name="langgraph-agent")

# Create callback handler with graph name for clarity
callback = LangGraphOpenTelemetryCallbackHandler(graph_name="my-workflow")

# Build your graph
graph = StateGraph(AgentState)
graph.add_node("process", process_node)
graph.add_node("decide", decide_node)
graph.add_edge("process", "decide")
graph.set_entry_point("process")

app = graph.compile()

# Execute with callback
result = app.invoke(
    {"messages": [...]},
    config=RunnableConfig(callbacks=[callback])
)

The handler traces:

Graph execution lifecycle
Node execution with step ordering
State transitions between nodes
LLM and tool calls within nodes

Backend Configuration

The SDK supports multiple observability backends through simple presets. Choose based on your existing infrastructure:

LangFuse (Default)
Datadog
Grafana
Custom OTLP

LangFuse is the default backend, optimized for AI observability with prompt management, cost tracking, and user feedback.

from bb_ai_sdk.observability import init

# Option 1: Use environment variables (recommended)
# Set LANGFUSE_PUBLIC_KEY and LANGFUSE_SECRET_KEY in .env
init(agent_name="my-agent", backend="langfuse")

# Option 2: Explicit parameters (for testing)
init(
    agent_name="my-agent",
    backend="langfuse",
    langfuse_public_key="pk-lf-xxx",
    langfuse_secret_key="sk-lf-xxx"
)

LangFuse features:

Prompt management and versioning
Cost tracking per trace
User feedback collection
Session grouping

Export traces to Datadog APM for teams already using Datadog for infrastructure monitoring.

from bb_ai_sdk.observability import init

# Use DD_API_KEY environment variable
init(agent_name="my-agent", backend="datadog")

# Or explicit parameter
init(
    agent_name="my-agent",
    backend="datadog",
    datadog_api_key="your-api-key"
)

Export to Grafana Cloud via OTLP endpoint.

from bb_ai_sdk.observability import init

init(
    agent_name="my-agent",
    backend="grafana",
    otlp_endpoint="https://your-grafana.com/otlp",
    grafana_bearer_token="your-token"
)

Use any OpenTelemetry-compatible collector or backend.

from bb_ai_sdk.observability import init

init(
    agent_name="my-agent",
    backend="custom",
    otlp_endpoint="https://your-collector.com/v1/traces",
    otlp_headers={"Authorization": "Bearer your-token"}
)

Environment Variables

Configure your backend credentials via environment variables (recommended for security):

.env

# LangFuse Authentication (default backend)
LANGFUSE_PUBLIC_KEY=pk-lf-xxx
LANGFUSE_SECRET_KEY=sk-lf-xxx
LANGFUSE_HOST=https://cloud.langfuse.com  # Optional, defaults to cloud

# Alternative: Custom OTLP Endpoint
OTEL_EXPORTER_OTLP_ENDPOINT=https://your-otlp-endpoint.com
OTEL_EXPORTER_OTLP_HEADERS=Authorization=Bearer xxx

# Datadog (if using backend="datadog")
DD_API_KEY=your-datadog-api-key

# Grafana (if using backend="grafana")
GRAFANA_BEARER_TOKEN=your-grafana-token

# Control observability
OBSERVABILITY_ENABLED=true  # Set to false to disable

Never commit API keys to version control. Use environment variables or a secrets management solution.

Best Practices

Follow these patterns for effective observability in production:

1. Initialize Once at Startup

Call init() once at application startup, before creating AI Gateway instances.

# app.py - at the top
from bb_ai_sdk.observability import init
init(agent_name="my-agent")

# Then import and use other modules

2. Use Meaningful Span Names

Name spans descriptively for easy trace navigation.

# ✅ Good: Descriptive names
@trace(name="validate-user-input")
@trace(name="fetch-customer-data")

# ❌ Bad: Generic names
@trace(name="process")
@trace(name="step1")

3. Add Business Context

Include attributes that help with business analysis and debugging.

@trace(attributes={
    "customer.tier": "premium",
    "request.type": "balance-inquiry",
    "channel": "mobile-app"
})

4. Use Environment Variables

Keep credentials out of code—always.

# ✅ Good: Use environment
init(agent_name="my-agent", backend="langfuse")

# ❌ Bad: Hardcode credentials
init(
    agent_name="my-agent",
    langfuse_public_key="pk-xxx"  # Don't do this
)

5. Leverage Auto-Instrumentation

Let the SDK handle tracing for AI Gateway calls automatically. Only add custom traces for your business logic.

# AI Gateway calls are traced automatically - no @trace needed
response = gateway.chat.completions.create(...)

# Add @trace only for your own functions
@trace(name="process-customer-request")
def handle_request(user_input):
    # Your business logic here
    pass

6. Use Consistent Naming Conventions

Adopt a naming pattern across your team for easier filtering and analysis.

Pattern	Example	Use Case
`verb-noun`	`validate-input`, `fetch-data`	Function-level spans
`service.operation`	`billing.calculate-total`	Service boundaries
`domain.action`	`customer.update-profile`	Domain operations

Security

Sensitive Data Redaction

The SDK automatically redacts sensitive data from traces to prevent credential leakage:

API keys (patterns like api_key=..., apikey=...)
Tokens (Bearer tokens, access tokens)
Passwords and secrets
LangFuse credentials

from bb_ai_sdk.observability import trace

@trace(attributes={
    "api_key": "sk-secret-key-123",  # Automatically redacted to "[REDACTED]"
    "user.name": "John"              # Not redacted, safe to log
})
def my_function():
    pass

While automatic redaction helps, design your tracing to capture operational metrics, not secrets. Avoid passing sensitive data as span attributes in the first place.

Configuration Reference

Initialization Parameters

agent_name

string

required

Agent name identifier. This appears as service.name in traces.

agent_id

string

Agent ID in UUID v4 format. Used for service.instance.id.

organization_id

string

Organization ID for multi-tenant context tracking. Enables cost attribution per organization.

organization_name

string

Human-readable organization name for filtering traces.

environment

string

default:"development"

Environment name: development, staging, or production.

backend

string

Backend preset: langfuse, datadog, grafana, or custom. Defaults to LangFuse if not specified.

enabled

boolean

default:"true"

Enable or disable observability. When disabled, all tracing operations become no-ops. Can also be set via OBSERVABILITY_ENABLED environment variable.

resource_attributes

dict

Custom OpenTelemetry resource attributes. Add any metadata you need for filtering and analysis.

Custom Resource Attributes

Add metadata to all traces for filtering in your observability backend:

init(
    agent_name="my-agent",
    resource_attributes={
        "service.version": "1.2.3",
        "deployment.region": "us-east-1",
        "deployment.cluster": "prod-cluster-1",
        "custom.tenant.id": "tenant-123",
        "custom.team": "platform-team"
    }
)

Context Utilities

Sometimes you need to access trace context outside of decorated functions—for example, to correlate your existing logging with OpenTelemetry traces, or to add attributes based on runtime decisions.

Get Current Trace ID

Useful for log correlation:

from bb_ai_sdk.observability import get_current_trace_id

trace_id = get_current_trace_id()
if trace_id:
    logger.info(f"Processing request", extra={"trace_id": trace_id})

Get Current Span

Access the active span to add attributes or events dynamically:

from bb_ai_sdk.observability import get_current_span

span = get_current_span()
if span:
    span.set_attribute("user.action", "clicked_button")
    span.add_event("User interaction recorded")

Set Custom Context

For advanced scenarios where you need to set a custom span as context:

from bb_ai_sdk.observability import set_context
from opentelemetry import trace

# Create a custom span and set it as context
tracer = trace.get_tracer(__name__)
with tracer.start_as_current_span("custom-context") as span:
    set_context(span)
    # Subsequent operations use this span as parent

Advanced Configuration

Batch Export Settings

For high-throughput agents, tune how spans are batched and exported:

init(
    agent_name="high-throughput-agent",
    otlp_batch_size=1024,      # Max spans per batch (default: 512)
    otlp_batch_timeout=10.0,   # Max wait time in seconds (default: 5.0)
    otlp_max_queue_size=20000  # Max buffered spans (default: 10000)
)

Increase otlp_batch_size and otlp_max_queue_size for high-throughput agents. This reduces export overhead but uses more memory.

Disabling Observability

Disable for testing or specific environments:

# Option 1: Via parameter
init(agent_name="my-agent", enabled=False)

# Option 2: Via environment variable
# OBSERVABILITY_ENABLED=false

When disabled, all tracing operations become no-ops with zero overhead. Your agent code runs unchanged.

Error Handling

Fail-Safe Behavior

Observability is designed to never break your agent:

from bb_ai_sdk.observability import init, trace

# If initialization fails, agent continues without observability
init(agent_name="my-agent")  # Fails gracefully if LangFuse unreachable

@trace()
def my_function():
    # If tracing fails, function executes normally
    return "result"  # Always returns, even if span creation fails

Configuration Errors

Handle configuration errors explicitly if needed:

from bb_ai_sdk.observability import init
from bb_ai_sdk.observability.errors import ObservabilityConfigurationError

try:
    init(
        agent_name="",  # Invalid: empty agent name
        backend="langfuse"
    )
except ObservabilityConfigurationError as e:
    print(f"Configuration error: {e}")
    # Handle gracefully or use defaults

Troubleshooting

Traces not appearing in LangFuse

Cause: Missing or invalid LangFuse credentials.Solution: Verify environment variables are set correctly:

export LANGFUSE_PUBLIC_KEY=pk-lf-xxx
export LANGFUSE_SECRET_KEY=sk-lf-xxx

Check that keys are valid in your LangFuse dashboard.

High memory usage

Cause: Span queue growing due to export failures.Solution: Check network connectivity to your OTLP endpoint. Reduce otlp_max_queue_size if memory is constrained:

init(agent_name="my-agent", otlp_max_queue_size=5000)

Missing spans in traces

Cause: Spans not properly nested or context not propagated.Solution: Ensure you’re using the @trace decorator or trace_context context manager consistently. For async code, OpenTelemetry context propagates automatically via contextvars.

OBSERVABILITY_ENABLED not working

Cause: Environment variable not loaded before initialization.Solution: Load .env file before importing the SDK:

from dotenv import load_dotenv
load_dotenv()  # Load before import

from bb_ai_sdk.observability import init
init(agent_name="my-agent")

API Reference

init()

init

function

Initialize OpenTelemetry observability with TracerProvider and OTLP exporter.

Show Parameters

agent_name

str

required

Agent name identifier.

agent_id

str | None

Agent ID (UUID v4 format).

organization_id

str | None

Organization ID for multi-tenant context.

organization_name

str | None

Organization name for display.

environment

str

default:"development"

Environment name.

backend

str | None

Backend preset: langfuse, datadog, grafana, or custom.

otlp_endpoint

str | None

Custom OTLP endpoint URL.

otlp_headers

dict | None

Custom OTLP headers.

otlp_batch_size

int

default:"512"

Maximum spans per export batch.

otlp_batch_timeout

float

default:"5.0"

Maximum wait time before exporting.

otlp_max_queue_size

int

default:"10000"

Maximum buffered spans.

enabled

bool

default:"true"

Enable or disable observability.

resource_attributes

dict | None

Custom OpenTelemetry resource attributes.

langfuse_public_key

str | None

LangFuse public key (overrides environment variable).

langfuse_secret_key

str | None

LangFuse secret key (overrides environment variable).

datadog_api_key

str | None

Datadog API key (overrides environment variable).

grafana_bearer_token

str | None

Grafana bearer token (overrides environment variable).

trace()

trace

decorator

Decorator that creates OpenTelemetry spans for functions.

Show Parameters

name

str | None

Span name. Defaults to module.function_name.

attributes

dict | None

Dictionary of span attributes.

trace_context()

trace_context

context manager

Context manager for manual span control.

Show Parameters

name

str

required

Span name.

attributes

dict | None

Initial span attributes.

Show Yields

span

Span

The active span for adding attributes, events, or status.

Callback Handlers

LangChainOpenTelemetryCallbackHandler

class

Callback handler for LangChain operations.

Show Constructor Parameters

tracer_name

str

default:"langchain"

Name for the OpenTelemetry tracer.

LangGraphOpenTelemetryCallbackHandler

class

Callback handler for LangGraph operations.

Show Constructor Parameters

tracer_name

str

default:"langgraph"

Name for the OpenTelemetry tracer.

graph_name

str | None

Optional graph name for span attributes.

Next Steps

AI Gateway

Learn about the OpenAI-compatible client for model access

Starter Kits

See observability integrated in production-ready templates

Get Started

Build your first traced agent

Examples

View complete working examples

Overview

Architecture & Technology

Getting Started

ADLC - Agent Development Lifecycle

Starter Kits

BB AI SDK

CI/CD Workflows

​Why Observability Matters

​Quick Start

​1. Set Up Environment Variables

​2. Basic Initialization

​3. With AI Gateway (Recommended)

​Custom Tracing

​The @trace Decorator

​Adding Custom Attributes

​Async Function Support

​The trace_context Context Manager

​Framework Integration

​LangChain Callback Handler

​LangGraph Callback Handler

​Backend Configuration

​Environment Variables

​Best Practices

​1. Initialize Once at Startup

​2. Use Meaningful Span Names

​3. Add Business Context

​4. Use Environment Variables

​5. Leverage Auto-Instrumentation

​6. Use Consistent Naming Conventions

​Security

​Sensitive Data Redaction

​Configuration Reference

​Initialization Parameters

​Custom Resource Attributes

​Context Utilities

​Get Current Trace ID

​Get Current Span

​Set Custom Context

​Advanced Configuration

​Batch Export Settings

​Disabling Observability

​Error Handling

​Fail-Safe Behavior

​Configuration Errors

​Troubleshooting

​API Reference

​init()

​trace()

​trace_context()

​Callback Handlers

​Next Steps

AI Gateway

Starter Kits

Get Started

Examples

Why Observability Matters

Quick Start

1. Set Up Environment Variables

2. Basic Initialization

3. With AI Gateway (Recommended)

Custom Tracing

The `@trace` Decorator

Adding Custom Attributes

Async Function Support

The `trace_context` Context Manager

Framework Integration

LangChain Callback Handler

LangGraph Callback Handler

Backend Configuration

Environment Variables

Best Practices

1. Initialize Once at Startup

2. Use Meaningful Span Names

3. Add Business Context

4. Use Environment Variables

5. Leverage Auto-Instrumentation

6. Use Consistent Naming Conventions

Security

Sensitive Data Redaction

Configuration Reference

Initialization Parameters

Custom Resource Attributes

Context Utilities

Get Current Trace ID

Get Current Span

Set Custom Context

Advanced Configuration

Batch Export Settings

Disabling Observability

Error Handling

Fail-Safe Behavior

Configuration Errors

Troubleshooting

API Reference

init()

trace()

trace_context()

Callback Handlers

Next Steps