Knowledge Agent

Current Version: v0.1.0

The Knowledge Agent (Level 3) is a production-ready RAG (Retrieval-Augmented Generation) starter kit for building knowledge-powered AI agents. It provides complete document ingestion, intelligent chunking, hybrid search, and RAG-enabled query capabilities—everything needed to build agents that answer questions from your organization’s documents and data.

This repository can be used as a base template for creating your own application. Select starter-knowledge-base-agent as the repository_template when provisioning a new repository via Self Service.

GitHub Repository

View source code, releases, and issues

Why RAG for Banking?

Financial institutions manage vast repositories of policies, regulations, product documentation, and customer-facing content. RAG enables agents to provide accurate, source-cited answers from this knowledge—critical for compliance, trust, and customer experience. Common use cases:

Policy and compliance Q&A
Product information lookup
Internal knowledge base assistants
Document-grounded customer support

What You’ll Get

This starter provides a complete RAG pipeline with three core capabilities:

Document Ingestion

Load PDFs, text, markdown, and web pages with multiple chunking strategies (character, token, semantic, recursive).

Hybrid Search

Semantic vector search, keyword full-text search, and hybrid search with optional reranking for improved relevance.

RAG Agent

Knowledge-augmented responses with source citations. Answers grounded in your documents.

This documentation explains two implementation approaches:

Starter Kit Implementation

The native implementation included in this repository—custom pipelines using raw SQL, embeddings API, and search logic.

Agno Framework Alternative

Expandable examples throughout each section showing how to achieve similar functionality using Agno’s built-in knowledge base, readers, and search features.

Prerequisites

Python 3.11+: Managed via UV
UV Package Manager: Modern Python package manager (replaces pip/poetry)
PostgreSQL 14+: With pgvector extension for vector storage
Docker: For running PostgreSQL locally

Database Required: This starter requires PostgreSQL with the pgvector extension. Unlike other starters, you must set up a database before running the application.

Quick Start

Clone and Install UV

git clone https://github.com/bb-ecos-agbs/starter-knowledge-base-agent.git
cd starter-knowledge-base-agent

# Install UV (macOS)
brew install uv

# Install UV (Linux/WSL)
curl -LsSf https://astral.sh/uv/install.sh | sh

Setup Environment

# Create virtual environment
uv venv --python 3.11
source .venv/bin/activate  # macOS/Linux
# or .venv\Scripts\activate  # Windows

# Copy environment template
cp env.template .env

Configure Credentials

Edit .env with required values:

# Required - Artifactory credentials (for bb-ai-sdk)
export UV_INDEX_BACKBASE_USERNAME[email protected]
export UV_INDEX_BACKBASE_PASSWORD=your-artifactory-token

# AI Gateway
AI_GATEWAY_ENDPOINT=https://ai-gateway.backbase.cloud
AI_GATEWAY_API_KEY=your-api-key

# Database
DB_HOST=localhost
DB_PORT=5432
DB_NAME=knowledge_base
DB_USER=postgres
DB_PASSWORD=postgres

# Observability (Langfuse)
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_SECRET_KEY=sk-lf-...
LANGFUSE_HOST=https://cloud.langfuse.com

# Web Proxy (Required for local dev)
HTTP_PROXY=http://webproxy.infra.backbase.cloud:8888
HTTPS_PROXY=http://webproxy.infra.backbase.cloud:8888

Setup Database and Install Dependencies

Set up PostgreSQL with pgvector. See Database Setup for detailed instructions on provisioning via Azure or running locally with Docker.

# Export env vars and sync dependencies
source .env && uv sync

Run the Server

uv run python -m src.main

The server runs at http://localhost:8000. Access the API documentation at /docs.

VPN and Web Proxy: Required for local development. Configure Aviatrix VPN and web proxy settings. See Onboarding Guide for setup instructions.

Database Setup

The starter uses PostgreSQL with pgvector for vector storage and similarity search. You have two options for provisioning the database:

Azure PostgreSQL (Production)
Docker (Local Development)

Use Self Service to provision a managed PostgreSQL instance on Azure. This is the recommended approach for production deployments and shared development environments.

See Self Service for instructions on requesting PostgreSQL with pgvector extension and managing infrastructure access.

Once provisioned, configure your .env with the provided credentials:

DB_HOST=your-postgres-instance-host
DB_PORT=5432
DB_NAME=knowledge_base
DB_USER=your-username
DB_PASSWORD=your-password
DB_SSLMODE=require

For local development and testing, run PostgreSQL with pgvector using Docker:

# Start PostgreSQL with pgvector
docker run -d --name pgvector \
  -p 5432:5432 \
  -e POSTGRES_PASSWORD=postgres \
  -e POSTGRES_DB=knowledge_base \
  pgvector/pgvector:pg16

Configure your .env for local connection:

DB_HOST=localhost
DB_PORT=5432
DB_NAME=knowledge_base
DB_USER=postgres
DB_PASSWORD=postgres

Docker setup is intended for local development only. For production deployments, use Azure PostgreSQL via Self Service.

Initialize Database

After configuring your database connection, run the setup script to create the required tables:

# Initialize database tables
uv run python scripts/setup_db.py

This creates three normalized tables:

Table	Purpose
`documents`	Source documents with metadata
`chunks`	Text chunks with token counts
`embeddings`	Vector embeddings (1536 dimensions)

Database Management

# Test connection
uv run python scripts/setup_db.py --test

# Check database status
uv run python scripts/setup_db.py --status

# Reset database (drop and recreate)
uv run python scripts/setup_db.py --reset

Using Agno Framework for Database

Agno manages its own schema automatically via PgVector. To use the AI Gateway for embeddings:

import os
from agno.vectordb.pgvector import PgVector, SearchType
from agno.knowledge.embedder.openai import OpenAIEmbedder

# Configure embedder to use AI Gateway
gateway_url = os.getenv("AI_GATEWAY_ENDPOINT")
gateway_key = os.getenv("AI_GATEWAY_API_KEY")

embedder = OpenAIEmbedder(
    id="text-embedding-3-small",
    dimensions=1536,
    api_key=gateway_key,
    base_url=f"{gateway_url}/deployments/text-embedding-3-small",
)

# Create knowledge base with the embedder
kb = PgVector(
    table_name="my_kb",
    db_url="postgresql://user:pass@localhost:5432/db",
    embedder=embedder,
    search_type=SearchType.hybrid,
)
kb.create()  # Creates table automatically

See Agno PgVector Documentation for more details on PgVector and other supported vector stores.

Document Ingestion

Add documents to the knowledge base using CLI scripts or the REST API.

# Ingest a single file
uv run python scripts/ingest.py --file ./data/samples/document.pdf

# With specific chunking strategy
uv run python scripts/ingest.py --file ./document.pdf --chunking semantic

# Ingest a directory
uv run python scripts/ingest.py --dir ./docs/

# Ingest from URL
uv run python scripts/ingest.py --url https://example.com/article

# PDF with OCR (for scanned documents)
uv run python scripts/ingest.py --file ./scanned.pdf --ocr

# Upload a file
curl -X POST http://localhost:8000/ingest \
  -F "[email protected]" \
  -F "chunking_strategy=recursive"

# Ingest from URL
curl -X POST http://localhost:8000/ingest \
  -F "url=https://example.com/article" \
  -F "chunking_strategy=semantic"

# Ingest raw text
curl -X POST http://localhost:8000/ingest \
  -F "text=Your text content here..." \
  -F "title=My Document"

Chunking Strategies

Strategy	Description	Best For
character	Fixed character-based splits	Simple text
token	Token-aware splitting (tiktoken)	LLM-optimized chunks
semantic	Sentence boundary splitting	Preserving meaning
recursive	Hierarchical splitting (default)	Structured documents

Supported File Formats

Format	Extensions
PDF	`.pdf`
Text	`.txt`
Markdown	`.md`
Web URLs	`http://`, `https://`

Using Agno Framework for Ingestion

Agno provides built-in Readers that transform raw content from various sources into structured Document objects. Readers handle parsing, text extraction, and automatic chunking.

from agno.knowledge.reader.pdf_reader import PDFReader

# Configure reader with chunking
reader = PDFReader(
    chunk=True,
    chunk_size=1000,
)

# Read and process document
documents = reader.read("document.pdf")

# Add to knowledge base
kb.load(documents)

Agno supports multiple reader types including PDF, CSV, Markdown, JSON, and web content. See Agno Readers Documentation for the full list of supported readers and configuration options.For chunking strategies, Agno supports document chunking, fixed-size chunking, semantic chunking, and agentic chunking. See Agno Chunking Documentation for details.

Search & Retrieval

Query the knowledge base with multiple search strategies.

# Hybrid search (default - combines semantic and keyword)
uv run python scripts/search.py "What is attention?"

# Semantic search only (vector similarity)
uv run python scripts/search.py "transformer architecture" --strategy semantic

# Keyword search only (full-text)
uv run python scripts/search.py "neural network" --strategy keyword

# With reranking for better quality
uv run python scripts/search.py "attention mechanism" --rerank

# Basic hybrid search
curl -X POST http://localhost:8000/search \
  -H "Content-Type: application/json" \
  -d '{"query": "What is attention?"}'

# Semantic search with reranking
curl -X POST http://localhost:8000/search \
  -H "Content-Type: application/json" \
  -d '{"query": "attention mechanism", "strategy": "semantic", "rerank": true}'

# Search within specific documents
curl -X POST http://localhost:8000/search \
  -H "Content-Type: application/json" \
  -d '{"query": "neural networks", "document_ids": ["doc-uuid-1"]}'

Search Strategies

Strategy	Description	Best For
Semantic	Vector similarity using embeddings	Conceptual queries, finding related content
Keyword	Full-text search with PostgreSQL tsvector	Exact terms, specific phrases
Hybrid	Combined semantic + keyword with RRF fusion	General use, best overall accuracy

Reranking

Reranking improves search result quality by re-scoring retrieved documents. Two backends are supported:

Backend	Latency	Setup	Best For
Cohere	~100-300ms	Requires API keys	Production, high throughput
Local Cross-Encoder	~1-5s	No setup needed	Development, testing

For production deployments, configure Cohere reranking via COHERE_RERANK_ENDPOINT and COHERE_RERANK_API_KEY environment variables.

Using Agno Framework for Search

Agno provides built-in search capabilities on the knowledge base with support for different search types.

# Basic search
results = kb.search("What is attention?", limit=10)
for r in results:
    print(f"{r.score:.3f}: {r.content[:100]}...")

Agno’s PgVector supports multiple search types configured at initialization:

from agno.vectordb.pgvector import PgVector, SearchType

kb = PgVector(
    table_name="my_kb",
    db_url=db_url,
    embedder=embedder,
    search_type=SearchType.hybrid,  # Options: vector, keyword, hybrid
)

See Agno Search & Retrieval for advanced search configuration and filtering options.

RAG Agent

The starter includes a RAG-enabled agent that searches the knowledge base to answer questions with source citations.

curl -X POST http://localhost:8000/agent/crude \
  -H "Content-Type: application/json" \
  -d '{"query": "What is the attention mechanism?"}'

Response with sources:

{
  "query": "What is the attention mechanism?",
  "answer": "Based on the knowledge base, the attention mechanism is a component that allows models to focus on relevant parts of the input...",
  "sources": [
    {
      "title": "attention_is_all_you_need.pdf",
      "source": "/data/samples/attention_is_all_you_need.pdf",
      "snippet": "An attention function can be described as mapping a query and a set of key-value pairs to an output..."
    }
  ]
}

Using Agno Framework for RAG Agent

Agno agents can automatically search the knowledge base when configured with knowledge:

from agno.agent import Agent

agent = Agent(
    model=model,
    knowledge=kb,
    search_knowledge=True,  # Automatic retrieval
)
response = agent.run("Explain transformers")

When search_knowledge=True, the agent automatically queries the knowledge base for relevant context before generating a response. You can also configure the number of results to retrieve and filtering options.See Agno Knowledge Getting Started for more details on integrating knowledge bases with agents.

API Reference

Method	Endpoint	Description
`GET`	`/`	Service status
`GET`	`/health`	Health check
`POST`	`/ingest`	Ingest documents (file, URL, or text)
`GET`	`/documents`	List all documents
`GET`	`/documents/{id}`	Get document details
`DELETE`	`/documents/{id}`	Delete a document
`POST`	`/search`	Search the knowledge base
`POST`	`/agent/crude`	RAG agent query

Access interactive API documentation at:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

Project Structure

starter-knowledge-base-agent/
├── .github/                    # CI/CD workflows
├── scripts/
│   ├── setup_db.py             # Database setup
│   ├── ingest.py               # Ingestion CLI
│   └── search.py               # Search CLI
├── src/
│   ├── main.py                 # App entry point
│   ├── agents/                 # RAG agent implementation
│   │   ├── assistant.py        # Agent with knowledge base tool
│   │   └── tools/              # Knowledge base search tool
│   ├── api/                    # FastAPI routes and schemas
│   │   ├── app.py
│   │   ├── schemas.py
│   │   └── routers/            # Ingest, search, documents, agent
│   ├── config/                 # Configuration management
│   ├── ingestion/              # Document ingestion pipeline
│   │   ├── loaders/            # PDF, text, markdown, web loaders
│   │   ├── chunkers/           # Chunking strategies
│   │   ├── embeddings/         # Embedding generation
│   │   └── pipeline.py         # Ingestion orchestrator
│   ├── search/                 # Search pipeline
│   │   ├── semantic.py         # Vector similarity search
│   │   ├── keyword.py          # Full-text search
│   │   ├── hybrid.py           # RRF fusion
│   │   ├── rerankers/          # Cohere, cross-encoder
│   │   └── pipeline.py         # Search orchestrator
│   └── storage/                # Database repositories
│       ├── document.py
│       ├── chunk.py
│       └── embedding.py
├── tests/                      # Test suite
├── redteam.yaml                # Red teaming configuration
├── Dockerfile                  # Container definition
└── pyproject.toml              # Dependencies

Observability

The starter automatically integrates with Langfuse via the bb-ai-sdk.

Automatic Tracing: Captures full traces for agent runs, search queries, and LLM calls.
Embedding Tracking: Monitors embedding generation costs and latency.
Search Analytics: Tracks search queries, strategies, and result quality.
Configuration: Managed via LANGFUSE_* environment variables.

This starter is integrated with bb-ai-sdk to connect with observability tools (Langfuse) and AI Gateway. See BB AI SDK Observability for advanced configuration and custom tracing.

Development

Run Tests

source .env && uv sync --extra dev
uv run pytest tests/ -v

# With coverage
uv run pytest tests/ -v --cov=src

Build Docker Image

docker build -t starter-knowledge-base-agent:local .

CI/CD

Standard workflows are pre-configured in .github/workflows:

PR Checks: Linting, testing, and validation.
Build & Publish: Docker image creation on merge.
Release: Automated versioning and release notes.

See CI/CD Workflows for pipeline details.

Next Steps

Create Your First Agent: Deploy to a runtime
Starter Agent: Start with basic agent patterns
Multi-Agent: Build agent teams
MCP Agent: Integrate with MCP servers
BB AI SDK: AI Gateway and observability

Overview

Architecture and technology

Agent Development Lifecycle

Getting started

Starter kits

BB AI SDK

MCP support

CI/CD workflows

GitHub Repository

Why RAG for Banking?

What You’ll Get

Document Ingestion

Hybrid Search

RAG Agent

Starter Kit Implementation

Agno Framework Alternative

Prerequisites

Quick Start

Database Setup

Initialize Database

Database Management

Document Ingestion

Chunking Strategies

Supported File Formats

Search & Retrieval

Search Strategies

Reranking

RAG Agent

API Reference

Project Structure

Observability

Development

Run Tests

Build Docker Image

CI/CD

Next Steps

Overview

Architecture and technology

Agent Development Lifecycle

Getting started

Starter kits

BB AI SDK

MCP support

CI/CD workflows

GitHub Repository

​Why RAG for Banking?

​What You’ll Get

Document Ingestion

Hybrid Search

RAG Agent

Starter Kit Implementation

Agno Framework Alternative

​Prerequisites

​Quick Start

​Database Setup

​Initialize Database

​Database Management

​Document Ingestion

​Chunking Strategies

​Supported File Formats

​Search & Retrieval

​Search Strategies

​Reranking

​RAG Agent

​API Reference

​Project Structure

​Observability

​Development

​Run Tests

​Build Docker Image

​CI/CD

​Next Steps

Why RAG for Banking?

What You’ll Get

Prerequisites

Quick Start

Database Setup

Initialize Database

Database Management

Document Ingestion

Chunking Strategies

Supported File Formats

Search & Retrieval

Search Strategies

Reranking

RAG Agent

API Reference

Project Structure

Observability

Development

Run Tests

Build Docker Image

CI/CD

Next Steps