Rylvo Knowledge Base: Enterprise RAG with 12 Retrieval Blueprints, Multi-Source Ingestion, and AI-Powered Document Grounding
Every AI bot is only as good as the knowledge it can access. A customer support bot that cannot read your product documentation will hallucinate features. A legal assistant that cannot query your contract database will give generic advice. A sales bot that does not know your pricing tiers will quote outdated numbers. The gap between what LLMs know from training data and what your organization actually needs them to know is the single biggest reason bots fail in production.
Retrieval-Augmented Generation, or RAG, closes that gap. Instead of relying on the LLM's memorized training data, RAG retrieves the most relevant snippets from your documents at query time and injects them into the bot's context. The result is answers that are factual, current, and grounded in your specific organizational knowledge.
But building RAG is not simple. You need to parse documents in dozens of formats, chunk them intelligently, embed them into vectors, store them in a vector database, retrieve the right chunks at query time, and format everything into a prompt the bot can use. And you need to do this for every data source your organization has: uploaded files, cloud storage, SaaS apps, databases, and websites.
Rylvo Knowledge Base is a production-grade RAG platform built into the Rylvo platform. It supports 15 data source types across 5 categories, 12 retrieval blueprints ranging from simple similarity search to multi-hop agentic reasoning, encrypted credential storage, multi-tenant vector indexing, auto-tune suggestions, a query playground, and seamless bot integration — all managed through an intuitive 6-tab dashboard at /dashboard/knowledge-base.
In this guide, we will explore the complete Knowledge Base system: its architecture, ingestion pipeline, 12 blueprints, 15 source types, 6-tab dashboard, security model, bot integration, and how it transforms raw documents into grounded AI answers.
Why RAG Matters for Production AI Bots
Large language models are trained on public internet data. They know Shakespeare, Python syntax, and general world knowledge. But they do not know your product documentation, your internal policies, your customer contracts, your pricing sheets, or your technical specifications. When asked about these, they either hallucinate plausible-sounding but incorrect answers, or they refuse to answer at all.
RAG solves this by giving the bot access to your documents at runtime. When a user asks "What is the refund policy for enterprise customers?" the system retrieves the relevant section from your policy document, injects it into the prompt, and the bot answers based on that exact text rather than its training data.
The benefits are immediate and dramatic: fewer hallucinations, more accurate answers, answers that reflect current information rather than training cutoff dates, and the ability to cite sources so users can verify claims.
Rylvo Knowledge Base takes RAG from a DIY engineering project to a fully managed platform feature.
Core Architecture: Source, Connection, Bot
Rylvo Knowledge Base uses a three-layer architecture that separates where data lives from how it is retrieved and which bots can access it.
Sources define where your data lives. A source might be an uploaded file, an S3 bucket, a Google Cloud Storage container, a website to crawl, or a SaaS app like Notion. Each source carries its authentication credentials, sync schedule, and configuration. You create sources once and can reuse them across multiple connections.
Connections combine a source, a blueprint, and configuration into an active knowledge base. A connection says: "Take the data from this S3 bucket and retrieve it using Hybrid Search with these chunk sizes and this top-K value." Connections also specify which bots are linked, so you can have different knowledge bases for different bots.
Bots load their linked connections at runtime. On every conversation turn, the bot retrieves the most relevant chunks from all active connections, merges them, and formats them into a CONTEXT block injected into the system prompt.
This separation means you can change retrieval strategies without re-ingesting data, link the same source to multiple bots with different blueprints, and manage credentials independently from retrieval logic.
The Ingestion Pipeline: Parse, Chunk, Embed, Index
Every document that enters the Knowledge Base flows through a four-stage pipeline.
Parse
Rylvo extracts raw text from documents in six formats: PDF (via unpdf), DOCX (via mammoth), TXT, Markdown, HTML, JSON, and CSV. The parser preserves page numbers and structural metadata so downstream stages can use them.
Chunk
Documents are split into chunks using either semantic chunking (splitting at sentence and paragraph boundaries) or fixed-size chunking with configurable overlap. The chunking strategy is blueprint-specific: some blueprints prefer small child chunks for matching, while others need larger parent chunks for context.
Some blueprints add special preprocessing at this stage. Contextual Retrieval uses an LLM to generate a document-level summary that is prepended to every chunk before embedding. Parent-Child creates two parallel chunk streams: small child chunks for embedding and matching, and large parent chunks for returning as context.
Embed
Each chunk is converted into a dense vector embedding using the customer's own API key. By default, Rylvo uses OpenAI's text-embedding-3-small (1536 dimensions). Cohere and Voyage embeddings are also supported through direct HTTP integration. This bring-your-own-key model ensures customers control their embedding costs and can switch models as better ones emerge.
Index
Embeddings are upserted into Qdrant, a high-performance vector database. Rylvo uses a single shared Qdrant collection with multi-tenant payload filtering: every chunk carries orgId, connectionId, and documentId fields, ensuring strict data isolation without needing per-organization collections.
The pipeline is idempotent. Re-ingesting the same document replaces its old chunks rather than creating duplicates. Incremental sync compares externalId plus modifiedAt or contentHash to skip unchanged objects, making large re-syncs efficient.
15 Source Types Across 5 Categories
Rylvo Knowledge Base supports connecting to 15 different data source types, organized into five categories.
Upload
File Upload is the simplest source. Drag and drop files in PDF, DOCX, TXT, Markdown, HTML, JSON, or CSV formats. Files are stored in Firebase Storage, parsed, chunked, embedded, and indexed immediately. Status updates live in the Documents Drawer so you can watch each file progress from queued to indexed.
Cloud Storage
Amazon S3 connects to any S3 bucket using access key and secret authentication. Supports custom endpoints for S3-compatible stores like Cloudflare R2, Backblaze B2, MinIO, and Wasabi. You can specify a prefix to limit ingestion to a folder.
Google Cloud Storage connects using a service account JSON key. Specify bucket, prefix, and sync schedule.
Azure Blob Storage is scaffolded in the type system and will be wired in an upcoming release.
SaaS Applications
Notion, Confluence, Google Drive, SharePoint, Zendesk, Intercom, and Slack are supported via OAuth authentication. The OAuth flow is fully implemented: clicking Connect opens the provider's consent screen, and on callback the credential is stored encrypted and bound to the source. The actual listObjects and fetchObject adapters for these SaaS apps are in active development and will complete the sync loop.
Databases
PostgreSQL, MySQL, and MongoDB source types exist in the type system with connection string authentication. The ingest adapters are planned for a future release.
APIs and Web
Web Crawl is fully operational. Configure seed URLs, maximum crawl depth, maximum pages per run, and allowed domain whitelist. The crawler uses breadth-first search with cheerio HTML parsing and a RylvoKBCrawler user-agent string. This is ideal for indexing public documentation, help centers, and marketing sites.
REST API source type exists in the type system for custom integrations and will be wired in a future release.
Sync Schedules
Every source can be configured with a sync cadence: real-time (webhook-driven for SaaS apps), hourly polling, daily full sync at midnight UTC, weekly full sync on Monday midnight UTC, or manual trigger-only. This ensures your knowledge base stays current without constant manual intervention.
12 Retrieval Blueprints: From Simple to Agentic
Blueprints are research-backed retrieval strategy templates. Instead of forcing users to tweak low-level parameters like chunk size and similarity metrics, Rylvo offers blueprints that encapsulate proven approaches from the academic literature.
Eight blueprints are fully implemented and operational today. Four more are planned for post-v1 releases.
Hybrid Search (Default, Beginner)
Hybrid Search combines keyword precision with semantic fuzziness. It uses semantic chunking, OpenAI embeddings, and a dual-index approach: BM25 for keyword matching plus dense vectors for semantic matching. Results are fused using Reciprocal Rank Fusion and optionally reranked with Cohere. Best for general-purpose knowledge bases with mixed content types. Benchmark: 180ms latency, 87% relevance, $0.0003 per query.
Classic RAG (Beginner)
The simplest blueprint. Fixed-size chunking, embeddings, similarity search, and optional reranking. Best for straightforward FAQ knowledge bases where queries closely match document content. Benchmark: 120ms latency, 78% relevance, $0.0002 per query.
Parent-Child Retrieval (Beginner)
Designed for long technical manuals and books. Large parent chunks (2048 tokens) retain full context, while small child chunks (256 tokens) are embedded for matching. When a child chunk matches, its parent chunk is returned for context. Best for documents where surrounding context matters. Benchmark: 130ms latency, 85% relevance, $0.0002 per query.
HyDE — Hypothetical Document Embeddings (Intermediate)
For vague or exploratory queries where users do not know the exact terminology. The system uses an LLM to generate a hypothetical answer to the query, embeds that hypothetical answer, and searches for similar documents. Best for research and discovery use cases. Benchmark: 320ms latency, 88% relevance, $0.0008 per query.
Contextual Retrieval (Intermediate, Anthropic-style)
Prepends a document-level LLM-generated context summary to every chunk before embedding. This helps chunks that reference external concepts or cross-references within a document. Best for dense technical documentation. Benchmark: 350ms latency, 91% relevance, $0.001 per query.
Self-RAG (Advanced)
For domains where hallucination tolerance is zero — medical, legal, and financial applications. After retrieving chunks, an LLM grader judges each chunk as supportive, partial, or irrelevant. Only supportive chunks are injected into the bot's context. Irrelevant chunks are discarded. Benchmark: 480ms latency, 94% relevance, $0.0015 per query.
Corrective RAG — CRAG (Advanced)
For knowledge bases with known gaps or rapidly changing information. After retrieval, a judge LLM evaluates whether the local knowledge base has sufficient information to answer the query. If the retrieval is correct, local chunks are used. If ambiguous, the system supplements with web search. If incorrect, it falls back entirely to web search. Best for domains where your KB may be incomplete. Benchmark: 520ms latency, 90% relevance, $0.002 per query.
Agentic RAG (Advanced)
For complex research questions requiring synthesis across many documents. An LLM agent iteratively searches, filters, and synthesizes across up to three hops of reasoning. Each hop can refine the query based on findings from previous hops. Best for deep research and multi-document synthesis. Benchmark: 800ms latency, 92% relevance, $0.005 per query.
Coming Soon
RAPTOR uses tree-organized retrieval with hierarchical summarization. Graph RAG uses entity extraction and knowledge graph traversal. ColBERT uses token-level late interaction matching. Multi-Vector Retrieval uses dual embeddings per document. Custom will allow users to define their own pipeline DAGs through a visual editor.
Runtime Retrieval: How Bots Ground Answers
When a user sends a message to a bot with Knowledge Base connections, the retrieval orchestrator runs on every turn.
First, it loads all active KB connections linked to that bot. For each connection, it runs the connection's blueprint runner with the user's query as input. Each blueprint performs its specific retrieval logic: embedding the query, searching the vector store, applying reranking or grading, and returning the most relevant chunks.
Results from multiple connections are merged using a round-robin strategy so that small connections are not drowned out by large ones. The total number of injected chunks is capped at 8 by default to preserve the LLM's context window.
The retrieved chunks are formatted into a fenced CONTEXT block and injected into the system prompt at a specific position: after identity and primary prompts, after other prompts, but before guardrail rules and connector awareness. This ensures the bot uses grounded knowledge as its primary reference while still applying safety rules and tool awareness.
The entire retrieval process is observable in the chat trace, which records the connection names, chunk counts, retrieval latency, and threshold status for every conversation turn.
The 6-Tab Dashboard: Complete KB Management
The Knowledge Base dashboard at /dashboard/knowledge-base provides six tabs for managing every aspect of your knowledge infrastructure.
Connections Tab: Active Knowledge Bases
The Connections tab is your operational view of active knowledge bases. It shows connection cards with document count, chunk count, query count, and average latency. Status filter pills show active, provisioning, paused, and error states. Click any connection to expand a detail panel with full pipeline configuration, performance stats, linked bots picker, auto-tune suggestions, and action buttons for pause, resume, reindex, and delete. Each connection has a copyable connection string in the format rylvo://{orgId}/{connectionId}?blueprint={slug}.
Sources Tab: Data Source Management
The Sources tab manages your data source definitions and credentials. Create new sources by category: Upload, Cloud Storage, SaaS, Database, or APIs and Web. Per-source actions include Upload (for upload types), Connect or Reconnect (for OAuth), Test (to verify credentials), Sync or Refresh Stats, Browse Documents, and Delete. Live sync status, last sync error, and next sync time are displayed for each source.
Blueprints Tab: Retrieval Strategy Gallery
The Blueprints tab is a gallery of all 12 blueprints with complexity badges (Beginner, Intermediate, Advanced). Search and filter by complexity. Click any blueprint to see its description, research paper link, tags, recommended use cases, benchmark estimates, and a full pipeline DAG visualization showing how data flows from chunker to embedder to indexer to retriever to reranker. Coming-soon blueprints are visible but clearly marked.
Playground Tab: Test Before You Deploy
The Playground tab lets you test queries against live connections before linking them to production bots. Enter a query, select a connection, and see the retrieved chunks with relevance scores, source attribution, and page numbers. Latency is displayed for each query. Save queries to history for comparison and documentation. This is essential for validating that a connection actually retrieves relevant information before customers see it.
Performance Tab: Metrics and Insights
The Performance tab shows aggregate statistics across all connections, per-connection latency trends, and relevance score trends. It helps you identify which connections are fast, which are slow, and which are retrieving high-quality results.
Settings Tab: Global Configuration
The Settings tab manages global KB configuration: embedding key status (with a warning banner if the OpenAI key is missing), connection-wide toggles for auto-tune and default chunk sizes, and wipe or recompute options for recovering from indexing issues.
Security and Multi-Tenancy
Rylvo Knowledge Base is designed for multi-tenant, enterprise-grade deployments.
Credential Security — All source credentials are stored encrypted using AES-256-GCM via MCP Vault. Source configuration documents only store a credentialId pointer, never the actual secret. Even if Firestore were compromised, credentials remain encrypted.
Multi-Tenancy — All vectors are stored in a single shared Qdrant collection with strict payload filtering. Every chunk carries orgId, connectionId, and documentId fields. Retrieval queries always include these filters, ensuring organizations can never see each other's data.
Firestore Rules — Org-level isolation is enforced through Firestore security rules for all KB collections: kbSources, kbBlueprints, kbConnections, kbPlaygroundQueries, kbDocuments, and kbIngestJobs.
File Storage — Uploaded files are stored in Firebase Storage at organization-scoped paths: gs://<bucket>/orgs/{orgId}/sources/{sourceId}/{docId}/{filename}.
Auto-Tune: AI-Powered Blueprint Recommendations
Each connection has an auto-tune block that tracks query performance and can suggest a better blueprint. When auto-tune is enabled, the system analyzes retrieval latency, relevance scores, and query patterns over time. If it detects that a different blueprint would significantly improve results, it suggests the alternative with an estimated improvement percentage and a reason.
For example, if a Classic RAG connection consistently shows low relevance scores on vague queries, auto-tune might suggest switching to HyDE with an estimated 10% improvement. The operator can review the suggestion and apply it with one click.
Auto-tune turns blueprint selection from a one-time guess into an ongoing optimization process.
Bot Integration: Link, Patch, Ground
Connecting a knowledge base to a bot is seamless and atomic.
In the Connections tab detail panel, operators select which bots should have access to the connection. Checking or unchecking a bot persists via an update to the connection document, then triggers a re-patch of every affected bot's response_composer prompt with a fresh KB citation block. The patcher is idempotent and creates a new prompt version with a full audit trail.
Deleting a connection also triggers re-patching on formerly linked bots, stripping the KB block if no other connections remain. This ensures bots never reference knowledge bases that no longer exist.
At runtime, the bot chat page loads the bot's active KB connections, retrieves context on every turn, and includes the retrieved chunks in the chat trace so operators can see exactly what knowledge grounded each answer.
Comparison: Rylvo KB vs. Basic File Upload
| Capability | Basic File Upload | Rylvo Knowledge Base |
|---|---|---|
| Source types | Single file upload | 15 types: upload, S3, GCS, web crawl, SaaS, database, API |
| Retrieval strategy | Simple similarity search | 12 blueprints from Hybrid Search to Agentic RAG |
| Chunking | Fixed size only | Semantic or fixed-size, with blueprint-specific hooks |
| Embeddings | Hardcoded model | BYOK: OpenAI, Cohere, Voyage |
| Vector store | None or single-tenant | Multi-tenant Qdrant with payload filtering |
| Credential security | Plain text or env vars | AES-256-GCM encrypted via MCP Vault |
| Sync scheduling | Manual only | Real-time, hourly, daily, weekly, or manual |
| Incremental sync | Full re-ingest | Skip unchanged objects by hash and timestamp |
| Query playground | None | Live testing with relevance scores and source attribution |
| Auto-tune | None | AI-powered blueprint recommendations |
| Performance metrics | None | Per-connection latency, relevance, query count |
| Bot linking | Manual prompt editing | Atomic diff + prompt re-patching with audit trail |
| Multi-source merge | Not supported | Round-robin merge across connections |
| Document status tracking | None | Queued → parsing → chunking → embedding → indexed → failed |
Getting Started
Step 1: Open the Knowledge Base Dashboard
Navigate to /dashboard/knowledge-base. The Connections tab shows any existing knowledge bases.
Step 2: Create Your First Source
Go to the Sources tab and click Connect Source. For your first knowledge base, choose Upload. Name it (e.g., "Product Documentation") and save.
Step 3: Upload Documents
Click Upload on your source card and drop files: PDFs, DOCX files, TXT files, or Markdown. Watch the Documents Drawer for live status updates as each file progresses through parsing, chunking, embedding, and indexing.
Step 4: Create a Connection
Go to the Connections tab and click New Knowledge Base. The 3-step wizard guides you through: Step 1 selects your source, Step 2 picks a blueprint (start with Hybrid Search for general use), Step 3 names the connection, configures chunk size and top-K, and links it to your bot.
Step 5: Test in the Playground
Go to the Playground tab. Select your connection and ask a question related to your uploaded documents. Review the retrieved chunks, relevance scores, and latency. If results look good, your knowledge base is ready.
Step 6: Verify Bot Integration
Open your bot's chat page and ask a question that requires document knowledge. The bot's answer should reference your uploaded content. Check the chat trace to confirm KB chunks were retrieved.
FAQ
What is Rylvo Knowledge Base? A production-grade retrieval-augmented generation (RAG) platform that ingests organizational documents, embeds them into vectors, and retrieves the most relevant snippets at bot query time to ground answers in factual data.
What source types are supported? 15 types across 5 categories: Upload, Cloud Storage (S3, GCS, Azure Blob), SaaS (Notion, Confluence, Google Drive, SharePoint, Zendesk, Intercom, Slack), Databases (PostgreSQL, MySQL, MongoDB), and APIs and Web (REST API, Web Crawl).
Which source types are available today? Upload, Amazon S3, Google Cloud Storage, and Web Crawl are fully operational. SaaS OAuth flows are scaffolded with ingest adapters in progress. Database and REST API adapters are planned.
What are blueprints? Research-backed retrieval strategy templates that encapsulate proven approaches from academic literature. Each blueprint defines a specific pipeline for chunking, embedding, indexing, retrieving, and reranking.
Which blueprints are available? Eight: Hybrid Search, Classic RAG, Parent-Child Retrieval, HyDE, Contextual Retrieval, Self-RAG, CRAG, and Agentic RAG. Four more (RAPTOR, Graph RAG, ColBERT, Multi-Vector) are planned.
How does the ingestion pipeline work? Parse (extract text) → Chunk (split intelligently) → Embed (convert to vectors) → Index (store in Qdrant). The pipeline is idempotent and supports incremental sync.
How are credentials secured? All source credentials are encrypted with AES-256-GCM via MCP Vault. Source configs only store credentialId pointers, never actual secrets.
How is data isolated between organizations? Multi-tenant payload filtering in Qdrant ensures strict org-level isolation. Firestore security rules enforce isolation for all KB collections.
How do I link a knowledge base to a bot? In the Connections tab detail panel, check the bots you want to link. The system automatically patches each bot's prompt with a KB citation block and creates a new prompt version.
What is auto-tune? An AI-powered recommendation engine that tracks query performance and suggests better blueprints with estimated improvement percentages.
Can I test queries before deploying? Yes. The Playground tab lets you test queries against live connections and review retrieved chunks with relevance scores.
How do I keep my knowledge base current? Configure sync schedules per source: real-time (webhook), hourly, daily, weekly, or manual trigger-only.
Ready to Ground Your Bots in Real Knowledge?
Rylvo Knowledge Base transforms your AI bots from generic language models into domain experts that answer from your actual documents. Twelve retrieval blueprints give you the right strategy for every use case, from simple FAQ to complex multi-hop research. Fifteen source types connect to every corner of your data ecosystem. The 6-tab dashboard makes management intuitive. Encrypted credentials and multi-tenant isolation keep everything secure. Auto-tune optimizes performance over time. And seamless bot integration means your knowledge goes live with a single checkbox.
Open Knowledge Base and create your first connection today.
Ground every answer. Cite every source. Know everything.
