Rylvo, Inc.

Rylvo Traces: Complete AI Observability for Every Bot Conversation

When your AI bot handles thousands of conversations across API endpoints, embedded widgets, WhatsApp, Telegram, and the dashboard playground, how do you know what is actually happening? How do you find the conversation where a guardrail fired unexpectedly? How do you trace the exact tool call that failed? How do you know which customer had a bad experience, which model consumed the most tokens, or which prompt version produced the slowest responses?

Traditional logging gives you text dumps. Traditional analytics gives you aggregate counts. Neither tells you the full story of a single conversation turn — the LLM call, the tool execution, the guardrail check, the cost, the latency, and the customer identity — all in one place.

Rylvo Traces is a Langfuse-class observability layer built natively into the Rylvo platform. It captures, ingests, queries, replays, and analyzes every AI interaction across every channel your bots touch. From a single dashboard at /dashboard/traces, you can inspect any conversation in surgical detail, replay its full observation tree, filter by customer or channel, track costs and tokens, export to datasets, and even ingest external traces from your own systems.

In this guide, we will explore the complete Traces system: its canonical data model, the ten types of observations it captures, its multi-channel architecture, customer identity resolution, the dashboard UI, cost and token analytics, OpenTelemetry compatibility, and how it transforms raw conversation data into actionable operational intelligence.

Why AI Observability Matters More Than Ever

Deploying an AI bot is no longer a one-time setup. Modern bots use multiple LLM models, call external tools through MCP servers, retrieve knowledge from vector databases, apply guardrails for safety, and serve users across a dozen different channels. Each conversation turn can involve a chain of operations: input validation, prompt assembly, LLM generation, tool selection, tool execution, output filtering, guardrail evaluation, response formatting, and delivery.

When something goes wrong — a hallucination, a slow response, a tool failure, a guardrail block, or a frustrated user — you need to reconstruct the entire chain of events to understand why. Without observability, you are debugging in the dark. You might see that response times increased, but you cannot tell whether it was the model, the tool, the retrieval step, or the guardrail causing the delay. You might see that customer satisfaction dropped, but you cannot find the specific conversations where the bot failed.

Rylvo Traces solves this by treating every conversation turn as a first-class trace with a full nested observation tree. Every LLM call is a generation observation. Every tool execution is a tool observation. Every guardrail check is a guardrail observation. Every knowledge retrieval is a retriever observation. All are linked to the parent trace, timestamped, cost-attributed, and queryable.

The Three-Layer Canonical Model: Trace, Observation, Score

Rylvo Traces uses a canonical observability model inspired by Langfuse but built for Rylvo's multi-channel, multi-bot architecture.

Traces: The Conversation Turn

A trace represents one end-user interaction — a single API request, a dashboard playground chat turn, a WhatsApp message response, an embedded widget interaction, or a Workspace Architect turn. Every trace carries:

A unique trace ID and session/conversation linkage
Bot identity and model used
Customer identity (internal ID, external ID, display name, email, phone)
Channel and surface (API, web widget, WhatsApp, Telegram, dashboard, etc.)
Environment (production, staging, test, development)
Input and output snapshots
Start time, end time, and total latency
Total tokens and total cost in USD
Error flags and error count
Tags and metadata for custom filtering
OpenTelemetry trace ID for cross-system correlation

Observations: The Nested Steps

Inside every trace, observations capture the individual runtime steps. Each observation has a type, a name, input/output snapshots, latency, status, level, and a parent observation ID for nesting. Rylvo supports ten observation types:

Generation — Every LLM call. Captures the provider, model, parameters, input messages, output message, token usage breakdown (input, output, cached, reasoning, audio, image), cost breakdown, finish reason, streaming flag, prompt version link, and raw request/response snapshots. This is where you see exactly what the model received and produced.

Tool — Every tool execution including MCP server calls, webhook connectors, HTTP actions, and internal functions. Captures tool name, provider, server ID, arguments, output, success/failure status, error details, approval requirements, and cost. The full tool lifecycle is tracked from advertisement to result delivery.

Agent — Multi-agent orchestration steps. Captures agent ID, name, role (orchestrator, specialist, architect, verifier, planner), iteration index, handoff decisions, and stop reasons. Used by bot groups and Workspace Architect to trace which agent handled what.

Guardrail — Every guardrail evaluation. Captures guardrail ID, name, phase (input, output, tool input, tool output, retrieval), whether it triggered, severity, action taken (allow, warn, block, rewrite, escalate), and the matched condition. This is essential for debugging why a response was blocked or rewritten.

Retriever — Every knowledge base retrieval. Captures the query, rewritten query, index name, top-K, filters, returned chunks, selected chunks, scores, and document references. You can see exactly which KB sources fed a given answer.

Span — Generic timed operations. Used for phases like prompt compilation, context assembly, or proposal persistence that do not fit the specialized types.

Event — Point-in-time markers. Used for lifecycle events like "tool selected by LLM" or "approval requested" that have no duration.

Chain — Sequential workflow steps. Used for high-level pipeline phases like the main bot runtime turn.

Embedding — Vector embedding operations. Captures model, input text, and output vector metadata.

Evaluator — Evaluation and scoring operations. Captures the evaluator type, criteria, and results.

Scores: Feedback and Evaluation

Scores attach qualitative and quantitative assessments to traces, observations, sessions, or customers. A score has a name, a value (numeric, categorical, boolean, or text), a data type, and a source. Sources include user feedback, customer ratings, operator reviews, LLM-as-a-judge, guardrail results, code evaluations, and automated evals.

Examples include thumbs up/down on a response, a CSAT rating, a hallucination score, a helpfulness score, a guardrail pass/fail result, and an LLM judge correctness rating. Scores are queryable and aggregatable, so you can track average helpfulness per bot or guardrail trigger rate per channel.

Multi-Channel Capture: Every Surface, One Trace Stream

Rylvo Traces does not discriminate by channel. It captures every conversation from every surface your bots serve:

Production API — Every API call to your deployed bots is traced with the full request/response, tool calls, and guardrail checks.

Dashboard Playground — Every test conversation in the bot playground is traced and tagged as environment "test" so you can separate testing traffic from production.

Embedded Web Widget — Every visitor interaction through your embedded chat widget is traced with anonymous visitor identity and web session linkage.

WhatsApp — Every message and response through WhatsApp Business API is traced with phone number identity and WhatsApp thread linkage.

Telegram — Every Telegram bot interaction is traced with Telegram user ID and chat thread linkage.

Slack and Discord — Every workspace message handled by your bot is traced with workspace user identity and channel/thread linkage.

Sample Site — Every interaction on your hosted sample/demo site is traced with cookie-based visitor identity.

Workspace Architect — Every AI-assisted workspace setup turn is traced as a special surface with architect agent observations, tool calls for resource creation, and generation steps.

Background Jobs — Every scheduled or event-driven background AI operation is traced independently.

All traces flow into the same observability system regardless of origin. You can filter by channel, surface, environment, bot, model, or customer in a single unified view.

Customer Identity Resolution: Know Who You Are Talking To

A major differentiator of Rylvo Traces is its built-in customer identity model. Every trace carries identity fields that let you answer "Who was this conversation with?" across all channels:

Customer ID — The internal Rylvo customer record when known
External ID — The caller's own user identifier (e.g., whatsapp:+919876543210, slack:U123456, web_widget:visitor_cookie_id)
Display Name, Email, Phone, Handle — Human-readable contact information
Anonymous Visitor ID — Stable identifier for unknown visitors via cookies or local storage
Channel — The delivery channel (web_widget, whatsapp, telegram, api, etc.)
Platform Metadata — Provider account ID, thread ID, message ID, platform user ID

The identity resolver merges channel identities when strong signals exist (matching email across web widget and API, matching phone across WhatsApp and SMS) while keeping anonymous identities separate until a verified link is established. This means a customer who first chats anonymously on your widget and later messages via WhatsApp can be recognized as the same person.

The Traces dashboard includes a dedicated Users view that groups traces by customer rather than by time. You can see all sessions for a specific user, their worst status across sessions, their last seen time, their preferred channel, and drill into individual conversations.

The Traces Dashboard: Three Views, Infinite Detail

The /dashboard/traces page provides three complementary views for exploring your trace data.

Traces View: The Chronological Feed

A filterable list of every trace with rich metadata at a glance. Each row shows:

Trace ID with one-click copy
Bot name chip
Verification status (pass/warn/fail) with color-coded pills
Execution status (executed/blocked) with color-coded pills
Predicted and final stage
Guardrail evaluation count
Active resources count (prompts, connectors)
Skills injected count
Cost in USD
Latency in milliseconds
Escalated, blocked, error, and expiring-soon badges

Filters include search by trace ID or content, status filters, bot filter, environment mode, date range, and retention window. You can select individual traces or bulk-select all visible traces for batch operations like export or deletion.

Users View: Customer-Centric Exploration

Groups all traces by resolved customer identity. Each user bucket shows:

Display name with initials avatar
Channel indicator icon
Error count badge
Session count and total message count
Last seen relative time

Clicking a user reveals their sessions, ordered by most recent. Each session shows the worst status across its messages, a preview of the last message, the bots involved, duration, and message count. Clicking a session opens a chat-style conversation view with user and assistant message bubbles, timestamps, and status indicators. Clicking any message bubble opens the full trace detail.

Trace Detail and Replay

Click any trace to open the detail drawer. It displays:

Summary cards: status, latency, tokens, cost, model, bot, channel
Full input and output
Active prompts, guardrails, connectors, evolution rules, and skills
MCP servers available and MCP calls executed with disposition details
Guardrail results with trigger status and matched conditions
Knowledge Base retrieval details with connection names, chunk counts, latency, and threshold status
Timeline of events with durations
Operator whispers and Mission Control augmentation flags
Trace comments for team discussion
Dataset export (save this trace to a dataset for testing)
Full canonical replay with the nested observation tree

The replay view reconstructs the full execution flow: the generation observation showing the LLM call details, tool observations showing each tool execution, guardrail observations showing each safety check, and retriever observations showing each knowledge lookup. You can see exactly what happened, in what order, with what latency, at what cost.

Cost and Token Analytics: Know What AI Costs You

Every trace captures detailed cost attribution:

LLM cost — Input and output token costs computed from the model pricing catalog
Infrastructure cost — Platform overhead for the turn
Total cost — Combined LLM + infrastructure cost in USD
Token breakdown — Prompt tokens, completion tokens, total tokens, with per-model pricing

In the traces list, each row shows the computed cost. In the detail view, summary cards break down total cost. Over time, this data feeds into analytics dashboards showing cost per bot, cost per channel, cost per model, and cost trends.

This granularity means you can identify expensive conversations (a bot that repeatedly calls tools, a model with high token usage), optimize prompt length to reduce input tokens, or switch to a cheaper model for specific channels.

Dual-Write Architecture: Legacy and Canonical

Rylvo Traces operates on a pragmatic dual-write architecture during its migration phases:

Legacy Firestore path — Dashboard playground and architect turns write to organizations/{orgId}/chatTraces as ChatTraceDoc records. This preserves backward compatibility for existing widgets and pages.

Canonical observability path — The same turns also produce canonical observability batches via adapters. These batches contain the trace event, generation observation, tool observations, guardrail observations, and score events in the unified model. They are ingested through the Next.js proxy or directly to the FastAPI backend.

Over time, the canonical SQL-backed observability system becomes the primary storage while Firestore serves as read-only legacy data. This migration strategy ensures zero downtime and zero data loss while the system evolves.

Ingestion: Bring Your Own Traces

Rylvo Traces is not limited to internally generated data. It supports two ingestion modes for external telemetry:

Rylvo-native JSON batch API — POST batches of trace, observation, and score events to /v1/observability/ingest. Each batch carries an idempotency key, supports out-of-order arrival (scores before traces are held and linked later), and returns per-event validation results. This is ideal for customer backends that want to push their own AI interaction data into Rylvo for unified analysis.

OpenTelemetry compatible ingestion — POST OTLP/HTTP traces to /v1/observability/otel/v1/traces. Rylvo maps OTEL trace IDs, span IDs, GenAI attributes, user/session metadata, and model usage into the canonical model. This makes migration easy for teams already using OpenTelemetry collectors or Langfuse-compatible endpoints.

Dataset Export, Comments, and Collaboration

Traces are not just for debugging — they are a source of truth for your team:

Save to Dataset — Any trace can be exported to a dataset for test suite creation, regression testing, or evaluation benchmarking. The user message becomes the test input, and the assistant message becomes the expected output.

Trace Comments — Team members can leave comments on individual traces to discuss failures, share findings, or document investigations. Comments appear in the trace detail drawer.

Bulk Operations — Select multiple traces to delete, export, or analyze as a group.

Retention Management — Traces respect your organization's retention policy. Expiring traces are flagged in the UI so you know which data will be purished soon.

Comparison: Rylvo Traces vs. Basic Logging vs. Langfuse

Capability	Basic Logging	Langfuse	Rylvo Traces
Nested observation tree	Flat text only	Full tree	Full tree with 10 observation types
Multi-channel capture	Per-channel silos	Manual SDK integration	Automatic across API, widget, WhatsApp, Telegram, Slack, dashboard
Customer identity	Session IDs only	User ID field	Full identity resolver with cross-channel merging
Cost tracking	None	Token + cost	LLM cost + infrastructure cost with per-trace attribution
Guardrail observations	None	Custom events	First-class guardrail observation type
KB retrieval tracing	None	Custom spans	First-class retriever observation with chunk details
MCP tool tracing	None	Limited	Full tool lifecycle with approval status
Score/feedback system	None	Scores	Scores with 8 source types and 4 data types
OTEL ingestion	None	Yes	Yes with GenAI attribute normalization
External batch ingestion	None	Limited	Full JSON batch API with idempotency
Dataset export	Manual	Limited	One-click save to dataset
Team comments	None	None	Built-in trace comments
Users view	None	None	Customer-centric session grouping
Retention management	Manual	None	Policy-driven with expiry warnings

Getting Started

Step 1: Open the Traces Dashboard

Navigate to /dashboard/traces. The page loads recent traces from both production API traffic and dashboard playground conversations.

Step 2: Explore the Traces List

Browse the chronological feed. Filter by bot, status, or environment. Click any trace to open its detail drawer.

Step 3: Switch to Users View

Click the Users tab to see traces grouped by customer. Search for a specific user, click their name, then explore their sessions and conversation history.

Step 4: Inspect a Failure

Filter by status "fail" or look for red error badges. Click a failed trace, scroll through the observation tree, and identify which step failed — the LLM call, the tool execution, or the guardrail.

Step 5: Track Costs

Sort by cost column or filter for high-latency traces. Identify expensive patterns and optimize.

Step 6: Export to Dataset

Find a trace that represents a good test case. Click "Save to Dataset" to add it to your bot test suite.

Step 7: Ingest External Traces (Optional)

If you have AI interactions from other systems, use the JSON batch API or OTEL endpoint to push them into Rylvo for unified analysis.

FAQ

What is Rylvo Traces? A Langfuse-class observability system that captures, replays, and analyzes every AI interaction across all channels your bots serve. It uses a canonical model of traces, nested observations, and scores.

What observation types are supported? Generation (LLM calls), Tool (MCP/connector/internal), Agent (multi-agent orchestration), Guardrail (safety checks), Retriever (KB lookups), Span, Event, Chain, Embedding, and Evaluator.

Which channels are traced? Production API, dashboard playground, embedded web widgets, WhatsApp, Telegram, Slack, Discord, SMS, email, voice, sample sites, and Workspace Architect turns.

How does customer identity work? Traces carry internal customer ID, external ID (namespaced by channel), display name, email, phone, and anonymous visitor ID. The identity resolver merges identities across channels when strong signals match.

Can I see costs per conversation? Yes. Every trace shows LLM cost, infrastructure cost, and total cost in USD. Token usage is broken down by input and output.

Can I ingest traces from my own system? Yes. Use the Rylvo-native JSON batch API or the OpenTelemetry-compatible endpoint.

What is the difference between Traces view and Users view? Traces view shows a chronological list of all conversation turns. Users view groups traces by customer identity, showing sessions and conversation history per person.

Can I export traces to datasets? Yes. Any trace can be saved to a dataset with one click for use in test suites or evaluation pipelines.

Are trace comments supported? Yes. Team members can leave comments on individual traces for discussion and investigation.

How is retention handled? Traces respect your organization's retention policy. Expiring traces are flagged in the UI, and you can configure automatic cleanup.

Is the system OpenTelemetry compatible? Yes. Rylvo accepts OTLP/HTTP traces and maps them into the canonical model, including GenAI attributes for LLM calls.

Ready to See Every Conversation in Full Detail?

Rylvo Traces gives you the observability that AI operations demand. Nested observation trees reveal exactly what happened inside every conversation turn. Multi-channel capture ensures nothing slips through the cracks. Customer identity resolution turns anonymous traffic into understandable user journeys. Cost and token tracking keeps your AI spend transparent. Dataset export, trace comments, and OpenTelemetry ingestion make Traces not just a debugging tool but the operational center of your AI platform.

Open Traces and inspect your first conversation today.

See everything. Know everything. Fix anything.