Traces + Analytics
See everything your botsdo.
Traces are on by default — every run is captured the moment you send it. Analyze patterns, spot regressions, and prove compliance without building dashboards.
Nothing sampled at capture
Every turn stores the full replay payload - prompts, tool traces, guardrail evaluations, timeline events. The analytics screen samples for speed; the archive keeps the truth.
One pane for health + debug
Traces answer “what happened in that one run?”; Analytics answers “what’s happening across all of them?”. Same filters, same environment split, same bot scope - zero context switching.
Replay + export built in
Open any trace to a full timeline. Copy it to a plain-text report your on-call engineer or a language model can triage in seconds.
One pipeline, two surfaces
Run. Capture. Store. Inspect. Aggregate. Replay.
The same data model powers the Trace Inspector and the Analytics dashboard. You never have to wonder whether a chart and a trace disagree - they don’t.
Trace waterfall · one turn, every span
run · capture · inspect · replayCapture
every field, every call
Store
trace + replay payload
Inspect
search · filter · split
Replay
export to any LLM
Traces
Full-fidelity record for every turn.
Routing, verification, tools, evidence, and timing - captured with the same schema for dashboard test runs and production API traffic.
What every trace captures
full-fidelity · no samplingRouting decisions
- predicted_stage
- final_stage
- selected_action_id
- selected_action_class
- router_mode
- router_version
Verification
- verification_outcome (pass / warn / fail)
- verification_reason_codes
- verification_checks_run
- verification_failures
- verification_warnings
Escalation
- should_escalate
- escalation_target
- escalation_reason_codes
- safety_override_applied
Tool calls
- tool_call_count
- tool_name per call
- success / error
- latency_ms per tool
- full output payload
Evidence & retrieval
- selected_evidence_ids
- evidence_count
- retrieval hits
- KB connection used
Performance
- latency_ms
- started_at / completed_at
- execution_status
- has_errors
- errors[]
Trace Inspector - timeline replay
passTrace ID
trc_9f82…b14
Bot
support-bot
Stage → Action
investigation → refund.issue
Verification
PASS · no reason codes
Latency
1,248 ms
Tokens
prompt 412 · completion 196 · total 608
Timeline (8 events)
stage.classified
Stage predicted with component + version - inspect reason codes.
action.selected
Selected action + class; confidence and verifier inputs saved.
tool.invoked
Tool name, inputs, success, latency, and full JSON output.
guardrail.evaluated
Guardrail name, type, action, and whether the condition matched.
guardrail.triggered
Which guardrail blocked or rewrote - full context preserved.
mcp.called
MCP server, tool, disposition (approved / denied / auto), cost, latency.
missionControl.augmented
Operator whispers, pauses, takeovers - recorded in the turn.
response.composed
Final response text + tokens used (prompt, completion, total).
The Traces workspace
Search across everything
Instant substring search over trace_id, session_id, final_stage, action ID, and bot name - no query language to learn.
Verification + bot + environment filters
Narrow to pass / warn / fail, filter to a single bot, and flip between Test and Prod traffic on one page.
Date-range scoping
From/To pickers scope the list to any window - debug a specific incident or review yesterday’s runs.
Split-pane inspector
Pick a row on the left, see the full replay on the right - summary, verification, escalation, tools, timeline.
Full decision replay
Every trace opens to the exact stage classifier outputs, action selection, verification checks, tool results, and final response.
Copy trace for LLM review
One-click export builds a BOT TRACE REPORT (performance, guardrails, resources, Mission Control augmentations) - paste it into any model for root-cause analysis.
Analytics
Workflow health at a glance.
Verification trends, latency distribution, top stages, action mix, and reason-code buckets - with a bot filter and environment split on the same page.
Ten KPIs on one screen
workflow healthTotal Runs
with sampled window size
Pass
verification passed
Warn
verification warnings
Fail
verification failed
Blocked
policy or guardrail block
Avg Latency
mean over runs
Escalated
handed to human
Executed Actions
action was taken
Safety Overrides
policy override fired
Avg Events / Run
pipeline density
Six built-in charts
trends · distributions · reason codesVerification Trend
Last 40 runs as vertical bars, colored by outcome. Spot regressions the moment they appear.
Strip chart · pass / warn / fail
Verification Distribution
Donut chart of pass vs warn vs fail across the sampled window.
Donut · three segments
Latency Distribution
Bars for <100ms, 100–250ms, 250–500ms, 500ms–1s, and >1s so you can see the shape, not just the mean.
Histogram · 5 buckets
Top Stages
Which workflow stages dominate traffic - useful for capacity planning and prompt tuning.
Horizontal bars · top N
Top Actions
The actions your bot actually takes most often, ranked.
Horizontal bars · top N
Reason Codes
Verifier and escalation reason-code buckets - go from “what failed” to “why it failed” in one click.
Bucket list · two rails
Dashboard widgets - glance-level signals
Quick Stats
Total traces, plan usage %, avg latency, pass rate, guardrails triggered, inputs blocked.
Request Volume (7d)
Daily bar chart of traffic across the last seven days, by weekday.
Latency gauge
Average, P95, and P99 latency - so a spike can’t hide behind the mean.
Model Usage
Top 6 models by call count, with total tokens per model.
Verification Health
Stacked bar + tile grid for pass / warn / fail at a glance.
Recent Traces
Eight most recent traces, clickable straight into the inspector.
Test traffic never pollutes production metrics.
Every dashboard bot-chat run is tagged with environment: "test" and source: "dashboard". Real API calls are tagged prod. Flip the environment pill on the Traces page or the Analytics page to split them, combine them, or isolate either one - stats recompute locally so there’s no waiting.
What you gain
- Ship test iterations without distorting production dashboards
- Compare test vs prod pass rates before you go live
- Keep a clean audit trail per environment, per bot
- Deep-link to any trace with ?traceId= - shareable with engineers or LLMs
Answers
What teams usually ask
Is every run recorded, or just a sample?
Every run is captured end-to-end - messages, tool calls, guardrail evaluations, verification outcomes, tokens, and latency. The dashboard’s analytics view samples the most recent runs for speed, but the underlying trace store keeps the full record for replay and audit.
Can I separate test traffic from production?
Yes. Dashboard bot-chat runs are tagged with environment: "test" and source: "dashboard"; real API calls are environment: "prod". Both the Traces page and the Analytics page let you flip between All, Test, and Prod so test noise never pollutes your production metrics.
What’s the difference between a trace and an analytics row?
A trace is one complete decision - every input, every tool call, every guardrail, and the final response. Analytics aggregates many traces: distributions, reason codes, stage counts, latency buckets, model usage. You move between them with one click; every analytics bar drills into the traces behind it.
Can I hand a trace to an LLM for root-cause analysis?
Yes. The Copy Trace button produces a plain-text BOT TRACE REPORT with performance, flags, guardrails, resources, evolution rules, and any Mission Control activity. Paste it into any model and ask “why did this fail?” - no schema knowledge required.
How does this tie into Mission Control and Agent Evolution?
Every trace carries the Mission Control whispers and takeovers that shaped it, plus any Agent Evolution rules that were applied. When a reviewer promotes a rule or an operator intervenes live, those signals become searchable fields on the next trace and inputs to the next analytics window.
Can I pipe traces into my own tooling?
Runs are fetched from the company runs API (GET /v1/analytics/companies/{id}/runs) and each trace has a detail endpoint (GET /v1/traces/{id}/replay). Wire those into your SIEM, warehouse, or on-call tools directly.
Deep dive
What “real” LLM observability requires
Full-fidelity capture, not sampled traces
The trace records every field on every turn — prompt, retrieved chunks, tool arguments and returns, verifier output, token counts, latency per span. No 10% sampling, no “we couldn't reproduce that incident.” If it happened in production, it's in the trace.
Search by what actually happened
Filter on bot, environment, verdict, latency band, token cost, tool called, KB chunk cited, or any structured field on the trace. Find the 17 sessions where the verifier flagged a hallucination last Tuesday — without writing a single query against your data warehouse.
Replay against any model, with the same context
Every trace is a runnable replay payload. Pin a model, swap a prompt block, change a tool — and re-run the exact same scenario to see the new behavior side-by-side. This is how teams ship model upgrades with evidence instead of crossed fingers.
Inspector and analytics share one pipeline
The same trace that powers the inspector also powers cohorts, funnels, SLA dashboards, and cost trends. You never have to wonder whether your aggregate chart and your debugging view agree — they read the same source of truth, so the numbers never lie to you in a meeting.
Ready to inspect?
Ship an agent you can actually see into.
Traces are on by default - every run is captured the moment you send it. Analytics starts aggregating from the first call, and the Inspector opens to a full replay with one click.
