A closed loop that records every run, surfaces failure clusters, lets four specialized reviewers analyze them, generates versioned rules, and deploys them rollback-safe into the very next conversation.
No silent regressions
Every rule carries applied / success / failure counters. If a rule hurts, it rolls back — automatically.
Humans decide what ships
Auto-promotion is a setting, not a default. Reviewers and operators stay in the loop for regulated bots.
Audit-grade history
Every rule change is versioned with diffs, author, and timestamp. Rollback to any point in seconds.
How it works
The same loop runs continuously in the background of every bot. Nothing is hidden, nothing is hand-coded, and nothing ships without a trail.
The Evolution Loop
6 stages · continuousEvolution Core
Run
Every conversation is a learning signal
Log
Full-fidelity trace, not a sampled metric
Detect
Failure clusters surface without a human watching
Review
Four specialized reviewers — not one opinion
Learn
Insights become versioned rules
Improve
Measured, rollback-safe deployment
Every conversation is a learning signal
Each bot run executes with the current set of active rules injected into its system prompt, tool filters, and output checks. Nothing is hidden — every decision is observable from the moment the bot replies.
How it works
Full-fidelity trace, not a sampled metric
Every run is recorded end-to-end: messages, tool calls, guardrail results, latency, cost, and outcome. The same trace powers debugging, compliance, and the reviewer pipeline.
How it works
Failure clusters surface without a human watching
A detection pass continually scans recent runs for failure signals — timeouts, guardrail blocks, repeated escalations, sentiment drops, and behavioral loops. Matches are clustered into failure patterns with affected-run counts.
How it works
Four specialized reviewers — not one opinion
When a failure cluster crosses its trigger threshold, four LLM reviewers independently analyze it. Each focuses on a single axis — trust, outcome, safety, efficiency — producing insights with confidence scores.
How it works
Insights become versioned rules
The rule generator deduplicates insights across reviewers (Jaccard similarity), scores confidence on a weighted formula, and proposes candidate rules. High-confidence candidates can auto-promote; everything else waits for a human reviewer.
How it works
Measured, rollback-safe deployment
Active rules ship into the next run automatically. Every application is tracked for success and failure, effectiveness is recomputed continually, and underperforming rules are auto-rolled-back with full version history preserved.
How it works
The reviewer panel
Four specialists, one clustered failure.
Trust Reviewer
Asks: Did the bot respect policy, tone, and scope constraints for this org and workflow?
Flags: Policy violations, tone drift, out-of-scope commitments.
Outcome Reviewer
Asks: Did the user actually get resolution, or did the run stall, escalate, or confuse?
Flags: Unresolved intents, missed handoffs, incomplete actions.
Safety Reviewer
Asks: Were any guardrails near-missed or sensitive topics mishandled?
Flags: Near-miss guardrail, PII leakage, crisis-language gaps.
Efficiency Reviewer
Asks: Were tool calls, context, and latency used economically for this outcome?
Flags: Redundant tool calls, context bloat, unnecessary retries.
Rule lifecycle
Every rule has state, history, and a rollback path.
Proposed
Produced by reviewers, not yet scored.
Candidate
Deduplicated, scored, awaiting promotion.
Active
Injected into runs, effectiveness tracked.
Underperforming
Effectiveness below threshold — rollback queued.
Disabled
Auto- or manual-rolled-back. Full history retained.
Answers
Can I turn off auto-promotion?
Yes — every bot has its own Self-Evolving config. You can require manual approval before a candidate rule goes live, and even disable rule generation entirely for regulated workloads.
What happens if a rule makes the bot worse?
We track applied, success, and failure counters on every rule. Once a rule has been applied five or more times and its failure rate crosses the org threshold, it's automatically disabled. The full version history stays intact so you can inspect, edit, or revert.
Is the learning tenant-isolated?
Yes. Rules are scoped to the bot and the organization. One tenant's failure patterns never leak into another tenant's prompt.
How does this work with Mission Control?
Mission Control supervises live conversations in real time. Agent Evolution turns their after-the-fact decisions — approvals, interventions, overrides — into durable rules your bots apply on the next run.
Ready when you are
Turn on Agent Evolution for any bot during setup, or flip it on for an existing bot from the Evolution dashboard. Reviewers, versioning, and rollback are on the house.