Agent Evolution

Your agents get better every day — automatically.

A closed loop that records every run, surfaces failure clusters, lets four specialized reviewers analyze them, generates versioned rules, and deploys them rollback-safe into the very next conversation.

No silent regressions

Every rule carries applied / success / failure counters. If a rule hurts, it rolls back — automatically.

Humans decide what ships

Auto-promotion is a setting, not a default. Reviewers and operators stay in the loop for regulated bots.

Audit-grade history

Every rule change is versioned with diffs, author, and timestamp. Rollback to any point in seconds.

How it works

Run. Log. Detect. Review. Learn. Improve.

The same loop runs continuously in the background of every bot. Nothing is hidden, nothing is hand-coded, and nothing ships without a trail.

The Evolution Loop

6 stages · continuous
01

Run

Every conversation is a learning signal

02

Log

Full-fidelity trace, not a sampled metric

03

Detect

Failure clusters surface without a human watching

04

Review

Four specialized reviewers — not one opinion

05

Learn

Insights become versioned rules

06

Improve

Measured, rollback-safe deployment

loop
01 · stage

Run

Every conversation is a learning signal

Each bot run executes with the current set of active rules injected into its system prompt, tool filters, and output checks. Nothing is hidden — every decision is observable from the moment the bot replies.

How it works

  • Active rules auto-injected via agentHooks runtime
  • Per-bot rule catalog keeps tenants isolated
  • No manual prompt engineering to ship a fix
02 · stage

Log

Full-fidelity trace, not a sampled metric

Every run is recorded end-to-end: messages, tool calls, guardrail results, latency, cost, and outcome. The same trace powers debugging, compliance, and the reviewer pipeline.

How it works

  • Stored per-bot with bot_id, trace_id, and org scope
  • Messages + tool_calls + guardrail checks preserved
  • Success, failure, and timeout classification
03 · stage

Detect

Failure clusters surface without a human watching

A detection pass continually scans recent runs for failure signals — timeouts, guardrail blocks, repeated escalations, sentiment drops, and behavioral loops. Matches are clustered into failure patterns with affected-run counts.

How it works

  • Timeouts, guardrail triggers, repeat escalations
  • Frustration / urgency sentiment keywords
  • Loop detection and repetition patterns
  • Automatically grouped into failure clusters
04 · stage

Review

Four specialized reviewers — not one opinion

When a failure cluster crosses its trigger threshold, four LLM reviewers independently analyze it. Each focuses on a single axis — trust, outcome, safety, efficiency — producing insights with confidence scores.

How it works

  • Trust — did the bot follow policy and stay in scope?
  • Outcome — did the user actually get resolution?
  • Safety — any guardrail near-miss or risk exposure?
  • Efficiency — wasted tool calls or bloated context?
05 · stage

Learn

Insights become versioned rules

The rule generator deduplicates insights across reviewers (Jaccard similarity), scores confidence on a weighted formula, and proposes candidate rules. High-confidence candidates can auto-promote; everything else waits for a human reviewer.

How it works

  • Jaccard de-duplication across reviewer insights
  • Confidence = insight + reviewer agreement + pattern frequency
  • Auto-promote only when above org threshold
  • Max-rule limit evicts lowest-effectiveness active rule
06 · stage

Improve

Measured, rollback-safe deployment

Active rules ship into the next run automatically. Every application is tracked for success and failure, effectiveness is recomputed continually, and underperforming rules are auto-rolled-back with full version history preserved.

How it works

  • Applied / success / failure counters per rule
  • Auto-rollback when failureRate > threshold (≥5 runs)
  • Version history keeps every rule edit + diff
  • Rollback to any prior version from one click

The reviewer panel

Four specialists, one clustered failure.

Trust Reviewer

Asks: Did the bot respect policy, tone, and scope constraints for this org and workflow?

Flags: Policy violations, tone drift, out-of-scope commitments.

Outcome Reviewer

Asks: Did the user actually get resolution, or did the run stall, escalate, or confuse?

Flags: Unresolved intents, missed handoffs, incomplete actions.

Safety Reviewer

Asks: Were any guardrails near-missed or sensitive topics mishandled?

Flags: Near-miss guardrail, PII leakage, crisis-language gaps.

Efficiency Reviewer

Asks: Were tool calls, context, and latency used economically for this outcome?

Flags: Redundant tool calls, context bloat, unnecessary retries.

Rule lifecycle

Every rule has state, history, and a rollback path.

01

Proposed

Produced by reviewers, not yet scored.

02

Candidate

Deduplicated, scored, awaiting promotion.

03

Active

Injected into runs, effectiveness tracked.

04

Underperforming

Effectiveness below threshold — rollback queued.

05

Disabled

Auto- or manual-rolled-back. Full history retained.

Answers

What teams usually ask

Can I turn off auto-promotion?

Yes — every bot has its own Self-Evolving config. You can require manual approval before a candidate rule goes live, and even disable rule generation entirely for regulated workloads.

What happens if a rule makes the bot worse?

We track applied, success, and failure counters on every rule. Once a rule has been applied five or more times and its failure rate crosses the org threshold, it's automatically disabled. The full version history stays intact so you can inspect, edit, or revert.

Is the learning tenant-isolated?

Yes. Rules are scoped to the bot and the organization. One tenant's failure patterns never leak into another tenant's prompt.

How does this work with Mission Control?

Mission Control supervises live conversations in real time. Agent Evolution turns their after-the-fact decisions — approvals, interventions, overrides — into durable rules your bots apply on the next run.

Ready when you are

Ship an agent that keeps getting better.

Turn on Agent Evolution for any bot during setup, or flip it on for an existing bot from the Evolution dashboard. Reviewers, versioning, and rollback are on the house.