Skip to main content

Harness Evolution Trace Learning

Self-Improving Harness Evolution: Learn from past runs, generate evidence-backed playbooks, and accelerate future evolutions.

Overview

Harness Evolution's Trace Learning feature enables the system to learn from its own execution history, automatically detecting patterns in successful runs and distilling them into reusable playbooks. This creates a self-improvement loop where each harness evolution run makes the next one smarter.

Key Concepts

Evolution History

Every routa harness evolve --apply run records rich execution context to docs/fitness/evolution/history.jsonl:

{
"timestamp": "2026-04-06T01:29:43Z",
"sessionId": "abc-123", // Links to agent traces
"taskType": "harness_evolution",
"workflow": "bootstrap", // Auto-inferred
"trigger": "manual",
"gapsDetected": 2,
"gapCategories": ["missing_governance_gate", "missing_execution_surface"],
"changedPaths": [".github/CODEOWNERS", "docs/harness/build.yml"],
"patchesApplied": ["patch.create_codeowners", "bootstrap.synthesize_build_yml"],
"patchesFailed": [],
"successRate": 1.0
}

Pattern Detection

The learning algorithm analyzes historical runs to find recurring patterns:

  1. Group by gap patterns - Which gap categories appear together?
  2. Filter successful runs - success_rate ≥ 80%
  3. Find consensus - Patterns appearing 3+ times
  4. Extract strategies - Preferred patch order, common file changes

Playbooks

Generated playbooks capture proven strategies with full provenance:

{
"id": "harness-evolution-missing-governance",
"taskType": "harness_evolution",
"confidence": 0.95,
"strategy": {
"preferredPatchOrder": [
"patch.create_codeowners",
"patch.create_dependabot"
],
"gapPatterns": ["missing_governance_gate"],
"antiPatterns": [
{
"doNot": "skip ratchet enforcement",
"reason": "Caused fitness regression in 2/5 runs"
}
]
},
"provenance": {
"sourceRuns": [
"2026-04-06T01:29:43Z",
"2026-04-06T02:15:22Z",
"2026-04-07T10:30:15Z"
],
"successRate": 0.95,
"evidenceCount": 3
}
}

Usage

Phase 1: Generate Evolution History

Run harness evolution multiple times to build a learning dataset:

# Bootstrap multiple repos
for repo in repo1 repo2 repo3; do
cd $repo
routa harness evolve --bootstrap --apply
done

# Or run on the same repo after making changes
routa harness evolve --apply

Each run appends to docs/fitness/evolution/history.jsonl.

Phase 2: Learn from History

After 3+ successful runs with similar gap patterns:

routa harness evolve --learn

Output:

📊 Harness Evolution - Learning Mode
Loading evolution history...
Found 5 evolution runs
Detected 2 common patterns:
- Gap pattern: ["missing_governance_gate"] (seen 3 times, avg success: 100.0%)
- Gap pattern: ["missing_execution_surface"] (seen 4 times, avg success: 95.0%)
Generated 2 playbook candidates:
✓ harness-evolution-missing-governance.json (confidence: 100.0%, evidence: 3 runs)
✓ harness-evolution-missing-execution-surface.json (confidence: 95.0%, evidence: 4 runs)

✅ Playbooks saved to docs/fitness/playbooks

Phase 3: Review Playbooks

Inspect generated playbooks:

# List all playbooks
ls docs/fitness/playbooks/

# View a playbook
cat docs/fitness/playbooks/harness-evolution-missing-governance.json | jq

# Check patch order
jq '.strategy.preferredPatchOrder' docs/fitness/playbooks/*.json

Phase 4: Runtime Integration (Coming in Phase 2)

Future versions will automatically load playbooks at runtime:

routa harness evolve --apply

# 🧠 Loaded 1 learned playbook (confidence: 95%)
# Recommended patch order: ["patch.A", "patch.B"]
# Evidence: 3 successful runs over 2 weeks

Benefits

1. Self-Improvement Loop

Run → Evidence → Playbook → Runtime → Guardrail

Each evolution run makes the system smarter for the next one.

2. Evidence-Backed Strategies

Every playbook links back to concrete runs with timestamps, ensuring strategies are validated by real execution, not hunches.

3. Cross-Project Knowledge Transfer

Playbooks generated from one repo can inform evolutions on similar repos, accelerating bootstrapping.

4. Continuous Refinement

As more runs accumulate, confidence scores increase and anti-patterns emerge, making playbooks more reliable over time.

Storage

Evolution History

  • Path: docs/fitness/evolution/history.jsonl
  • Format: JSONL (append-only)
  • Committed: Yes (part of repo history)

Playbooks

  • Path: docs/fitness/playbooks/*.json
  • Format: JSON
  • Committed: Recommended (shareable knowledge)

Integration with Agent Traces

Evolution history entries include sessionId to link with full agent execution traces in .routa/traces/, enabling deep analysis:

  • Which files were read during gap detection?
  • What was the exact tool call sequence?
  • What was the Git state before/after?

See Trace Learning Analysis for details.

Roadmap

  • Phase 0 (✅ Completed): Schema extension for trace learning
  • Phase 1 (✅ Completed): Pattern detection + playbook generation
  • Phase 2 (⏭️ Next): Runtime playbook loading + preflight guidance
  • Phase 3 (Future): Guardrail promotion + cross-repo sharing