Harness Evolution Trace Learning

Self-Improving Harness Evolution: Learn from past runs, generate evidence-backed playbooks, and accelerate future evolutions.

Overview

Harness Evolution's Trace Learning feature enables the system to learn from its own execution history, automatically detecting patterns in successful runs and distilling them into reusable playbooks. This creates a self-improvement loop where each harness evolution run makes the next one smarter.

Key Concepts

Evolution History

Every routa harness evolve --apply run records rich execution context to docs/fitness/evolution/history.jsonl:

{
  "timestamp": "2026-04-06T01:29:43Z",
  "sessionId": "abc-123",                    // Links to agent traces
  "taskType": "harness_evolution",
  "workflow": "bootstrap",                   // Auto-inferred
  "trigger": "manual",
  "gapsDetected": 2,
  "gapCategories": ["missing_governance_gate", "missing_execution_surface"],
  "changedPaths": [".github/CODEOWNERS", "docs/harness/build.yml"],
  "patchesApplied": ["patch.create_codeowners", "bootstrap.synthesize_build_yml"],
  "patchesFailed": [],
  "successRate": 1.0
}

Pattern Detection

The learning algorithm analyzes historical runs to find recurring patterns:

Group by gap patterns - Which gap categories appear together?
Filter successful runs - success_rate ≥ 80%
Find consensus - Patterns appearing 3+ times
Extract strategies - Preferred patch order, common file changes

Playbooks

Generated playbooks capture proven strategies with full provenance:

{
  "id": "harness-evolution-missing-governance",
  "taskType": "harness_evolution",
  "confidence": 0.95,
  "strategy": {
    "preferredPatchOrder": [
      "patch.create_codeowners",
      "patch.create_dependabot"
    ],
    "gapPatterns": ["missing_governance_gate"],
    "antiPatterns": [
      {
        "doNot": "skip ratchet enforcement",
        "reason": "Caused fitness regression in 2/5 runs"
      }
    ]
  },
  "provenance": {
    "sourceRuns": [
      "2026-04-06T01:29:43Z",
      "2026-04-06T02:15:22Z",
      "2026-04-07T10:30:15Z"
    ],
    "successRate": 0.95,
    "evidenceCount": 3
  }
}

Usage

Phase 1: Generate Evolution History

Run harness evolution multiple times to build a learning dataset:

# Bootstrap multiple repos
for repo in repo1 repo2 repo3; do
  cd $repo
  routa harness evolve --bootstrap --apply
done

# Or run on the same repo after making changes
routa harness evolve --apply

Each run appends to docs/fitness/evolution/history.jsonl.

Phase 2: Learn from History

After 3+ successful runs with similar gap patterns:

routa harness evolve --learn

Output:

📊 Harness Evolution - Learning Mode
  Loading evolution history...
  Found 5 evolution runs
  Detected 2 common patterns:
    - Gap pattern: ["missing_governance_gate"] (seen 3 times, avg success: 100.0%)
    - Gap pattern: ["missing_execution_surface"] (seen 4 times, avg success: 95.0%)
  Generated 2 playbook candidates:
    ✓ harness-evolution-missing-governance.json (confidence: 100.0%, evidence: 3 runs)
    ✓ harness-evolution-missing-execution-surface.json (confidence: 95.0%, evidence: 4 runs)

✅ Playbooks saved to docs/fitness/playbooks

Phase 3: Review Playbooks

Inspect generated playbooks:

# List all playbooks
ls docs/fitness/playbooks/

# View a playbook
cat docs/fitness/playbooks/harness-evolution-missing-governance.json | jq

# Check patch order
jq '.strategy.preferredPatchOrder' docs/fitness/playbooks/*.json

Phase 4: Runtime Integration (Coming in Phase 2)

Future versions will automatically load playbooks at runtime:

routa harness evolve --apply

# 🧠 Loaded 1 learned playbook (confidence: 95%)
#   Recommended patch order: ["patch.A", "patch.B"]
#   Evidence: 3 successful runs over 2 weeks

Benefits

1. Self-Improvement Loop

Run → Evidence → Playbook → Runtime → Guardrail

Each evolution run makes the system smarter for the next one.

2. Evidence-Backed Strategies

Every playbook links back to concrete runs with timestamps, ensuring strategies are validated by real execution, not hunches.

3. Cross-Project Knowledge Transfer

Playbooks generated from one repo can inform evolutions on similar repos, accelerating bootstrapping.

As more runs accumulate, confidence scores increase and anti-patterns emerge, making playbooks more reliable over time.

Storage

Evolution History

Path: docs/fitness/evolution/history.jsonl
Format: JSONL (append-only)
Committed: Yes (part of repo history)

Playbooks

Path: docs/fitness/playbooks/*.json
Format: JSON
Committed: Recommended (shareable knowledge)

Integration with Agent Traces

Evolution history entries include sessionId to link with full agent execution traces in .routa/traces/, enabling deep analysis:

Which files were read during gap detection?
What was the exact tool call sequence?
What was the Git state before/after?

See Harness Trace Learning - Phase 2 Design for the follow-up design and operational direction.

Roadmap

Phase 0 (✅ Completed): Schema extension for trace learning
Phase 1 (✅ Completed): Pattern detection + playbook generation
Phase 2 (⏭️ Next): Runtime playbook loading + preflight guidance
Phase 3 (Future): Guardrail promotion + cross-repo sharing

Fitness Function Rulebook
Harness Fitness Blog
Architecture
Issue #294 - Trace Learning
PR #342 - Design RFC
PR #343 - Phase 0
PR #345 - Phase 1

Overview​

Key Concepts​

Evolution History​

Pattern Detection​

Playbooks​

Usage​

Phase 1: Generate Evolution History​

Phase 2: Learn from History​

Phase 3: Review Playbooks​

Phase 4: Runtime Integration (Coming in Phase 2)​

Benefits​

1. Self-Improvement Loop​

2. Evidence-Backed Strategies​

3. Cross-Project Knowledge Transfer​

4. Continuous Refinement​

Storage​

Evolution History​

Playbooks​

Integration with Agent Traces​

Roadmap​

Related​

Overview

Key Concepts

Evolution History

Pattern Detection

Playbooks

Usage

Phase 1: Generate Evolution History

Phase 2: Learn from History

Phase 3: Review Playbooks

Phase 4: Runtime Integration (Coming in Phase 2)

Benefits

1. Self-Improvement Loop

2. Evidence-Backed Strategies

3. Cross-Project Knowledge Transfer

4. Continuous Refinement

Storage

Evolution History

Playbooks

Integration with Agent Traces

Roadmap

Related