Harness Trace Learning - Phase 2: Runtime Integration
Status: In Development
Created: 2026-04-06
Related: #294, #342, #343, #345
Overview
Phase 2 implements runtime playbook loading and preflight guidance, allowing harness evolution to use learned strategies during execution.
Goals
- Load relevant playbooks at harness evolve startup
- Display preflight guidance to users
- Reorder patches based on learned strategies
- Show provenance for transparency
Architecture
1. Playbook Loading
Function: load_playbooks_for_task(repo_root, task_type)
pub fn load_playbooks_for_task(
repo_root: &Path,
task_type: &str,
) -> Result<Vec<PlaybookCandidate>, String> {
let playbook_dir = repo_root.join("docs/fitness/playbooks");
if !playbook_dir.exists() {
return Ok(Vec::new());
}
// Read all .json files
// Filter by task_type
// Sort by confidence (descending)
// Return top N (e.g., 3)
}
2. Gap Pattern Matching
Function: find_matching_playbook(playbooks, gaps)
pub fn find_matching_playbook<'a>(
playbooks: &'a [PlaybookCandidate],
gaps: &[HarnessEngineeringGap],
) -> Option<&'a PlaybookCandidate> {
// Extract gap categories from current run
let current_categories: Vec<String> = gaps
.iter()
.map(|g| g.category.clone())
.collect();
// Find exact match
playbooks.iter().find(|pb| {
pb.strategy.gap_patterns == current_categories
})
// Or find best partial match
}
3. Preflight Guidance Display
fn display_preflight_guidance(
playbook: &PlaybookCandidate,
options: &HarnessEngineeringOptions,
) {
if options.json_output {
return; // Skip in JSON mode
}
println!("🧠 Loaded learned playbook (confidence: {:.0}%)", playbook.confidence * 100.0);
println!(" ID: {}", playbook.id);
println!(" Evidence: {} successful runs", playbook.provenance.evidence_count);
println!();
println!("💡 Recommended patch order:");
for (idx, patch_id) in playbook.strategy.preferred_patch_order.iter().enumerate() {
println!(" {}. {}", idx + 1, patch_id);
}
if !playbook.strategy.anti_patterns.is_empty() {
println!();
println!("⚠️ Known issues:");
for anti in &playbook.strategy.anti_patterns {
println!(" - {}: {}", anti.do_not, anti.reason);
}
}
println!();
}
4. Patch Reordering
Function: reorder_patches_by_playbook(patches, playbook)
pub fn reorder_patches_by_playbook(
patches: &mut Vec<HarnessEngineeringPatchCandidate>,
playbook: &PlaybookCandidate,
) {
// Create priority map from playbook order
let priority_map: HashMap<String, usize> = playbook
.strategy
.preferred_patch_order
.iter()
.enumerate()
.map(|(idx, id)| (id.clone(), idx))
.collect();
// Sort patches by:
// 1. Patches in playbook order (by priority)
// 2. Patches not in playbook (original order)
patches.sort_by_key(|patch| {
priority_map.get(&patch.id).copied().unwrap_or(usize::MAX)
});
}
Integration Points
In evaluate_harness_engineering()
pub async fn evaluate_harness_engineering(
repo_root: &Path,
options: &HarnessEngineeringOptions,
state: Option<&AppState>,
) -> Result<HarnessEngineeringReport, String> {
// ... existing gap detection ...
// NEW: Load playbooks
let playbooks = learning::load_playbooks_for_task(repo_root, "harness_evolution")?;
// NEW: Find matching playbook
let matching_playbook = playbooks
.iter()
.find(|pb| {
let mut sorted_pattern = pb.strategy.gap_patterns.clone();
sorted_pattern.sort();
let mut current_categories: Vec<String> = gaps
.iter()
.map(|g| g.category.clone())
.collect();
current_categories.sort();
sorted_pattern == current_categories
});
// ... existing patch generation ...
// NEW: Reorder patches if playbook found
if let Some(playbook) = matching_playbook {
learning::display_preflight_guidance(playbook, options);
learning::reorder_patches_by_playbook(&mut patch_candidates, playbook);
}
// ... rest of execution ...
}
User Experience
Before (Phase 1)
$ routa harness evolve --apply
📊 Harness Evolution - Evaluation
Found 2 gaps...
Generated 2 patches...
✅ Applied 2 patches
After (Phase 2)
$ routa harness evolve --apply
🧠 Loaded learned playbook (confidence: 95%)
ID: harness-evolution-missing-governance
Evidence: 3 successful runs
💡 Recommended patch order:
1. patch.create_codeowners
2. patch.create_dependabot
📊 Harness Evolution - Evaluation
Found 2 gaps...
Generated 2 patches (reordered by playbook)...
✅ Applied 2 patches
Opt-Out Mechanism
Users can disable playbook loading:
# Disable playbook loading
routa harness evolve --apply --no-playbooks
# Or via environment variable
NO_PLAYBOOKS=1 routa harness evolve --apply
Testing Strategy
Unit Tests
test_load_playbooks_for_task- Load from directorytest_find_matching_playbook- Gap pattern matchingtest_reorder_patches_by_playbook- Patch sorting
Integration Tests
- Create playbook + run evolve → verify patch order
- Multiple playbooks → verify best match selected
- No matching playbook → verify no reordering
Implementation Plan
Step 1: Playbook Loading (1 hour)
- Implement
load_playbooks_for_task() - Add unit tests
- Handle missing playbook directory gracefully
Step 2: Pattern Matching (1 hour)
- Implement
find_matching_playbook() - Add fuzzy matching (partial overlap)
- Add unit tests
Step 3: Preflight Guidance (1 hour)
- Implement
display_preflight_guidance() - Format output nicely
- Skip in JSON mode
Step 4: Patch Reordering (1 hour)
- Implement
reorder_patches_by_playbook() - Preserve patches not in playbook
- Add unit tests
Step 5: Integration (2 hours)
- Wire up in
evaluate_harness_engineering() - Add integration tests
- Test with real playbooks
Step 6: Polish (1 hour)
- Add
--no-playbooksflag - Update help text
- Update documentation
Total Estimate: 7 hours (1 day)
Success Criteria
- ✅ Playbooks loaded automatically when present
- ✅ Matching playbook selected based on gap patterns
- ✅ Preflight guidance displayed to user
- ✅ Patches reordered according to learned strategy
- ✅ Users can opt-out if desired
- ✅ All tests passing
- ✅ Documentation updated
Next Steps (Phase 3)
- Playbook staleness detection (expire after 30 days)
- Cross-repo playbook sharing
- Playbook approval workflow
- Guardrail promotion (playbook → fitness rule)