Changelog

Daily updates and changes to the Harness Engineering Guide.

2026-04-19

+4 Articles — Classifier Permissions, Eval Awareness, Agent Teams, Initializer Pattern (25 Total)

Classifier-Based Permissions — Replace approval fatigue with model-based classifiers. Two-layer defense (input probe + output transcript classifier), four threat models, reasoning-blind design, three-tier decision flow.
Eval Awareness — Claude Opus 4.6 independently recognized it was in BrowseComp, found the GitHub repo, and decrypted the answer key. Novel contamination pattern, multi-agent 3.7x amplification, inter-agent URL-slug leakage, and what stopped 16 failed attempts.
Agent Teams — 16 parallel Claudes produced a 100K-line Rust C compiler that builds Linux 6.9. Ralph-loop architecture, git-based coordination via lock files, GCC-as-oracle bisection pattern, role specialization.
Initializer + Coding Agent Pattern — Two-phase harness for long-running agents. Why compaction isn't enough, feature_list.json schema, 5-step startup ritual, end-to-end testing with Puppeteer MCP.

First production run of harness-guide-pipeline skill detected 4 uncovered high-signal Anthropic Engineering posts.
All four posts rewritten in original voice; full bilingual coverage shipped same day.
Confirmed anthropic.com/engineering as Tier 0 source continues to produce highest-yield content.

2026-04-16

Multi-Agent Orchestration — Orchestration patterns (pipeline, fan-out, supervisor, peer-to-peer), context isolation, real-world examples from Multica, Paseo, and OpenClaw.
Scheduling & Automation — Cron, heartbeats, event triggers, one-shot timers. Session targeting, delivery, LangSmith vs harness-native comparison.

Ghost Account Hunting — Post-mortem: 1000+ ghost accounts drained our platform in 15 days. Full investigation, detection scripts, and prevention playbook.

Long-Running Harness Design — Context anxiety, self-evaluation bias, context reset vs compaction, GAN-inspired generator-evaluator, three-agent architecture (planner/generator/evaluator).
Managed Agents Architecture — Brain/hands/session decoupling, pets vs cattle, session as durable event log, credential isolation, TTFT p50 -60% / p95 -90%.
Eval Infrastructure Noise — Resource config swings benchmark scores by 6pp. Floor+ceiling enforcement, 1x→3x→uncapped analysis.

2026-04-15