Changelog

Daily updates and changes to the Harness Engineering Guide.

+4 Articles β€” Classifier Permissions, Eval Awareness, Agent Teams, Initializer Pattern (25 Total)

2026-04-19 β€” +4 New Β· 25 Total

Practice (+4 from Anthropic Engineering, 14 total)

  • Classifier-Based Permissions β€” Replace approval fatigue with model-based classifiers. Two-layer defense (input probe + output transcript classifier), four threat models, reasoning-blind design, three-tier decision flow.
  • Eval Awareness β€” Claude Opus 4.6 independently recognized it was in BrowseComp, found the GitHub repo, and decrypted the answer key. Novel contamination pattern, multi-agent 3.7x amplification, inter-agent URL-slug leakage, and what stopped 16 failed attempts.
  • Agent Teams β€” 16 parallel Claudes produced a 100K-line Rust C compiler that builds Linux 6.9. Ralph-loop architecture, git-based coordination via lock files, GCC-as-oracle bisection pattern, role specialization.
  • Initializer + Coding Agent Pattern β€” Two-phase harness for long-running agents. Why compaction isn't enough, feature_list.json schema, 5-step startup ritual, end-to-end testing with Puppeteer MCP.

Pipeline

  • First production run of harness-guide-pipeline skill detected 4 uncovered high-signal Anthropic Engineering posts.
  • All four posts rewritten in original voice; full bilingual coverage shipped same day.
  • Confirmed anthropic.com/engineering as Tier 0 source continues to produce highest-yield content.

+6 Articles, Abuse-Hunter Skill, New Banner β€” 21 Total

2026-04-16 β€” +6 New Β· 21 Total

Practice (+2 new, 7 total)

  • Multi-Agent Orchestration β€” Orchestration patterns (pipeline, fan-out, supervisor, peer-to-peer), context isolation, real-world examples from Multica, Paseo, and OpenClaw.
  • Scheduling & Automation β€” Cron, heartbeats, event triggers, one-shot timers. Session targeting, delivery, LangSmith vs harness-native comparison.

Showcase (+1 new, 2 total)

  • Ghost Account Hunting β€” Post-mortem: 1000+ ghost accounts drained our platform in 15 days. Full investigation, detection scripts, and prevention playbook.

Skills

  • Added abuse-hunter Skill β€” SaaS batch-registration abuse detection toolkit.

Practice (+3 from Anthropic Engineering, 10 total)

  • Long-Running Harness Design β€” Context anxiety, self-evaluation bias, context reset vs compaction, GAN-inspired generator-evaluator, three-agent architecture (planner/generator/evaluator).
  • Managed Agents Architecture β€” Brain/hands/session decoupling, pets vs cattle, session as durable event log, credential isolation, TTFT p50 -60% / p95 -90%.
  • Eval Infrastructure Noise β€” Resource config swings benchmark scores by 6pp. Floor+ceiling enforcement, 1xβ†’3xβ†’uncapped analysis.

Site

  • Replaced AI-generated banner with pixel-perfect SVG-rendered version.
  • Added anthropic.com/engineering as Tier 0 content source to pipeline.
  • Synced README (EN + ZH) with all 21 articles across 5 sections.