Module M4: Autonomous Development Loop

One-liner: Replace the founder-as-courier between strategist, developer, and QA with an automated agent loop.

Problem Statement

Current development workflow requires the founder to manually shuttle between three roles:

  1. Strategist (Claude chat / co-founder role) — formulates tasks, writes prompts, reviews results
  2. Developer (Claude Code on VPS) — implements code, runs tests, commits
  3. QA / Testers (Perplexity, Cowork, other LLM agents) — test the live product in browser, find bugs, produce reports

The founder acts as a manual courier between these three:

  • Receives prompt from strategist → copies to Claude Code
  • Receives build result from Claude Code → sends to QA agents
  • Receives bug report from QA → brings back to strategist
  • Receives fix prompt from strategist → copies to Claude Code
  • Loop repeats until quality target met

This is the same “courier work” problem that M3 Deliberation Chamber solved for strategic questions — but applied to the entire development cycle.

Pain Points

  • Founder time wasted on mechanical copy-paste between agents (~60-70% of session time)
  • Context loss between handoffs (each agent sees only its slice)
  • Inconsistent testing (founder may skip QA steps when tired)
  • No persistent quality monitoring between sessions
  • Founder becomes bottleneck — development speed limited by founder’s availability

Vision

An autonomous loop where: Strategist Agent → formulates task with full context ↓ automatic Developer Agent (Claude Code) → implements + unit tests ↓ automatic QA Agents (multi-LLM, browser-based) → test live product, find bugs, produce structured report ↓ automatic Developer Agent → fixes bugs based on QA report ↓ automatic QA Agents → re-test ↓ automatic (if all checks pass) Strategist Agent → reviews final result, updates roadmap ↓ ONLY if critical decision needed Founder → intervenes with personal judgment

Founder’s Role After M4

  • NOT a courier — agents communicate directly
  • Decision-maker on critical questions — per CriticalityPolicy Level 3
  • Vision-setter — defines what to build, not how to shuttle between builders
  • Quality arbiter of last resort — when agents disagree on what “done” means

Agent Types

Health Agents

Continuously monitor quality of each module (M1, M2, M3):

  • Run periodic automated tests against live product
  • Track quality metrics over time (score trends, regression detection)
  • Alert founder when quality degrades below threshold
  • Example: “M1 Health Agent detects that BADs Russia score dropped from 8/10 to 6/10 after latest deploy”

SOS Agents

Emergency response when something breaks:

  • Triggered by Health Agent alerts or user-reported issues
  • Attempt automated diagnosis (read logs, check recent commits, identify regression)
  • If auto-fixable (known pattern) → fix + notify founder
  • If not auto-fixable → escalate to founder with full diagnostic context
  • Per Constitution Law 9: default to safe mode, never auto-deploy risky fixes

QA Agents (Multi-LLM Browser Testing)

Replace the current manual “ask Perplexity to test in browser” workflow:

  • Multiple LLM agents (Perplexity, Cowork, others) independently test the live product
  • Browser-based testing: navigate UI, fill forms, check outputs, verify rendering
  • Produce structured bug reports (not free-form text)
  • Cross-reference findings (if 2/3 QA agents find the same bug → high confidence)
  • Integration with Chamber: QA disagreements can trigger Level 2 auto-quorum

Courier Agent (Orchestrator)

Replaces the founder’s courier role:

  • Receives task from Strategist
  • Routes to Developer Agent with full context (manifest, ADRs, code references)
  • Collects build result, routes to QA Agents
  • Collects QA reports, routes back to Developer for fixes
  • Manages the loop until Definition of Done is met
  • Escalates to founder only per CriticalityPolicy

Relationship to Existing Modules

  • M1 (Investment Navigator): Health Agent monitors pipeline quality, QA Agents test UI
  • M2 (Team Navigator): Same pattern — Health + QA monitoring
  • M3 (Deliberation Chamber): QA disagreements trigger Chamber sessions; Chamber itself gets Health monitoring
  • M4 is infrastructure — it serves M1, M2, M3 and all future modules

Implementation Prerequisites

Before M4 can be built:

  • M1 functional and deployed (done)
  • M2 functional and deployed (planned)
  • M3 Chamber with UI (done — today)
  • CriticalityPolicy in place (done)
  • Clear Definition of Done per module (exists in manifest)

Open Questions (to be resolved before implementation)

  1. How do QA agents access the browser? Cowork has browser automation. Perplexity can browse. What’s the API/integration model?
  2. How does Developer Agent receive structured bug reports? Format? Direct Claude Code invocation or via queue?
  3. How much autonomy for auto-fix? SOS Agent can fix known patterns — but what counts as “known pattern”? Need a registry.
  4. Cost model: Multi-LLM QA + multi-loop iterations could be expensive. Budget caps per loop iteration?
  5. How to prevent infinite loops? Max iterations per task? Escalation after N failed QA rounds?
  6. Security: Courier Agent needs access to VPS, git, Streamlit — what’s the permission model?

Timeline

Not scheduled. This is a post-M2 concept. Captured here so the vision is preserved and future sessions have full context.

Estimated effort when sprint begins: 3-4 weeks (significant infrastructure).

Cross-References