Module M4: Autonomous Development Loop

One-liner: Replace the founder-as-courier between strategist, developer, and QA with an automated agent loop.

Problem Statement

Current development workflow requires the founder to manually shuttle between three roles:

Strategist (Claude chat / co-founder role) — formulates tasks, writes prompts, reviews results
Developer (Claude Code on VPS) — implements code, runs tests, commits
QA / Testers (Perplexity, Cowork, other LLM agents) — test the live product in browser, find bugs, produce reports

The founder acts as a manual courier between these three:

Receives prompt from strategist → copies to Claude Code
Receives build result from Claude Code → sends to QA agents
Receives bug report from QA → brings back to strategist
Receives fix prompt from strategist → copies to Claude Code
Loop repeats until quality target met

This is the same “courier work” problem that M3 Deliberation Chamber solved for strategic questions — but applied to the entire development cycle.

Pain Points

Founder time wasted on mechanical copy-paste between agents (~60-70% of session time)
Context loss between handoffs (each agent sees only its slice)
Inconsistent testing (founder may skip QA steps when tired)
No persistent quality monitoring between sessions
Founder becomes bottleneck — development speed limited by founder’s availability

Vision

An autonomous loop where: Strategist Agent → formulates task with full context ↓ automatic Developer Agent (Claude Code) → implements + unit tests ↓ automatic QA Agents (multi-LLM, browser-based) → test live product, find bugs, produce structured report ↓ automatic Developer Agent → fixes bugs based on QA report ↓ automatic QA Agents → re-test ↓ automatic (if all checks pass) Strategist Agent → reviews final result, updates roadmap ↓ ONLY if critical decision needed Founder → intervenes with personal judgment

Founder’s Role After M4

NOT a courier — agents communicate directly
Decision-maker on critical questions — per CriticalityPolicy Level 3
Vision-setter — defines what to build, not how to shuttle between builders
Quality arbiter of last resort — when agents disagree on what “done” means

Agent Types

Health Agents

Continuously monitor quality of each module (M1, M2, M3):

Run periodic automated tests against live product
Track quality metrics over time (score trends, regression detection)
Alert founder when quality degrades below threshold
Example: “M1 Health Agent detects that BADs Russia score dropped from 8/10 to 6/10 after latest deploy”

SOS Agents

Emergency response when something breaks:

Triggered by Health Agent alerts or user-reported issues
Attempt automated diagnosis (read logs, check recent commits, identify regression)
If auto-fixable (known pattern) → fix + notify founder
If not auto-fixable → escalate to founder with full diagnostic context
Per Constitution Law 9: default to safe mode, never auto-deploy risky fixes

QA Agents (Multi-LLM Browser Testing)

Replace the current manual “ask Perplexity to test in browser” workflow:

Multiple LLM agents (Perplexity, Cowork, others) independently test the live product
Browser-based testing: navigate UI, fill forms, check outputs, verify rendering
Produce structured bug reports (not free-form text)
Cross-reference findings (if 2/3 QA agents find the same bug → high confidence)
Integration with Chamber: QA disagreements can trigger Level 2 auto-quorum

Courier Agent (Orchestrator)

Replaces the founder’s courier role:

Receives task from Strategist
Routes to Developer Agent with full context (manifest, ADRs, code references)
Collects build result, routes to QA Agents
Collects QA reports, routes back to Developer for fixes
Manages the loop until Definition of Done is met
Escalates to founder only per CriticalityPolicy

Relationship to Existing Modules

M1 (Investment Navigator): Health Agent monitors pipeline quality, QA Agents test UI
M2 (Team Navigator): Same pattern — Health + QA monitoring
M3 (Deliberation Chamber): QA disagreements trigger Chamber sessions; Chamber itself gets Health monitoring
M4 is infrastructure — it serves M1, M2, M3 and all future modules

Implementation Prerequisites

Before M4 can be built:

M1 functional and deployed (done)
M2 functional and deployed (planned)
M3 Chamber with UI (done — today)
CriticalityPolicy in place (done)
Clear Definition of Done per module (exists in manifest)

Open Questions (to be resolved before implementation)

How do QA agents access the browser? Cowork has browser automation. Perplexity can browse. What’s the API/integration model?
How does Developer Agent receive structured bug reports? Format? Direct Claude Code invocation or via queue?
How much autonomy for auto-fix? SOS Agent can fix known patterns — but what counts as “known pattern”? Need a registry.
Cost model: Multi-LLM QA + multi-loop iterations could be expensive. Budget caps per loop iteration?
How to prevent infinite loops? Max iterations per task? Escalation after N failed QA rounds?
Security: Courier Agent needs access to VPS, git, Streamlit — what’s the permission model?

Timeline

Not scheduled. This is a post-M2 concept. Captured here so the vision is preserved and future sessions have full context.

Estimated effort when sprint begins: 3-4 weeks (significant infrastructure).

Cross-References

Constitution — Law 1 (Founder Interests), Law 5 (Human Veto on critical), Law 9 (Safe defaults)
CriticalityPolicy — determines when founder intervenes vs agents auto-resolve
Deliberation-Chamber-Module — M3 Chamber used by QA agents for disagreement resolution
Niche-Evaluation-Module — M1, first module to receive Health/QA agents
Team-Implementation-Module — M2, second module
EscalationPolicy — SOS Agent escalation triggers
NorthStar — 80% autonomous closure target aligns directly with M4 vision

Synth Nova Manifest

Explorer

M4: Autonomous Development Loop