Module M4: Autonomous Development Loop
One-liner: Replace the founder-as-courier between strategist, developer, and QA with an automated agent loop.
Problem Statement
Current development workflow requires the founder to manually shuttle between three roles:
- Strategist (Claude chat / co-founder role) — formulates tasks, writes prompts, reviews results
- Developer (Claude Code on VPS) — implements code, runs tests, commits
- QA / Testers (Perplexity, Cowork, other LLM agents) — test the live product in browser, find bugs, produce reports
The founder acts as a manual courier between these three:
- Receives prompt from strategist → copies to Claude Code
- Receives build result from Claude Code → sends to QA agents
- Receives bug report from QA → brings back to strategist
- Receives fix prompt from strategist → copies to Claude Code
- Loop repeats until quality target met
This is the same “courier work” problem that M3 Deliberation Chamber solved for strategic questions — but applied to the entire development cycle.
Pain Points
- Founder time wasted on mechanical copy-paste between agents (~60-70% of session time)
- Context loss between handoffs (each agent sees only its slice)
- Inconsistent testing (founder may skip QA steps when tired)
- No persistent quality monitoring between sessions
- Founder becomes bottleneck — development speed limited by founder’s availability
Vision
An autonomous loop where: Strategist Agent → formulates task with full context ↓ automatic Developer Agent (Claude Code) → implements + unit tests ↓ automatic QA Agents (multi-LLM, browser-based) → test live product, find bugs, produce structured report ↓ automatic Developer Agent → fixes bugs based on QA report ↓ automatic QA Agents → re-test ↓ automatic (if all checks pass) Strategist Agent → reviews final result, updates roadmap ↓ ONLY if critical decision needed Founder → intervenes with personal judgment
Founder’s Role After M4
- NOT a courier — agents communicate directly
- Decision-maker on critical questions — per CriticalityPolicy Level 3
- Vision-setter — defines what to build, not how to shuttle between builders
- Quality arbiter of last resort — when agents disagree on what “done” means
Agent Types
Health Agents
Continuously monitor quality of each module (M1, M2, M3):
- Run periodic automated tests against live product
- Track quality metrics over time (score trends, regression detection)
- Alert founder when quality degrades below threshold
- Example: “M1 Health Agent detects that BADs Russia score dropped from 8/10 to 6/10 after latest deploy”
SOS Agents
Emergency response when something breaks:
- Triggered by Health Agent alerts or user-reported issues
- Attempt automated diagnosis (read logs, check recent commits, identify regression)
- If auto-fixable (known pattern) → fix + notify founder
- If not auto-fixable → escalate to founder with full diagnostic context
- Per Constitution Law 9: default to safe mode, never auto-deploy risky fixes
QA Agents (Multi-LLM Browser Testing)
Replace the current manual “ask Perplexity to test in browser” workflow:
- Multiple LLM agents (Perplexity, Cowork, others) independently test the live product
- Browser-based testing: navigate UI, fill forms, check outputs, verify rendering
- Produce structured bug reports (not free-form text)
- Cross-reference findings (if 2/3 QA agents find the same bug → high confidence)
- Integration with Chamber: QA disagreements can trigger Level 2 auto-quorum
Courier Agent (Orchestrator)
Replaces the founder’s courier role:
- Receives task from Strategist
- Routes to Developer Agent with full context (manifest, ADRs, code references)
- Collects build result, routes to QA Agents
- Collects QA reports, routes back to Developer for fixes
- Manages the loop until Definition of Done is met
- Escalates to founder only per CriticalityPolicy
Relationship to Existing Modules
- M1 (Investment Navigator): Health Agent monitors pipeline quality, QA Agents test UI
- M2 (Team Navigator): Same pattern — Health + QA monitoring
- M3 (Deliberation Chamber): QA disagreements trigger Chamber sessions; Chamber itself gets Health monitoring
- M4 is infrastructure — it serves M1, M2, M3 and all future modules
Implementation Prerequisites
Before M4 can be built:
- M1 functional and deployed (done)
- M2 functional and deployed (planned)
- M3 Chamber with UI (done — today)
- CriticalityPolicy in place (done)
- Clear Definition of Done per module (exists in manifest)
Open Questions (to be resolved before implementation)
- How do QA agents access the browser? Cowork has browser automation. Perplexity can browse. What’s the API/integration model?
- How does Developer Agent receive structured bug reports? Format? Direct Claude Code invocation or via queue?
- How much autonomy for auto-fix? SOS Agent can fix known patterns — but what counts as “known pattern”? Need a registry.
- Cost model: Multi-LLM QA + multi-loop iterations could be expensive. Budget caps per loop iteration?
- How to prevent infinite loops? Max iterations per task? Escalation after N failed QA rounds?
- Security: Courier Agent needs access to VPS, git, Streamlit — what’s the permission model?
Timeline
Not scheduled. This is a post-M2 concept. Captured here so the vision is preserved and future sessions have full context.
Estimated effort when sprint begins: 3-4 weeks (significant infrastructure).
Cross-References
- Constitution — Law 1 (Founder Interests), Law 5 (Human Veto on critical), Law 9 (Safe defaults)
- CriticalityPolicy — determines when founder intervenes vs agents auto-resolve
- Deliberation-Chamber-Module — M3 Chamber used by QA agents for disagreement resolution
- Niche-Evaluation-Module — M1, first module to receive Health/QA agents
- Team-Implementation-Module — M2, second module
- EscalationPolicy — SOS Agent escalation triggers
- NorthStar — 80% autonomous closure target aligns directly with M4 vision