Sprint Plan: Weeks 5-7 (April 17 — May 8, 2026)

Quality Sprint Baseline

Date: 2026-04-17
Baseline score: 5.75/10 (BADs Russia, run 20260416_050822)
Target: 8.0/10
Judge verdict: FAIL
P0 blocker: Scout returned 0 actors, Director did not validate/retry
Structural issues identified: ADR-0018 (RAG Memory), ADR-0019 (Cross-Model Validation)

Strategic basis: Chamber deliberation (3x unanimous, 9/10 confidence) + ADR-0013 formal blocker.

Approach: B+C hybrid — M1 quality sprint as primary track, M3 quick wins as parallel non-conflicting track.

Week 5 (Apr 17-23): M1 Quality Sprint + M3 Quick Wins

Primary: M1 Quality (6.25/10 → 7.5+/10)

Day 1: Aida triages 3.75 failing BADs Russia tests by severity:

P0: crashes, wrong data, misleading conclusions
P1: significant quality gaps, missing sections, weak analysis
P2: cosmetic, formatting, minor inaccuracies

Days 2-5: Fix P0/P1 failures. Target: 7.5+/10 by end of week.

Success criteria: re-run BADs Russia test suite, score ≥ 7.5/10.

Parallel: M3 Chamber Quick Wins (non-conflicting)

These are isolated modules — no overlap with M1 pipeline code:

Grok + DeepSeek providers (~1 day)
- Add two new provider adapters to src/synth_brain/llm/providers/
- Panel expands from 3 to 5 providers
- Update divergence logic for 5-provider panel
- Amendment to ADR-0014 per existing plan
“Add my input” button (~1 day)
- Currently stubbed as “Coming soon” in Chamber UI
- Implement: founder types insight → arbiter re-runs with founder input as additional context
- This is the Contributor role from CriticalityPolicy Level 3
Minor UX polish (~0.5 day)
- Gemini tab name truncation fix
- Any remaining items from tester feedback

Week 6 (Apr 24-30): M1 → 8/10 + External Testers + Backlog

Primary: M1 Quality (7.5 → 8/10)

Fix remaining P1/P2 failures
Target: ≥ 8/10 on BADs Russia
Success criteria: Judge score ≥ 8/10 on at least 2 consecutive runs

External Testing (ADR-0013 unblock)

Recruit 2+ external testers (beyond Aida) to test M1 via app.synth-nova.com
Collect structured feedback per FEEDBACK_LOG format
This unblocks M2 per ADR-0013 requirement: “M1 tested by 3+ external users”

Parallel: Backlog Items

google.genai migration (tech debt from M3 v1, ~1 hour)
Pipeline speed optimization (currently 16-19 min ceiling)
Role files in 03-Roles/ — add “Must comply with Constitution” (follow-up ADR-0012)

Weeks 7-8 (May 1-14): M2 Team Implementation Navigator

Prerequisites (must be met before starting)

M1 score ≥ 8/10 on BADs Russia
M1 tested by 3+ external users with positive feedback
Aida available for M2 QA (freed from M1 regression work)

Scope (from ADR-0013)

4 new agents: ProfileHarvester, ClaimVerifier, DigitalPresenceAnalyzer, TrackRecordResearcher
Consent model (explicit consent only, v1)
Verification pipeline with delta-based discrepancy detection
10-section report structure
Integration with M1 for Combined mode (M1+M2)
Streamlit UI page (same pattern as Chamber)

Estimated effort: 2-3 weeks

Risk Register

Risk	Mitigation
M1 failures deeper than expected (need >1 week)	Week 6 is buffer; M3 wins fill time
Can’t find 3+ external testers	Use Chamber to deliberate alternative unblock criteria
Grok/DeepSeek API access issues	These are nice-to-have; don’t block M1 work
M2 scope creep	ADR-0013 defines scope; stick to v1

Decision Log

2026-04-17: Chamber recommended M1-first (3x unanimous, 9/10). Founder accepted with parallel M3 cherry-pick.
2026-04-17: ADR-0013 blocker confirmed — M2 requires 3+ external M1 users.
2026-04-17: M1 baseline measured at 5.75/10 (worse than estimated 6.25). P0: Scout 0 actors + Director no validation. Two architectural ADRs proposed: RAG Memory (ADR-0018), Cross-Model Validation (ADR-0019).

Cross-References

Niche-Evaluation-Module — M1 spec
Team-Implementation-Module — M2 spec (ADR-0013)
Deliberation-Chamber-Module — M3 v2 vision
Autonomous-Dev-Loop — M4 concept (post-M2)
CriticalityPolicy — governs M3 “Add my input” implementation
Constitution — Law 1 (founder decision), Law 3 (quality over speed)

Progress Log

Day 1 (2026-04-17)

M1 Quality Sprint: 5.75/10 → 7.0/10 PASS (6 rounds, 6 pipeline fixes)
Pipeline fixes: Director meta patch, Researcher schema validation + retry without web_search, Scout max_tokens 8K→16K + truncation detection + segment preservation, global timeout 1200→2400, per-agent timeout enforcement
Cost reduction: $2.50 \to$ 1.65/run (-34%)
Reliability: timeout crashes eliminated, json_parse_failed fixed
Remaining for 8/10: Researcher TAM/SAM methodology, Aggregate executive summary, Scout source quality
Chamber UI: shipped with shareable URLs, progress phases, 6 bugfixes
Manifest: ADR-0017 (M4), ADR-0018 (RAG), ADR-0019 (Cross-Model), Sprint Plan, CLAUDE.md

Day 3 (2026-04-18)

M1 Quality Sprint: 5.75/10 → 7.0-7.75 zone (20 rounds, prompt-level ceiling reached)
Best run: 7.75/10 (Round 11) — Researcher 8.0, Aggregate 9.0
Pipeline fixes: Director streaming, parsed_output sanitization, type-safe rendering, partial save, finalization watchdog, orphan cleanup, confidence calibration, nested field propagation, Director truncation retry
Structural findings: stale worker root cause (+1.0 to score), systemd worker disabled (race condition), prompt ceiling ~7.5±0.5
Cost reduction: $2.50 \to$ 2.04/run (-18%)
Next: Track C (Grok/DeepSeek Chamber providers), then ADR-0018 RAG + ADR-0019 Cross-Model for 8.0+
Manifest expansion: 15 files, 3 ADR (0020-0022), wiki rebuilt (agent-hub, 404, ignorePatterns)
RAG: manifest-search shipped (Qdrant + OpenAI embeddings, 538 chunks, HTTP API on port 8080)
Wiki: wiki.synth-nova.com fully rebuilt — agent-hub main page, custom 404, Glossary quick-ref, onboarding system

Synth Nova Manifest

Explorer

Sprint Plan: Weeks 5-7