Sprint Plan: Weeks 5-7 (April 17 — May 8, 2026)
Quality Sprint Baseline
- Date: 2026-04-17
- Baseline score: 5.75/10 (BADs Russia, run 20260416_050822)
- Target: 8.0/10
- Judge verdict: FAIL
- P0 blocker: Scout returned 0 actors, Director did not validate/retry
- Structural issues identified: ADR-0018 (RAG Memory), ADR-0019 (Cross-Model Validation)
Strategic basis: Chamber deliberation (3x unanimous, 9/10 confidence) + ADR-0013 formal blocker.
Approach: B+C hybrid — M1 quality sprint as primary track, M3 quick wins as parallel non-conflicting track.
Week 5 (Apr 17-23): M1 Quality Sprint + M3 Quick Wins
Primary: M1 Quality (6.25/10 → 7.5+/10)
Day 1: Aida triages 3.75 failing BADs Russia tests by severity:
- P0: crashes, wrong data, misleading conclusions
- P1: significant quality gaps, missing sections, weak analysis
- P2: cosmetic, formatting, minor inaccuracies
Days 2-5: Fix P0/P1 failures. Target: 7.5+/10 by end of week.
Success criteria: re-run BADs Russia test suite, score ≥ 7.5/10.
Parallel: M3 Chamber Quick Wins (non-conflicting)
These are isolated modules — no overlap with M1 pipeline code:
-
Grok + DeepSeek providers (~1 day)
- Add two new provider adapters to
src/synth_brain/llm/providers/ - Panel expands from 3 to 5 providers
- Update divergence logic for 5-provider panel
- Amendment to ADR-0014 per existing plan
- Add two new provider adapters to
-
“Add my input” button (~1 day)
- Currently stubbed as “Coming soon” in Chamber UI
- Implement: founder types insight → arbiter re-runs with founder input as additional context
- This is the Contributor role from CriticalityPolicy Level 3
-
Minor UX polish (~0.5 day)
- Gemini tab name truncation fix
- Any remaining items from tester feedback
Week 6 (Apr 24-30): M1 → 8/10 + External Testers + Backlog
Primary: M1 Quality (7.5 → 8/10)
- Fix remaining P1/P2 failures
- Target: ≥ 8/10 on BADs Russia
- Success criteria: Judge score ≥ 8/10 on at least 2 consecutive runs
External Testing (ADR-0013 unblock)
- Recruit 2+ external testers (beyond Aida) to test M1 via app.synth-nova.com
- Collect structured feedback per FEEDBACK_LOG format
- This unblocks M2 per ADR-0013 requirement: “M1 tested by 3+ external users”
Parallel: Backlog Items
google.genaimigration (tech debt from M3 v1, ~1 hour)- Pipeline speed optimization (currently 16-19 min ceiling)
- Role files in 03-Roles/ — add “Must comply with Constitution” (follow-up ADR-0012)
Weeks 7-8 (May 1-14): M2 Team Implementation Navigator
Prerequisites (must be met before starting)
- M1 score ≥ 8/10 on BADs Russia
- M1 tested by 3+ external users with positive feedback
- Aida available for M2 QA (freed from M1 regression work)
Scope (from ADR-0013)
- 4 new agents: ProfileHarvester, ClaimVerifier, DigitalPresenceAnalyzer, TrackRecordResearcher
- Consent model (explicit consent only, v1)
- Verification pipeline with delta-based discrepancy detection
- 10-section report structure
- Integration with M1 for Combined mode (M1+M2)
- Streamlit UI page (same pattern as Chamber)
Estimated effort: 2-3 weeks
Risk Register
| Risk | Mitigation |
|---|---|
| M1 failures deeper than expected (need >1 week) | Week 6 is buffer; M3 wins fill time |
| Can’t find 3+ external testers | Use Chamber to deliberate alternative unblock criteria |
| Grok/DeepSeek API access issues | These are nice-to-have; don’t block M1 work |
| M2 scope creep | ADR-0013 defines scope; stick to v1 |
Decision Log
- 2026-04-17: Chamber recommended M1-first (3x unanimous, 9/10). Founder accepted with parallel M3 cherry-pick.
- 2026-04-17: ADR-0013 blocker confirmed — M2 requires 3+ external M1 users.
- 2026-04-17: M1 baseline measured at 5.75/10 (worse than estimated 6.25). P0: Scout 0 actors + Director no validation. Two architectural ADRs proposed: RAG Memory (ADR-0018), Cross-Model Validation (ADR-0019).
Cross-References
- Niche-Evaluation-Module — M1 spec
- Team-Implementation-Module — M2 spec (ADR-0013)
- Deliberation-Chamber-Module — M3 v2 vision
- Autonomous-Dev-Loop — M4 concept (post-M2)
- CriticalityPolicy — governs M3 “Add my input” implementation
- Constitution — Law 1 (founder decision), Law 3 (quality over speed)
Progress Log
Day 1 (2026-04-17)
- M1 Quality Sprint: 5.75/10 → 7.0/10 PASS (6 rounds, 6 pipeline fixes)
- Pipeline fixes: Director meta patch, Researcher schema validation + retry without web_search, Scout max_tokens 8K→16K + truncation detection + segment preservation, global timeout 1200→2400, per-agent timeout enforcement
- Cost reduction: 1.65/run (-34%)
- Reliability: timeout crashes eliminated, json_parse_failed fixed
- Remaining for 8/10: Researcher TAM/SAM methodology, Aggregate executive summary, Scout source quality
- Chamber UI: shipped with shareable URLs, progress phases, 6 bugfixes
- Manifest: ADR-0017 (M4), ADR-0018 (RAG), ADR-0019 (Cross-Model), Sprint Plan, CLAUDE.md
Day 3 (2026-04-18)
- M1 Quality Sprint: 5.75/10 → 7.0-7.75 zone (20 rounds, prompt-level ceiling reached)
- Best run: 7.75/10 (Round 11) — Researcher 8.0, Aggregate 9.0
- Pipeline fixes: Director streaming, parsed_output sanitization, type-safe rendering, partial save, finalization watchdog, orphan cleanup, confidence calibration, nested field propagation, Director truncation retry
- Structural findings: stale worker root cause (+1.0 to score), systemd worker disabled (race condition), prompt ceiling ~7.5±0.5
- Cost reduction: 2.04/run (-18%)
- Next: Track C (Grok/DeepSeek Chamber providers), then ADR-0018 RAG + ADR-0019 Cross-Model for 8.0+
- Manifest expansion: 15 files, 3 ADR (0020-0022), wiki rebuilt (agent-hub, 404, ignorePatterns)
- RAG: manifest-search shipped (Qdrant + OpenAI embeddings, 538 chunks, HTTP API on port 8080)
- Wiki: wiki.synth-nova.com fully rebuilt — agent-hub main page, custom 404, Glossary quick-ref, onboarding system