Sprint Plan: Weeks 5-7 (April 17 — May 8, 2026)

Quality Sprint Baseline

  • Date: 2026-04-17
  • Baseline score: 5.75/10 (BADs Russia, run 20260416_050822)
  • Target: 8.0/10
  • Judge verdict: FAIL
  • P0 blocker: Scout returned 0 actors, Director did not validate/retry
  • Structural issues identified: ADR-0018 (RAG Memory), ADR-0019 (Cross-Model Validation)

Strategic basis: Chamber deliberation (3x unanimous, 9/10 confidence) + ADR-0013 formal blocker.

Approach: B+C hybrid — M1 quality sprint as primary track, M3 quick wins as parallel non-conflicting track.

Week 5 (Apr 17-23): M1 Quality Sprint + M3 Quick Wins

Primary: M1 Quality (6.25/10 → 7.5+/10)

Day 1: Aida triages 3.75 failing BADs Russia tests by severity:

  • P0: crashes, wrong data, misleading conclusions
  • P1: significant quality gaps, missing sections, weak analysis
  • P2: cosmetic, formatting, minor inaccuracies

Days 2-5: Fix P0/P1 failures. Target: 7.5+/10 by end of week.

Success criteria: re-run BADs Russia test suite, score ≥ 7.5/10.

Parallel: M3 Chamber Quick Wins (non-conflicting)

These are isolated modules — no overlap with M1 pipeline code:

  1. Grok + DeepSeek providers (~1 day)

    • Add two new provider adapters to src/synth_brain/llm/providers/
    • Panel expands from 3 to 5 providers
    • Update divergence logic for 5-provider panel
    • Amendment to ADR-0014 per existing plan
  2. “Add my input” button (~1 day)

    • Currently stubbed as “Coming soon” in Chamber UI
    • Implement: founder types insight → arbiter re-runs with founder input as additional context
    • This is the Contributor role from CriticalityPolicy Level 3
  3. Minor UX polish (~0.5 day)

    • Gemini tab name truncation fix
    • Any remaining items from tester feedback

Week 6 (Apr 24-30): M1 → 8/10 + External Testers + Backlog

Primary: M1 Quality (7.5 → 8/10)

  • Fix remaining P1/P2 failures
  • Target: ≥ 8/10 on BADs Russia
  • Success criteria: Judge score ≥ 8/10 on at least 2 consecutive runs

External Testing (ADR-0013 unblock)

  • Recruit 2+ external testers (beyond Aida) to test M1 via app.synth-nova.com
  • Collect structured feedback per FEEDBACK_LOG format
  • This unblocks M2 per ADR-0013 requirement: “M1 tested by 3+ external users”

Parallel: Backlog Items

  • google.genai migration (tech debt from M3 v1, ~1 hour)
  • Pipeline speed optimization (currently 16-19 min ceiling)
  • Role files in 03-Roles/ — add “Must comply with Constitution” (follow-up ADR-0012)

Weeks 7-8 (May 1-14): M2 Team Implementation Navigator

Prerequisites (must be met before starting)

  • M1 score ≥ 8/10 on BADs Russia
  • M1 tested by 3+ external users with positive feedback
  • Aida available for M2 QA (freed from M1 regression work)

Scope (from ADR-0013)

  • 4 new agents: ProfileHarvester, ClaimVerifier, DigitalPresenceAnalyzer, TrackRecordResearcher
  • Consent model (explicit consent only, v1)
  • Verification pipeline with delta-based discrepancy detection
  • 10-section report structure
  • Integration with M1 for Combined mode (M1+M2)
  • Streamlit UI page (same pattern as Chamber)

Estimated effort: 2-3 weeks

Risk Register

RiskMitigation
M1 failures deeper than expected (need >1 week)Week 6 is buffer; M3 wins fill time
Can’t find 3+ external testersUse Chamber to deliberate alternative unblock criteria
Grok/DeepSeek API access issuesThese are nice-to-have; don’t block M1 work
M2 scope creepADR-0013 defines scope; stick to v1

Decision Log

  • 2026-04-17: Chamber recommended M1-first (3x unanimous, 9/10). Founder accepted with parallel M3 cherry-pick.
  • 2026-04-17: ADR-0013 blocker confirmed — M2 requires 3+ external M1 users.
  • 2026-04-17: M1 baseline measured at 5.75/10 (worse than estimated 6.25). P0: Scout 0 actors + Director no validation. Two architectural ADRs proposed: RAG Memory (ADR-0018), Cross-Model Validation (ADR-0019).

Cross-References

Progress Log

Day 1 (2026-04-17)

  • M1 Quality Sprint: 5.75/10 → 7.0/10 PASS (6 rounds, 6 pipeline fixes)
  • Pipeline fixes: Director meta patch, Researcher schema validation + retry without web_search, Scout max_tokens 8K→16K + truncation detection + segment preservation, global timeout 1200→2400, per-agent timeout enforcement
  • Cost reduction: 1.65/run (-34%)
  • Reliability: timeout crashes eliminated, json_parse_failed fixed
  • Remaining for 8/10: Researcher TAM/SAM methodology, Aggregate executive summary, Scout source quality
  • Chamber UI: shipped with shareable URLs, progress phases, 6 bugfixes
  • Manifest: ADR-0017 (M4), ADR-0018 (RAG), ADR-0019 (Cross-Model), Sprint Plan, CLAUDE.md

Day 3 (2026-04-18)

  • M1 Quality Sprint: 5.75/10 → 7.0-7.75 zone (20 rounds, prompt-level ceiling reached)
  • Best run: 7.75/10 (Round 11) — Researcher 8.0, Aggregate 9.0
  • Pipeline fixes: Director streaming, parsed_output sanitization, type-safe rendering, partial save, finalization watchdog, orphan cleanup, confidence calibration, nested field propagation, Director truncation retry
  • Structural findings: stale worker root cause (+1.0 to score), systemd worker disabled (race condition), prompt ceiling ~7.5±0.5
  • Cost reduction: 2.04/run (-18%)
  • Next: Track C (Grok/DeepSeek Chamber providers), then ADR-0018 RAG + ADR-0019 Cross-Model for 8.0+
  • Manifest expansion: 15 files, 3 ADR (0020-0022), wiki rebuilt (agent-hub, 404, ignorePatterns)
  • RAG: manifest-search shipped (Qdrant + OpenAI embeddings, 538 chunks, HTTP API on port 8080)
  • Wiki: wiki.synth-nova.com fully rebuilt — agent-hub main page, custom 404, Glossary quick-ref, onboarding system