ADR-0014: M3 Deliberation Chamber

Status

accepted

Context

Founder currently performs manual courier work between LLMs for strategic questions: asks Claude, then Perplexity for second opinion, then ChatGPT, manually synthesizes, returns to Claude for implementation. This is slow, error-prone, inconsistent (founder may skip querying some LLMs), and does not capture an audit trail. At the same time, single-LLM answers risk model-specific biases that can propagate into Synth Nova decisions. A structured multi-LLM deliberation module would automate courier work while preserving the epistemic value of diverse perspectives.

M3 is the literal implementation of Constitution Law 7 (Verify — trust no single source) for strategic questions, and must operate within Law 5 (Human Veto) and Law 8 (Tokens are Capital) because multi-LLM sessions cost 3-5x a single query.

Decision

Add Module M3 — Deliberation Chamber to the Synth Nova roadmap. M3 provides structured multi-LLM deliberation with arbitrated synthesis.

Core commitments:

  1. v1 panel: Claude Sonnet 4, GPT-4, Gemini Pro. Three independent providers for minimum viable quorum.
  2. Arbiter: Claude Sonnet 4 (shared with existing Judge agent in synth-brain). Conflict-of-interest mitigations defined in MultiLLMDeliberationPolicy; arbiter evaluates methodology and evidence, not content authorship.
  3. Trigger model: pipeline agents or founder may propose; founder must explicitly approve each session (Constitution Law 5). System never auto-delegates.
  4. Two-phase deliberation: independent Phase 1 responses, then optional one-round cross-examination if divergence detected. No infinite loops.
  5. Honest uncertainty: arbiter may return “no consensus — founder review recommended” as a valid outcome. Forced synthesis when panelists genuinely disagree is prohibited.
  6. Strategic document and operational policy ratified now; implementation sprint parallel to M2 or after M2 based on capacity.

Alternatives Considered

Option A: Continue manual courier workflow

  • Pros: zero infrastructure cost; fully under founder control.
  • Cons: slow, error-prone, no audit trail, inconsistent (founder may skip LLMs). Doesn’t scale as pipeline matures.

Option B: Rely on single LLM (Claude) for all strategic questions

  • Pros: simplest; cheapest; no new integrations.
  • Cons: single-model bias risk; reduces validation; violates spirit of Constitution Law 7 (Verify). A single model’s blind spot becomes a systemic blind spot.

Option C: Ensemble voting (majority wins)

  • Pros: simple aggregation; no arbiter cost.
  • Cons: epistemically weak. Averaging or majority-voting LLM outputs loses the signal — minority view may be correct, and reasoning quality matters more than vote count.

Option D: Use external service (e.g., Consensus, LMSYS)

  • Pros: no build cost.
  • Cons: external dependency; per-call cost; lack of customization for Synth Nova integration points (CEO/Director/Judge pipeline hooks); audit trail lives outside our observability stack.

Option E: Structured deliberation with arbitrated synthesis ← chosen

  • Pros: preserves verbatim panelist responses (audit trail); arbiter evaluates evidence quality, not vote count; honest-uncertainty outcomes allowed; integrates with M1/M2 pipelines; human-gated per Law 5.
  • Cons: 3-5x cost vs single LLM; arbiter conflict-of-interest (Claude-as-both-panelist-and-arbiter) requires explicit mitigation; additional API account setup (OpenAI, Google).
  • Why chosen: directly replaces manual courier work, operationalizes Law 7 for strategic decisions, and establishes an auditable multi-model consultation primitive that M1 and M2 can invoke.

Consequences

Positive:

  • New strategic asset in manifest: 07-Roadmap/Deliberation-Chamber-Module.md.
  • New operational policy: 05-Rules/MultiLLMDeliberationPolicy.md.
  • Multi-source validation primitive available to M1 (Research confidence < threshold, Financial Modeler contradictions, Judge FAIL) and M2 (ambiguous team-fit signals).
  • Audit trail per Law 6: verbatim panelist responses, arbiter reasoning, founder’s subsequent action all preserved.
  • Potential evolution path: M3 can become external product feature (v2+) if internal use demonstrates value.

Negative / Trade-offs:

  • New infrastructure: adapters for GPT-4 (OpenAI API) and Gemini (Google API) — new API keys, new cost tracking, new failure modes.
  • Per-session cost ~100 — non-zero recurring expense.
  • Arbiter conflict-of-interest requires active monitoring (if Claude-panelist view wins >50% of arbitrated cases, re-evaluate arbiter choice).
  • Founder approval friction per session (intentional per Law 5, but real cost in flow).

Mitigations:

  • Rate limits and cost caps defined in MultiLLMDeliberationPolicy (10 sessions/day, 100/month budget).
  • Conflict-of-interest mitigations (explicit arbiter system prompt, self-check, periodic founder review).
  • Integration Triage Policy (IntegrationTriagePolicy) governs any v2+ panelist additions (Grok, DeepSeek, Perplexity, Mistral, Llama).
  • Honest-uncertainty mandate prevents false-consensus outputs.

Follow-ups

  • OpenAI API account setup (billing separate from Anthropic).
  • Google Gemini API account setup.
  • Detailed agent specs during M3 implementation sprint (panelist adapters, divergence detector, arbiter orchestrator).
  • Arbiter calibration process (founder reviews first 10-20 sessions for arbitration quality).
  • CLI-only v1 vs Streamlit integration decision.
  • Per-agent ADR for each panelist adapter during implementation.
  • Review: if Claude-panelist view wins >50% of arbitrated cases, re-evaluate arbiter choice (avoid Claude echo chamber).
  • Integration hooks spec for M1 (Research, Financial Modeler, Judge) and M2 (Team Assessment) triggers.

References