Module M3: Deliberation Chamber

One-liner: Multi-LLM deliberation with arbitrated quorum answer.

Governance: Operates under Constitution. Particularly relevant: Law 5 (Human Veto — delegation requires consent), Law 8 (Tokens are capital — multi-LLM queries cost 3-5x single query), Law 7 (Verify — multi-source verification is literal implementation of this Law).

Product Vision

When a high-uncertainty strategic question appears, asking a single LLM risks single-model bias and missing perspectives. M3 provides a structured way to query multiple independent LLMs, let them exchange views, and arbitrate the most representative answer.

Problem this solves

Founder’s current manual workflow:

  1. Asks Claude → gets answer A
  2. Copies question to Perplexity → gets answer B
  3. Copies to ChatGPT → gets answer C
  4. Manually compares, synthesizes
  5. Returns to Claude with synthesized view for implementation

This is “courier work” — mechanical, error-prone (founder can miss nuances), slow (context-switching costs). Also inconsistent (founder may not query all three every time, introducing bias).

M3 automates the courier work while preserving the epistemic value of multi-model consultation.

Target users

  • Primary: Synth Nova pipeline — agents (CEO, Director, Judge) can propose Chamber session when they face high-uncertainty decisions. Pipeline never auto-delegates — founder approval required.
  • Secondary: Founder direct use — founder can manually open Chamber with a question (“should we prioritize M2 or M3 next quarter?”). Same arbitrated quorum result.
  • Future (v2+): External users — premium feature for product users who want “GPT+Claude+Gemini consensus” on their strategic questions.

Differentiation

  • Consensus vs single-LLM: reduces risk of single-model biases and blind spots
  • Structured arbitration: not just voting or averaging — arbiter explicitly evaluates methodology and evidence quality
  • Auditable: all participating LLM responses preserved, arbiter reasoning documented (Constitution Law 6: No Important Decision Without Trace)
  • Human-gated: founder approves every delegation (Constitution Law 5)

Trigger Model

Delegation requires explicit founder approval. The system never auto-delegates to Chamber.

Two trigger paths:

Path A — Pipeline proposal:

  1. Agent encounters high-uncertainty question (low confidence, conflicting evidence, novel domain)
  2. Agent proposes “Consider Chamber session for question X” to founder
  3. Founder approves or declines
  4. On approval, Chamber session runs

Path B — Founder initiation:

  1. Founder manually opens Chamber UI/CLI
  2. Enters question + context
  3. Chamber session runs

LLM Panel (v1)

Three independent providers for minimum viable quorum:

LLMProviderRole
Claude Sonnet 4AnthropicPanelist + Arbiter (dual role — see Arbitration)
GPT-4OpenAIPanelist
Gemini ProGooglePanelist

Rationale for v1 panel:

  • Three providers → true diversity of training data, alignment approaches, biases
  • All three have stable APIs with reasonable pricing
  • Minimum viable quorum: 3 panelists can have disagreement resolution (2 vs 1) while 2 would just tie

v2+ candidates (deferred, see Integration Triage Policy ADR-0011):

  • Grok (xAI) — different worldview, but API access limited
  • DeepSeek — cheap but geopolitical considerations
  • Perplexity — strong on web-research questions, but expensive API
  • Mistral, Llama — open-source diversity for specific technical questions

Deliberation Structure

Phase 1: Independent Responses

Each LLM receives the same question + context, independently produces initial answer.

  • No cross-talk between models at this phase
  • Each answer includes: stance, reasoning, confidence, key evidence cited
  • Responses preserved verbatim for audit trail

Phase 2: Cross-Examination (optional)

If initial responses diverge significantly (see Divergence Detection below):

  • Each model sees other models’ responses
  • Each invited to address specific disagreements
  • Responses explicitly labeled: “confirming”, “revising”, “standing by with counter-argument”
  • One round maximum — no infinite loops

Phase 3: Arbitration

Claude Sonnet 4 (arbiter) synthesizes:

  • Areas of consensus (where panelists agree)
  • Areas of disagreement (where they differ, why)
  • Weighted assessment of evidence quality per position
  • Final synthesized answer with explicit acknowledgment of uncertainty
  • Arbiter’s own reasoning for the synthesis

Divergence Detection

Cross-examination phase triggered when:

  • Panelists disagree on stance (e.g., one says “yes”, others “no”)
  • Confidence scores vary by >30%
  • Different key facts cited (indicating different underlying data)
  • Arbiter’s initial read suggests meaningful difference

If panelists converge (all say similar things with similar confidence) — skip Phase 2, arbiter synthesizes directly.

Arbitration Details

Arbiter = Claude Sonnet 4. Same model as synth-brain Judge agent (consistency, familiar calibration).

Why not a separate fourth LLM:

  • Adds cost (4th API call per session)
  • Adds complexity (fallback chains if one arbiter fails)
  • Claude as arbiter evaluates methodology and evidence, not content authorship — conflict of interest low
  • v2 can introduce “second opinion arbiter” if methodology review shows Claude bias in arbitration

Arbiter’s task, explicitly:

  • Does NOT vote on “which LLM is right”
  • DOES identify where evidence is strongest
  • DOES distinguish genuine disagreement from surface-level wording differences
  • DOES return uncertainty honestly — “all three disagree with low confidence” is a valid answer
  • DOES preserve minority viewpoints in output (not just majority wins)

Report Structure

Each Chamber session produces:

Question: [original question]
Context provided: [context fed to panelists]

## Panelist Responses (verbatim)
### Claude Sonnet 4
[response]

### GPT-4
[response]

### Gemini Pro
[response]

## Divergence Analysis
- Consensus areas: ...
- Disagreement areas: ...

## Cross-Examination (if triggered)
[second-round responses]

## Arbiter Synthesis
[Claude Sonnet 4 arbiter's synthesized answer]

## Confidence Assessment
- Synthesis confidence: X/10
- Dissent level: low/medium/high
- Recommended action level: proceed / proceed with caveats / require further investigation

Integration with M1 / M2

M3 can be invoked from:

  • M1 pipeline — when Research confidence < threshold, or Financial Modeler sees contradictions, or Judge gives FAIL on architectural decision
  • M2 pipeline — when Team Assessment has ambiguous domain-fit signals
  • Direct founder request — manual Chamber session, standalone

In all cases, trigger is explicit founder approval. Pipeline agents can propose but not commit.

Cost Model

Per session (v1 panel):

  • Claude Sonnet 4 (panelist): ~$0.05-0.10
  • GPT-4 (panelist): ~$0.05-0.15
  • Gemini Pro (panelist): ~$0.02-0.05
  • Claude Sonnet 4 (arbiter): ~$0.10-0.20 (larger context — sees all panelist responses)
  • Cross-examination phase (if triggered): +50% to panelist costs

Typical session: 0.50-1.20

Budget thresholds per Constitution Law 8 / DecisionRights:

  • Session < $1: Low approval (founder notify, auto-proceed)
  • Session $1-3: Standard approval required
  • Session > $3: unusual, approve required, review necessity

v1 Scope

Included:

  • Three-LLM panel (Claude, GPT-4, Gemini)
  • Two-phase deliberation (independent + optional cross-examination)
  • Claude Sonnet 4 arbitration
  • Text output report
  • CLI interface for direct founder use
  • Programmatic API for pipeline agents to propose sessions

Excluded (v2+):

  • Web UI for Chamber sessions (use CLI/Streamlit basic form for v1)
  • Additional LLMs (Grok, DeepSeek, Perplexity, open-source models)
  • Second-opinion arbiter
  • Multi-round cross-examination (beyond one round)
  • Persistent Chamber “memory” across sessions
  • Real-time streaming of panelist thinking

v2 Vision (post-v1)

v1 ships Chamber as a standalone CLI (and programmatic API) for founder-initiated or pipeline-proposed sessions, always gated by explicit founder approval per session. v2 evolves Chamber from a tool invoked beside the Investment Navigator to a decision mechanism embedded within the Navigator’s UX and pipeline. This section captures the target state; ratified in ADR-0016-chamber-v2-vision.

Chamber as embedded decision mechanism

In v2, Chamber is not a separate CLI the founder remembers to open. It is the resolution path the Investment Navigator falls into whenever a pipeline question meets criticality thresholds. Most pipeline questions never reach Chamber (Level 1 autoresolve). The ones that do are surfaced in-context with all upstream Researcher / Financial Modeler / Judge evidence already attached — the founder does not re-formulate the question.

Two trigger modes

  • Automatic trigger — the pipeline detects uncertainty per CriticalityPolicy (Level 2 or Level 3 criteria fire) and invokes Chamber without needing a manual step. Level 2 runs without founder. Level 3 blocks the pipeline and requests founder participation.
  • Manual trigger — the founder opens Chamber from the Navigator UI with a specific question or a pipeline-stage result the founder wants to stress-test. This path behaves as a Level 3 session (founder-initiated → founder participates).

Founder-as-participant model

In v1, the founder’s role is binary: approve the session, read the output, take action. In v2, the founder can participate inside the session itself in one of four modes per CriticalityPolicy §Level 3:

  • Approver — accept or reject a clear quorum answer
  • Challenger — send agents back to re-check specific aspects with new data
  • Contributor — inject insider knowledge as a fourth panelist voice
  • Verifier — state a claim for agents to fact-check before it influences the quorum

Operational rules for founder input structure and arbiter behavior are in MultiLLMDeliberationPolicy §Founder Participation Rules.

Three-level criticality system

Every pipeline question is classified into one of three levels at decision time. See CriticalityPolicy for full definitions. Summary:

  • Level 1 — agent resolves alone (no Chamber)
  • Level 2 — Chamber auto-quorum runs without founder; auto-accept if arbiter confidence ≥ 8/10, auto-escalate to Level 3 if < 8/10
  • Level 3 — full Chamber with founder participation; always required for Go/No-Go

Go/No-Go as the ultimate output

When Chamber is invoked for a niche decision, there are exactly three valid outcomes:

  • GO — enter the niche; pipeline continues to execution planning
  • NO-GO — reject the niche; pipeline halts; rejection reasoning preserved
  • NEED MORE DATA — insufficient evidence; agents receive a specific data-gathering assignment; quorum re-runs with new data

NEED MORE DATA is a first-class outcome, not a deferral — it prevents both false-GO (acting on bad inputs) and false-NO-GO (rejecting on insufficient inputs). See CriticalityPolicy §Go/No-Go for the BADs Russia example.

Streamlit UI integration plan

v2 surfaces Chamber as a tab in app.synth-nova.com (the Streamlit app that hosts the Investment Navigator). The tab renders:

  • The question and auto-populated context (upstream pipeline artifacts)
  • Each panelist’s response as it streams in (stance, reasoning, confidence, evidence)
  • The divergence analysis
  • A founder-input field for Level 3 sessions (role selector: Approver / Challenger / Contributor / Verifier; structured entry for Contributor/Verifier modes)
  • The arbiter synthesis with explicit confidence and dissent level
  • The rendered report inline, with links to the persisted session artifact

Level 2 sessions render as read-only post-factum review (founder can open any completed Level 2 session and see the full trace).

Pipeline integration

Pipeline agents (Scout, Researcher, Financial Modeler, Judge) evaluate criticality at every decision point per CriticalityPolicy §Criticality Assessment Procedure. When Level 2 or Level 3 criteria are met, the agent invokes Chamber via EscalationPolicy Trigger 7. Agents do not independently decide to “skip” Chamber when criteria are met — criticality assessment is enforceable, not advisory.

Context auto-population

When Chamber is triggered from the pipeline, the session context is assembled automatically from:

  • Researcher findings (sources, claims, confidence per claim)
  • Financial Model data (inputs, assumptions, output ranges, ROI calculations)
  • Judge assessment (score, dissent points, failure modes flagged)
  • Prior Chamber sessions on the same niche (if any) for continuity

The founder does not manually formulate the context. This removes “courier work” end-to-end: v1 removed the courier work between LLMs; v2 removes the courier work between pipeline stages and Chamber.

Inputs

Required:

  • Question (text)
  • Context (text or markdown, up to ~10K tokens)
  • Approval (founder confirmation for delegation)

Optional:

  • Specific LLMs to include/exclude (default: all v1 panel)
  • Max rounds of cross-examination (default: 1)
  • Preferred arbiter (default: Claude Sonnet 4)
  • Output format preference (default: structured markdown)

Timeline Placeholder

Implementation sprint: parallel to M2 implementation OR after M2. M3 has lower dependency on M1 internals than M2 does — Chamber doesn’t need to integrate with Scout/Researcher/FinancialModeler. Chamber is more standalone.

Estimated effort:

  • Panelist adapters (Claude, GPT-4, Gemini): ~2-3 days (API integrations)
  • Divergence detection: ~1 day
  • Cross-examination orchestration: ~1 day
  • Arbiter integration: ~1 day (Claude API already in place)
  • Report generation: ~1 day
  • CLI interface: ~0.5 day
  • Integration hooks for M1/M2 pipelines: ~1 day
  • Testing + Judge calibration (arbiter quality): ~2 days

Total: ~1.5-2 weeks work when sprint begins.

Success Metrics (for M3 rollout)

  • Arbiter quality: founder agrees with arbiter synthesis in ≥80% of sessions (human calibration)
  • Divergence capture: when panelists actually disagree, arbiter reports it (not flattening to false consensus)
  • Cost: ≤ $1.00 per typical session
  • Duration: ≤ 3 minutes per session (parallel API calls)
  • Useful disagreement rate: sessions where minority view was valuable (subjective but tracked)

Cross-References