Module M3: Deliberation Chamber

One-liner: Multi-LLM deliberation with arbitrated quorum answer.

Governance: Operates under Constitution. Particularly relevant: Law 5 (Human Veto — delegation requires consent), Law 8 (Tokens are capital — multi-LLM queries cost 3-5x single query), Law 7 (Verify — multi-source verification is literal implementation of this Law).

Product Vision

When a high-uncertainty strategic question appears, asking a single LLM risks single-model bias and missing perspectives. M3 provides a structured way to query multiple independent LLMs, let them exchange views, and arbitrate the most representative answer.

Problem this solves

Founder’s current manual workflow:

Asks Claude → gets answer A
Copies question to Perplexity → gets answer B
Copies to ChatGPT → gets answer C
Manually compares, synthesizes
Returns to Claude with synthesized view for implementation

This is “courier work” — mechanical, error-prone (founder can miss nuances), slow (context-switching costs). Also inconsistent (founder may not query all three every time, introducing bias).

M3 automates the courier work while preserving the epistemic value of multi-model consultation.

Target users

Primary: Synth Nova pipeline — agents (CEO, Director, Judge) can propose Chamber session when they face high-uncertainty decisions. Pipeline never auto-delegates — founder approval required.
Secondary: Founder direct use — founder can manually open Chamber with a question (“should we prioritize M2 or M3 next quarter?”). Same arbitrated quorum result.
Future (v2+): External users — premium feature for product users who want “GPT+Claude+Gemini consensus” on their strategic questions.

Differentiation

Consensus vs single-LLM: reduces risk of single-model biases and blind spots
Structured arbitration: not just voting or averaging — arbiter explicitly evaluates methodology and evidence quality
Auditable: all participating LLM responses preserved, arbiter reasoning documented (Constitution Law 6: No Important Decision Without Trace)
Human-gated: founder approves every delegation (Constitution Law 5)

Trigger Model

Delegation requires explicit founder approval. The system never auto-delegates to Chamber.

Two trigger paths:

Path A — Pipeline proposal:

Agent encounters high-uncertainty question (low confidence, conflicting evidence, novel domain)
Agent proposes “Consider Chamber session for question X” to founder
Founder approves or declines
On approval, Chamber session runs

Path B — Founder initiation:

Founder manually opens Chamber UI/CLI
Enters question + context
Chamber session runs

LLM Panel (v1)

Three independent providers for minimum viable quorum:

LLM	Provider	Role
Claude Sonnet 4	Anthropic	Panelist + Arbiter (dual role — see Arbitration)
GPT-4	OpenAI	Panelist
Gemini Pro	Google	Panelist

Rationale for v1 panel:

Three providers → true diversity of training data, alignment approaches, biases
All three have stable APIs with reasonable pricing
Minimum viable quorum: 3 panelists can have disagreement resolution (2 vs 1) while 2 would just tie

v2+ candidates (deferred, see Integration Triage Policy ADR-0011):

Grok (xAI) — different worldview, but API access limited
DeepSeek — cheap but geopolitical considerations
Perplexity — strong on web-research questions, but expensive API
Mistral, Llama — open-source diversity for specific technical questions

Deliberation Structure

Phase 1: Independent Responses

Each LLM receives the same question + context, independently produces initial answer.

No cross-talk between models at this phase
Each answer includes: stance, reasoning, confidence, key evidence cited
Responses preserved verbatim for audit trail

Phase 2: Cross-Examination (optional)

If initial responses diverge significantly (see Divergence Detection below):

Each model sees other models’ responses
Each invited to address specific disagreements
Responses explicitly labeled: “confirming”, “revising”, “standing by with counter-argument”
One round maximum — no infinite loops

Phase 3: Arbitration

Claude Sonnet 4 (arbiter) synthesizes:

Areas of consensus (where panelists agree)
Areas of disagreement (where they differ, why)
Weighted assessment of evidence quality per position
Final synthesized answer with explicit acknowledgment of uncertainty
Arbiter’s own reasoning for the synthesis

Divergence Detection

Cross-examination phase triggered when:

Panelists disagree on stance (e.g., one says “yes”, others “no”)
Confidence scores vary by >30%
Different key facts cited (indicating different underlying data)
Arbiter’s initial read suggests meaningful difference

If panelists converge (all say similar things with similar confidence) — skip Phase 2, arbiter synthesizes directly.

Arbitration Details

Arbiter = Claude Sonnet 4. Same model as synth-brain Judge agent (consistency, familiar calibration).

Why not a separate fourth LLM:

Adds cost (4th API call per session)
Adds complexity (fallback chains if one arbiter fails)
Claude as arbiter evaluates methodology and evidence, not content authorship — conflict of interest low
v2 can introduce “second opinion arbiter” if methodology review shows Claude bias in arbitration

Arbiter’s task, explicitly:

Does NOT vote on “which LLM is right”
DOES identify where evidence is strongest
DOES distinguish genuine disagreement from surface-level wording differences
DOES return uncertainty honestly — “all three disagree with low confidence” is a valid answer
DOES preserve minority viewpoints in output (not just majority wins)

Report Structure

Each Chamber session produces:

Question: [original question]
Context provided: [context fed to panelists]

## Panelist Responses (verbatim)
### Claude Sonnet 4
[response]

### GPT-4
[response]

### Gemini Pro
[response]

## Divergence Analysis
- Consensus areas: ...
- Disagreement areas: ...

## Cross-Examination (if triggered)
[second-round responses]

## Arbiter Synthesis
[Claude Sonnet 4 arbiter's synthesized answer]

## Confidence Assessment
- Synthesis confidence: X/10
- Dissent level: low/medium/high
- Recommended action level: proceed / proceed with caveats / require further investigation

Integration with M1 / M2

M3 can be invoked from:

M1 pipeline — when Research confidence < threshold, or Financial Modeler sees contradictions, or Judge gives FAIL on architectural decision
M2 pipeline — when Team Assessment has ambiguous domain-fit signals
Direct founder request — manual Chamber session, standalone

In all cases, trigger is explicit founder approval. Pipeline agents can propose but not commit.

Cost Model

Per session (v1 panel):

Claude Sonnet 4 (panelist): ~$0.05-0.10
GPT-4 (panelist): ~$0.05-0.15
Gemini Pro (panelist): ~$0.02-0.05
Claude Sonnet 4 (arbiter): ~$0.10-0.20 (larger context — sees all panelist responses)
Cross-examination phase (if triggered): +50% to panelist costs

Typical session: $0.30 - 0.80 E sc a l a t e d sess i o n (f u ll cross - e x am) :$ 0.50-1.20

Budget thresholds per Constitution Law 8 / DecisionRights:

Session < $1: Low approval (founder notify, auto-proceed)
Session $1-3: Standard approval required
Session > $3: unusual, approve required, review necessity

v1 Scope

Included:

Three-LLM panel (Claude, GPT-4, Gemini)
Two-phase deliberation (independent + optional cross-examination)
Claude Sonnet 4 arbitration
Text output report
CLI interface for direct founder use
Programmatic API for pipeline agents to propose sessions

Excluded (v2+):

Web UI for Chamber sessions (use CLI/Streamlit basic form for v1)
Additional LLMs (Grok, DeepSeek, Perplexity, open-source models)
Second-opinion arbiter
Multi-round cross-examination (beyond one round)
Persistent Chamber “memory” across sessions
Real-time streaming of panelist thinking

v2 Vision (post-v1)

v1 ships Chamber as a standalone CLI (and programmatic API) for founder-initiated or pipeline-proposed sessions, always gated by explicit founder approval per session. v2 evolves Chamber from a tool invoked beside the Investment Navigator to a decision mechanism embedded within the Navigator’s UX and pipeline. This section captures the target state; ratified in ADR-0016-chamber-v2-vision.

Chamber as embedded decision mechanism

In v2, Chamber is not a separate CLI the founder remembers to open. It is the resolution path the Investment Navigator falls into whenever a pipeline question meets criticality thresholds. Most pipeline questions never reach Chamber (Level 1 autoresolve). The ones that do are surfaced in-context with all upstream Researcher / Financial Modeler / Judge evidence already attached — the founder does not re-formulate the question.

Two trigger modes

Automatic trigger — the pipeline detects uncertainty per CriticalityPolicy (Level 2 or Level 3 criteria fire) and invokes Chamber without needing a manual step. Level 2 runs without founder. Level 3 blocks the pipeline and requests founder participation.
Manual trigger — the founder opens Chamber from the Navigator UI with a specific question or a pipeline-stage result the founder wants to stress-test. This path behaves as a Level 3 session (founder-initiated → founder participates).

Founder-as-participant model

In v1, the founder’s role is binary: approve the session, read the output, take action. In v2, the founder can participate inside the session itself in one of four modes per CriticalityPolicy §Level 3:

Approver — accept or reject a clear quorum answer
Challenger — send agents back to re-check specific aspects with new data
Contributor — inject insider knowledge as a fourth panelist voice
Verifier — state a claim for agents to fact-check before it influences the quorum

Operational rules for founder input structure and arbiter behavior are in MultiLLMDeliberationPolicy §Founder Participation Rules.

Three-level criticality system

Every pipeline question is classified into one of three levels at decision time. See CriticalityPolicy for full definitions. Summary:

Level 1 — agent resolves alone (no Chamber)
Level 2 — Chamber auto-quorum runs without founder; auto-accept if arbiter confidence ≥ 8/10, auto-escalate to Level 3 if < 8/10
Level 3 — full Chamber with founder participation; always required for Go/No-Go

Go/No-Go as the ultimate output

When Chamber is invoked for a niche decision, there are exactly three valid outcomes:

GO — enter the niche; pipeline continues to execution planning
NO-GO — reject the niche; pipeline halts; rejection reasoning preserved
NEED MORE DATA — insufficient evidence; agents receive a specific data-gathering assignment; quorum re-runs with new data

NEED MORE DATA is a first-class outcome, not a deferral — it prevents both false-GO (acting on bad inputs) and false-NO-GO (rejecting on insufficient inputs). See CriticalityPolicy §Go/No-Go for the BADs Russia example.

Streamlit UI integration plan

v2 surfaces Chamber as a tab in app.synth-nova.com (the Streamlit app that hosts the Investment Navigator). The tab renders:

The question and auto-populated context (upstream pipeline artifacts)
Each panelist’s response as it streams in (stance, reasoning, confidence, evidence)
The divergence analysis
A founder-input field for Level 3 sessions (role selector: Approver / Challenger / Contributor / Verifier; structured entry for Contributor/Verifier modes)
The arbiter synthesis with explicit confidence and dissent level
The rendered report inline, with links to the persisted session artifact

Level 2 sessions render as read-only post-factum review (founder can open any completed Level 2 session and see the full trace).

Pipeline integration

Pipeline agents (Scout, Researcher, Financial Modeler, Judge) evaluate criticality at every decision point per CriticalityPolicy §Criticality Assessment Procedure. When Level 2 or Level 3 criteria are met, the agent invokes Chamber via EscalationPolicy Trigger 7. Agents do not independently decide to “skip” Chamber when criteria are met — criticality assessment is enforceable, not advisory.

Context auto-population

When Chamber is triggered from the pipeline, the session context is assembled automatically from:

Researcher findings (sources, claims, confidence per claim)
Financial Model data (inputs, assumptions, output ranges, ROI calculations)
Judge assessment (score, dissent points, failure modes flagged)
Prior Chamber sessions on the same niche (if any) for continuity

The founder does not manually formulate the context. This removes “courier work” end-to-end: v1 removed the courier work between LLMs; v2 removes the courier work between pipeline stages and Chamber.

Inputs

Required:

Question (text)
Context (text or markdown, up to ~10K tokens)
Approval (founder confirmation for delegation)

Optional:

Specific LLMs to include/exclude (default: all v1 panel)
Max rounds of cross-examination (default: 1)
Preferred arbiter (default: Claude Sonnet 4)
Output format preference (default: structured markdown)

Timeline Placeholder

Implementation sprint: parallel to M2 implementation OR after M2. M3 has lower dependency on M1 internals than M2 does — Chamber doesn’t need to integrate with Scout/Researcher/FinancialModeler. Chamber is more standalone.

Estimated effort:

Panelist adapters (Claude, GPT-4, Gemini): ~2-3 days (API integrations)
Divergence detection: ~1 day
Cross-examination orchestration: ~1 day
Arbiter integration: ~1 day (Claude API already in place)
Report generation: ~1 day
CLI interface: ~0.5 day
Integration hooks for M1/M2 pipelines: ~1 day
Testing + Judge calibration (arbiter quality): ~2 days

Total: ~1.5-2 weeks work when sprint begins.

Success Metrics (for M3 rollout)

Arbiter quality: founder agrees with arbiter synthesis in ≥80% of sessions (human calibration)
Divergence capture: when panelists actually disagree, arbiter reports it (not flattening to false consensus)
Cost: ≤ $1.00 per typical session
Duration: ≤ 3 minutes per session (parallel API calls)
Useful disagreement rate: sessions where minority view was valuable (subjective but tracked)

Cross-References

Constitution: Constitution — all Laws, especially 5/7/8
M1: Niche-Evaluation-Module — may invoke M3 on uncertainty
M2: Team-Implementation-Module — may invoke M3 on team-fit ambiguity
Multi-LLM Deliberation Policy: MultiLLMDeliberationPolicy — operational rules
Integration Triage Policy: IntegrationTriagePolicy — applied to LLM provider choices
Conflict Resolution: ConflictResolution — Judge agent in synth-brain handles intra-system conflicts; M3 handles external-knowledge conflicts
ADR-0014: this module’s decision record

Synth Nova Manifest

Explorer

Deliberation-Chamber-Module