Module M3: Deliberation Chamber
One-liner: Multi-LLM deliberation with arbitrated quorum answer.
Governance: Operates under Constitution. Particularly relevant: Law 5 (Human Veto — delegation requires consent), Law 8 (Tokens are capital — multi-LLM queries cost 3-5x single query), Law 7 (Verify — multi-source verification is literal implementation of this Law).
Product Vision
When a high-uncertainty strategic question appears, asking a single LLM risks single-model bias and missing perspectives. M3 provides a structured way to query multiple independent LLMs, let them exchange views, and arbitrate the most representative answer.
Problem this solves
Founder’s current manual workflow:
- Asks Claude → gets answer A
- Copies question to Perplexity → gets answer B
- Copies to ChatGPT → gets answer C
- Manually compares, synthesizes
- Returns to Claude with synthesized view for implementation
This is “courier work” — mechanical, error-prone (founder can miss nuances), slow (context-switching costs). Also inconsistent (founder may not query all three every time, introducing bias).
M3 automates the courier work while preserving the epistemic value of multi-model consultation.
Target users
- Primary: Synth Nova pipeline — agents (CEO, Director, Judge) can propose Chamber session when they face high-uncertainty decisions. Pipeline never auto-delegates — founder approval required.
- Secondary: Founder direct use — founder can manually open Chamber with a question (“should we prioritize M2 or M3 next quarter?”). Same arbitrated quorum result.
- Future (v2+): External users — premium feature for product users who want “GPT+Claude+Gemini consensus” on their strategic questions.
Differentiation
- Consensus vs single-LLM: reduces risk of single-model biases and blind spots
- Structured arbitration: not just voting or averaging — arbiter explicitly evaluates methodology and evidence quality
- Auditable: all participating LLM responses preserved, arbiter reasoning documented (Constitution Law 6: No Important Decision Without Trace)
- Human-gated: founder approves every delegation (Constitution Law 5)
Trigger Model
Delegation requires explicit founder approval. The system never auto-delegates to Chamber.
Two trigger paths:
Path A — Pipeline proposal:
- Agent encounters high-uncertainty question (low confidence, conflicting evidence, novel domain)
- Agent proposes “Consider Chamber session for question X” to founder
- Founder approves or declines
- On approval, Chamber session runs
Path B — Founder initiation:
- Founder manually opens Chamber UI/CLI
- Enters question + context
- Chamber session runs
LLM Panel (v1)
Three independent providers for minimum viable quorum:
| LLM | Provider | Role |
|---|---|---|
| Claude Sonnet 4 | Anthropic | Panelist + Arbiter (dual role — see Arbitration) |
| GPT-4 | OpenAI | Panelist |
| Gemini Pro | Panelist |
Rationale for v1 panel:
- Three providers → true diversity of training data, alignment approaches, biases
- All three have stable APIs with reasonable pricing
- Minimum viable quorum: 3 panelists can have disagreement resolution (2 vs 1) while 2 would just tie
v2+ candidates (deferred, see Integration Triage Policy ADR-0011):
- Grok (xAI) — different worldview, but API access limited
- DeepSeek — cheap but geopolitical considerations
- Perplexity — strong on web-research questions, but expensive API
- Mistral, Llama — open-source diversity for specific technical questions
Deliberation Structure
Phase 1: Independent Responses
Each LLM receives the same question + context, independently produces initial answer.
- No cross-talk between models at this phase
- Each answer includes: stance, reasoning, confidence, key evidence cited
- Responses preserved verbatim for audit trail
Phase 2: Cross-Examination (optional)
If initial responses diverge significantly (see Divergence Detection below):
- Each model sees other models’ responses
- Each invited to address specific disagreements
- Responses explicitly labeled: “confirming”, “revising”, “standing by with counter-argument”
- One round maximum — no infinite loops
Phase 3: Arbitration
Claude Sonnet 4 (arbiter) synthesizes:
- Areas of consensus (where panelists agree)
- Areas of disagreement (where they differ, why)
- Weighted assessment of evidence quality per position
- Final synthesized answer with explicit acknowledgment of uncertainty
- Arbiter’s own reasoning for the synthesis
Divergence Detection
Cross-examination phase triggered when:
- Panelists disagree on stance (e.g., one says “yes”, others “no”)
- Confidence scores vary by >30%
- Different key facts cited (indicating different underlying data)
- Arbiter’s initial read suggests meaningful difference
If panelists converge (all say similar things with similar confidence) — skip Phase 2, arbiter synthesizes directly.
Arbitration Details
Arbiter = Claude Sonnet 4. Same model as synth-brain Judge agent (consistency, familiar calibration).
Why not a separate fourth LLM:
- Adds cost (4th API call per session)
- Adds complexity (fallback chains if one arbiter fails)
- Claude as arbiter evaluates methodology and evidence, not content authorship — conflict of interest low
- v2 can introduce “second opinion arbiter” if methodology review shows Claude bias in arbitration
Arbiter’s task, explicitly:
- Does NOT vote on “which LLM is right”
- DOES identify where evidence is strongest
- DOES distinguish genuine disagreement from surface-level wording differences
- DOES return uncertainty honestly — “all three disagree with low confidence” is a valid answer
- DOES preserve minority viewpoints in output (not just majority wins)
Report Structure
Each Chamber session produces:
Question: [original question]
Context provided: [context fed to panelists]
## Panelist Responses (verbatim)
### Claude Sonnet 4
[response]
### GPT-4
[response]
### Gemini Pro
[response]
## Divergence Analysis
- Consensus areas: ...
- Disagreement areas: ...
## Cross-Examination (if triggered)
[second-round responses]
## Arbiter Synthesis
[Claude Sonnet 4 arbiter's synthesized answer]
## Confidence Assessment
- Synthesis confidence: X/10
- Dissent level: low/medium/high
- Recommended action level: proceed / proceed with caveats / require further investigation
Integration with M1 / M2
M3 can be invoked from:
- M1 pipeline — when Research confidence < threshold, or Financial Modeler sees contradictions, or Judge gives FAIL on architectural decision
- M2 pipeline — when Team Assessment has ambiguous domain-fit signals
- Direct founder request — manual Chamber session, standalone
In all cases, trigger is explicit founder approval. Pipeline agents can propose but not commit.
Cost Model
Per session (v1 panel):
- Claude Sonnet 4 (panelist): ~$0.05-0.10
- GPT-4 (panelist): ~$0.05-0.15
- Gemini Pro (panelist): ~$0.02-0.05
- Claude Sonnet 4 (arbiter): ~$0.10-0.20 (larger context — sees all panelist responses)
- Cross-examination phase (if triggered): +50% to panelist costs
Typical session: 0.50-1.20
Budget thresholds per Constitution Law 8 / DecisionRights:
- Session < $1: Low approval (founder notify, auto-proceed)
- Session $1-3: Standard approval required
- Session > $3: unusual, approve required, review necessity
v1 Scope
Included:
- Three-LLM panel (Claude, GPT-4, Gemini)
- Two-phase deliberation (independent + optional cross-examination)
- Claude Sonnet 4 arbitration
- Text output report
- CLI interface for direct founder use
- Programmatic API for pipeline agents to propose sessions
Excluded (v2+):
- Web UI for Chamber sessions (use CLI/Streamlit basic form for v1)
- Additional LLMs (Grok, DeepSeek, Perplexity, open-source models)
- Second-opinion arbiter
- Multi-round cross-examination (beyond one round)
- Persistent Chamber “memory” across sessions
- Real-time streaming of panelist thinking
v2 Vision (post-v1)
v1 ships Chamber as a standalone CLI (and programmatic API) for founder-initiated or pipeline-proposed sessions, always gated by explicit founder approval per session. v2 evolves Chamber from a tool invoked beside the Investment Navigator to a decision mechanism embedded within the Navigator’s UX and pipeline. This section captures the target state; ratified in ADR-0016-chamber-v2-vision.
Chamber as embedded decision mechanism
In v2, Chamber is not a separate CLI the founder remembers to open. It is the resolution path the Investment Navigator falls into whenever a pipeline question meets criticality thresholds. Most pipeline questions never reach Chamber (Level 1 autoresolve). The ones that do are surfaced in-context with all upstream Researcher / Financial Modeler / Judge evidence already attached — the founder does not re-formulate the question.
Two trigger modes
- Automatic trigger — the pipeline detects uncertainty per CriticalityPolicy (Level 2 or Level 3 criteria fire) and invokes Chamber without needing a manual step. Level 2 runs without founder. Level 3 blocks the pipeline and requests founder participation.
- Manual trigger — the founder opens Chamber from the Navigator UI with a specific question or a pipeline-stage result the founder wants to stress-test. This path behaves as a Level 3 session (founder-initiated → founder participates).
Founder-as-participant model
In v1, the founder’s role is binary: approve the session, read the output, take action. In v2, the founder can participate inside the session itself in one of four modes per CriticalityPolicy §Level 3:
- Approver — accept or reject a clear quorum answer
- Challenger — send agents back to re-check specific aspects with new data
- Contributor — inject insider knowledge as a fourth panelist voice
- Verifier — state a claim for agents to fact-check before it influences the quorum
Operational rules for founder input structure and arbiter behavior are in MultiLLMDeliberationPolicy §Founder Participation Rules.
Three-level criticality system
Every pipeline question is classified into one of three levels at decision time. See CriticalityPolicy for full definitions. Summary:
- Level 1 — agent resolves alone (no Chamber)
- Level 2 — Chamber auto-quorum runs without founder; auto-accept if arbiter confidence ≥ 8/10, auto-escalate to Level 3 if < 8/10
- Level 3 — full Chamber with founder participation; always required for Go/No-Go
Go/No-Go as the ultimate output
When Chamber is invoked for a niche decision, there are exactly three valid outcomes:
- GO — enter the niche; pipeline continues to execution planning
- NO-GO — reject the niche; pipeline halts; rejection reasoning preserved
- NEED MORE DATA — insufficient evidence; agents receive a specific data-gathering assignment; quorum re-runs with new data
NEED MORE DATA is a first-class outcome, not a deferral — it prevents both false-GO (acting on bad inputs) and false-NO-GO (rejecting on insufficient inputs). See CriticalityPolicy §Go/No-Go for the BADs Russia example.
Streamlit UI integration plan
v2 surfaces Chamber as a tab in app.synth-nova.com (the Streamlit app that hosts the Investment Navigator). The tab renders:
- The question and auto-populated context (upstream pipeline artifacts)
- Each panelist’s response as it streams in (stance, reasoning, confidence, evidence)
- The divergence analysis
- A founder-input field for Level 3 sessions (role selector: Approver / Challenger / Contributor / Verifier; structured entry for Contributor/Verifier modes)
- The arbiter synthesis with explicit confidence and dissent level
- The rendered report inline, with links to the persisted session artifact
Level 2 sessions render as read-only post-factum review (founder can open any completed Level 2 session and see the full trace).
Pipeline integration
Pipeline agents (Scout, Researcher, Financial Modeler, Judge) evaluate criticality at every decision point per CriticalityPolicy §Criticality Assessment Procedure. When Level 2 or Level 3 criteria are met, the agent invokes Chamber via EscalationPolicy Trigger 7. Agents do not independently decide to “skip” Chamber when criteria are met — criticality assessment is enforceable, not advisory.
Context auto-population
When Chamber is triggered from the pipeline, the session context is assembled automatically from:
- Researcher findings (sources, claims, confidence per claim)
- Financial Model data (inputs, assumptions, output ranges, ROI calculations)
- Judge assessment (score, dissent points, failure modes flagged)
- Prior Chamber sessions on the same niche (if any) for continuity
The founder does not manually formulate the context. This removes “courier work” end-to-end: v1 removed the courier work between LLMs; v2 removes the courier work between pipeline stages and Chamber.
Inputs
Required:
- Question (text)
- Context (text or markdown, up to ~10K tokens)
- Approval (founder confirmation for delegation)
Optional:
- Specific LLMs to include/exclude (default: all v1 panel)
- Max rounds of cross-examination (default: 1)
- Preferred arbiter (default: Claude Sonnet 4)
- Output format preference (default: structured markdown)
Timeline Placeholder
Implementation sprint: parallel to M2 implementation OR after M2. M3 has lower dependency on M1 internals than M2 does — Chamber doesn’t need to integrate with Scout/Researcher/FinancialModeler. Chamber is more standalone.
Estimated effort:
- Panelist adapters (Claude, GPT-4, Gemini): ~2-3 days (API integrations)
- Divergence detection: ~1 day
- Cross-examination orchestration: ~1 day
- Arbiter integration: ~1 day (Claude API already in place)
- Report generation: ~1 day
- CLI interface: ~0.5 day
- Integration hooks for M1/M2 pipelines: ~1 day
- Testing + Judge calibration (arbiter quality): ~2 days
Total: ~1.5-2 weeks work when sprint begins.
Success Metrics (for M3 rollout)
- Arbiter quality: founder agrees with arbiter synthesis in ≥80% of sessions (human calibration)
- Divergence capture: when panelists actually disagree, arbiter reports it (not flattening to false consensus)
- Cost: ≤ $1.00 per typical session
- Duration: ≤ 3 minutes per session (parallel API calls)
- Useful disagreement rate: sessions where minority view was valuable (subjective but tracked)
Cross-References
- Constitution: Constitution — all Laws, especially 5/7/8
- M1: Niche-Evaluation-Module — may invoke M3 on uncertainty
- M2: Team-Implementation-Module — may invoke M3 on team-fit ambiguity
- Multi-LLM Deliberation Policy: MultiLLMDeliberationPolicy — operational rules
- Integration Triage Policy: IntegrationTriagePolicy — applied to LLM provider choices
- Conflict Resolution: ConflictResolution — Judge agent in synth-brain handles intra-system conflicts; M3 handles external-knowledge conflicts
- ADR-0014: this module’s decision record