ADR-0015: M3 Deliberation Chamber — Implementation Record
Status
accepted
Context
ADR-0014-m3-deliberation-chamber ratified the strategic decision and operational policy for M3. This ADR records the concrete implementation choices made during the v1 build sprint (Week 5), within the envelope ADR-0014 defined. It exists so that future readers can reconstruct what was actually shipped — the generic model classes in ADR-0014 (e.g. “Claude Sonnet 4”) were resolved to specific dated model IDs at build time, and those choices have cost, behavior, and governance implications worth preserving.
Decision
Concrete model resolution
| Role | Generic class (ADR-0014) | Concrete model (v1) | Rationale |
|---|---|---|---|
| Panelist 1 | Claude Sonnet 4 | claude-sonnet-4-5-20250929 | Latest Sonnet at build date (Claude 4.5 family). |
| Panelist 2 | GPT-4 | gpt-4o | OpenAI’s production flagship at build date. |
| Panelist 3 | Gemini Pro | gemini-2.5-pro | Google’s production flagship at build date. |
| Arbiter | Claude Sonnet 4 | claude-sonnet-4-5-20250929 | Shared with Judge (ADR-0014 §2); COI mitigations per Policy. |
| Divergence detector | (unspecified in ADR-0014) | claude-haiku-4-5-20251001 | Cheapest capable JSON classifier; Law 2 (minimum cost). |
Provider pricing at build date (2026-04-16)
| Model | Input $/1M | Output $/1M | Source |
|---|---|---|---|
claude-sonnet-4-5-20250929 | 3.00 | 15.00 | anthropic.com/pricing |
claude-haiku-4-5-20251001 | 1.00 | 5.00 | anthropic.com/pricing |
gpt-4o | 2.50 | 10.00 | openai.com/api/pricing |
gemini-2.5-pro | 1.25 | 5.00 | ai.google.dev/pricing |
Pricing is stored in src/synth_brain/llm/pricing.py and verified manually per Law 7 via scripts/verify_provider_pricing.py (quarterly cadence).
Architecture (within ADR-0014 envelope)
- Provider abstraction:
PanelProviderProtocol + three adapter classes (AnthropicPanelProvider,OpenAIPanelProvider,GeminiPanelProvider). Error taxonomy: retryable (rate-limit, 5xx) vs terminal (auth, unrecoverable). Anthropic adapter wraps existingAnthropicClientretry logic; OpenAI and Gemini adapters implement equivalent manual retry loops (max 3, exponential backoff). - Parallel fan-out:
ThreadPoolExecutorwith abort-on-any-fail semantics. On any panelist terminal error, remaining futures are cancelled and the session transitions toABORTED_API_ERROR. No “2-of-3” fallback (Law 9 — safe default is to abort, not partial-complete). - Divergence detection: Haiku 4.5 returns strict JSON (
{diverge, criteria_triggered, reasoning}); one retry on malformed output; fallback todiverge=True, criteria=["substantive"]on second failure (Law 9 — safe default is more deliberation, not less). - Cross-examination: one round only, only if
diverge=True. No infinite loops (ADR-0014 §4). - Arbiter: Sonnet 4.5 receives verbatim panelist responses + divergence result. System prompt contains verbatim COI text from MultiLLMDeliberationPolicy, honest-uncertainty clause, and model-substitution disclosure requirement.
- Audit trail:
session.jsonwritten at every major state transition;report.mdwith all ADR-0014 headers; per-panelist raw dumps (panelist_<provider>_raw.json). Stored underreports/chamber/YYYY-MM-DD/session_<slug>/. - Approval gate:
is_approval(s)returnsTrueiffs.strip().casefold() == "yes". Module docstring prohibits bypass addition (Law 5). - Budget enforcement: three gates — per-session cap 100.00 (filesystem scan of this month’s session.json cost_usd values). All three run before the approval prompt (Law 8).
Founder-added implementation requirements (beyond ADR-0014 text)
Two clauses surfaced during plan review and are captured here so they don’t evaporate into code-only knowledge:
-
Model-substitution honesty (Policy Honesty clause extension). Every provider adapter compares the model string returned by the API to the requested model. If they differ: log WARN + set
model_substituted: bool = TrueonProviderResponse. The flag propagates intosession.json,report.md(badge), and the arbiter’s user message (inline[MODEL SUBSTITUTED]tag + a dedicated warning section) so the arbiter cannot silently rely on a downgraded panelist. -
Ctrl+C-safe founder-action capture (Law 6 edge case). The founder-action prompt (accept / reject / expand / skip) uses
try/finallyaround the stdin read. If interrupted (KeyboardInterrupt) or EOF-before-input,founder_actionis set to"interrupted"andsession.jsonis still written. A session that ran but had its founder-action recording interrupted must still produce a complete audit artifact.
Alternatives Considered (implementation-level)
Haiku vs Sonnet for divergence detection
- Haiku 4.5 (chosen): 5 per 1M — classification task doesn’t need Sonnet capability. Law 2 (minimum cost).
- Sonnet 4.5: redundant capability at 3x cost for a binary + enum output.
Plugin framework for providers
- Protocol + 3 concrete adapters (chosen): minimum viable abstraction. Law 4 (no over-abstraction).
- Entry-point based plugin system: premature; we have three providers, not thirty.
2-of-3 panel fallback on single-panelist failure
- Abort the session (chosen): Law 9 — safe defaults. A failed panelist is a signal, not a quorum-math problem. Silent degradation would break audit-trail expectations (report claims 3 panelists; only 2 responded).
- Continue with two panelists + disclose in report: rejected. Degrades honest uncertainty — arbiter’s job is to weigh three perspectives, not improvise with missing data.
Consequences
Positive:
- Implementation traceable to specific model versions and prices at build date.
- Model-substitution and Ctrl+C clauses prevent two silent-failure classes that could have caused audit-trail drift.
- Haiku-as-classifier saves ~$0.01-0.02 per session vs Sonnet-as-classifier; compounds at monthly budget scale.
- 54-test suite including 6 constitution-compliance tests anchors governance contract at code level.
Negative / Trade-offs:
- Model-substitution check introduces a WARN log + report badge for every substitution event — if providers route silently to a different model family, reports become noisy. Mitigation: trust provider routing decisions, track rate via future observability if needed.
google.generativeaipackage is deprecated; build uses it anyway (v1 prompt specified). Migration togoogle.genailogged inFEEDBACK_LOG.mdWeek 5 as tech debt.- Dated model IDs in pricing lookup create a drift risk: when Anthropic releases
claude-sonnet-4-5-20260315, pricing must be updated or the fallback entry will underbill. Mitigation:verify_provider_pricing.pyis a quarterly manual check, not automated.
Open Questions
- How frequently does OpenAI route
gpt-4oto a different snapshot? Observe viamodel_substitutedrate over first month of production use. - Does COI in arbiter (Claude-as-both-panelist-and-arbiter) manifest as measurable preference for Claude-panelist stance? Needs monthly audit per MultiLLMDeliberationPolicy §COI-Monitoring once ≥10 sessions exist.
Links
- Strategic ADR: ADR-0014-m3-deliberation-chamber
- Module spec: Deliberation-Chamber-Module
- Operational policy: MultiLLMDeliberationPolicy
- Constitution: ADR-0012-constitution
- Code:
synth-brain/src/synth_brain/chamber/andsynth-brain/src/synth_brain/llm/providers/