R28 Timeout Incident

Status: Open — remediation path not chosen yet Severity: Medium (M1 pipeline does complete on simpler niches per R27; this affects complex briefs) Fundraise-blocker: Yes if left unfixed — pipeline scalability is a demo risk

Summary

R28 run hit 30-minute hard timeout at 1802s. All 11 specialized agents completed successfully, but the final intel_director aggregation stage hung for 596s without completing. Pipeline exited with status completed_partial, having produced valid full.json (62KB) and report.md (31KB) but no meta.json or judgement.json.

This is the first observed timeout incident since R25 (2026-04-19). R26 and R27 completed in 1515s and 1434s respectively on the same AI-esoterics ТЗ.

Context — what R28 was supposed to validate

R28 was launched Day 3 to validate that R27’s Judge 8.0 was not niche-specific to AI-esoterics. Second niche chosen: e-commerce SMB AI-copilot for Shopify/WooCommerce merchants. ТЗ was intentionally more complex than R27 — longer word count (~800 vs ~500), more explicit deliverables (top-15 competitors vs top-10, two acquisition channels vs one, three pricing tiers), harder B2B context.

Timeline (from `activity.jsonl`)

Time (s)	Event
51	Scout + Researcher start in parallel
220	Scout done ($0.59)
466	Researcher done ($1.08)
466-663	Phase 1 expand: FinancialModeler + RatingAgent + ProductDesigner + AudienceSegmenter complete in parallel ($0.57 combined)
661	Phase 2 expand starts: FunnelArchitect + ContentStrategist + Copywriter
866	FunnelArchitect done ($0.15)
869	ContentStrategist done ($0.16)
1207	Copywriter done ($0.34) — took 545s alone (36% of pipeline time)
1208	`intel_director` final aggregation starts
1804	30-min hard timeout hits; pipeline aborts during director aggregation

Root cause analysis

Two concurrent issues:

Issue 1 — Copywriter regression

In R27 (AI-esoterics), Copywriter completed in ~170 seconds. In R28, Copywriter took 545 seconds — 3.2x longer.

Hypothesis: Copywriter prompt size scales with prior agent outputs. R28 had larger outputs from Researcher (1.08 vs 0.37 Scout cost indicates much more source material), which inflates Copywriter’s input context, which in turn slows generation.

No investigation done yet on Copywriter prompt internals for this run.

Issue 2 — Intel Director aggregation hang

After all 11 agents complete, intel_director runs a final aggregation LLM call to synthesize executive summary. In R27 this completed in <60s. In R28 it hung for the full 596s until hard timeout cut it off.

Hypothesis: max_tokens ceiling reached, or retry loop, or rate limit backoff. Pipeline had consumed $2.90 by this point — not hitting per-run budget cap. Total tokens across agents was high.

Commit 67af174 from Day 2 raised max_tokens to 32000 to address a similar issue, but may be insufficient for R28’s larger context.

Impact

What we lost

No meta.json with verdict + final scores
No judgement.json with Judge overall and per-agent scores
No validation of R27’s Judge 8.0 against different niche

What we kept

Full activity.jsonl with complete per-stage timing and costs
full.json with all 11 agents’ structured outputs
report.md — pipeline wrote 31KB of narrative report before timeout
Enough data to diagnose (this report)

Financial impact

$2.90 spent on an incomplete run. Acceptable cost of investigation. Not a pattern that would bankrupt the project at current volumes.

Brand / positioning impact

Two niches attempted, one completed cleanly. Cannot yet claim “M1 produces Judge 8.0 consistently across niches.” M2 implementation gate (”≥ 3 clean runs on distinct niches”) now pushed further out.

Remediation options

Three remediation paths, non-mutually-exclusive.

Option A — Raise hard timeout

Simplest. Change 30-min limit to 45 or 60 min. Let complex pipelines complete.

Pros:

5-minute code change
Immediately unblocks complex niche validation
R28-class complexity completes

Cons:

Doesn’t address root cause
Cost ceiling per-run increases proportionally
User experience degrades (45-min wait)
Does not fix the underlying Copywriter regression or Director aggregation hang
Next time we hit complex niche, we may still hit 60-min limit

Cost: ~15 min code change, 1 test run to validate.

Option B — Fix Director aggregation

Investigate why intel_director hung for 596s. Likely candidates:

Retry loop on LLM rate-limit errors (cascading backoff)
max_tokens insufficient — LLM generates partial output, triggers internal retry
Sync API call with no outer timeout

Fix: add explicit per-call timeout on director aggregation (e.g., 120s), retry at most once, then gracefully degrade to partial synthesis.

Pros:

Addresses actual root cause
R27 proved the code path works when not overloaded — something specific broke in R28
Preserves 30-min UX contract
Deterministic fix (aggregation call either completes or doesn’t)

Cons:

Requires investigation time (~2-4 hours Claude Code)
May reveal Copywriter regression as upstream problem (needs further work)

Cost: ~3-4 hours Claude Code work.

Option C — Split aggregation into smaller chunks

Instead of one large synthesis call, do multiple focused passes:

Executive summary synthesis (small scope)
Scorecards synthesis (small scope)
Next steps synthesis (small scope)

Each < 2000 tokens output. Then concatenate.

Pros:

Addresses scaling issue structurally
Each sub-call finishes faster and hits smaller max_tokens
Parallelizable if needed
Better error isolation

Cons:

Larger refactor (~1 day work)
Loses benefit of LLM’s cross-section reasoning (may reduce quality)
Risk of regressions in R27-class runs that currently work

Cost: ~1 day Claude Code work.

Recommendation

Default path: Option B (fix Director aggregation) first, then consider Option C if B reveals deeper issues.

Reasoning:

R27 proves the code works — something specific broke in R28
Debugging before restructuring is cheaper
Preserves existing quality on simple niches while fixing complex ones
3-4 hour investigation gives evidence whether C is needed

Option A (raise timeout) only as temporary workaround if urgent demo pending.

Next steps

Decide remediation path (A / B / C) — Denis decision
If Option B: create Claude Code task to investigate director aggregation
Re-run R28 after fix to validate
If clean, run R29 on third niche to close M1 stability gate
Update ADR-0025-Combined-M1-M2-Primary-Product implementation gates if M1 stability needs more runs

R27-Results — R27 completed successfully, baseline for comparison
ADR-0016-chamber-v2-vision — CriticalityPolicy defines pipeline timeout classes
Commit 67af174 — prior max_tokens fix (raised to 32000)
Commit ce144a6 — prior director aggregation change (removed LLM call from extended stage, but aggregation still uses LLM for core)
Commit abee295 — prior director aggregation change (removed dead ThreadPoolExecutor timeout)

Synth Nova Manifest

Explorer

R28 Timeout Incident — 2026-04-20

R28 Timeout Incident

Summary

Context — what R28 was supposed to validate

Timeline (from `activity.jsonl`)

Root cause analysis

Issue 1 — Copywriter regression

Issue 2 — Intel Director aggregation hang

Impact

What we lost

What we kept

Financial impact

Brand / positioning impact

Remediation options

Option A — Raise hard timeout

Option B — Fix Director aggregation

Option C — Split aggregation into smaller chunks

Recommendation

Next steps

Graph View

Table of Contents

Backlinks

Synth Nova Manifest

Explorer

R28 Timeout Incident — 2026-04-20

R28 Timeout Incident

Summary

Context — what R28 was supposed to validate

Timeline (from activity.jsonl)

Root cause analysis

Issue 1 — Copywriter regression

Issue 2 — Intel Director aggregation hang

Impact

What we lost

What we kept

Financial impact

Brand / positioning impact

Remediation options

Option A — Raise hard timeout

Option B — Fix Director aggregation

Option C — Split aggregation into smaller chunks

Recommendation

Next steps

Related

Graph View

Table of Contents

Backlinks

Timeline (from `activity.jsonl`)