Process — Failure Patterns

Повторяющиеся паттерны отказов в multi-agent flows + их mitigation strategies. Не описывает конкретный flow — это reference для authors scenarios и agent manifests.

Based on Reference-Org-Blueprint §9 Cross-Scenario Failure Patterns.

Why document these

Multi-agent системы fail не в том же месте что monolithic software:

Failures часто emergent (interaction между agents), не local
Detection harder — один agent видит только свой scope
Recovery требует coordination
Postmortems нужны чтобы учить систему, не just исправить bug

Каждый scenario author должен review эти patterns и проверить что scenario handled их. Не все patterns relevant для всех scenarios — выбирай applicable.

Pattern 1: Handoff gaps

Описание: State теряется между agents. Agent A завершил task, Agent B не подобрал его вовремя / с неправильным context / с incomplete data.

Examples:

Research result в memory, но Director не subscribed на research.completed
Agent-CEO создал task, Director не увидел в queue
Child task done, parent scenario не advanced

Detection:

Scenario timeout (ожидает transition)
Event emitted но no subscriber reacted в SLA
Orphan tasks в queue

Mitigation (design-time):

Explicit handoff events (не rely на implicit state)
Confirmation pattern: B emits handoff.received подтверждая
Orchestrator tracks expected transitions, timeouts trigger escalation

Recovery (runtime):

Orchestrator replays last event к missed subscriber
Escalate к Agent-CEO если replay не работает
Manual re-trigger через HITL-Gateway

Pattern 2: HITL bottlenecks

Описание: Agent escalated правильно, но human overloaded → approval pending, scenario stuck.

Examples:

20 agents ждут одного Founder approval
Stage 9 Hypothesis-to-Validation (см. Template-Scenario) pending > SLA
Weekend / offline period — no approvals моving

Detection:

HITL-Gateway SLA tracker — pending > threshold
Queue depth monitoring per approver
Scenario completion latency p95 growing

Mitigation (design-time):

Batched approvals в Gateway
Delegation rules (Rules-AgentDecisionBoundaries)
Pre-approval templates для predictable patterns
Lower criticality если possible (review if L3 really needed vs L2)

Recovery (runtime):

Auto-escalate up the hierarchy после SLA breach
Digest notifications — “5 approvals pending >24h”
Founder “overload mode” — batch-approve safe categories with audit

Pattern 3: Over-confident action

Описание: Agent executes в edge case где должен был escalate. Confidence calibration off, или novel situation missed.

Examples:

Agent уверен в research result на новой теме (no prior data), но actually hallucinating
Agent-MarketResearcher применяет Dubai playbook к other market без adjustment
L2 action выполнен autonomously, но реально был L3 impact

Detection:

Post-hoc Process-OutcomeLabeling outcomes хуже confidence predicted
Agent-Judge flags overconfident claims
Founder feedback на выполненные actions

Mitigation (design-time):

Confidence thresholds в Rules-Criticality
Novel-situation detection — если pattern не в memory, escalate
Default-down rule (Rules-Criticality — ambiguity → lower level)
Required “uncertainty acknowledgment” в agent outputs

Recovery (runtime):

Retrospective escalation — “this was L3, not L2, review outcome”
Action rollback if possible
Learning loop: pattern goes into guardrails

Pattern 4: Under-confident escalation flood

Описание: Agent escalates всё, human drowns. Opposite of Pattern 3.

Examples:

New agent tuned aggressively, escalates routine actions
Threshold too conservative → every moderate confidence triggers HITL
Agent scared после недавнего rollback — over-escalates

Detection:

Escalation rate per agent abnormally high
HITL-Gateway queue depth дominated одним agent
Human feedback “too many trivial approvals”

Mitigation (design-time):

Gradual threshold tuning
Training period for new agents (shadow mode)
Explicit “low-risk auto-approve” список в manifest

Recovery (runtime):

Threshold adjustment в agent manifest
Temporarily auto-approve low-risk category
Manifest review в quarterly tuning cycle

Pattern 5: Trace gaps

Описание: Decision path теряется между agents. Когда нужна audit, full path не reconstructable. Violation Law 6 (trace).

Examples:

Agent emits event без trace_id
Subscriber не propagate parent_span_id
External tool call без trace metadata

Detection:

Trace completeness metric — % scenarios с full trace
Audit queries fail to find decision rationale
Events с missing trace_id / span_id

Mitigation (design-time):

Event-Bus-Pattern enforces trace_id / span_id mandatory в envelope
Agent templates include trace propagation boilerplate
External tool wrappers inject trace metadata

Recovery (runtime):

Partial trace accepted, gap explicitly noted
Escalate если material decision без proper trace
Scenario marked “trace incomplete” в memory

Pattern 6: Cross-agent contradiction

Описание: 2+ agents дают противоречивые рекомендации human. Human confused, nobody resolves.

Examples:

Agent-IntelDirector recommends entry в niche, Agent-NicheEvaluationDirector recommends skip
Market research says X, competitor research says opposite
Two executors same task — divergent results

Detection:

Explicit contradictions в agent outputs
Agent-Judge flags conflict
Human feedback “agents telling me different things”

Mitigation (design-time):

Orchestrator conflict resolution rules
Chamber-for-Strategic arbitration для strategic contradictions
Single source of truth per decision type
Explicit “primary” vs “advisory” role distinction

Recovery (runtime):

Orchestrator triggers Chamber review
Both agents explain reasoning к human
Escalate к Agent-CEO для routing

Pattern 7: Policy / rules blindspots

Описание: Действие одного agent triggers obligations другого agent (policy, security, compliance), но второй agent не узнаёт.

Examples:

Agent publishes research публично, но no one flagged IP review obligation
External communication отправлен, но privacy-agent не notified для audit
Data access expanded, security-agent не audit-ил

Detection:

Retro review обнаруживает missed obligations
External feedback (“you should have done X”)
Audit gaps

Mitigation (design-time):

Policy-sensitive agents [future] subscribed на cross-cutting events
Policy-Layer inspects all actions не just emitter-side
Event naming convention surfaces sensitive actions (e.g. *.external, *.published)

Recovery (runtime):

Post-hoc audit + corrective action
HITL-Gateway escalation
Update policy-agent subscriptions to catch similar в future

Usage guidance

For scenario authors

Review этот doc создавая новый scenario. Для каждого pattern:

Relevant ли для scenario? (yes/no + reasoning)
Если yes — mitigation applied? (link или описание)
Recovery path defined?

Scenarios без consideration patterns = incomplete. Agent-Judge может flag.

For agent manifest authors

Review pattern list при проектировании agent:

Confidence thresholds → Pattern 3/4
Escalation rules → Pattern 2
Event subscriptions → Pattern 1/5
Trace propagation → Pattern 5
Contradiction handling → Pattern 6

For retrospectives

Post-incident review ask:

Какой из 7 patterns проявился?
Был ли detection adequate?
Была mitigation но не сработала, или missing?
Что добавить / изменить чтобы prevent recurrence?

Outcome feeds Process-OutcomeLabeling и manifest / threshold tuning.

New pattern addition

Если обнаруживается pattern не в списке:

Document в shadow notes (временно)
После 2+ occurrences — propose formal addition через ADR
Update этот doc + relevant mitigations в system

Patterns не frozen — list will grow с experience.

Связанные документы

Reference-Org-Blueprint — section §9 source
Template-Scenario — mandatory failure modes section
Agent-Judge — runtime detection
HITL-Gateway — bottleneck / escalation handling
Chamber-for-Strategic — contradiction arbitration
Event-Bus-Pattern — trace propagation
Policy-Layer — blindspot prevention
Observability — metrics для detection
Process-OutcomeLabeling — feed learning loops
Process-Escalation
Process-Rollback
Rules-Criticality
Rules-AgentDecisionBoundaries
Manifesto — принципы observability, async, separate judge

Synth Nova Manifest

Explorer

Process — Failure Patterns

Process — Failure Patterns

Why document these

Pattern 1: Handoff gaps

Pattern 2: HITL bottlenecks

Pattern 3: Over-confident action

Pattern 4: Under-confident escalation flood

Pattern 5: Trace gaps

Pattern 6: Cross-agent contradiction

Pattern 7: Policy / rules blindspots

Usage guidance

For scenario authors

For agent manifest authors

For retrospectives

New pattern addition

Связанные документы

Graph View

Table of Contents

Backlinks