Integration Triage Policy
Purpose
Decide when to add external integrations (APIs, MCP servers, new tools) vs when to improve what exists.
Default stance
No new integration without observed pain point. “This sounds useful” is not a pain point.
Triage rules
Rule 1: Pain point type → integration category
| Observed pain point | Consider integrations in category | Do NOT consider |
|---|---|---|
| Мусорные источники / weak recall in research | Search providers (Tavily, Exa, Firecrawl) | Financial data, orchestration |
| Абсурдные финансовые выводы / hallucinated benchmarks | Financial data (FMP, public APIs) | Search, orchestration |
| Pipeline медленно | First: profiling, parallelization, token budget. Only then: provider swap | Any provider swap without profiling first |
| Unstable UX / crashes / broken rendering | None — fix existing code | All integrations |
| Missing capability (e.g., нет email/calendar) | Orchestration (Composio, specific MCP) | Search, data |
Rule 2: Risk assessment before adding
Before merging integration into current sprint, classify:
LOW RISK (can be in bugfix sprint):
- Drop-in replacement of existing interface
- No new code paths
- No changes to orchestration, schemas, cost tracking
- No new error types
- Feature-flagged with easy rollback
MEDIUM RISK (separate sprint):
- New client-side tool or handler
- Changes to prompts, role configs
- New cost/observability fields
- Requires A/B evaluation vs existing
- Touches 3-5 components
HIGH RISK (plan carefully, possibly ADR):
- New orchestration path or fallback chain
- Schema changes that Judge/downstream consumers depend on
- Multiple integration points
- Requires architectural discussion
Rule 3: Evidence before experimentation
Before spending effort on integration evaluation:
- Collect 3-5 real observations of the pain point (FEEDBACK_LOG.md, failed runs, user comments)
- Formulate hypothesis — “We expect X integration to improve Y by Z%”
- Set success metric — how will we know it worked
- Budget the A/B test — time + cost
Without this, integrations become “tried it, kinda works, kinda doesn’t, moved on” — wasted effort.
Rule 4: Bugfix sprint rule
During any sprint whose primary goal is stabilization (bugfix, reliability, UX fix):
- Only LOW RISK integrations permitted
- All MEDIUM/HIGH RISK deferred to next capability sprint
- Violation: adding MEDIUM RISK “because it seems small” is how bugfix sprints derail
Known candidates (as of 2026-04-16)
| Candidate | Triggered by | Risk | Status |
|---|---|---|---|
| Tavily | ”слабые источники” feedback | MEDIUM (client-side tool + new path) | Deferred, awaits Rule 3 evidence |
| Exa | Semantic search для сложных queries | MEDIUM | Deferred |
| FMP | ”absurd ROI” feedback | MEDIUM-HIGH (FM schema impact) | Deferred, high value if finance pain point confirms |
| Firecrawl | Deep scraping specific sites | LOW if used narrowly | Deferred, probably overkill |
| SimilarWeb | Real traffic data | Cost-blocked ($149+/mo) | Not now |
| SEMrush | Keyword/ads data | Cost-blocked ($130+/mo) | Not now |
| Composio | 250+ integrations unified | HIGH (major orchestration change) | Not now — premature for current scale |
Decision log for integration candidates
Each candidate consideration должен закончиться одним из:
- Approved → сразу в implementation sprint
- Deferred → добавляется в candidates table выше с trigger condition
- Rejected → записывается в Decision-Log с reason
Never “maybe later without specifics”.
Review cycle
Этот policy пересматривается:
- После каждого 5-го capability sprint
- Если candidates table становится длиннее 10
- Если появляются новые categories of pain points
Cross-references:
- Pain point evidence:
reports/streamlit_runs/FEEDBACK_LOG.md(in synth-brain repo) - Risk taxonomy examples: Week 4.7 pre-check on Tavily (showed MEDIUM RISK classification)
- Related: EscalationPolicy, DecisionRights