M2 Navigator Spec v2 — Unfair Advantage Audit
Status: Proposed, awaiting ADR-0025 approval Target start: After R27 validation stabilizes + 2 more clean M1 runs on distinct niches Estimated duration: 6 weeks across 5 sprints
This spec replaces the M2 scope described in ADR-0013. Team-competency assessment, profile harvesting, and track-record research are removed. UA audit becomes the complete M2.
Purpose
M2 answers the question: “Given the niche from M1, can the user’s proposed entry succeed?”
It answers this by:
- Extracting what Unfair Advantages this niche historically rewards
- Verifying each Unfair Advantage the user claims to have
- Computing alignment score between claimed+verified UAs and niche requirements
- Producing Swiss-neutral verdict with gap analysis if below threshold
M2 is NOT:
- A team competency assessment
- A budget/timeline validator
- An empathetic support tool
- A standalone product (requires M1 context)
Inputs to M2
Required from user
Solution description:
Free-text or structured answer to “What product/service do you plan to build for this niche?” (100-1000 words typical).
Used for: context in UA scoring (does the solution match how claimed UAs would be deployed?)
Claimed Unfair Advantages:
List, 1-8 items typical. Each item:
- category: patent | license | partnership | data | distribution | regulatory | brand | network | process
description: "What specifically is the claim?"
evidence:
- type: url | file | reference | text
content: "The actual evidence or pointer"Inherited from M1
- Full niche analysis including competitor landscape, moat patterns observed, financial structure
- M1 confidence scores
- Recommended next steps (informs scoring context)
Explicitly NOT collected
- Team member profiles, CVs, LinkedIn links
- Budget amount
- Timeline estimates
- Self-reported experience/expertise levels
- Past company names (too easily fabricated)
Agent architecture
M2 Director (orchestrator)
Receives M1 output + user’s M2 input. Dispatches to three specialized agents. Synthesizes final report section.
Position: peer to Intel Director in 3-tier hierarchy (ADR-0003).
Model: Sonnet 4.5.
Agent 2.1 — Unfair Advantage Verifier
For each claimed UA, attempts verification via category-specific strategies.
Verification strategies reference table:
| Category | Strategy | Confidence ceiling |
|---|---|---|
| Patent | USPTO / EPO / Rospatent / WIPO public search by number or inventor | High (if claim includes specific number) |
| License / certification | Search relevant public registry (FDA, CE, industry body) by entity name | Medium (registries vary in completeness) |
| Partnership / distribution | Web search target company announcements, press releases, joint case studies; LinkedIn company page check | Medium (partnerships often under-announced) |
| Proprietary data | Evaluate uniqueness — search for equivalent datasets, assess coverage claims vs alternatives | Medium-Low (hard to prove negatives) |
| Owned distribution channel | Verify account existence, metrics (audience size via public profile), activity level | Medium-High |
| Brand / audience recognition | Search mentions, press, review sites; audience size verification | Low-Medium |
| Regulatory approval | Check public registries per jurisdiction | Medium-High (if registry exists) |
| Network / relationships | Generally unverifiable via automated checks | Low (mark UNVERIFIABLE) |
| Process / know-how | Generally unverifiable | Low (mark UNVERIFIABLE) |
Output per UA (structured):
- claim: original text
category: user-provided category
category_normalized: canonical category if user chose non-canonical
verification_verdict: VERIFIED | PARTIALLY_VERIFIED | UNVERIFIED | UNVERIFIABLE | FABRICATED
verification_method: what was actually done
verification_evidence:
- type: url | excerpt | registry_entry | api_response
content: "what was found"
matches_claim: boolean
verification_confidence: 0.0-1.0
scoring_weight: computed from verdict (VERIFIED=1.0, PARTIAL=0.5, UNVERIFIED=0, UNVERIFIABLE=0, FABRICATED=-0.3 penalty)
notes: "caveat, limitation, context"Model: Sonnet 4.5 with web search tool access. May require patent API integration per ADR-0011.
Agent 2.2 — Niche Moat Requirements
Analyzes M1 output to determine what categories of UAs matter for this specific niche.
Input: M1 output (especially competitor analysis, market analysis, financial model structure)
Process:
- Examine existing niche winners (from M1 competitor landscape) — what moats do they demonstrate?
- Apply heuristic library for niche-type patterns:
- Consumer apps with network effects: audience/brand weight 40-50%, distribution 20-30%
- Regulated industries (health/finance/education): license/regulatory 30-40%, data 15-25%
- B2B SaaS with enterprise sales: distribution 25-35%, tech differentiation 20-30%
- Commodity markets: cost/scale advantage 40-50%, brand secondary
- Content platforms: proprietary content library 25-35%, audience 30-40%
- Marketplaces: liquidity/network effects 40-50%, distribution 20-30%
- Developer tools: tech differentiation 30-40%, community/network 20-30%
- Leverage Pipeline Memory (ADR-0018) for past-niche learnings — if similar niches have been analyzed before, their moat patterns inform this output.
Output: Weighted UA category requirements, summing to 1.0, with rationale per category.
niche: "AI-esoterics (tarot, astrology, predictions)"
total_categories: 4
categories:
- name: audience_trust_and_brand
weight: 0.40
rationale: "Top winners (Co-Star, Sanctuary) built via audience acquisition and brand trust. Repeat subscription requires trust at high rates."
- name: proprietary_content_library
weight: 0.25
rationale: "Differentiation vs ChatGPT-wrapper competitors requires unique content. Pattern shown by Nebula's curated astrologer library."
- name: platform_distribution
weight: 0.20
rationale: "Top-of-funnel dominated by App Store ASO + social media virality. Access to these channels is moat."
- name: tech_differentiation
weight: 0.15
rationale: "Only moderate weight — most competitors use similar AI layer. Truly differentiated tech could matter but historically hasn't determined winners."Model: Sonnet 4.5.
Agent 2.3 — UA-Market Fit Scorer
Compares verified UAs (from 2.1) against niche requirements (from 2.2). Computes alignment.
Scoring algorithm:
alignment_score = 0.0
gaps = []
for req_category, req_weight in niche_requirements.items():
matching_uas = [ua for ua in verified_uas if ua.maps_to(req_category)]
if not matching_uas:
gaps.append({
"category": req_category,
"weight": req_weight,
"status": "UNSATISFIED",
"impact": req_weight,
})
continue
# Sum scoring_weight of matching UAs, capped at 1.0 per category
category_score = min(1.0, sum(ua.scoring_weight for ua in matching_uas))
alignment_score += req_weight * category_score
if category_score < 0.5:
gaps.append({
"category": req_category,
"weight": req_weight,
"status": "WEAK",
"matching_uas": [ua.claim for ua in matching_uas],
"impact": req_weight * (1.0 - category_score),
})
# Apply fabrication penalties if any
for ua in verified_uas:
if ua.verification_verdict == "FABRICATED":
alignment_score -= 0.1 # hard penalty for lyingVerdict thresholds:
alignment_score ≥ 0.80→ GO — alignment is strong, proceed0.60 ≤ alignment_score < 0.80→ CONDITIONAL — enterable with specific gap closuresalignment_score < 0.60→ NO-GO — alignment too weak for this niche
Output:
alignment_score: 0.38
entry_threshold: 0.60
verdict: NO-GO
per_category_breakdown:
- category: audience_trust_and_brand
weight: 0.40
matching_uas: []
status: UNSATISFIED
contribution: 0.0
# ... etc
fabrication_detected: false
gaps_ordered_by_impact:
- {category: audience_trust_and_brand, impact: 0.40, status: UNSATISFIED}
- {category: proprietary_content_library, impact: 0.25, status: UNSATISFIED}
- {category: tech_differentiation, impact: 0.15, status: UNSATISFIED}Model: Sonnet 4.5.
Agent 2.4 — M2 Director synthesis
Integrates outputs of 2.1, 2.2, 2.3 into M2 report section with Swiss tone.
Prompt emphasis:
- No softening language (“but you’ve done great work” — REMOVE)
- No consolation (“this is still valuable progress” — REMOVE)
- No implicit emotional support
- Present facts + scores + gaps + recommendations
- Where negative verdict: include concrete gap-closing steps (not vague suggestions)
- Where positive verdict: confirm strengths with verification evidence
- Tone model: Swiss banking compliance report, not startup advisor
Example tone comparison:
❌ Softened (wrong): “Your Unfair Advantages show real promise! While we identified some gaps around proprietary content, your distribution partnership with Synergia is a strong foundation to build from. With some focused work on the gaps, you could be well positioned.”
✓ Swiss (correct): “UA-Niche alignment: 0.38 / 1.00. Verdict: NO-GO. Three of four niche-requirement categories have zero satisfying UAs. Your Synergia distribution UA is VERIFIED and contributes 0.20 to alignment — alone insufficient for 0.60 entry threshold. Gap closures required: (1) validate proprietary content library claim with concrete samples, (2) articulate tech differentiation versus named competitors, (3) develop audience/brand assets (current: none verifiable).”
Model: Sonnet 4.5.
M2 Judge
Reuses M1 Judge infrastructure (GPT-4o via ADR-0019). Evaluates:
- Quality of UA verification (were appropriate methods used? Are evidence conclusions justified?)
- Quality of niche requirement assignment (does the heuristic fit this niche?)
- Quality of scoring (is the alignment calculation correct? Are gaps accurate?)
- Quality of synthesis (does tone match Swiss standard? Are recommendations actionable?)
No new Judge code needed — plug into existing JudgeLLMClient.
Integration with M1 pipeline
Flow orchestration
Existing M1 pipeline continues as-is through:
- Agent-Intake (if deployed)
- CEO delegation
- Intel Director + 11 executors
- Aggregate + Judge
- Report generation
New: After M1 completion, if M1 verdict ∈ {GO, CONDITIONAL GO}:
- Streamlit UI shows “M2 Implementation Audit available. Run now? [Yes] [Skip]”
- If Yes: M2 Intake collects solution description + UA claims with evidence
- M2 Director kicks off 2.1, 2.2, 2.3 in parallel
- M2 Director + Judge synthesize final M2 report section
- Combined report (M1 + M2) rendered as unified deliverable
Storage and artifacts
Per-run directory structure (extends existing):
reports/streamlit_runs/<run_id>/
├── meta.json (existing)
├── status.json (existing)
├── brief.json (Agent-Intake output if shipped)
├── full.json (M1 director report — existing)
├── judgement.json (M1 Judge — existing)
├── report.md (M1 narrative — existing)
├── m2_input.json (NEW — user's M2 intake)
├── m2_ua_verification.json (NEW — Agent 2.1 output)
├── m2_niche_moat.json (NEW — Agent 2.2 output)
├── m2_fit_scoring.json (NEW — Agent 2.3 output)
├── m2_judgement.json (NEW — M2 Judge)
├── m2_report.md (NEW — M2 narrative)
└── combined_report.md (NEW — unified M1+M2 deliverable)
Auto-ingest extension
Pipeline Memory (ADR-0018) auto-ingest hook extends to M2 artifacts. M2 outputs become part of memory for future niche moat requirement heuristics.
External integrations required
Per ADR-0011 (Integration Triage Policy), each new integration needs evidence of need × 3-5 instances before approval.
Planned integrations with triage notes:
| Integration | Purpose | ADR-0011 status |
|---|---|---|
| USPTO public API | Patent verification | Triage required — no observed pain yet, but critical for core M2 use case |
| LinkedIn scraping (partnership verification) | Verify claimed partnerships | Triage required — also ToS concern |
| Crunchbase API | Company / partnership data | Triage required — may be covered by web search |
| EPO / WIPO APIs | International patent verification | Triage required — only if USPTO insufficient for first pilots |
| Industry registry scrapers | License verification per jurisdiction | Multiple — triage each separately |
Default approach: Start with web_search for everything (already available). Only integrate specialized APIs when web_search proves insufficient across 3-5 verification attempts.
Cost and latency
Estimated per M2 run:
| Stage | Estimated tokens | Estimated cost | Latency |
|---|---|---|---|
| UA Verifier (5 UAs, web searches each) | 15000-30000 in/out | $0.80-1.50 | 120-240s |
| Niche Moat Requirements | 3000-6000 in/out | $0.15-0.30 | 15-30s |
| UA-Market Fit Scorer | 2000-4000 in/out | $0.10-0.20 | 10-20s |
| M2 Director synthesis | 4000-8000 in/out | $0.20-0.40 | 20-40s |
| M2 Judge | 3000-6000 in/out | $0.15-0.30 (GPT-4o) | 10-20s |
| Total M2 | 27000-54000 | $1.40-2.70 | 175-350s (~3-6 min) |
Combined M1 + M2:
- Cost: ~2.68 + M2 $1.40-2.70)
- Duration: ~27-30 minutes (M1 24min + M2 3-6min)
- Sold at $5-20K = 1000x-5000x compute cost (standard DD margin)
Implementation sprint plan
Sprint 1 (1 week) — M2 Intake infrastructure
Deliverables:
- Streamlit UI for M2 input (triggered post-M1 verdict)
- User input forms: solution description (free text) + UA list (structured with category, description, evidence)
- Evidence upload handling (files + URLs)
- Storage:
m2_input.jsonin run directory - Validation: at least solution description + 1 UA required to proceed
Tests:
- M2 input validation (reject empty)
- Evidence URL format sanity check
- M2 input persists correctly
Sprint 2 (2 weeks) — Unfair Advantage Verifier
Deliverables:
- Agent 2.1 module with category-specific verification strategies
- Web search integration for all categories (baseline)
- USPTO API integration for patents (specific, gated on ADR-0011 triage)
- Verification verdict classification logic
- Per-UA structured output matching schema in spec
- Tests (unit + integration with mock web search)
Open decision: which categories use fallback “best-effort LLM evaluation” vs “automated verification.” Default: automated where possible (patent, registry), LLM-only where not (process, know-how).
Sprint 3 (1 week) — Niche Moat Requirements
Deliverables:
- Agent 2.2 module with heuristic library
- Pipeline Memory integration for past-niche patterns
- Weighted requirement output schema
- Tests
Sprint 4 (1 week) — UA-Market Fit Scorer + M2 Director
Deliverables:
- Agent 2.3 scoring algorithm implementation
- Alignment score computation
- Gap analysis logic
- Agent 2.4 (M2 Director) synthesis with Swiss-tone prompt
- Combined report rendering (M1 + M2 unified markdown)
- Tests
Sprint 5 (1 week) — Integration + end-to-end testing
Deliverables:
- M1 → M2 orchestration in run_m1_query.py (or equivalent)
- Streamlit UI for M2 output display
- End-to-end tests on 3+ distinct niches (replay R26, R27 + 1 new niche)
- M2 Judge integration (already exists, just plug in)
- Updated auto-ingest to include M2 artifacts in Pipeline Memory
Acceptance criteria:
- 3 distinct niches each complete M1 + M2 successfully
- M2 verdicts are correctly calibrated (negative when UAs don’t fit, positive when they do)
- M2 Judge score ≥ 7.0 on each end-to-end test
- No regression in M1 performance (still Judge ≥ 7.5)
Success metrics
Technical:
- M2 completes successfully ≥ 95% of the time
- M2 cost stays ≤ $3.00 per run
- M2 duration stays ≤ 8 minutes
- M2 Judge score ≥ 7.0 on 5 consecutive distinct-niche tests
Product:
- UA verification accuracy ≥ 80% on labeled ground truth (requires manual audit set)
- Niche moat requirements match expert assessment ≥ 75% on labeled set
- Swiss-tone compliance ≥ 90% (no softening language detected by review agent)
Business:
- First external pilot willing to pay $5K+ for Combined audit
- 3 external audits completed within 3 months of Sprint 5 completion
- First customer NPS ≥ 7/10 (even on negative verdicts — they value accuracy)
Risks and mitigations
| Risk | Likelihood | Mitigation |
|---|---|---|
| UA verification hits third-party API rate limits | Medium | Cache results per UA claim, fallback to web_search |
| Niche moat heuristic library has gaps | Expected | LLM fallback with “low confidence” flagging; iterate library per encountered niche |
| Users claim fabricated UAs and we don’t catch | Low-Medium | FABRICATED verdict with penalty; manual review for first 10 audits |
| Swiss tone fails — LLM softens verdicts anyway | Medium | Post-synthesis review agent checks for softening phrases; retry if detected |
| Combined pricing ($5-20K) is rejected by market | Medium | First 3 customers priced at $1-3K as beta; adjust upward based on feedback |
| Negative verdicts damage brand (“they told me NO after I paid”) | Medium-High | Transparent pricing communication: “we audit, we don’t endorse”; testimonials from accurate-NO cases |
| Gaps analysis becomes generic boilerplate | Medium | Prompt emphasis on specificity; include M1 context (named competitors, specific pricing tiers) |
| M2 run on “wrong” M1 output (PASS verdict) runs anyway | Low | UI gates M2 offer behind M1 verdict check |
Open questions for resolution
From ADR-0025 parent:
- Q1: Supersedes ADR-0013 fully or coexist? Default: full supersede.
- Q2: Pricing directionally correct at $5-20K? Default: yes, validate with first 3 customers.
- Q3: API cost budgeting? Needs separate business decision.
- Q4: M2 Final Verdict triggers Chamber? Default: yes (L3 criticality).
- Q5: Niches with no matching heuristic — fallback strategy? Default: LLM best-guess with low-confidence flag.
- Q6: First external test customer? Needs identification.
- Q7: Legal disclaimer required? Default: yes, standard for DD.
Additional spec-level open questions:
- Q8: Evidence upload file size limit? Format restrictions? Propose: 10MB per file, PDF/DOC/MD/TXT + URL-only for all else.
- Q9: M2 input form layout — structured form vs conversational agent (like Agent-Intake)? Default: structured form for Sprint 1 MVP, upgrade to conversational in later iteration.
- Q10: Can M2 be re-run on same M1 with updated UA claims? Default: yes, each re-run is new artifact, old artifacts retained.
Next actions
- Denis approves ADR-0025 direction
- Denis resolves open questions (at least Q1, Q2, Q4 critical)
- Deploy this spec to manifest as
07-Roadmap/M2-Navigator-Spec-v2.md - Wait for R28/R29 validation on other niches (stabilize M1 before M2 build)
- When M1 stable + 2 external testers done: begin Sprint 1
Estimated earliest Sprint 1 start: May-June 2026.