M2 Navigator Spec v2 — Unfair Advantage Audit

Status: Proposed, awaiting ADR-0025 approval Target start: After R27 validation stabilizes + 2 more clean M1 runs on distinct niches Estimated duration: 6 weeks across 5 sprints

This spec replaces the M2 scope described in ADR-0013. Team-competency assessment, profile harvesting, and track-record research are removed. UA audit becomes the complete M2.


Purpose

M2 answers the question: “Given the niche from M1, can the user’s proposed entry succeed?”

It answers this by:

  1. Extracting what Unfair Advantages this niche historically rewards
  2. Verifying each Unfair Advantage the user claims to have
  3. Computing alignment score between claimed+verified UAs and niche requirements
  4. Producing Swiss-neutral verdict with gap analysis if below threshold

M2 is NOT:

  • A team competency assessment
  • A budget/timeline validator
  • An empathetic support tool
  • A standalone product (requires M1 context)

Inputs to M2

Required from user

Solution description:

Free-text or structured answer to “What product/service do you plan to build for this niche?” (100-1000 words typical).

Used for: context in UA scoring (does the solution match how claimed UAs would be deployed?)

Claimed Unfair Advantages:

List, 1-8 items typical. Each item:

- category: patent | license | partnership | data | distribution | regulatory | brand | network | process
  description: "What specifically is the claim?"
  evidence:
    - type: url | file | reference | text
      content: "The actual evidence or pointer"

Inherited from M1

  • Full niche analysis including competitor landscape, moat patterns observed, financial structure
  • M1 confidence scores
  • Recommended next steps (informs scoring context)

Explicitly NOT collected

  • Team member profiles, CVs, LinkedIn links
  • Budget amount
  • Timeline estimates
  • Self-reported experience/expertise levels
  • Past company names (too easily fabricated)

Agent architecture

M2 Director (orchestrator)

Receives M1 output + user’s M2 input. Dispatches to three specialized agents. Synthesizes final report section.

Position: peer to Intel Director in 3-tier hierarchy (ADR-0003).

Model: Sonnet 4.5.

Agent 2.1 — Unfair Advantage Verifier

For each claimed UA, attempts verification via category-specific strategies.

Verification strategies reference table:

CategoryStrategyConfidence ceiling
PatentUSPTO / EPO / Rospatent / WIPO public search by number or inventorHigh (if claim includes specific number)
License / certificationSearch relevant public registry (FDA, CE, industry body) by entity nameMedium (registries vary in completeness)
Partnership / distributionWeb search target company announcements, press releases, joint case studies; LinkedIn company page checkMedium (partnerships often under-announced)
Proprietary dataEvaluate uniqueness — search for equivalent datasets, assess coverage claims vs alternativesMedium-Low (hard to prove negatives)
Owned distribution channelVerify account existence, metrics (audience size via public profile), activity levelMedium-High
Brand / audience recognitionSearch mentions, press, review sites; audience size verificationLow-Medium
Regulatory approvalCheck public registries per jurisdictionMedium-High (if registry exists)
Network / relationshipsGenerally unverifiable via automated checksLow (mark UNVERIFIABLE)
Process / know-howGenerally unverifiableLow (mark UNVERIFIABLE)

Output per UA (structured):

- claim: original text
  category: user-provided category
  category_normalized: canonical category if user chose non-canonical
  verification_verdict: VERIFIED | PARTIALLY_VERIFIED | UNVERIFIED | UNVERIFIABLE | FABRICATED
  verification_method: what was actually done
  verification_evidence:
    - type: url | excerpt | registry_entry | api_response
      content: "what was found"
      matches_claim: boolean
  verification_confidence: 0.0-1.0
  scoring_weight: computed from verdict (VERIFIED=1.0, PARTIAL=0.5, UNVERIFIED=0, UNVERIFIABLE=0, FABRICATED=-0.3 penalty)
  notes: "caveat, limitation, context"

Model: Sonnet 4.5 with web search tool access. May require patent API integration per ADR-0011.

Agent 2.2 — Niche Moat Requirements

Analyzes M1 output to determine what categories of UAs matter for this specific niche.

Input: M1 output (especially competitor analysis, market analysis, financial model structure)

Process:

  1. Examine existing niche winners (from M1 competitor landscape) — what moats do they demonstrate?
  2. Apply heuristic library for niche-type patterns:
    • Consumer apps with network effects: audience/brand weight 40-50%, distribution 20-30%
    • Regulated industries (health/finance/education): license/regulatory 30-40%, data 15-25%
    • B2B SaaS with enterprise sales: distribution 25-35%, tech differentiation 20-30%
    • Commodity markets: cost/scale advantage 40-50%, brand secondary
    • Content platforms: proprietary content library 25-35%, audience 30-40%
    • Marketplaces: liquidity/network effects 40-50%, distribution 20-30%
    • Developer tools: tech differentiation 30-40%, community/network 20-30%
  3. Leverage Pipeline Memory (ADR-0018) for past-niche learnings — if similar niches have been analyzed before, their moat patterns inform this output.

Output: Weighted UA category requirements, summing to 1.0, with rationale per category.

niche: "AI-esoterics (tarot, astrology, predictions)"
total_categories: 4
categories:
  - name: audience_trust_and_brand
    weight: 0.40
    rationale: "Top winners (Co-Star, Sanctuary) built via audience acquisition and brand trust. Repeat subscription requires trust at high rates."
  - name: proprietary_content_library
    weight: 0.25
    rationale: "Differentiation vs ChatGPT-wrapper competitors requires unique content. Pattern shown by Nebula's curated astrologer library."
  - name: platform_distribution
    weight: 0.20
    rationale: "Top-of-funnel dominated by App Store ASO + social media virality. Access to these channels is moat."
  - name: tech_differentiation
    weight: 0.15
    rationale: "Only moderate weight — most competitors use similar AI layer. Truly differentiated tech could matter but historically hasn't determined winners."

Model: Sonnet 4.5.

Agent 2.3 — UA-Market Fit Scorer

Compares verified UAs (from 2.1) against niche requirements (from 2.2). Computes alignment.

Scoring algorithm:

alignment_score = 0.0
gaps = []
 
for req_category, req_weight in niche_requirements.items():
    matching_uas = [ua for ua in verified_uas if ua.maps_to(req_category)]
    
    if not matching_uas:
        gaps.append({
            "category": req_category,
            "weight": req_weight,
            "status": "UNSATISFIED",
            "impact": req_weight,
        })
        continue
    
    # Sum scoring_weight of matching UAs, capped at 1.0 per category
    category_score = min(1.0, sum(ua.scoring_weight for ua in matching_uas))
    alignment_score += req_weight * category_score
    
    if category_score < 0.5:
        gaps.append({
            "category": req_category,
            "weight": req_weight,
            "status": "WEAK",
            "matching_uas": [ua.claim for ua in matching_uas],
            "impact": req_weight * (1.0 - category_score),
        })
 
# Apply fabrication penalties if any
for ua in verified_uas:
    if ua.verification_verdict == "FABRICATED":
        alignment_score -= 0.1  # hard penalty for lying

Verdict thresholds:

  • alignment_score ≥ 0.80GO — alignment is strong, proceed
  • 0.60 ≤ alignment_score < 0.80CONDITIONAL — enterable with specific gap closures
  • alignment_score < 0.60NO-GO — alignment too weak for this niche

Output:

alignment_score: 0.38
entry_threshold: 0.60
verdict: NO-GO
per_category_breakdown:
  - category: audience_trust_and_brand
    weight: 0.40
    matching_uas: []
    status: UNSATISFIED
    contribution: 0.0
  # ... etc
fabrication_detected: false
gaps_ordered_by_impact:
  - {category: audience_trust_and_brand, impact: 0.40, status: UNSATISFIED}
  - {category: proprietary_content_library, impact: 0.25, status: UNSATISFIED}
  - {category: tech_differentiation, impact: 0.15, status: UNSATISFIED}

Model: Sonnet 4.5.

Agent 2.4 — M2 Director synthesis

Integrates outputs of 2.1, 2.2, 2.3 into M2 report section with Swiss tone.

Prompt emphasis:

  • No softening language (“but you’ve done great work” — REMOVE)
  • No consolation (“this is still valuable progress” — REMOVE)
  • No implicit emotional support
  • Present facts + scores + gaps + recommendations
  • Where negative verdict: include concrete gap-closing steps (not vague suggestions)
  • Where positive verdict: confirm strengths with verification evidence
  • Tone model: Swiss banking compliance report, not startup advisor

Example tone comparison:

Softened (wrong): “Your Unfair Advantages show real promise! While we identified some gaps around proprietary content, your distribution partnership with Synergia is a strong foundation to build from. With some focused work on the gaps, you could be well positioned.”

Swiss (correct): “UA-Niche alignment: 0.38 / 1.00. Verdict: NO-GO. Three of four niche-requirement categories have zero satisfying UAs. Your Synergia distribution UA is VERIFIED and contributes 0.20 to alignment — alone insufficient for 0.60 entry threshold. Gap closures required: (1) validate proprietary content library claim with concrete samples, (2) articulate tech differentiation versus named competitors, (3) develop audience/brand assets (current: none verifiable).”

Model: Sonnet 4.5.

M2 Judge

Reuses M1 Judge infrastructure (GPT-4o via ADR-0019). Evaluates:

  • Quality of UA verification (were appropriate methods used? Are evidence conclusions justified?)
  • Quality of niche requirement assignment (does the heuristic fit this niche?)
  • Quality of scoring (is the alignment calculation correct? Are gaps accurate?)
  • Quality of synthesis (does tone match Swiss standard? Are recommendations actionable?)

No new Judge code needed — plug into existing JudgeLLMClient.


Integration with M1 pipeline

Flow orchestration

Existing M1 pipeline continues as-is through:

  1. Agent-Intake (if deployed)
  2. CEO delegation
  3. Intel Director + 11 executors
  4. Aggregate + Judge
  5. Report generation

New: After M1 completion, if M1 verdict ∈ {GO, CONDITIONAL GO}:

  1. Streamlit UI shows “M2 Implementation Audit available. Run now? [Yes] [Skip]”
  2. If Yes: M2 Intake collects solution description + UA claims with evidence
  3. M2 Director kicks off 2.1, 2.2, 2.3 in parallel
  4. M2 Director + Judge synthesize final M2 report section
  5. Combined report (M1 + M2) rendered as unified deliverable

Storage and artifacts

Per-run directory structure (extends existing):

reports/streamlit_runs/<run_id>/
    ├── meta.json          (existing)
    ├── status.json        (existing)
    ├── brief.json         (Agent-Intake output if shipped)
    ├── full.json          (M1 director report — existing)
    ├── judgement.json     (M1 Judge — existing)
    ├── report.md          (M1 narrative — existing)
    ├── m2_input.json      (NEW — user's M2 intake)
    ├── m2_ua_verification.json  (NEW — Agent 2.1 output)
    ├── m2_niche_moat.json       (NEW — Agent 2.2 output)
    ├── m2_fit_scoring.json      (NEW — Agent 2.3 output)
    ├── m2_judgement.json        (NEW — M2 Judge)
    ├── m2_report.md             (NEW — M2 narrative)
    └── combined_report.md       (NEW — unified M1+M2 deliverable)

Auto-ingest extension

Pipeline Memory (ADR-0018) auto-ingest hook extends to M2 artifacts. M2 outputs become part of memory for future niche moat requirement heuristics.


External integrations required

Per ADR-0011 (Integration Triage Policy), each new integration needs evidence of need × 3-5 instances before approval.

Planned integrations with triage notes:

IntegrationPurposeADR-0011 status
USPTO public APIPatent verificationTriage required — no observed pain yet, but critical for core M2 use case
LinkedIn scraping (partnership verification)Verify claimed partnershipsTriage required — also ToS concern
Crunchbase APICompany / partnership dataTriage required — may be covered by web search
EPO / WIPO APIsInternational patent verificationTriage required — only if USPTO insufficient for first pilots
Industry registry scrapersLicense verification per jurisdictionMultiple — triage each separately

Default approach: Start with web_search for everything (already available). Only integrate specialized APIs when web_search proves insufficient across 3-5 verification attempts.


Cost and latency

Estimated per M2 run:

StageEstimated tokensEstimated costLatency
UA Verifier (5 UAs, web searches each)15000-30000 in/out$0.80-1.50120-240s
Niche Moat Requirements3000-6000 in/out$0.15-0.3015-30s
UA-Market Fit Scorer2000-4000 in/out$0.10-0.2010-20s
M2 Director synthesis4000-8000 in/out$0.20-0.4020-40s
M2 Judge3000-6000 in/out$0.15-0.30 (GPT-4o)10-20s
Total M227000-54000$1.40-2.70175-350s (~3-6 min)

Combined M1 + M2:

  • Cost: ~2.68 + M2 $1.40-2.70)
  • Duration: ~27-30 minutes (M1 24min + M2 3-6min)
  • Sold at $5-20K = 1000x-5000x compute cost (standard DD margin)

Implementation sprint plan

Sprint 1 (1 week) — M2 Intake infrastructure

Deliverables:

  • Streamlit UI for M2 input (triggered post-M1 verdict)
  • User input forms: solution description (free text) + UA list (structured with category, description, evidence)
  • Evidence upload handling (files + URLs)
  • Storage: m2_input.json in run directory
  • Validation: at least solution description + 1 UA required to proceed

Tests:

  • M2 input validation (reject empty)
  • Evidence URL format sanity check
  • M2 input persists correctly

Sprint 2 (2 weeks) — Unfair Advantage Verifier

Deliverables:

  • Agent 2.1 module with category-specific verification strategies
  • Web search integration for all categories (baseline)
  • USPTO API integration for patents (specific, gated on ADR-0011 triage)
  • Verification verdict classification logic
  • Per-UA structured output matching schema in spec
  • Tests (unit + integration with mock web search)

Open decision: which categories use fallback “best-effort LLM evaluation” vs “automated verification.” Default: automated where possible (patent, registry), LLM-only where not (process, know-how).

Sprint 3 (1 week) — Niche Moat Requirements

Deliverables:

  • Agent 2.2 module with heuristic library
  • Pipeline Memory integration for past-niche patterns
  • Weighted requirement output schema
  • Tests

Sprint 4 (1 week) — UA-Market Fit Scorer + M2 Director

Deliverables:

  • Agent 2.3 scoring algorithm implementation
  • Alignment score computation
  • Gap analysis logic
  • Agent 2.4 (M2 Director) synthesis with Swiss-tone prompt
  • Combined report rendering (M1 + M2 unified markdown)
  • Tests

Sprint 5 (1 week) — Integration + end-to-end testing

Deliverables:

  • M1 → M2 orchestration in run_m1_query.py (or equivalent)
  • Streamlit UI for M2 output display
  • End-to-end tests on 3+ distinct niches (replay R26, R27 + 1 new niche)
  • M2 Judge integration (already exists, just plug in)
  • Updated auto-ingest to include M2 artifacts in Pipeline Memory

Acceptance criteria:

  • 3 distinct niches each complete M1 + M2 successfully
  • M2 verdicts are correctly calibrated (negative when UAs don’t fit, positive when they do)
  • M2 Judge score ≥ 7.0 on each end-to-end test
  • No regression in M1 performance (still Judge ≥ 7.5)

Success metrics

Technical:

  • M2 completes successfully ≥ 95% of the time
  • M2 cost stays ≤ $3.00 per run
  • M2 duration stays ≤ 8 minutes
  • M2 Judge score ≥ 7.0 on 5 consecutive distinct-niche tests

Product:

  • UA verification accuracy ≥ 80% on labeled ground truth (requires manual audit set)
  • Niche moat requirements match expert assessment ≥ 75% on labeled set
  • Swiss-tone compliance ≥ 90% (no softening language detected by review agent)

Business:

  • First external pilot willing to pay $5K+ for Combined audit
  • 3 external audits completed within 3 months of Sprint 5 completion
  • First customer NPS ≥ 7/10 (even on negative verdicts — they value accuracy)

Risks and mitigations

RiskLikelihoodMitigation
UA verification hits third-party API rate limitsMediumCache results per UA claim, fallback to web_search
Niche moat heuristic library has gapsExpectedLLM fallback with “low confidence” flagging; iterate library per encountered niche
Users claim fabricated UAs and we don’t catchLow-MediumFABRICATED verdict with penalty; manual review for first 10 audits
Swiss tone fails — LLM softens verdicts anywayMediumPost-synthesis review agent checks for softening phrases; retry if detected
Combined pricing ($5-20K) is rejected by marketMediumFirst 3 customers priced at $1-3K as beta; adjust upward based on feedback
Negative verdicts damage brand (“they told me NO after I paid”)Medium-HighTransparent pricing communication: “we audit, we don’t endorse”; testimonials from accurate-NO cases
Gaps analysis becomes generic boilerplateMediumPrompt emphasis on specificity; include M1 context (named competitors, specific pricing tiers)
M2 run on “wrong” M1 output (PASS verdict) runs anywayLowUI gates M2 offer behind M1 verdict check

Open questions for resolution

From ADR-0025 parent:

  • Q1: Supersedes ADR-0013 fully or coexist? Default: full supersede.
  • Q2: Pricing directionally correct at $5-20K? Default: yes, validate with first 3 customers.
  • Q3: API cost budgeting? Needs separate business decision.
  • Q4: M2 Final Verdict triggers Chamber? Default: yes (L3 criticality).
  • Q5: Niches with no matching heuristic — fallback strategy? Default: LLM best-guess with low-confidence flag.
  • Q6: First external test customer? Needs identification.
  • Q7: Legal disclaimer required? Default: yes, standard for DD.

Additional spec-level open questions:

  • Q8: Evidence upload file size limit? Format restrictions? Propose: 10MB per file, PDF/DOC/MD/TXT + URL-only for all else.
  • Q9: M2 input form layout — structured form vs conversational agent (like Agent-Intake)? Default: structured form for Sprint 1 MVP, upgrade to conversational in later iteration.
  • Q10: Can M2 be re-run on same M1 with updated UA claims? Default: yes, each re-run is new artifact, old artifacts retained.

Next actions

  1. Denis approves ADR-0025 direction
  2. Denis resolves open questions (at least Q1, Q2, Q4 critical)
  3. Deploy this spec to manifest as 07-Roadmap/M2-Navigator-Spec-v2.md
  4. Wait for R28/R29 validation on other niches (stabilize M1 before M2 build)
  5. When M1 stable + 2 external testers done: begin Sprint 1

Estimated earliest Sprint 1 start: May-June 2026.