M2 Navigator Spec v2 — Unfair Advantage Audit

Status: Proposed, awaiting ADR-0025 approval Target start: After R27 validation stabilizes + 2 more clean M1 runs on distinct niches Estimated duration: 6 weeks across 5 sprints

This spec replaces the M2 scope described in ADR-0013. Team-competency assessment, profile harvesting, and track-record research are removed. UA audit becomes the complete M2.

Purpose

M2 answers the question: “Given the niche from M1, can the user’s proposed entry succeed?”

It answers this by:

Extracting what Unfair Advantages this niche historically rewards
Verifying each Unfair Advantage the user claims to have
Computing alignment score between claimed+verified UAs and niche requirements
Producing Swiss-neutral verdict with gap analysis if below threshold

M2 is NOT:

A team competency assessment
A budget/timeline validator
An empathetic support tool
A standalone product (requires M1 context)

Inputs to M2

Required from user

Solution description:

Free-text or structured answer to “What product/service do you plan to build for this niche?” (100-1000 words typical).

Used for: context in UA scoring (does the solution match how claimed UAs would be deployed?)

Claimed Unfair Advantages:

List, 1-8 items typical. Each item:

- category: patent | license | partnership | data | distribution | regulatory | brand | network | process
  description: "What specifically is the claim?"
  evidence:
    - type: url | file | reference | text
      content: "The actual evidence or pointer"

Inherited from M1

Full niche analysis including competitor landscape, moat patterns observed, financial structure
M1 confidence scores
Recommended next steps (informs scoring context)

Explicitly NOT collected

Team member profiles, CVs, LinkedIn links
Budget amount
Timeline estimates
Self-reported experience/expertise levels
Past company names (too easily fabricated)

Agent architecture

M2 Director (orchestrator)

Receives M1 output + user’s M2 input. Dispatches to three specialized agents. Synthesizes final report section.

Position: peer to Intel Director in 3-tier hierarchy (ADR-0003).

Model: Sonnet 4.5.

Agent 2.1 — Unfair Advantage Verifier

For each claimed UA, attempts verification via category-specific strategies.

Verification strategies reference table:

Category	Strategy	Confidence ceiling
Patent	USPTO / EPO / Rospatent / WIPO public search by number or inventor	High (if claim includes specific number)
License / certification	Search relevant public registry (FDA, CE, industry body) by entity name	Medium (registries vary in completeness)
Partnership / distribution	Web search target company announcements, press releases, joint case studies; LinkedIn company page check	Medium (partnerships often under-announced)
Proprietary data	Evaluate uniqueness — search for equivalent datasets, assess coverage claims vs alternatives	Medium-Low (hard to prove negatives)
Owned distribution channel	Verify account existence, metrics (audience size via public profile), activity level	Medium-High
Brand / audience recognition	Search mentions, press, review sites; audience size verification	Low-Medium
Regulatory approval	Check public registries per jurisdiction	Medium-High (if registry exists)
Network / relationships	Generally unverifiable via automated checks	Low (mark UNVERIFIABLE)
Process / know-how	Generally unverifiable	Low (mark UNVERIFIABLE)

Output per UA (structured):

- claim: original text
  category: user-provided category
  category_normalized: canonical category if user chose non-canonical
  verification_verdict: VERIFIED | PARTIALLY_VERIFIED | UNVERIFIED | UNVERIFIABLE | FABRICATED
  verification_method: what was actually done
  verification_evidence:
    - type: url | excerpt | registry_entry | api_response
      content: "what was found"
      matches_claim: boolean
  verification_confidence: 0.0-1.0
  scoring_weight: computed from verdict (VERIFIED=1.0, PARTIAL=0.5, UNVERIFIED=0, UNVERIFIABLE=0, FABRICATED=-0.3 penalty)
  notes: "caveat, limitation, context"

Model: Sonnet 4.5 with web search tool access. May require patent API integration per ADR-0011.

Agent 2.2 — Niche Moat Requirements

Analyzes M1 output to determine what categories of UAs matter for this specific niche.

Input: M1 output (especially competitor analysis, market analysis, financial model structure)

Process:

Examine existing niche winners (from M1 competitor landscape) — what moats do they demonstrate?
Apply heuristic library for niche-type patterns:
- Consumer apps with network effects: audience/brand weight 40-50%, distribution 20-30%
- Regulated industries (health/finance/education): license/regulatory 30-40%, data 15-25%
- B2B SaaS with enterprise sales: distribution 25-35%, tech differentiation 20-30%
- Commodity markets: cost/scale advantage 40-50%, brand secondary
- Content platforms: proprietary content library 25-35%, audience 30-40%
- Marketplaces: liquidity/network effects 40-50%, distribution 20-30%
- Developer tools: tech differentiation 30-40%, community/network 20-30%
Leverage Pipeline Memory (ADR-0018) for past-niche learnings — if similar niches have been analyzed before, their moat patterns inform this output.

Output: Weighted UA category requirements, summing to 1.0, with rationale per category.

niche: "AI-esoterics (tarot, astrology, predictions)"
total_categories: 4
categories:
  - name: audience_trust_and_brand
    weight: 0.40
    rationale: "Top winners (Co-Star, Sanctuary) built via audience acquisition and brand trust. Repeat subscription requires trust at high rates."
  - name: proprietary_content_library
    weight: 0.25
    rationale: "Differentiation vs ChatGPT-wrapper competitors requires unique content. Pattern shown by Nebula's curated astrologer library."
  - name: platform_distribution
    weight: 0.20
    rationale: "Top-of-funnel dominated by App Store ASO + social media virality. Access to these channels is moat."
  - name: tech_differentiation
    weight: 0.15
    rationale: "Only moderate weight — most competitors use similar AI layer. Truly differentiated tech could matter but historically hasn't determined winners."

Model: Sonnet 4.5.

Agent 2.3 — UA-Market Fit Scorer

Compares verified UAs (from 2.1) against niche requirements (from 2.2). Computes alignment.

Scoring algorithm:

alignment_score = 0.0
gaps = []
 
for req_category, req_weight in niche_requirements.items():
    matching_uas = [ua for ua in verified_uas if ua.maps_to(req_category)]
    
    if not matching_uas:
        gaps.append({
            "category": req_category,
            "weight": req_weight,
            "status": "UNSATISFIED",
            "impact": req_weight,
        })
        continue
    
    # Sum scoring_weight of matching UAs, capped at 1.0 per category
    category_score = min(1.0, sum(ua.scoring_weight for ua in matching_uas))
    alignment_score += req_weight * category_score
    
    if category_score < 0.5:
        gaps.append({
            "category": req_category,
            "weight": req_weight,
            "status": "WEAK",
            "matching_uas": [ua.claim for ua in matching_uas],
            "impact": req_weight * (1.0 - category_score),
        })
 
# Apply fabrication penalties if any
for ua in verified_uas:
    if ua.verification_verdict == "FABRICATED":
        alignment_score -= 0.1  # hard penalty for lying

Verdict thresholds:

alignment_score ≥ 0.80 → GO — alignment is strong, proceed
0.60 ≤ alignment_score < 0.80 → CONDITIONAL — enterable with specific gap closures
alignment_score < 0.60 → NO-GO — alignment too weak for this niche

Output:

alignment_score: 0.38
entry_threshold: 0.60
verdict: NO-GO
per_category_breakdown:
  - category: audience_trust_and_brand
    weight: 0.40
    matching_uas: []
    status: UNSATISFIED
    contribution: 0.0
  # ... etc
fabrication_detected: false
gaps_ordered_by_impact:
  - {category: audience_trust_and_brand, impact: 0.40, status: UNSATISFIED}
  - {category: proprietary_content_library, impact: 0.25, status: UNSATISFIED}
  - {category: tech_differentiation, impact: 0.15, status: UNSATISFIED}

Model: Sonnet 4.5.

Agent 2.4 — M2 Director synthesis

Integrates outputs of 2.1, 2.2, 2.3 into M2 report section with Swiss tone.

Prompt emphasis:

No softening language (“but you’ve done great work” — REMOVE)
No consolation (“this is still valuable progress” — REMOVE)
No implicit emotional support
Present facts + scores + gaps + recommendations
Where negative verdict: include concrete gap-closing steps (not vague suggestions)
Where positive verdict: confirm strengths with verification evidence
Tone model: Swiss banking compliance report, not startup advisor

Example tone comparison:

❌ Softened (wrong): “Your Unfair Advantages show real promise! While we identified some gaps around proprietary content, your distribution partnership with Synergia is a strong foundation to build from. With some focused work on the gaps, you could be well positioned.”

✓ Swiss (correct): “UA-Niche alignment: 0.38 / 1.00. Verdict: NO-GO. Three of four niche-requirement categories have zero satisfying UAs. Your Synergia distribution UA is VERIFIED and contributes 0.20 to alignment — alone insufficient for 0.60 entry threshold. Gap closures required: (1) validate proprietary content library claim with concrete samples, (2) articulate tech differentiation versus named competitors, (3) develop audience/brand assets (current: none verifiable).”

Model: Sonnet 4.5.

M2 Judge

Reuses M1 Judge infrastructure (GPT-4o via ADR-0019). Evaluates:

Quality of UA verification (were appropriate methods used? Are evidence conclusions justified?)
Quality of niche requirement assignment (does the heuristic fit this niche?)
Quality of scoring (is the alignment calculation correct? Are gaps accurate?)
Quality of synthesis (does tone match Swiss standard? Are recommendations actionable?)

No new Judge code needed — plug into existing JudgeLLMClient.

Integration with M1 pipeline

Flow orchestration

Existing M1 pipeline continues as-is through:

Agent-Intake (if deployed)
CEO delegation
Intel Director + 11 executors
Aggregate + Judge
Report generation

New: After M1 completion, if M1 verdict ∈ {GO, CONDITIONAL GO}:

Streamlit UI shows “M2 Implementation Audit available. Run now? [Yes] [Skip]”
If Yes: M2 Intake collects solution description + UA claims with evidence
M2 Director kicks off 2.1, 2.2, 2.3 in parallel
M2 Director + Judge synthesize final M2 report section
Combined report (M1 + M2) rendered as unified deliverable

Storage and artifacts

Per-run directory structure (extends existing):

reports/streamlit_runs/<run_id>/
    ├── meta.json          (existing)
    ├── status.json        (existing)
    ├── brief.json         (Agent-Intake output if shipped)
    ├── full.json          (M1 director report — existing)
    ├── judgement.json     (M1 Judge — existing)
    ├── report.md          (M1 narrative — existing)
    ├── m2_input.json      (NEW — user's M2 intake)
    ├── m2_ua_verification.json  (NEW — Agent 2.1 output)
    ├── m2_niche_moat.json       (NEW — Agent 2.2 output)
    ├── m2_fit_scoring.json      (NEW — Agent 2.3 output)
    ├── m2_judgement.json        (NEW — M2 Judge)
    ├── m2_report.md             (NEW — M2 narrative)
    └── combined_report.md       (NEW — unified M1+M2 deliverable)

Auto-ingest extension

Pipeline Memory (ADR-0018) auto-ingest hook extends to M2 artifacts. M2 outputs become part of memory for future niche moat requirement heuristics.

External integrations required

Per ADR-0011 (Integration Triage Policy), each new integration needs evidence of need × 3-5 instances before approval.

Planned integrations with triage notes:

Integration	Purpose	ADR-0011 status
USPTO public API	Patent verification	Triage required — no observed pain yet, but critical for core M2 use case
LinkedIn scraping (partnership verification)	Verify claimed partnerships	Triage required — also ToS concern
Crunchbase API	Company / partnership data	Triage required — may be covered by web search
EPO / WIPO APIs	International patent verification	Triage required — only if USPTO insufficient for first pilots
Industry registry scrapers	License verification per jurisdiction	Multiple — triage each separately

Default approach: Start with web_search for everything (already available). Only integrate specialized APIs when web_search proves insufficient across 3-5 verification attempts.

Cost and latency

Estimated per M2 run:

Stage	Estimated tokens	Estimated cost	Latency
UA Verifier (5 UAs, web searches each)	15000-30000 in/out	$0.80-1.50	120-240s
Niche Moat Requirements	3000-6000 in/out	$0.15-0.30	15-30s
UA-Market Fit Scorer	2000-4000 in/out	$0.10-0.20	10-20s
M2 Director synthesis	4000-8000 in/out	$0.20-0.40	20-40s
M2 Judge	3000-6000 in/out	$0.15-0.30 (GPT-4o)	10-20s
Total M2	27000-54000	$1.40-2.70	175-350s (~3-6 min)

Combined M1 + M2:

Cost: ~ $4.00 - 5.50 p er f u ll a u d i t (M 1$ 2.68 + M2 $1.40-2.70)
Duration: ~27-30 minutes (M1 24min + M2 3-6min)
Sold at $5-20K = 1000x-5000x compute cost (standard DD margin)

Implementation sprint plan

Sprint 1 (1 week) — M2 Intake infrastructure

Deliverables:

Streamlit UI for M2 input (triggered post-M1 verdict)
User input forms: solution description (free text) + UA list (structured with category, description, evidence)
Evidence upload handling (files + URLs)
Storage: m2_input.json in run directory
Validation: at least solution description + 1 UA required to proceed

Tests:

M2 input validation (reject empty)
Evidence URL format sanity check
M2 input persists correctly

Sprint 2 (2 weeks) — Unfair Advantage Verifier

Deliverables:

Agent 2.1 module with category-specific verification strategies
Web search integration for all categories (baseline)
USPTO API integration for patents (specific, gated on ADR-0011 triage)
Verification verdict classification logic
Per-UA structured output matching schema in spec
Tests (unit + integration with mock web search)

Open decision: which categories use fallback “best-effort LLM evaluation” vs “automated verification.” Default: automated where possible (patent, registry), LLM-only where not (process, know-how).

Sprint 3 (1 week) — Niche Moat Requirements

Deliverables:

Agent 2.2 module with heuristic library
Pipeline Memory integration for past-niche patterns
Weighted requirement output schema
Tests

Sprint 4 (1 week) — UA-Market Fit Scorer + M2 Director

Deliverables:

Agent 2.3 scoring algorithm implementation
Alignment score computation
Gap analysis logic
Agent 2.4 (M2 Director) synthesis with Swiss-tone prompt
Combined report rendering (M1 + M2 unified markdown)
Tests

Sprint 5 (1 week) — Integration + end-to-end testing

Deliverables:

M1 → M2 orchestration in run_m1_query.py (or equivalent)
Streamlit UI for M2 output display
End-to-end tests on 3+ distinct niches (replay R26, R27 + 1 new niche)
M2 Judge integration (already exists, just plug in)
Updated auto-ingest to include M2 artifacts in Pipeline Memory

Acceptance criteria:

3 distinct niches each complete M1 + M2 successfully
M2 verdicts are correctly calibrated (negative when UAs don’t fit, positive when they do)
M2 Judge score ≥ 7.0 on each end-to-end test
No regression in M1 performance (still Judge ≥ 7.5)

Success metrics

Technical:

M2 completes successfully ≥ 95% of the time
M2 cost stays ≤ $3.00 per run
M2 duration stays ≤ 8 minutes
M2 Judge score ≥ 7.0 on 5 consecutive distinct-niche tests

Product:

UA verification accuracy ≥ 80% on labeled ground truth (requires manual audit set)
Niche moat requirements match expert assessment ≥ 75% on labeled set
Swiss-tone compliance ≥ 90% (no softening language detected by review agent)

Business:

First external pilot willing to pay $5K+ for Combined audit
3 external audits completed within 3 months of Sprint 5 completion
First customer NPS ≥ 7/10 (even on negative verdicts — they value accuracy)

Risks and mitigations

Risk	Likelihood	Mitigation
UA verification hits third-party API rate limits	Medium	Cache results per UA claim, fallback to web_search
Niche moat heuristic library has gaps	Expected	LLM fallback with “low confidence” flagging; iterate library per encountered niche
Users claim fabricated UAs and we don’t catch	Low-Medium	FABRICATED verdict with penalty; manual review for first 10 audits
Swiss tone fails — LLM softens verdicts anyway	Medium	Post-synthesis review agent checks for softening phrases; retry if detected
Combined pricing ($5-20K) is rejected by market	Medium	First 3 customers priced at $1-3K as beta; adjust upward based on feedback
Negative verdicts damage brand (“they told me NO after I paid”)	Medium-High	Transparent pricing communication: “we audit, we don’t endorse”; testimonials from accurate-NO cases
Gaps analysis becomes generic boilerplate	Medium	Prompt emphasis on specificity; include M1 context (named competitors, specific pricing tiers)
M2 run on “wrong” M1 output (PASS verdict) runs anyway	Low	UI gates M2 offer behind M1 verdict check

Open questions for resolution

From ADR-0025 parent:

Q1: Supersedes ADR-0013 fully or coexist? Default: full supersede.
Q2: Pricing directionally correct at $5-20K? Default: yes, validate with first 3 customers.
Q3: API cost budgeting? Needs separate business decision.
Q4: M2 Final Verdict triggers Chamber? Default: yes (L3 criticality).
Q5: Niches with no matching heuristic — fallback strategy? Default: LLM best-guess with low-confidence flag.
Q6: First external test customer? Needs identification.
Q7: Legal disclaimer required? Default: yes, standard for DD.

Additional spec-level open questions:

Q8: Evidence upload file size limit? Format restrictions? Propose: 10MB per file, PDF/DOC/MD/TXT + URL-only for all else.
Q9: M2 input form layout — structured form vs conversational agent (like Agent-Intake)? Default: structured form for Sprint 1 MVP, upgrade to conversational in later iteration.
Q10: Can M2 be re-run on same M1 with updated UA claims? Default: yes, each re-run is new artifact, old artifacts retained.

Next actions

Denis approves ADR-0025 direction
Denis resolves open questions (at least Q1, Q2, Q4 critical)
Deploy this spec to manifest as 07-Roadmap/M2-Navigator-Spec-v2.md
Wait for R28/R29 validation on other niches (stabilize M1 before M2 build)
When M1 stable + 2 external testers done: begin Sprint 1

Estimated earliest Sprint 1 start: May-June 2026.

Synth Nova Manifest

Explorer

M2 Navigator Spec v2 — Unfair Advantage Audit

M2 Navigator Spec v2 — Unfair Advantage Audit

Purpose

Inputs to M2

Required from user

Inherited from M1

Explicitly NOT collected

Agent architecture

M2 Director (orchestrator)

Agent 2.1 — Unfair Advantage Verifier

Agent 2.2 — Niche Moat Requirements

Agent 2.3 — UA-Market Fit Scorer

Agent 2.4 — M2 Director synthesis

M2 Judge

Integration with M1 pipeline

Flow orchestration

Storage and artifacts

Auto-ingest extension

External integrations required

Cost and latency

Implementation sprint plan

Sprint 1 (1 week) — M2 Intake infrastructure

Sprint 2 (2 weeks) — Unfair Advantage Verifier

Sprint 3 (1 week) — Niche Moat Requirements

Sprint 4 (1 week) — UA-Market Fit Scorer + M2 Director

Sprint 5 (1 week) — Integration + end-to-end testing

Success metrics

Risks and mitigations

Open questions for resolution

Next actions

Graph View

Table of Contents

Backlinks