Phase 2 Report Modules — Field Mapping Reference

Purpose: Precise data paths for each of the 6 Phase 2 slide modules based on validated R27 full.json structure. Use this document alongside Report-Generator-Spec when implementing modules.

Context: Validated via REPL against R27 artifacts at reports/streamlit_runs/20260420_154017_техническое_задание_проект_ai_эзотерика_провести_анализ_ниши/ on 2026-04-21. Structure confirmed by direct filesystem inspection.


Root structure of run directory

reports/streamlit_runs/<run_id>/
├── meta.json              # Run metadata (query, timings, costs)
├── status.json            # Completion status
├── judgement.json         # Judge scores + verdict
├── full.json              # PRIMARY — all agent outputs combined
├── report.md              # Narrative report (not used by Report Generator)
├── activity.jsonl         # Event log (not used by Report Generator)
├── brief.json             # Agent-Intake output (if Agent-Intake deployed — future)
├── stage_*.json           # Individual stage outputs (redundant with full.json)
└── pipeline.log           # Runtime log (not used by Report Generator)

full.json top-level structure

{
  "director_report": { ... },   // Intel Director's synthesized output
  "scout_result": { ... },      // Scout's raw competitor research
  "researcher_result": { ... }  // Researcher's raw market data
}

Important: Some data appears in both director_report and in scout_result or researcher_result. Director output is pre-processed and cleaner for rendering. Prefer director_report paths except where noted.


director_report keys (validated R27)

KeyUsed by moduleNotes
_metaInternal metadata, not rendered
stageProcessing stage indicator
executive_summaryM10_ExecutiveSummaryMain exec summary text
scorecardsM10_ExecutiveSummaryWeighted scores (Risk, Growth, Entry, Attractiveness)
confidence_overallM20_ResearchQuality, M10 (badge)0.00-1.00 float
recommended_next_stepsM99_NextStepsList of action items
budget_allocation_noteM99_NextStepsBudget distribution across next steps
gaps_opportunitiesM20_ResearchQualityP0/P1/P2 knowledge gaps
what_we_dont_knowM20_ResearchQuality, M10 (risks)Research limitations
key_findingsM10_ExecutiveSummary (opportunities)Top 3-5 findings
competitorsM40, M41Pre-processed competitor summaries
market_analysisM30 (cross-ref)Director’s market summary
audience_segmentsM50_AudienceSegmentsSegments with demographics
financial_modelM80_UnitEconomics, M81_ScenariosUnit economics + scenarios
funnelsM60_FunnelAnalysisCJM + drop-off analysis
product_matrixM70_PricingTiersTier structure with features
content_strategy(not in MVP)Future module M110
sales_scripts(not in MVP)Future module M111
radar_chart_dataM10_ExecutiveSummary (optional)Scorecard radar visualization

scout_result keys (validated R27)

KeyUsed by moduleNotes
_meta
_output_method
competitors(fallback)Raw competitor list — use director_report.competitors first
total_competitors_foundM40 header statInteger count
search_intentM20What Scout was looking for
search_queries_usedM20 (detail)Queries executed
_source_qualityM20_ResearchQualityTier breakdown (Tier 1/2/3)
gapsM20_ResearchQualityGaps Scout identified
confidenceM20Scout-specific confidence
notes(optional)Free text annotations

researcher_result keys (validated R27)

KeyUsed by moduleNotes
_meta
_output_method
_validation_warningsInternal, not rendered
market_sizeM30_MarketSizing, M31_RegionalDistributionPRIMARY SOURCE for TAM/SAM/SOM
dynamicsM32_GrowthDriversCAGR + growth drivers
segments(cross-ref)Can complement director_report.audience_segments
pricing_analysisM70_PricingTiers, M71_RegionalWTPRegional WTP data
sourcesM20_ResearchQualitySource list with tier classification
confidenceM20Researcher-specific confidence
key_findings(cross-ref)Complements director_report.key_findings
what_we_dont_knowM20_ResearchQualityResearcher-specific gaps

judgement.json structure (validated R27)

{
  "run_id": "20260420_154017",
  "overall_score": 8.0,                    // float 0-10
  "verdict": "PASS",                        // "PASS" | "CONDITIONAL GO" | "FAIL"
  "stages": {
    "director": { "score": 8.0, ... },
    "scout": { "score": 7.0, ... },
    "researcher": { "score": 8.0, ... },
    "aggregate": { "score": 8.5, ... }
  },
  "blockers": [ ... ]                       // List of issues found
}

Used by:

  • M01_Coveroverall_score, verdict
  • M10_ExecutiveSummaryoverall_score (big number badge)
  • M20_ResearchQuality — per-stage scores from stages dict

meta.json structure (validated R27)

{
  "run_id": "20260420_154017",
  "query": "Техническое задание: проект AI-эзотерика...",
  "started_at": "2026-04-20T15:40:19.676472+00:00",
  "finished_at": "2026-04-20T16:04:14.664187+00:00",
  "total_cost_usd": 2.681989,
  "total_duration_seconds": 1434,
  "total_tokens": 492735,
  "compression_meta": { ... }
}

Used by:

  • M01_Cover — topic extracted from query, run_id, finished_at
  • M20_ResearchQuality — cost and duration breakdown

Module-by-module field mapping

For each Phase 2 module, exact data paths with extraction notes.


M01_Cover

File: src/synth_brain/reporting/modules/m01_cover.py Section: cover Priority: 100

is_available()

Always available if meta.json and judgement.json exist with minimum fields.

def is_available(self, ctx):
    return bool(ctx.meta.get("query")) and bool(ctx.judgement.get("overall_score"))

extract()

{
    "topic": extract_topic_from_query(ctx.meta["query"]),
    "judge_score": ctx.judgement["overall_score"],
    "verdict": ctx.judgement.get("verdict", "N/A"),
    "run_date": ctx.meta.get("finished_at", "")[:10],  # YYYY-MM-DD
    "sam": safe_get(ctx.full_json, "researcher_result.market_size", alias_groups={
        "sam": ["sam_usd", "sam", "sam_value", "serviceable_addressable_market"]
    }),
    "cagr": safe_get(ctx.full_json, "researcher_result.dynamics", alias_groups={
        "cagr": ["cagr", "cagr_pct", "growth_rate_cagr", "annual_growth_rate"]
    }),
    "target_audience_brief": extract_audience_brief(ctx.meta["query"]),
    "ltv": safe_get(ctx.full_json, "director_report.financial_model", alias_groups={
        "ltv": ["ltv_usd", "ltv", "lifetime_value", "ltv_premium_tier"]
    }),
    "cac": safe_get(ctx.full_json, "director_report.financial_model", alias_groups={
        "cac": ["cac_usd", "cac", "customer_acquisition_cost", "cac_blended"]
    }),
}

Helper: extract_topic_from_query

User query is free text like “Техническое задание: проект AI-эзотерика…“. Extract 3-5 word topic phrase.

def extract_topic_from_query(query: str) -> str:
    # Common patterns: "проект X", "ниша Y", "AI для Z"
    for marker in ["проект ", "ниша ", "анализ ниши ", "AI-"]:
        if marker in query.lower():
            idx = query.lower().find(marker)
            candidate = query[idx+len(marker):].split(".")[0].split(",")[0]
            words = candidate.split()[:5]
            return " ".join(words).strip()
    # Fallback: first 40 chars
    return query[:40].strip() + "..."

render()

Title slide layout per Genspark reference (verified in PPTX inspection):

  • Center-top: big topic title (48pt)
  • Below: “Инвестиционный отчёт” subtitle (20pt muted)
  • 6 KPI tiles in 3×2 grid below subtitle
  • Footer: run_date, run_id small (10pt)

KPI tiles: Judge Score + /10, SAM, CAGR, Target Audience, LTV, CAC.


M10_ExecutiveSummary

File: src/synth_brain/reporting/modules/m10_executive_summary.py Section: executive Priority: 100

is_available()

def is_available(self, ctx):
    return bool(ctx.full_json.get("director_report", {}).get("executive_summary"))

extract()

dr = ctx.full_json["director_report"]
{
    "title": "Executive Summary",
    "judge_score": ctx.judgement.get("overall_score", 0),
    "verdict": ctx.judgement.get("verdict", ""),
    "weighted_score": safe_get(dr, "scorecards", alias_groups={
        "weighted": ["weighted_score", "overall_weighted", "composite_score"]
    }, default=0),
    "confidence": dr.get("confidence_overall", 0),
    "key_findings": dr.get("key_findings", [])[:5],  # top 5
    "risks": dr.get("what_we_dont_know", [])[:3],
    "scorecards": safe_get(dr, "scorecards", alias_groups={
        "risk": ["risk_score", "risk", "risk_assessment"],
        "growth": ["growth_potential_score", "growth_potential", "growth_score"],
        "entry": ["entry_difficulty_score", "entry_difficulty", "entry_score"],
        "attractiveness": ["market_attractiveness_score", "market_attractiveness"],
    }),
}

render()

Two-column layout:

  • Left: “Opportunities” (green accent) — weighted score + key_findings as bullets
  • Right: “Risks” (amber accent) — confidence + risks as bullets
  • Bottom center: verdict badge (GO green / CONDITIONAL GO amber / FAIL rose)

M99_NextSteps

File: src/synth_brain/reporting/modules/m99_next_steps.py Section: verdict Priority: 100

is_available()

def is_available(self, ctx):
    return bool(ctx.full_json.get("director_report", {}).get("recommended_next_steps"))

extract()

dr = ctx.full_json["director_report"]
{
    "title": "Next Steps",
    "verdict": ctx.judgement.get("verdict", ""),
    "steps": dr.get("recommended_next_steps", [])[:5],  # top 5
    "budget_note": dr.get("budget_allocation_note", ""),
}

Step structure (per R27 inspection)

Each step in recommended_next_steps has variable structure. Use alias groups:

step_alias_groups = {
    "action": ["action", "step", "description", "title"],
    "timeline": ["timeline", "duration", "weeks", "months", "time_estimate"],
    "budget": ["budget", "budget_usd", "cost_usd", "estimated_cost"],
    "kpi": ["kpi", "success_metric", "target", "outcome"],
    "priority": ["priority", "p", "rank"],
}

render()

  • Title + verdict badge (center-top)
  • 3-5 action cards in single column (stacked vertically)
  • Each card: number + action + timeline + budget + KPI
  • Card background color coded by priority (P0 blue / P1 muted)

M30_MarketSizing

File: src/synth_brain/reporting/modules/m30_market_sizing.py Section: market Priority: 100

is_available()

def is_available(self, ctx):
    return bool(ctx.full_json.get("researcher_result", {}).get("market_size"))

extract()

rr = ctx.full_json["researcher_result"]["market_size"]
{
    "title": "Анализ рынка",
    "subtitle": "TAM / SAM / SOM с источниками",
    "tam": safe_get(rr, "", alias_groups={
        "value": ["tam_usd", "tam", "tam_value", "total_addressable_market"]
    }),
    "sam": safe_get(rr, "", alias_groups={
        "value": ["sam_usd", "sam", "sam_value", "serviceable_addressable_market"]
    }),
    "som": safe_get(rr, "", alias_groups={
        "value": ["som_usd", "som", "som_value", "serviceable_obtainable_market"]
    }),
    "cagr": safe_get(ctx.full_json, "researcher_result.dynamics", alias_groups={
        "cagr": ["cagr", "cagr_pct", "annual_growth_rate"]
    }),
    "regional": safe_get(rr, "regional_distribution", default=[]),
    "sources": rr.get("sources", []) or ctx.full_json.get("researcher_result", {}).get("sources", []),
}

render()

  • Title + subtitle (top)
  • Funnel visualization: TAM box → SAM box → SOM box (top to bottom, narrowing)
  • CAGR badge (top-right corner)
  • Regional distribution mini-chart (bottom)
  • Sources count badge (bottom-right)

M50_AudienceSegments

File: src/synth_brain/reporting/modules/m50_audience_segments.py Section: audience Priority: 100

is_available()

def is_available(self, ctx):
    segments = ctx.full_json.get("director_report", {}).get("audience_segments", [])
    return len(segments) > 0

extract()

segments_raw = ctx.full_json["director_report"].get("audience_segments", [])
segments = []
for seg in segments_raw[:3]:  # Max 3 segments in one slide
    segments.append({
        "name": safe_get(seg, "", alias_groups={
            "name": ["name", "segment_name", "title", "label"]
        }),
        "sam_share": safe_get(seg, "", alias_groups={
            "share": ["sam_share_pct", "sam_share", "share_of_sam", "percentage"]
        }),
        "size_usd": safe_get(seg, "", alias_groups={
            "size": ["size_usd", "size", "market_size_usd", "segment_size"]
        }),
        "age_range": safe_get(seg, "demographics", alias_groups={
            "age": ["age", "age_range", "age_group"]
        }),
        "income": safe_get(seg, "demographics", alias_groups={
            "income": ["income", "income_usd", "income_range"]
        }),
        "arpu": safe_get(seg, "", alias_groups={
            "arpu": ["arpu_monthly", "arpu", "revenue_per_user"]
        }),
        "jtbd": safe_get(seg, "", alias_groups={
            "jtbd": ["jtbd", "job_to_be_done", "primary_need"]
        }),
        "pain_points": safe_get(seg, "", alias_groups={
            "pain": ["pain_points", "pains", "frustrations"]
        }, default=[]),
    })
 
{
    "title": "Сегменты аудитории",
    "subtitle": "Демография, JTBD, boli",
    "segments": segments,
}

render()

3-column card layout:

  • Each card 4” wide × 5.5” tall
  • Top: segment name + SAM share %
  • Middle: demographics (age, income)
  • Bottom: JTBD in italics, 2-3 pain points as bullets
  • ARPU badge in top-right corner of each card

M80_UnitEconomics

File: src/synth_brain/reporting/modules/m80_unit_economics.py Section: financial Priority: 100

is_available()

def is_available(self, ctx):
    fm = ctx.full_json.get("director_report", {}).get("financial_model", {})
    return bool(fm)

extract() — CRITICAL: use alias groups extensively

This is where the R27→R28 rendering bug happened. LLM produces 20+ different field name variants for the same 6 metrics. Study section_renderers.py:165-172 commit f5f8644 for the reference implementation.

fm = ctx.full_json["director_report"]["financial_model"]
{
    "title": "Юнит-экономика",
    "subtitle": "CAC / LTV / ratio по каналам и тирам",
    "cac_blended": safe_get(fm, "", alias_groups={
        "cac": ["cac_usd", "cac", "customer_acquisition_cost", "cac_blended", "cac_avg"]
    }),
    "cac_by_channel": safe_get(fm, "", alias_groups={
        "by_channel": ["cac_by_channel", "cac_channel_breakdown", "acquisition_costs"]
    }, default={}),
    "ltv_by_tier": safe_get(fm, "", alias_groups={
        "by_tier": ["ltv_by_tier", "ltv_tier_breakdown", "lifetime_values"]
    }, default={}),
    "ltv_blended": safe_get(fm, "", alias_groups={
        "ltv": ["ltv_usd", "ltv", "lifetime_value", "ltv_blended", "ltv_avg"]
    }),
    "ratio": safe_get(fm, "", alias_groups={
        "ratio": ["ltv_cac_ratio", "ltv_to_cac", "cac_ltv_ratio"]
    }),
    "gross_margin": safe_get(fm, "", alias_groups={
        "margin": ["gross_margin_pct", "gross_margin", "margin_pct"]
    }),
    "payback_months": safe_get(fm, "", alias_groups={
        "payback": ["payback_months", "cac_payback", "payback_period"]
    }),
}

render()

  • Title + subtitle
  • Big LTV/CAC ratio display (top center, 48pt)
  • Left column: CAC per channel (table)
  • Right column: LTV per tier (table)
  • Bottom row: 3 stat tiles (gross_margin, payback_months, breakeven_month)
  • Color code ratio badge: green if ≥ 3.0, amber if 1.5-3.0, rose if < 1.5

Shared helper — safe_get with alias_groups

Required utility in src/synth_brain/reporting/modules/base.py or a new utils.py:

def safe_get(obj, path, alias_groups=None, default=None):
    """
    Safe nested field access with alias fallback.
    
    Args:
        obj: dict or nested dict
        path: dot-separated path, e.g., "researcher_result.market_size"
              Empty string means obj itself.
        alias_groups: dict of field_name -> list of alias names
                      e.g., {"sam": ["sam_usd", "sam", "serviceable_addressable_market"]}
                      Returns value from first alias that matches.
        default: value if no match found
    
    Returns:
        Found value or default
    """
    if not obj:
        return default
    
    # Navigate path
    current = obj
    if path:
        for part in path.split("."):
            if isinstance(current, dict) and part in current:
                current = current[part]
            else:
                return default
    
    # If no alias_groups, return current
    if not alias_groups:
        return current if current else default
    
    # Try each alias group
    if not isinstance(current, dict):
        return default
    
    # For single-field alias groups, return the first match
    if len(alias_groups) == 1:
        key, aliases = next(iter(alias_groups.items()))
        for alias in aliases:
            if alias in current and current[alias]:
                return current[alias]
        return default
    
    # For multi-field alias groups, return dict of {canonical_name: value}
    result = {}
    for canonical, aliases in alias_groups.items():
        for alias in aliases:
            if alias in current and current[alias]:
                result[canonical] = current[alias]
                break
    return result or default

Testing per module

Each module needs 3 tests (per Report-Generator-Spec):

# tests/test_reporting_modules.py
 
import pytest
from pathlib import Path
from synth_brain.reporting.modules.m01_cover import M01_Cover
from synth_brain.reporting.generator import _load_module_context
 
R27_DIR = Path("reports/streamlit_runs/20260420_154017_техническое_задание_проект_ai_эзотерика_провести_анализ_ниши")
 
def test_m01_cover_available_on_r27():
    ctx = _load_module_context(R27_DIR)
    module = M01_Cover()
    assert module.is_available(ctx) is True
 
def test_m01_cover_unavailable_on_empty_context():
    from synth_brain.reporting.modules.base import ModuleContext
    ctx = ModuleContext(full_json={}, m2_verification=None, m2_scoring=None,
                        chamber_transcripts=None, outcome_history=None, meta={}, judgement={})
    module = M01_Cover()
    assert module.is_available(ctx) is False
 
def test_m01_cover_renders_without_error():
    from pptx import Presentation
    ctx = _load_module_context(R27_DIR)
    module = M01_Cover()
    data = module.extract(ctx)
    assert "topic" in data
    
    prs = Presentation()
    slide = prs.slides.add_slide(prs.slide_layouts[6])
    module.render(slide, data)  # should not raise

18 tests total (6 modules × 3 tests each).


Implementation ordering

Implement modules in this order (simplest → most complex data):

  1. M01_Cover (simple fields, establishes pattern)
  2. M99_NextSteps (iterates array of steps, uses alias groups)
  3. M10_ExecutiveSummary (most scorecards, most complex rendering)
  4. M30_MarketSizing (researcher_result path, regional distribution)
  5. M50_AudienceSegments (3-column layout, nested segment data)
  6. M80_UnitEconomics (MUST use alias groups extensively — R27→R28 bug source)

After each module:

  • Run its 3 tests
  • Generate full deck on R27 to verify accumulation works
  • Visual spot-check (optional: convert PPTX to PNG via LibreOffice headless)

Known data variations R27 vs R28

R28 has completed_partial status with director_report.executive_summary missing. When testing Phase 2 modules, verify graceful behavior:

  • M10_ExecutiveSummary should return False from is_available() → slide not generated
  • M01_Cover should still work (meta and judgement exist on R28? — need verification)

Pattern: every module checks data presence explicitly, never assumes.


Links

  • Report-Generator-Spec — parent spec
  • R28-Timeout-Incident — context for partial-run handling
  • Commit f5f8644 in synth-brain — reference implementation of alias groups in section_renderers.py:165-172
  • Commit 52a76cf in synth-brain — Phase 1 scaffold (base class, generator, design system)