Phase 2 Report Modules — Field Mapping Reference

Purpose: Precise data paths for each of the 6 Phase 2 slide modules based on validated R27 full.json structure. Use this document alongside Report-Generator-Spec when implementing modules.

Context: Validated via REPL against R27 artifacts at reports/streamlit_runs/20260420_154017_техническое_задание_проект_ai_эзотерика_провести_анализ_ниши/ on 2026-04-21. Structure confirmed by direct filesystem inspection.

Root structure of run directory

reports/streamlit_runs/<run_id>/
├── meta.json              # Run metadata (query, timings, costs)
├── status.json            # Completion status
├── judgement.json         # Judge scores + verdict
├── full.json              # PRIMARY — all agent outputs combined
├── report.md              # Narrative report (not used by Report Generator)
├── activity.jsonl         # Event log (not used by Report Generator)
├── brief.json             # Agent-Intake output (if Agent-Intake deployed — future)
├── stage_*.json           # Individual stage outputs (redundant with full.json)
└── pipeline.log           # Runtime log (not used by Report Generator)

full.json top-level structure

{
  "director_report": { ... },   // Intel Director's synthesized output
  "scout_result": { ... },      // Scout's raw competitor research
  "researcher_result": { ... }  // Researcher's raw market data
}

Important: Some data appears in both director_report and in scout_result or researcher_result. Director output is pre-processed and cleaner for rendering. Prefer director_report paths except where noted.

director_report keys (validated R27)

Key	Used by module	Notes
`_meta`	—	Internal metadata, not rendered
`stage`	—	Processing stage indicator
`executive_summary`	M10_ExecutiveSummary	Main exec summary text
`scorecards`	M10_ExecutiveSummary	Weighted scores (Risk, Growth, Entry, Attractiveness)
`confidence_overall`	M20_ResearchQuality, M10 (badge)	0.00-1.00 float
`recommended_next_steps`	M99_NextSteps	List of action items
`budget_allocation_note`	M99_NextSteps	Budget distribution across next steps
`gaps_opportunities`	M20_ResearchQuality	P0/P1/P2 knowledge gaps
`what_we_dont_know`	M20_ResearchQuality, M10 (risks)	Research limitations
`key_findings`	M10_ExecutiveSummary (opportunities)	Top 3-5 findings
`competitors`	M40, M41	Pre-processed competitor summaries
`market_analysis`	M30 (cross-ref)	Director’s market summary
`audience_segments`	M50_AudienceSegments	Segments with demographics
`financial_model`	M80_UnitEconomics, M81_Scenarios	Unit economics + scenarios
`funnels`	M60_FunnelAnalysis	CJM + drop-off analysis
`product_matrix`	M70_PricingTiers	Tier structure with features
`content_strategy`	(not in MVP)	Future module M110
`sales_scripts`	(not in MVP)	Future module M111
`radar_chart_data`	M10_ExecutiveSummary (optional)	Scorecard radar visualization

scout_result keys (validated R27)

Key	Used by module	Notes
`_meta`	—
`_output_method`	—
`competitors`	(fallback)	Raw competitor list — use director_report.competitors first
`total_competitors_found`	M40 header stat	Integer count
`search_intent`	M20	What Scout was looking for
`search_queries_used`	M20 (detail)	Queries executed
`_source_quality`	M20_ResearchQuality	Tier breakdown (Tier 1/2/3)
`gaps`	M20_ResearchQuality	Gaps Scout identified
`confidence`	M20	Scout-specific confidence
`notes`	(optional)	Free text annotations

researcher_result keys (validated R27)

Key	Used by module	Notes
`_meta`	—
`_output_method`	—
`_validation_warnings`	—	Internal, not rendered
`market_size`	M30_MarketSizing, M31_RegionalDistribution	PRIMARY SOURCE for TAM/SAM/SOM
`dynamics`	M32_GrowthDrivers	CAGR + growth drivers
`segments`	(cross-ref)	Can complement director_report.audience_segments
`pricing_analysis`	M70_PricingTiers, M71_RegionalWTP	Regional WTP data
`sources`	M20_ResearchQuality	Source list with tier classification
`confidence`	M20	Researcher-specific confidence
`key_findings`	(cross-ref)	Complements director_report.key_findings
`what_we_dont_know`	M20_ResearchQuality	Researcher-specific gaps

judgement.json structure (validated R27)

{
  "run_id": "20260420_154017",
  "overall_score": 8.0,                    // float 0-10
  "verdict": "PASS",                        // "PASS" | "CONDITIONAL GO" | "FAIL"
  "stages": {
    "director": { "score": 8.0, ... },
    "scout": { "score": 7.0, ... },
    "researcher": { "score": 8.0, ... },
    "aggregate": { "score": 8.5, ... }
  },
  "blockers": [ ... ]                       // List of issues found
}

Used by:

M01_Cover — overall_score, verdict
M10_ExecutiveSummary — overall_score (big number badge)
M20_ResearchQuality — per-stage scores from stages dict

meta.json structure (validated R27)

{
  "run_id": "20260420_154017",
  "query": "Техническое задание: проект AI-эзотерика...",
  "started_at": "2026-04-20T15:40:19.676472+00:00",
  "finished_at": "2026-04-20T16:04:14.664187+00:00",
  "total_cost_usd": 2.681989,
  "total_duration_seconds": 1434,
  "total_tokens": 492735,
  "compression_meta": { ... }
}

Used by:

M01_Cover — topic extracted from query, run_id, finished_at
M20_ResearchQuality — cost and duration breakdown

Module-by-module field mapping

For each Phase 2 module, exact data paths with extraction notes.

M01_Cover

File: src/synth_brain/reporting/modules/m01_cover.py Section: cover Priority: 100

is_available()

Always available if meta.json and judgement.json exist with minimum fields.

def is_available(self, ctx):
    return bool(ctx.meta.get("query")) and bool(ctx.judgement.get("overall_score"))

extract()

{
    "topic": extract_topic_from_query(ctx.meta["query"]),
    "judge_score": ctx.judgement["overall_score"],
    "verdict": ctx.judgement.get("verdict", "N/A"),
    "run_date": ctx.meta.get("finished_at", "")[:10],  # YYYY-MM-DD
    "sam": safe_get(ctx.full_json, "researcher_result.market_size", alias_groups={
        "sam": ["sam_usd", "sam", "sam_value", "serviceable_addressable_market"]
    }),
    "cagr": safe_get(ctx.full_json, "researcher_result.dynamics", alias_groups={
        "cagr": ["cagr", "cagr_pct", "growth_rate_cagr", "annual_growth_rate"]
    }),
    "target_audience_brief": extract_audience_brief(ctx.meta["query"]),
    "ltv": safe_get(ctx.full_json, "director_report.financial_model", alias_groups={
        "ltv": ["ltv_usd", "ltv", "lifetime_value", "ltv_premium_tier"]
    }),
    "cac": safe_get(ctx.full_json, "director_report.financial_model", alias_groups={
        "cac": ["cac_usd", "cac", "customer_acquisition_cost", "cac_blended"]
    }),
}

Helper: extract_topic_from_query

User query is free text like “Техническое задание: проект AI-эзотерика…“. Extract 3-5 word topic phrase.

def extract_topic_from_query(query: str) -> str:
    # Common patterns: "проект X", "ниша Y", "AI для Z"
    for marker in ["проект ", "ниша ", "анализ ниши ", "AI-"]:
        if marker in query.lower():
            idx = query.lower().find(marker)
            candidate = query[idx+len(marker):].split(".")[0].split(",")[0]
            words = candidate.split()[:5]
            return " ".join(words).strip()
    # Fallback: first 40 chars
    return query[:40].strip() + "..."

render()

Title slide layout per Genspark reference (verified in PPTX inspection):

Center-top: big topic title (48pt)
Below: “Инвестиционный отчёт” subtitle (20pt muted)
6 KPI tiles in 3×2 grid below subtitle
Footer: run_date, run_id small (10pt)

KPI tiles: Judge Score + /10, SAM, CAGR, Target Audience, LTV, CAC.

M10_ExecutiveSummary

File: src/synth_brain/reporting/modules/m10_executive_summary.py Section: executive Priority: 100

is_available()

def is_available(self, ctx):
    return bool(ctx.full_json.get("director_report", {}).get("executive_summary"))

extract()

dr = ctx.full_json["director_report"]
{
    "title": "Executive Summary",
    "judge_score": ctx.judgement.get("overall_score", 0),
    "verdict": ctx.judgement.get("verdict", ""),
    "weighted_score": safe_get(dr, "scorecards", alias_groups={
        "weighted": ["weighted_score", "overall_weighted", "composite_score"]
    }, default=0),
    "confidence": dr.get("confidence_overall", 0),
    "key_findings": dr.get("key_findings", [])[:5],  # top 5
    "risks": dr.get("what_we_dont_know", [])[:3],
    "scorecards": safe_get(dr, "scorecards", alias_groups={
        "risk": ["risk_score", "risk", "risk_assessment"],
        "growth": ["growth_potential_score", "growth_potential", "growth_score"],
        "entry": ["entry_difficulty_score", "entry_difficulty", "entry_score"],
        "attractiveness": ["market_attractiveness_score", "market_attractiveness"],
    }),
}

render()

Two-column layout:

Left: “Opportunities” (green accent) — weighted score + key_findings as bullets
Right: “Risks” (amber accent) — confidence + risks as bullets
Bottom center: verdict badge (GO green / CONDITIONAL GO amber / FAIL rose)

M99_NextSteps

File: src/synth_brain/reporting/modules/m99_next_steps.py Section: verdict Priority: 100

is_available()

def is_available(self, ctx):
    return bool(ctx.full_json.get("director_report", {}).get("recommended_next_steps"))

extract()

dr = ctx.full_json["director_report"]
{
    "title": "Next Steps",
    "verdict": ctx.judgement.get("verdict", ""),
    "steps": dr.get("recommended_next_steps", [])[:5],  # top 5
    "budget_note": dr.get("budget_allocation_note", ""),
}

Step structure (per R27 inspection)

Each step in recommended_next_steps has variable structure. Use alias groups:

step_alias_groups = {
    "action": ["action", "step", "description", "title"],
    "timeline": ["timeline", "duration", "weeks", "months", "time_estimate"],
    "budget": ["budget", "budget_usd", "cost_usd", "estimated_cost"],
    "kpi": ["kpi", "success_metric", "target", "outcome"],
    "priority": ["priority", "p", "rank"],
}

render()

Title + verdict badge (center-top)
3-5 action cards in single column (stacked vertically)
Each card: number + action + timeline + budget + KPI
Card background color coded by priority (P0 blue / P1 muted)

M30_MarketSizing

File: src/synth_brain/reporting/modules/m30_market_sizing.py Section: market Priority: 100

is_available()

def is_available(self, ctx):
    return bool(ctx.full_json.get("researcher_result", {}).get("market_size"))

extract()

rr = ctx.full_json["researcher_result"]["market_size"]
{
    "title": "Анализ рынка",
    "subtitle": "TAM / SAM / SOM с источниками",
    "tam": safe_get(rr, "", alias_groups={
        "value": ["tam_usd", "tam", "tam_value", "total_addressable_market"]
    }),
    "sam": safe_get(rr, "", alias_groups={
        "value": ["sam_usd", "sam", "sam_value", "serviceable_addressable_market"]
    }),
    "som": safe_get(rr, "", alias_groups={
        "value": ["som_usd", "som", "som_value", "serviceable_obtainable_market"]
    }),
    "cagr": safe_get(ctx.full_json, "researcher_result.dynamics", alias_groups={
        "cagr": ["cagr", "cagr_pct", "annual_growth_rate"]
    }),
    "regional": safe_get(rr, "regional_distribution", default=[]),
    "sources": rr.get("sources", []) or ctx.full_json.get("researcher_result", {}).get("sources", []),
}

render()

Title + subtitle (top)
Funnel visualization: TAM box → SAM box → SOM box (top to bottom, narrowing)
CAGR badge (top-right corner)
Regional distribution mini-chart (bottom)
Sources count badge (bottom-right)

M50_AudienceSegments

File: src/synth_brain/reporting/modules/m50_audience_segments.py Section: audience Priority: 100

is_available()

def is_available(self, ctx):
    segments = ctx.full_json.get("director_report", {}).get("audience_segments", [])
    return len(segments) > 0

extract()

segments_raw = ctx.full_json["director_report"].get("audience_segments", [])
segments = []
for seg in segments_raw[:3]:  # Max 3 segments in one slide
    segments.append({
        "name": safe_get(seg, "", alias_groups={
            "name": ["name", "segment_name", "title", "label"]
        }),
        "sam_share": safe_get(seg, "", alias_groups={
            "share": ["sam_share_pct", "sam_share", "share_of_sam", "percentage"]
        }),
        "size_usd": safe_get(seg, "", alias_groups={
            "size": ["size_usd", "size", "market_size_usd", "segment_size"]
        }),
        "age_range": safe_get(seg, "demographics", alias_groups={
            "age": ["age", "age_range", "age_group"]
        }),
        "income": safe_get(seg, "demographics", alias_groups={
            "income": ["income", "income_usd", "income_range"]
        }),
        "arpu": safe_get(seg, "", alias_groups={
            "arpu": ["arpu_monthly", "arpu", "revenue_per_user"]
        }),
        "jtbd": safe_get(seg, "", alias_groups={
            "jtbd": ["jtbd", "job_to_be_done", "primary_need"]
        }),
        "pain_points": safe_get(seg, "", alias_groups={
            "pain": ["pain_points", "pains", "frustrations"]
        }, default=[]),
    })
 
{
    "title": "Сегменты аудитории",
    "subtitle": "Демография, JTBD, boli",
    "segments": segments,
}

render()

3-column card layout:

Each card 4” wide × 5.5” tall
Top: segment name + SAM share %
Middle: demographics (age, income)
Bottom: JTBD in italics, 2-3 pain points as bullets
ARPU badge in top-right corner of each card

M80_UnitEconomics

File: src/synth_brain/reporting/modules/m80_unit_economics.py Section: financial Priority: 100

is_available()

def is_available(self, ctx):
    fm = ctx.full_json.get("director_report", {}).get("financial_model", {})
    return bool(fm)

extract() — CRITICAL: use alias groups extensively

This is where the R27→R28 rendering bug happened. LLM produces 20+ different field name variants for the same 6 metrics. Study section_renderers.py:165-172 commit f5f8644 for the reference implementation.

fm = ctx.full_json["director_report"]["financial_model"]
{
    "title": "Юнит-экономика",
    "subtitle": "CAC / LTV / ratio по каналам и тирам",
    "cac_blended": safe_get(fm, "", alias_groups={
        "cac": ["cac_usd", "cac", "customer_acquisition_cost", "cac_blended", "cac_avg"]
    }),
    "cac_by_channel": safe_get(fm, "", alias_groups={
        "by_channel": ["cac_by_channel", "cac_channel_breakdown", "acquisition_costs"]
    }, default={}),
    "ltv_by_tier": safe_get(fm, "", alias_groups={
        "by_tier": ["ltv_by_tier", "ltv_tier_breakdown", "lifetime_values"]
    }, default={}),
    "ltv_blended": safe_get(fm, "", alias_groups={
        "ltv": ["ltv_usd", "ltv", "lifetime_value", "ltv_blended", "ltv_avg"]
    }),
    "ratio": safe_get(fm, "", alias_groups={
        "ratio": ["ltv_cac_ratio", "ltv_to_cac", "cac_ltv_ratio"]
    }),
    "gross_margin": safe_get(fm, "", alias_groups={
        "margin": ["gross_margin_pct", "gross_margin", "margin_pct"]
    }),
    "payback_months": safe_get(fm, "", alias_groups={
        "payback": ["payback_months", "cac_payback", "payback_period"]
    }),
}

render()

Title + subtitle
Big LTV/CAC ratio display (top center, 48pt)
Left column: CAC per channel (table)
Right column: LTV per tier (table)
Bottom row: 3 stat tiles (gross_margin, payback_months, breakeven_month)
Color code ratio badge: green if ≥ 3.0, amber if 1.5-3.0, rose if < 1.5

Shared helper — safe_get with alias_groups

Required utility in src/synth_brain/reporting/modules/base.py or a new utils.py:

def safe_get(obj, path, alias_groups=None, default=None):
    """
    Safe nested field access with alias fallback.
    
    Args:
        obj: dict or nested dict
        path: dot-separated path, e.g., "researcher_result.market_size"
              Empty string means obj itself.
        alias_groups: dict of field_name -> list of alias names
                      e.g., {"sam": ["sam_usd", "sam", "serviceable_addressable_market"]}
                      Returns value from first alias that matches.
        default: value if no match found
    
    Returns:
        Found value or default
    """
    if not obj:
        return default
    
    # Navigate path
    current = obj
    if path:
        for part in path.split("."):
            if isinstance(current, dict) and part in current:
                current = current[part]
            else:
                return default
    
    # If no alias_groups, return current
    if not alias_groups:
        return current if current else default
    
    # Try each alias group
    if not isinstance(current, dict):
        return default
    
    # For single-field alias groups, return the first match
    if len(alias_groups) == 1:
        key, aliases = next(iter(alias_groups.items()))
        for alias in aliases:
            if alias in current and current[alias]:
                return current[alias]
        return default
    
    # For multi-field alias groups, return dict of {canonical_name: value}
    result = {}
    for canonical, aliases in alias_groups.items():
        for alias in aliases:
            if alias in current and current[alias]:
                result[canonical] = current[alias]
                break
    return result or default

Testing per module

Each module needs 3 tests (per Report-Generator-Spec):

# tests/test_reporting_modules.py
 
import pytest
from pathlib import Path
from synth_brain.reporting.modules.m01_cover import M01_Cover
from synth_brain.reporting.generator import _load_module_context
 
R27_DIR = Path("reports/streamlit_runs/20260420_154017_техническое_задание_проект_ai_эзотерика_провести_анализ_ниши")
 
def test_m01_cover_available_on_r27():
    ctx = _load_module_context(R27_DIR)
    module = M01_Cover()
    assert module.is_available(ctx) is True
 
def test_m01_cover_unavailable_on_empty_context():
    from synth_brain.reporting.modules.base import ModuleContext
    ctx = ModuleContext(full_json={}, m2_verification=None, m2_scoring=None,
                        chamber_transcripts=None, outcome_history=None, meta={}, judgement={})
    module = M01_Cover()
    assert module.is_available(ctx) is False
 
def test_m01_cover_renders_without_error():
    from pptx import Presentation
    ctx = _load_module_context(R27_DIR)
    module = M01_Cover()
    data = module.extract(ctx)
    assert "topic" in data
    
    prs = Presentation()
    slide = prs.slides.add_slide(prs.slide_layouts[6])
    module.render(slide, data)  # should not raise

18 tests total (6 modules × 3 tests each).

Implementation ordering

Implement modules in this order (simplest → most complex data):

M01_Cover (simple fields, establishes pattern)
M99_NextSteps (iterates array of steps, uses alias groups)
M10_ExecutiveSummary (most scorecards, most complex rendering)
M30_MarketSizing (researcher_result path, regional distribution)
M50_AudienceSegments (3-column layout, nested segment data)
M80_UnitEconomics (MUST use alias groups extensively — R27→R28 bug source)

After each module:

Run its 3 tests
Generate full deck on R27 to verify accumulation works
Visual spot-check (optional: convert PPTX to PNG via LibreOffice headless)

Known data variations R27 vs R28

R28 has completed_partial status with director_report.executive_summary missing. When testing Phase 2 modules, verify graceful behavior:

M10_ExecutiveSummary should return False from is_available() → slide not generated
M01_Cover should still work (meta and judgement exist on R28? — need verification)

Pattern: every module checks data presence explicitly, never assumes.

Synth Nova Manifest

Explorer

Phase 2 Report Modules — Field Mapping Reference

Phase 2 Report Modules — Field Mapping Reference

Root structure of run directory

full.json top-level structure

director_report keys (validated R27)

scout_result keys (validated R27)

researcher_result keys (validated R27)

judgement.json structure (validated R27)

meta.json structure (validated R27)

Module-by-module field mapping

M01_Cover

is_available()

extract()

Helper: extract_topic_from_query

render()

M10_ExecutiveSummary

is_available()

extract()

render()

M99_NextSteps

is_available()

extract()

Step structure (per R27 inspection)

render()

M30_MarketSizing

is_available()

extract()

render()

M50_AudienceSegments

is_available()

extract()

render()

M80_UnitEconomics

is_available()

extract() — CRITICAL: use alias groups extensively

render()

Shared helper — safe_get with alias_groups

Testing per module

Implementation ordering

Known data variations R27 vs R28

Links

Graph View

Table of Contents