REI · Apollo — the journey, the comparison, and the state
web/ tree → 8020rei-new-model.web.app) was decommissioned: web/ was removed from the repo and the 8020rei-new-model Firebase hosting site was permanently deleted (the URL now returns HTTP 404). The live model surface is now the 8020IQ Models Wiki at models-8020iq.web.app, served from platform/ as plain static HTML. This is a historical record — the Apollo model itself is unaffected; only the web hub is gone. All references below to 8020rei-new-model.web.app and the web/app/** source paths are preserved as a record.TL;DR. Apollo is a per-county supervised classifier (HistGradientBoosting + isotonic calibration, 117 features over 4.05M parcels across 5 counties) that replaces Alpha, 8020REI's 25-signal hand-weighted heuristic, at step 4 of the Gaia ETL. As of 2026-05-08 it beats Alpha by a 3.03× geomean Lift@top-1% across all five counties and 5.72× across the three where the lift is statistically distinguishable from noise (Jackson 7.87×, Harris 6.92×, Maricopa 3.43×); Miami (1.22× ± 0.27) and Philadelphia (1.12× ± 0.21) sit inside the 95% CI of 1.0×. Nine of ten audit ship-blockers are closed; Scenario A (recency-feature leakage on embargoed Fold 5) is FLAG-band pending V2 ablation, and the locked March 2025 head-to-head test is gated on written sign-off from Eduardo and Camilo. State as of distillation: 2026-05-25 (REI bucket created; CallZeke moved to Roofing).
1 · The macro project
Problem being solved
8020REI is a deal-sourcing engine for small investors operating in 14 states with active county-level campaigns in five. The business runs on ranked lists of properties delivered to acquisition teams who work outreach off-market. Speed and precision matter equally: lists too broad waste acquisition bandwidth; lists that miss real opportunities cost deal flow.
Source: web/app/context/page.tsx:50-66.
The 8-week rock
Competitive build with Camilo, coached by Eduardo, weekly Thursday check-ins. Apollo is the supervised replacement for Alpha at step 4 of the Gaia 7-step ETL — the first training loop inside Gaia.
Source: web/app/decks/archive/22-ceo-summary-2026-04-27/page.tsx:70-73 (PullQuote).
Win condition (locked, three bars)
All three must clear:
| Bar | Threshold | Status |
|---|---|---|
| Top-decile recall | ≥ Alpha AND ≥ Camilo on locked March 2025 cohort | Not yet scored (gated on sign-off) |
| Calibration | Within ±15% on 30/60/90-day deal-rate buckets | Achieved on 4 of 5 counties; Jackson at honest floor |
| Transferability | Per-county model trained on county-X data explains county-X outcomes | Per-county architecture validated; pooled costs 15–27% AUC-PR |
Source: web/app/decks/archive/19-current-state-2026-04-22/page.tsx:67-86; web/app/context/page.tsx:227-246.
Players
| Role | Name | Lane |
|---|---|---|
| Builder | Ignacio Araya | Apollo (DS, model, features, pipeline) |
| Competitor | Camilo | Parallel model, baseline artifact pending |
| Coach | Eduardo | Sign-off authority, P/R/F1 evaluation against client deals |
| Cadence | — | Weekly Thursday check-ins |
Source: web/app/decks/archive/19-current-state-2026-04-22/page.tsx:67-86; web/app/decks/archive/22-ceo-summary-2026-04-27/page.tsx:194-209.
Sandbox & coverage
| Property | Value | Source |
|---|---|---|
| Active states | 14 | context/page.tsx:67-75 |
| Active counties (pilot) | 5 | context/page.tsx:67-75 |
| Sandbox time span | 2021-01 → 2025-09 (57 months) | data/page.tsx:235-239 |
| Sandbox storage | 680 GB | context/page.tsx:67-75 |
| Total parcels scored at T0=2025-09 | 4,052,593 (sometimes given as 4.05M / 5.17M including non-residential strata) | data/page.tsx:28-34, 244-251; brief/page.tsx:62-67 |
Per-county parcels at T0=2025-09 (data/page.tsx:28-34):
| FIPS | County | State | Parcels |
|---|---|---|---|
| 04013 | Maricopa | AZ | 1,384,985 |
| 48201 | Harris | TX | 1,226,790 |
| 12086 | Miami-Dade | FL | 782,077 |
| 42101 | Philadelphia | PA | 428,931 |
| 29095 | Jackson | MO | 229,810 |
| Total | 4,052,593 |
2 · Alpha — the incumbent
What Alpha is
A weighted sum of 25 distress indicators. Weights set by hand, tuned on Miami, unchanged since launch. PreforeclosureDistress carries weight 6.0; 16 other signals trail between 0.25 and 1.0.
Source: web/app/decks/01-why-apollo/page.tsx:57-71; web/app/context/page.tsx:138-156.
How it scores
- No training loop
- No outcome feedback
- No re-weighting as markets shift
- No mechanism to explain which signal fired on a given property — score 72 is an opaque sum, not a ranked list of reasons
Source: web/app/decks/01-why-apollo/page.tsx:57-89; web/app/context/page.tsx:138-156.
Where Alpha falls short (deck claims)
| Failure mode | Mechanism | Evidence |
|---|---|---|
| Frozen calibration | Static weights, last tuned 2021 | context/page.tsx:138-146 |
| Miami-tuned only | Weights don't transfer to TX/MO/AZ | context/page.tsx:148-156 |
| Cannot explain | Sum gives no per-feature attribution | context/page.tsx:148-156 |
| Wrong feature ordering | The 25 it weights are not the 25 that matter most empirically | context/page.tsx:148-156 |
| Distress signals don't clear bar | Distress forensics: only 3 of Alpha's signals clear 5× lift (Preforeclosure 5.44×, Probate 3.46×, Affidavit 2.32×) | decks/01-why-apollo/page.tsx:115-124 |
Why Alpha is still the baseline
- It is the production scorer (step 4 of Gaia)
- Apollo's win condition is defined against it ("recall ≥ Alpha")
- The head-to-head is the gate to Phase 4
Source: web/app/decks/archive/01-macro-project/page.tsx:54-65.
3 · Apollo — the contender
What Apollo is
A supervised gradient-boosted classifier replacing Alpha (step 4 of Gaia) with: per-county HistGradientBoosting, isotonic calibration on a held-out non-downsampled slice, walk-forward folds, and a CRM-leak guard. Output contract identical to Alpha: 0–100 score per property within county.
Source: web/app/decks/archive/22-ceo-summary-2026-04-27/page.tsx:54-69.
Architecture overview
INPUT : T0 month-end silver snapshot · 481 columns
TRIAGE : 481 → 117 curated features (sparse / constant / leaky dropped)
TRAIN : per-county HistGradientBoosting · seed=42 · early_stopping=False
training T0 ≤ 2025-03 · CRM-leak guard drops is_crm_matched_anywindow=1
CALIB : Isotonic regression on held-out non-downsampled slice (~60K rows/county)
RANK : Within-county percentile of calibrated_probability_isotonic → score_0_100
AUDIT : 69 deterministic sanity checks (monotonicity · prevalence · ECE · CRM · numeric)
Sources: web/app/decks/02-how-apollo-trains/page.tsx:56-100, 230-285; web/app/decks/archive/22-ceo-summary-2026-04-27/page.tsx:115-135.
Why HistGB beat the field
Architecture chosen via the 5×4 ablation matrix (5 counties × {HistGB, LightGBM, logistic, random forest}) on Fold 1. HistGB never loses by a meaningful margin and wins three counties outright.
| County | HistGB AUC-PR | LightGBM AUC-PR | Winner |
|---|---|---|---|
| Maricopa | 0.274 | 0.271 | HistGB |
| Harris | 0.192 | 0.186 | HistGB |
| Jackson | 0.166 | 0.151 | HistGB (+10.3%) |
| Miami | tie (Δ<0.002) | tie (Δ<0.002) | tie |
| Philadelphia | tie (Δ<0.002) | tie (Δ<0.002) | tie |
Logistic regression collapses 50–72% vs LightGBM. Philly: 0.194→0.055. Harris: 0.186→0.089. The problem is non-linear; tree splits on tenure curves, leverage×valuation interactions, and distress trajectory families earn their keep. Random forest trails both GBMs everywhere.
Source: web/app/decks/04-where-it-wins/page.tsx:206-249.
Pooled rejected — per-county wins
Cross-county transfer (Harris→Maricopa): AUC-PR 0.201 vs native 0.274, a 27% drop. Five separate HistGB models, each with its own isotonic calibration, is the production configuration. Finding 11 measured 15–27% AUC-PR cost on cross-county transfer.
Source: web/app/decks/04-where-it-wins/page.tsx:235-244; web/app/brief/page.tsx:109-116.
Feature tiers (per CLAUDE.md §Data conventions, mirrored in data/page.tsx:47-82)
| Tier | Name | Examples | Note |
|---|---|---|---|
| A | Property physical | parcel size, building area, living area, year built, use type | Most stable; in Miami, property_age alone = 92.7% of importance |
| B | Owner + distress | 23 distress trajectories, absentee level, leverage ratio, days-ownership | Information-dense; 3 signals under leakage audit |
| C | Valuation + activity | AVM, assessed value, market value, appreciation rate, valuation gap | Valuation-gap feature broken at data layer (V2 repair queued) |
| D | Date-derived | mortgage_age_months, listing_duration_months, months_since_prev_sale | Under active leakage audit — AUC-PR contribution not validated until ablation completes |
| E | National macro · FRED | mortgage rate 30yr, Fed funds, HPI, CPI, unemployment | Zero within-T0 variance; V2.1 interaction features unlock cohort signal |
| F | Local market context | BLS county unemployment, ACS county median income, FHFA state HPI | Currently being wired in |
Counts of source columns
- Silver carries 481 columns per row (First American provider + 8020REI distress trajectories + ETL metadata)
- Two-reviewer triage: 117 included, 359 excluded (sparse >70% null, constant, leaky, redundant)
- 77% of included columns have meanings sourced directly from the First American data dictionary (8 dictionaries, 984 provider-authoritative defs)
- 25 hand-engineered synthetic features; eight of the top 15 importance slots are occupied by synthetics
Sources: web/app/decks/02-how-apollo-trains/page.tsx:112-129, 200-217; web/app/data/page.tsx:316-322.
Training method
- Six expanding walk-forward folds. Fold 1 trains 15 months. Fold 6 trains 45 months. Each subsequent fold absorbs the previous eval window.
- Horizon = 6 months. Train on history up to T0; predict on properties observed at T0; score on outcomes at T0+6.
- Embargo = 1 month. Eval window shifted past prediction horizon; properties sold inside the embargo dropped from both train and eval. Closes the gap where a property listed at T0 and sold at T0+1 could carry signal into training while its outcome is visible.
- T0 anchor. The feature builder reads only as-of-T0 columns via
base_globs = _globs([t0], fips). No future data crosses the boundary. - Training T0 cap = 2025-03. The six-month horizon ends 2025-09, one month before first inference window T0=2025-10 — zero overlap.
Sources: web/app/decks/02-how-apollo-trains/page.tsx:56-100; web/app/data/page.tsx:444-499.
CRM-leak guard
Properties that 8020REI had already worked through its CRM are dropped via is_crm_matched_anywindow = 1. Not down-weighted, not isolated — dropped.
- 4,431 CRM deals → 2,463 silver-matched after address join
- All 2,463 carry the flag and never enter training
- Verified as one of 69 deterministic sanity checks every run
Sources: web/app/data/page.tsx:444-499; web/app/decks/02-how-apollo-trains/page.tsx:241-249.
Other safety nets
- 1,500× feature cache makes iterative training practical (30s vs 0.02s per period)
- 69/69 deterministic sanity checks pass on every run before any ZIP ships (monotonicity, prevalence stability, ECE, CRM, numeric integrity)
- Test suite: 0.41s, 5 categories (ZIP validator, cohort map, score formula, prefix collision, filter behavior)
- 186.7M-row overnight audit retired 28 dead columns:
stories(sentinel code 100),is_listed(binarizer bug, always 0 despite 1.66M "Y" rows),vacant_flag(99.2% null), 7 distress trajectories withmax=0.0across all 186M rows
Sources: web/app/decks/02-how-apollo-trains/page.tsx:252-275; web/app/brief/page.tsx:232-249.
4 · Apollo vs Alpha — head-to-head numbers
Locked evaluation window
Fold 5 embargoed: train 2021-01..2024-03, eval 2024-10..2025-03, residential-wide (SFH + Condo + Townhouse + 2-9 units). Same window for Alpha and Apollo — apples-to-apples.
Source: web/app/decks/archive/20-executive-submission/page.tsx:93-99.
Headline metrics — per county (Fold 5 embargoed)
| County | FIPS | Apollo Lift@1% | Alpha Lift@1% | Lift ratio | 95% CI half-width | Stat-sig vs 1.0× | AUC-ROC | Deck source |
|---|---|---|---|---|---|---|---|---|
| Jackson | 29095 | 15.36× | 1.95× | 7.87× | ±0.23 | YES | 0.76 | brief/page.tsx:138-143 |
| Harris | 48201 | 13.54× | 1.96× | 6.92× | ±0.11 | YES | 0.82 | brief/page.tsx:138-143 |
| Maricopa | 04013 | 16.76× | 4.88× | 3.43× | ±0.21 | YES | 0.83 | brief/page.tsx:138-143 |
| Miami | 12086 | 10.09× | 8.30× | 1.22× | ±0.27 | NO | 0.69 | brief/page.tsx:138-143 |
| Philadelphia | 42101 | 2.42× | 2.16× | 1.12× | ±0.21 | NO | 0.66 | brief/page.tsx:138-143 |
Citations: web/app/decks/04-where-it-wins/page.tsx:67-94, 152-160; web/app/decks/archive/22-ceo-summary-2026-04-27/page.tsx:83-105.
Dual-geomean framing (non-negotiable comms rule)
| Geomean | Value | Use case |
|---|---|---|
| All five counties | 3.03× | The honest all-markets headline; ships with the deliverable |
| Signal-three (Jackson, Harris, Maricopa) | 5.72× | Where Apollo clearly separates from Alpha |
"Both numbers travel together, or neither does." —findings/41_alpha_head_to_head.md, quoted inbrief/page.tsx:69-72,decks/05-the-submission/page.tsx:132-136.
Computed: (3.43 × 6.92 × 7.87)^(1/3) = 5.72× (Deck 24 mathematical re-audit).
Calibration (Fold 5 embargoed, T0=2025-09 inference)
| County | Raw BSS | Isotonic BSS | ECE top-10% reduction | Verdict |
|---|---|---|---|---|
| Miami | −0.0008 | +0.0011 | 87% | First positive BSS in project |
| Maricopa | not in source | + | not in source | Positive BSS post-isotonic |
| Harris | not in source | + | 95% | Positive BSS post-isotonic |
| Philadelphia | not in source | + | in 69–95% band | Positive BSS post-isotonic |
| Jackson | −0.0041 | −0.0003 | in 69–95% band | Honest floor (not a pass) |
Sources: web/app/decks/05-the-submission/page.tsx:160-175; web/app/decks/archive/20-executive-submission/page.tsx:143-157.
Top-decile ECE improvement: 69–95% across all five counties (web/app/context/page.tsx:251-258).
Fold 1 Miami baseline (the deck that opened the project)
| Metric | Apollo | Alpha | Notes |
|---|---|---|---|
| AUC-PR | 0.259 | 0.030 | 8.7× ratio |
| Precision@top-1% | 48.6% | 6.4% | 7.6× ratio |
| Recall@top-10% | 53.8% | 16.7% | 3.2× ratio |
| Brier score | 0.023 | — | Inside 0.025 calibration target |
Source: web/app/decks/archive/05-fold1-vs-alpha/page.tsx:58-83.
Caveat on Fold 1 Miami: measured pre-embargo; the legacy "33×" claim that appeared in early decks came from a window with 5-month label overlap, since closed. The Fold 5 embargoed Miami ratio is 1.22× — much narrower. See web/app/decks/archive/20-executive-submission/page.tsx:101-105.
Top-5 SHAP gain features on Fold 1 Miami
| Rank | Feature | SHAP gain | Origin |
|---|---|---|---|
| 1 | days_ownership | 3,146 | engineered |
| 2 | lot_size_sqft | 2,581 | raw → synthetic coalesce |
| 3 | property_age_years | 2,546 | engineered (synthetic from YearBuilt) |
| 4 | assd_total_value | 2,100 | raw provider |
| 5 | market_total_value | 1,900 | raw provider |
13 of top 25 by SHAP gain are engineered, not raw.
Source: web/app/decks/archive/05-fold1-vs-alpha/page.tsx:148-159.
Property age dominance — Miami vs others (finding 54 stratified ablation)
| Age band | Eval rows | Deal rate | Within-band AUC | Δ vs full 0.6942 | Lift@1% |
|---|---|---|---|---|---|
| < 20 yr (post-2005) | 756,783 | 0.0003 | 0.6068 | −0.0874 | 2.87× |
| 20–50 yr (1976–2005) | 1,855,499 | 0.0004 | 0.6359 | −0.0583 | 10.80× |
| ≥ 50 yr (pre-1976) | 2,023,471 | 0.0014 | 0.6767 | −0.0175 | 7.29× |
Verdict: BUY-BOX. Within-band AUC collapses 0.0175–0.0874 when age is removed. 4.7× deal-rate spread (0.0003 → 0.0014) is structural population separation, not within-band motivation.
Source: web/app/decks/archive/24-scientific-audit-2026-05-07/page.tsx:213-243.
property_age_years alone explains 92.71% of feature importance in Miami; Gini coefficient 0.94; 69× dominance gap over the second feature. In Maricopa / Philly / Harris / Jackson the top-feature ratio is only 1.02×–1.11× — Miami is structurally a different model.
Source: web/app/decks/archive/24-scientific-audit-2026-05-07/page.tsx:203-207, 272-277.
5 · The data backbone
The sandbox
- 14 states, 680 GB of monthly snapshots
- 57 month-end snapshots covering 2021-01 → 2025-09
- 4.05M total scored parcels at T0=2025-09
- 5 active counties (pilot)
Source: web/app/data/page.tsx:235-251.
T0 conventions
- T0 = month-end timestamp, stored
YYYY-MMstring - Features computed as-of T0 month-end (
_month_endinsrc/new_model/features.py) - Horizon: 6 months;
y_sold = 1iff any sale recorded in T0+1..T0+6 - FIPS always 5-digit zero-padded string.
f"{fips:05d}"in Python; string-type in CSV/JSON
Source: web/app/data/page.tsx:293-298.
BuildZoom permit refresh
| Snapshot date | Cohort permits | S3 bytes | Verdict |
|---|---|---|---|
| 2026-04-28 | 64,513 | 32 MB | "Structural data ceiling" (finding 52) |
| 2026-05-07 | 15,645,153 (242× growth) | 15 GB / 2,851 part-files | Finding 52 obsolete |
Per-county coverage at the 2026-05-07 refresh (data/page.tsx:394-425):
| FIPS | County | Silver props | Lifetime permits | Props w/ permit | Coverage | Recent 24m |
|---|---|---|---|---|---|---|
| 29095 | Jackson MO | 304,044 | 1,392,278 | 270,759 | 89.1% | 137,470 |
| 12086 | Miami-Dade FL | 924,426 | 4,723,912 | 597,111 | 64.6% | 319,663 |
| 48201 | Harris TX | 1,592,524 | 6,004,159 | 861,039 | 54.1% | 423,912 |
| 04013 | Maricopa AZ | 1,701,793 | 2,550,776 | 764,265 | 44.9% | 312,165 |
| 42101 | Philadelphia PA | 588,987 | 974,028 | 232,378 | 39.5% | 112,816 |
| 5-county cohort | 5,111,774 | 15,645,153 | 2,725,552 | 53.3% | 1,306,026 |
S3 prefix: s3://8020rei-sandbox/ignacio_sandbox_roofing/. Jackson's coverage lead is not permit density — it's the smallest silver universe, so a moderate permit count saturates it.
Sources: web/app/data/page.tsx:362-434; web/app/decks/06-the-audits/page.tsx:248-265.
FIPS-86052 bug (fixed)
ZIP 86052 (Page, AZ — 270 miles from Maricopa core) was classified under FIPS 04013. 1,754 Maricopa + 1 Miami + 7 Harris = 1,762 mis-FIPS'd rows. Consumer-side filter shipped in src/new_model/feature_cache.py plus new module src/new_model/ref/zip_fips_validation.py. Post-filter Maricopa frame at T0=2025-03 has zero rows with ZIP 86052.
Source: web/app/decks/06-the-audits/page.tsx:255-265.
Deliverable schema (Ranked CSV + Sidecar)
Deliverable ZIP: scored_properties_2026-05-07.zip — 63 MB compressed, 577 MB uncompressed, 4,052,593 rows across 5 county-scoped CSVs plus cross-county calibration sidecar (14 columns × 5 rows: AUC-ROC per county, Lift@1%, lift ratio with 95% CI, stat_significant_lift flag, empirical deal rates at top 1%/5%/10%).
score_0_100 is within-county percentile of raw_probability (or calibrated_probability_isotonic); cross-county percentile comparison is NOT meaningful — use sidecar for cross-county base rates.
Sources: web/app/data/page.tsx:509-557; web/app/brief/page.tsx:62-67, 322-344.
Cross-county comparability gap (sidecar fix)
A "score 99" Jackson property has 6.29% expected deal rate; a "score 99" Philly property has 0.62%. 10.10× gap. This is why the sidecar exists.
Source: web/app/decks/archive/24-scientific-audit-2026-05-07/page.tsx:159-171.
Deal labels (Oracle v1.2)
| Property | Value |
|---|---|
| Transactions screened | 419,669 |
| Labeled deals | 36,648 (8.73%) |
| Aggregate deal rate | 8.78% |
| Maricopa deal rate | 1.94% |
| Jackson deal rate | 14.84% |
| CRM deals in scope | 4,431 |
| Silver-matched CRM deals | 2,463 |
Five-criterion AND-rule: DOCTYPE_CLEAN ∧ FLAG_CLEAN ∧ SELLER_CLEAN ∧ BUYER_INVESTOR ∧ PRICE_GATE. The C5c NDS fallback (TX/MO non-disclosure states) accounts for 62% of deals with zero price verification — the largest acknowledged structural gap, flagged in every downstream deck.
Sources: web/app/context/page.tsx:111-119; web/app/decks/archive/19-current-state-2026-04-22/page.tsx:132-140.
6 · How Apollo trains
End-to-end pipeline: data → features → folds → train → calibrate → audit.
Stage 1 — Silver materialisation
S3 silver parquet, monthly snapshots 2021-01..2025-09, 481 columns per row. FIPS-86052 consumer-side filter at read time (post-2026-05-07).
Stage 2 — Feature builder (T0 month-end)
src/new_model/features.py. Reads base_globs = _globs([t0], fips) — only as-of-T0 columns. 25 hand-engineered synthetics layered on top:
- Coalesce with provenance: 3 leverage cols (CLBTV, CLTV, LTV) → canonical
leverage_ratio+ companion audit tag - Semantic derivation: 3-tier absentee level vs binary flag
- Temporal construction:
YearBuilt → property_age_yearsrecomputed per snapshot, clipped to [0, 200]
Sources: web/app/decks/02-how-apollo-trains/page.tsx:200-223.
Stage 3 — Cache + ZIP/FIPS filter
1,500× speedup (30s → 0.02s per period). Never rsync --delete over data/cache/. Mini compute syncs via ./scripts/mini.sh sync-cache (one-way merge).
Stage 4 — Oracle label join (v1.2)
Y_deal label joined on (normalized_address, zip5, fips5). 99.6% address match rate on Maricopa validation sample. CRM-leak guard drops is_crm_matched_anywindow=1 rows entirely.
Stage 5 — Walk-forward training (6 folds)
| Fold | Train range | Eval range | Notes |
|---|---|---|---|
| 1 | 2021-01..2022-03 (15 mo) | 2022-04..2022-09 + embargo | Macro regime: rate-hiking onset |
| 2 | + Fold 1 eval | 2022-10..2023-03 + embargo | |
| 3 | + Fold 2 eval | 2023-04..2023-09 + embargo | |
| 4 | + Fold 3 eval | 2023-10..2024-03 + embargo | |
| 5 | 2021-01..2024-03 (39 mo) | 2024-10..2025-03 + embargo | The v8 fix shifted Fold 5 eval from 2024-04..09 (v7 had 5-mo label-window overlap inflating AUC to 0.843; honest AUC 0.694). Embargo permanently sealed by default. |
| 6 | 2021-01..2024-09 (45 mo) | 2025-04..2025-09 + embargo | Most recent pre-test |
Sources: web/app/decks/02-how-apollo-trains/page.tsx:62-100; web/app/decks/archive/19-current-state-2026-04-22/page.tsx:152-179.
Stage 6 — Per-county HistGB
5 separate models. Deterministic: early_stopping=False, seed=42. scripts/train_model.py writes serialized artifacts to models/<FIPS>/v8/; manifest.feature_cache_version asserted at score-time. Old generate_final_ranked_list.py deleted (2026-05-07 audit fix).
Source: web/app/decks/archive/24-scientific-audit-2026-05-07/page.tsx:289-304.
Stage 7 — Isotonic calibration
Training universe split: 49 fit months (downsampled 10:1) + 2 held-out calibration months (non-downsampled, ~60K rows/county). Fit HistGB; predict on held-out slice; fit IsotonicRegression(p → y); apply at T0=2025-09 inference. Output carries 4–11 distinct probability tiers per county — use for threshold bands, not as continuous discriminator.
Sources: web/app/decks/archive/20-executive-submission/page.tsx:135-158; web/app/decks/archive/22-ceo-summary-2026-04-27/page.tsx:115-135.
Stage 8 — Score + rank → CSV + ZIP
Within-county percentile rank of calibrated_probability_isotonic → score_0_100. Monotonicity invariant: score_0_100 strictly monotone with raw_probability within county (asserted on output).
69/69 sanity checks
| Category | Examples |
|---|---|
| Monotonicity | Older properties trend toward higher sale rates up to a structural ceiling; reversals flag leakage candidates |
| Per-county prevalence | Eval-window sale rate within ±10% of training prevalence across folds |
| Calibration error | Within ±15% on 30/60/90-day deal-rate buckets after isotonic |
| CRM-leak | Zero rows where is_crm_matched_anywindow=1 reach training |
| Numerical | No NaN in score_0_100; no within-county duplicates; FIPS always 5-digit zero-padded |
Source: web/app/brief/page.tsx:232-249.
7 · The buy-box
Apollo identifies who fits the buy-box. It says nothing about motivation. Pairing Apollo (buy-box) with V2 motivation signals (probate fix, foreclosure oracle, valuation gap) closes the loop.
Source: web/app/decks/03-the-buy-box/page.tsx:31-37.
Three families define the box
Physical (web/app/decks/03-the-buy-box/page.tsx:54-72):
property_age_years— 92.7% of importance in Miami; structural age, deferred maintenance, equity gapsyear_built— raw construction year (used directly in non-Miami counties where weight distributes more evenly)building_area_sqft / living_area_sqft / lot_size_sqft— size thresholds define sub-market (small-footprint rowhouses Philly; condo towers Miami; sprawling lots Maricopa)
Location (web/app/decks/03-the-buy-box/page.tsx:75-92):
situs_zip5— top-5 ZIPs capture 39% of Philly deals, 41% of Jackson, 13% of Maricopa. Geographic micro-concentration is the signal- County prevalence — Maricopa 1.94% vs Jackson 14.84% (4× spread). Pooled model washes out market-specific signal
- BuildZoom permit density — renovation activity in ZIP predicts demand + pricing
Ownership (web/app/decks/03-the-buy-box/page.tsx:95-110):
days_ownership(rank 1 globally) — owners 7–12 years in are statistically most likely sell band; recent buyers near zeroowner_occupancy— absentee owners exit at higher rates with less friction; top-8 in Harris and Jacksonmortgage_age_months— refi-or-sell decisions when rates shift. Under active leakage audit (finding 09)
Top-15 feature breakdown by category
| Category | Count | Source |
|---|---|---|
| Buy-Box (physical / location) | 6 | decks/archive/22-ceo-summary-2026-04-27/page.tsx:169-172 |
| Deal-Motivation (distress / activity) | 5 | same |
| Hybrid | 3 | same |
| Ambiguous | 1 | same |
Camilo's critique "Buy Box matters more than Likely Deal Score" is quantitatively validated by this breakdown.
Investor identity bound
Target buyer: small operator (portfolio < 10 properties, holding periods < 2 yr, acquiring at ratio below market-value estimate). Large institutional buyers (iBuyers, SFR REITs) explicitly out of scope. In 2024 small investors = 60–90% of investor-purchase flow nationally, growing as institutions become net sellers.
Target volume: ~0.46% of housing units/yr ≈ 670K client-like investor purchases nationally, ~180K recoverable across 5-county × 8-yr training window.
Source: web/app/context/page.tsx:99-119.
Three broken motivation signals (V2 territory)
| Signal | Defect | Fix |
|---|---|---|
ProbateDistress_active | Probate dates NULL in 4 of 5 counties (data layer). Fires #3 in Miami only. Upstream ETL over-fires on partial string match | Tighten flag predicate to court-record document types only. Bronze-side, 1-day fix |
PreforeclosureDistress_active / foreclosure trajectories | Not in top-30 anywhere. Oracle rule C2 excludes REO acquisitions at discount ratios < 0.85 — exactly the transactions wholesalers target. Rule is backwards | Correct rule C2 to include the 3,261 entity-buyer REO acquisitions at ratio < 0.85 |
valuation_gap | Constant 1.0 in PA and TX (normalization defect); ~20× in AZ (Save-Our-Homes equivalent). Non-discriminating in 2 of 5 markets | HPI-adjusted replacement; rebuild from raw assessment rolls with county-specific refresh calendars |
Sources: web/app/decks/03-the-buy-box/page.tsx:208-239; web/app/decks/archive/24-scientific-audit-2026-05-07/page.tsx:246-258.
Three "surprises" surfaced by business-sense audit
| Feature | Rank | Why surprising |
|---|---|---|
bathrooms | #3 in Jackson at 0.0342 | Higher than any distress feature anywhere. KC metro is 1-bath bungalows (investor rental) vs 2+ bath (owner-occupied). Buy-box proxy disguised as physical feature |
TaxDelinquentDistress_months_active | #18–25 globally | Top wholesale signal in practice but ranks low. Annual assessment cycle creates near-degenerate distribution (Miami p10=p50=p90=27 months); tree-split utility collapses on constant data |
property_age_years Miami | #1 at 0.0844, 70× over #2 | In Maricopa/Philly/Harris/Jackson top ratio is only 1.02×–1.11×. Miami is structurally a different model |
Source: web/app/decks/archive/24-scientific-audit-2026-05-07/page.tsx:260-277.
8 · Where Apollo wins
Quantitative summary (lifted from Section 4):
| Tier | Counties | Geomean Lift ratio |
|---|---|---|
| Signal-three | Jackson, Harris, Maricopa | 5.72× |
| All five | + Miami + Philadelphia | 3.03× |
Why Miami flat
Alpha's Miami baseline lift = 8.30× (Alpha was originally tuned for Miami). Apollo's Miami lift = 10.09×, ratio 1.22× ± 0.27. The CI is wide because the base rate is already high. The Fold 1 Miami "8.7× AUC-PR" result that opened the project was measured on a different metric (AUC-PR not Lift@1% ratio) and on a single pre-embargo fold; the multi-fold embargoed evaluation showed the narrower gap.
Source: web/app/decks/04-where-it-wins/page.tsx:130-141.
Why Philly flat
AUC-ROC 0.66 is the lowest in the portfolio. Apollo model lift 2.42×, Alpha baseline 2.16×, ratio 1.12× ± 0.21. High sale prevalence (2.56%), Northeast row-house ownership structure, judicial foreclosure cycle different from Sunbelt markets. Feature stack transfers, but signal-to-noise environment is tighter. 482 positives before embargo expansion was below the 1K threshold. Apollo's 62.6% Townhouse composition (only 5% SFH) was previously masked when the model was SFH-only.
Sources: web/app/decks/04-where-it-wins/page.tsx:143-150; web/app/decks/archive/20-executive-submission/page.tsx:107-119.
The honest framing
Apollo is a buy-box model that has proven itself in 3 of 5 markets. In the remaining 2, Alpha is competitive enough that Apollo does not statistically dominate at the top of the list. That does not prevent Apollo from being useful — AUC-ROC scores (Miami 0.69, Philly 0.66) indicate meaningful ranking discrimination across the full distribution. It does mean the Lift@1% ratio headline should not be cited without the noise-band disclosure.
Source: web/app/decks/04-where-it-wins/page.tsx:167-175.
9 · The submission
Deliverable artifact
| Property | Value |
|---|---|
| File | scored_properties_2026-05-07.zip |
| Compressed | 63 MB |
| Uncompressed | 577 MB |
| Rows | 4,052,593 (5 ranked CSVs + 1 sidecar) |
| Inference T0 | 2025-09 |
| Prediction window | Oct 2025 – Mar 2026 |
meta.json | Embeds oracle sha256, feature cache version, train/calibration windows |
Dual-size cut-off (judging flexibility)
| Pack | Rows | Bytes | Optimised for |
|---|---|---|---|
| Top-1,000 per county | 5,000 | 660 KB | Precision@K, Lift@K, operational wholesale lists |
| Top-50K per county | 250,000 | 31 MB | F1@K, Recall@K |
TOP_1000_PER_COUNTY/ folder | 5,000 across 5 files | — | Split-by-county convenience for judges |
head_to_head_by_county.csv | 5 rows × 26 cols | — | Per-county metrics (AUC, BSS, ECE, Lift, Recall) |
Source: web/app/decks/archive/20-executive-submission/page.tsx:172-202.
How to use the output
score_0_100— within-county percentile of calibrated probability. Use for intra-top-100 ordering. NOT comparable across counties.calibrated_probability_isotonic— empirical deal rate (4–11 distinct tiers per county). Use for threshold bands.cross_county_calibration_2026-05-07.csv— per-county prevalence, lift ratios w/ 95% CI,stat_significant_liftflag, expected deal rate at top 1%/5%/10%.
Sources: web/app/decks/05-the-submission/page.tsx:175-179; web/app/decks/archive/24-scientific-audit-2026-05-07/page.tsx:161-171.
Top calibrated probability examples
- Miami #1 row:
calibrated_probability_isotonic = 0.579(58% deal probability) - Philly top-10 global dominated by Philly rows at probability 0.50–1.00 (isotonic calibration ceiling — optics note, not a bug)
Sources: web/app/decks/archive/20-executive-submission/page.tsx:153-157, 114-119.
10 · The audits
Three audits ran between 2026-04-23 and 2026-05-07. Together they closed 9 of 10 ship-blockers. Item #10 (Scenario A) is FLAG, not failed.
Audit comparison
| Audit | Date | Lens | Checks | Verdict | Key finding |
|---|---|---|---|---|---|
| Triple-Critic | 2026-04-23 | CRM leak · oracle proxy · prediction window · use_type filter · placebo · data integrity | 10 questions | 4 FAIL→FIXED · 2 CAVEATED · 2 PASS | CRM rows in training, proxy features in model, window off by one month — all three fixed before submission |
| Permit Density | 2026-05-07 | BuildZoom S3 coverage · per-county permit density · ZIP/FIPS integrity | 5 counties · 999 ZIPs | FINDING-52 OBSOLETE · FIPS BUG FIXED | 64K → 15.6M permits (242×) · finding-52 data ceiling closed · 1,762 mis-FIPS rows fixed |
| Scientific Re-Audit | 2026-05-07 | Mathematical · Business-sense · Pipeline structural (3 parallel specialist agents, no cross-coordination) | 36 checks across 3 agents | 24 PASS · 8 FLAG · 4 FAIL · 9 of 10 closed | Pipeline holds · geomean 3.03× · Miami/Philly within noise · model is a Buy-Box classifier |
Source: web/app/decks/06-the-audits/page.tsx:48-72.
Scientific re-audit scorecard (decks/archive/24-scientific-audit-2026-05-07/page.tsx:91-127)
| Lens | Checks | PASS | FLAG | FAIL | Key finding |
|---|---|---|---|---|---|
| Mathematical | 14 | 10 | 1 | 3 | 2 stat-sig fails (Miami · Philly) · 1 ECE undocumented · 1 cross-county comparability structural |
| Business-sense | 14 | 8 | 5 | 1 | property_age PASS→FLAG · 3 broken motivation signals · model is Buy-Box |
| Pipeline structural | 8 stages | 6 | 2 | 0 | 2 P0 fragility risks · 0 automated tests before audit · 2 canonical scoring scripts coexisted |
| Combined | 36 | 24 | 8 | 4 | 67% PASS · research-quality · ship with caveats |
Ten ship-blockers — action list
| # | Item | Status | Note |
|---|---|---|---|
| 1 | Exclude CRM-matched rows from training | CLOSED | 4,442 rows dropped · attach_y_deal(exclude_crm=True) |
| 2 | Remove 5 oracle-proxy features from PROXY_DROPS (cash_buyer_flag, is_distress_deed, +3) | CLOSED | Re-run: 103 clean columns · no oracle-input detected |
| 3 | Fix prediction window label — Oct 2025–Mar 2026 (was off-by-one month) | CLOSED | All artifacts corrected |
| 4 | Document --use-types default = expanded residential set (SFH + Condo + Townhouse + Duplex + Triplex + Quadruplex + 5-9 units) | CLOSED | |
| 5 | Add stat-sig caveat · dual geomean (5/5=3.03× · 3/5=5.72×) | CLOSED | Miami + Philly within noise of 1.0× — disclosed in deck 22 + sidecar CSV |
| 6 | Serialize model · train_model.py + score_model.py · seed=42 · early_stopping=False | CLOSED | Deterministic · manifest.feature_cache_version asserted at score-time |
| 7 | Eliminate hardcoded paths in features.py:688 + macro.py:48 | CLOSED | Path(__file__).resolve().parents[2] |
| 8 | Add monotonicity invariant to sanity_check | CLOSED | score_0_100 strictly monotone with raw_probability within county |
| 9 | Ship test baseline · tests/test_features.py · 5 checks | CLOSED | 0.41s · ZIP validator · cohort map · score formula · prefix collision · filter |
| 10 ⚠ | Scenario A leakage ablation on embargoed Fold 5 (Miami) | FLAG | ΔAUC-ROC −0.0270 vs −0.0033 pre-embargo · 8× larger drop · not broken but FLAG-band · not a pass |
Source: web/app/decks/06-the-audits/page.tsx:84-95.
Scenario A — the one open flag
- Test: drop
listing_duration_months,months_since_prev_sale,mortgage_age_monthson embargoed Fold 5 Miami - Pre-embargo (finding 31): ΔAUC-ROC −0.0033 (within noise)
- Embargoed: ΔAUC-ROC −0.0270 (8× larger)
- Verdict: not "broken" with features in; may perform worse than finding 31 suggested when ablated. Eduardo + Camilo head-to-head uses the full model output, not the ablated one. Flag is on the research trail, not the submission artifact.
- Resolution: targeted bronze-ingest probate-date fix + clean Scenario A re-run on all five counties with embargoed window. ~1 day of compute. Scoped for V2.
Sources: web/app/decks/06-the-audits/page.tsx:289-328; web/app/brief/page.tsx:250-258.
11 · Methodology evolution (timeline)
Chronological milestones from archive decks 01 → 24:
| Date | Milestone | Source |
|---|---|---|
| Project kickoff | Macro project brief: replace Alpha with calibrated, transferable, explainable ranker. 8-week rock vs Camilo, coached by Eduardo, weekly Thursday | decks/archive/01-macro-project |
| Phase 1 | Column inventory, foreclosure law validation, distress trajectory audit, 25 synthetic features, external data caching | decks/archive/01-macro-project/page.tsx:181-186 |
| Phase 2 | Six walk-forward folds across five counties | same |
| Phase 3 | Architecture sweep: HistGB vs LightGBM vs logistic vs random forest (5×4 = 20 cells) — HistGB wins | decks/archive/07-arch-ablation |
| ~2026-04 | Fold 1 Miami head-to-head: Apollo 8.7× AUC-PR over Alpha (0.259 vs 0.030), Brier 0.023 inside target | decks/archive/05-fold1-vs-alpha |
| ~2026-04 | Spatial expansion: Fold 1 across all five counties | decks/archive/06-spatial-expansion |
| 2026-04-20 | Architecture ablation matrix verdict: HistGB ships as default | decks/archive/07-arch-ablation |
| 2026-04-20 | Fold-by-fold results: 25-cell matrix, 5 county trajectories | decks/archive/08-fold-results |
| 2026-04-21 | Investor criteria · 6-box specification · deal oracle v1.1 (decks/archive/11-investor-criteria). 4,431 CRM deals as ground truth; 5-step deal-discovery pipeline | decks/archive/12-deal-discovery |
| 2026-04-21 | Identification criteria V2 · EXCLUDE + VALIDATE rule library · LIFT methodology | decks/archive/13-identification-criteria |
| 2026-04-22 | v8 fix shipped: Fold 5 eval shifted from 2024-04..09 to 2024-10..2025-03. The v7 AUC inflation (0.843 → 0.694 honest) was caused by 5-month label-window overlap — now sealed by embargo default. 17 unit-test assertions PASS | decks/archive/19-current-state-2026-04-22/page.tsx:152-179 |
| 2026-04-22 | Current State deck 19 prepared for Thursday Eduardo+Camilo check-in: research-ready with caveats | decks/archive/19-current-state-2026-04-22 |
| 2026-04-23 | Triple-Critic audit: 4 FAIL→FIXED · 2 CAVEATED · 2 PASS. CRM leak, oracle proxies, prediction window all fixed same session | decks/archive/21-triple-audit-2026-04-23 |
| 2026-04-23 | Executive submission deck 20: 3.03× geomean, 5K + 250K cut-off variants, calibration P1 solved | decks/archive/20-executive-submission |
| 2026-04-23 | V2 overnight report · oracle v1.1 · five-stream brief | decks/archive/18-v2-overnight-report |
| 2026-04-27 | CEO summary deck 22 (one-page brief, 5 questions/5 answers) | decks/archive/22-ceo-summary-2026-04-27 |
| 2026-05-07 | BuildZoom refresh: 64,513 → 15,645,153 permits (242×, 15 GB, 2,851 part-files). Finding 52's "structural data ceiling" verdict obsolete | decks/archive/23-permit-data-density-2026-05-07 |
| 2026-05-07 | FIPS-86052 fix: 1,762 mis-FIPS'd rows filtered out (ZIP 86052 = Page, AZ, 270 mi from Maricopa core) | decks/06-the-audits/page.tsx:255-265 |
| 2026-05-07 | Scientific Re-Audit (3 parallel agents: math, business-sense, pipeline). 36 checks · 24 PASS · 8 FLAG · 4 FAIL. 9 of 10 ship-blockers closed same day. Verdict: SHIP-WITH-SHARPER-CAVEATS | decks/archive/24-scientific-audit-2026-05-07 |
| 2026-05-07 | P0 risks fixed: model serialization (train_model.py + score_model.py, seed=42, deterministic), hardcoded paths replaced (Path(__file__).resolve().parents[2]), monotonicity invariant + 5-check pytest baseline shipped | same |
| 2026-05-07 | Finding 54: stratified ablation confirms property_age = cross-band separator, not within-band motivation. Apollo is a Buy-Box classifier | same |
| 2026-05-07 | Finding 55: Scenario A re-run on embargoed Fold 5 Miami returns FLAG (ΔAUC-ROC −0.0270 vs −0.0033 pre-embargo, 8× larger) | same |
| 2026-05-07 | Deliverable scored_properties_2026-05-07.zip shipped: 4.05M rows, 63 MB | decks/05-the-submission, brief |
| 2026-05-08 | Brief / Context / Data / Decks 01–06 published as the live hub at 8020rei-new-model.web.app (this is the source distilled here) | brief/page.tsx:30, data/page.tsx:198, decks/04-where-it-wins/page.tsx:28 |
12 · Current state
What's shipped
scored_properties_2026-05-07.zip— 4,052,593 properties across 5 counties (Miami-Dade, Maricopa, Philadelphia, Harris, Jackson), 63 MB compressed, 577 MB uncompressed- Five ranked CSVs + cross-county calibration sidecar
- Per-county HistGB + isotonic calibration models serialized at
models/<FIPS>/v8/(deterministic, seed=42) - 117 features (from 481 column universe), 25 synthetics, 6 tiers
- 69/69 deterministic sanity checks pass; 0.41s pytest baseline
- 9 of 10 audit ship-blockers closed
- Hub deployed at
8020rei-new-model.web.app(Next.js 15 + Tailwind v4, brand-token-synced from BigQuery, paths inweb/app/) - Cross-county comparability addressed via sidecar CSV
What's open
One audit flag:
- Scenario A leakage ablation on embargoed Fold 5 (Miami): ΔAUC-ROC −0.0270 vs −0.0033 pre-embargo. Not a ship-blocker (full model output unchanged), but blocks "10 of 10" sign-off. Scoped for V2.
Three under leakage audit (per CLAUDE.md hard rule #7):
listing_duration_monthsmonths_since_prev_salemortgage_age_months
Until ablation completes, these features' AUC-PR contribution is NOT cited as validated.
External gates (Eduardo + Camilo):
- Locked March 2025 head-to-head test: written sign-off required on universe, cut-off K, scoring metric
- Camilo's baseline artifact: needs his top-N list on the same eval cohort (currently only Alpha measured)
- Eduardo's P/R/F1 evaluation: against client deals, market deals (sold), market deals at discount. Eduardo has access to post-Oct silver; Apollo's role is shipping the list (done)
- Alpha sunset timeline depends on the locked March 2025 test gate opening
Known blockers / structural gaps:
- C5c NDS fallback — 62% of oracle deals (TX/MO non-disclosure states) have zero price verification. Acknowledged P1 gap, flagged in every downstream deck.
- Probate dates NULL in 4 of 5 counties — bronze-ingest fix, ~1 day. Highest-leverage single improvement to motivation signal.
- Foreclosure oracle rule C2 backwards — currently excludes REO acquisitions at discount ratios < 0.85, which is exactly what wholesalers target. 3,261 entity-buyer rows to recover.
- Valuation gap constant 1.0 in PA and TX — non-discriminating in 2 of 5 markets. HPI-adjusted replacement is V3 backlog.
- Silver universe saturated: 8 feature-addition experiments produced zero AUC gains. Next tier of gain requires fresh data sources — MLS DOM, permits (now refreshed), skip-trace, rent rolls.
- Arms-length filter intentionally OFF: foreclosures, quit-claims, probate transfers, divorce sales all count as
y_sold=1. V2 second pass once Eduardo signs off on policy definition. Impact on thin-positive counties like Philly (2.56% prevalence) unknown.
Sources: web/app/decks/archive/22-ceo-summary-2026-04-27/page.tsx:190-227; web/app/decks/05-the-submission/page.tsx:255-303; web/app/decks/archive/20-executive-submission/page.tsx:212-241.
Calibration / leakage audit status
| Item | Status |
|---|---|
| Isotonic calibration | LIVE; 4 of 5 counties Brier-positive; Jackson at honest floor (−0.0003) |
| ECE top-decile reduction | 69–95% across all 5 counties |
| Calibration target ±15% on 30/60/90-day | Met |
| 3-feature leakage audit | Pending Scenario A V2 ablation |
| CRM-leak guard | ENFORCED on every run, verified as 1 of 69 sanity checks |
| Walk-forward embargo | ENFORCED structurally; v8 default since 2026-04-22 |
V2 roadmap (the contender's next phase)
- Dependent variable sharpen: from "any sale" → "client-like investor purchase" (4,431 CRM deals as tightened ground truth)
- Three motivation signal repairs: probate ingest (bronze), foreclosure oracle C2 correction, valuation gap HPI-adjusted normalization
- Scenario A clean ablation on all five counties with embargoed window
- Arms-length filter second pass once policy signed off
V3 horizon (paradigm shifts, documented but out of V1 scope)
- Survival analysis
- Uplift modelling
- Computer vision on property imagery
- Open public data at national scale
Source: web/app/decks/01-why-apollo/page.tsx:170-178.
13 · Glossary
| Term | Definition |
|---|---|
| Apollo | Per-county supervised classifier (HistGB + isotonic) replacing Alpha at step 4 of Gaia. 117 features, 4.05M parcels, 5 counties. The contender |
| Alpha | 8020REI's incumbent scorer. Weighted sum of 25 hand-tuned distress indicators, Miami-tuned, no training loop, no calibration. The baseline |
| Gaia | Upstream 7-step ETL (ingest → dedup → join → label → enrich → BuyBox → export). Apollo replaces step 4 (scoring) only |
| Camilo | Competing modeller on the 8-week rock; baseline artifact pending. Apollo must clear top-decile recall ≥ Alpha AND ≥ Camilo |
| Eduardo | Coach; sign-off authority on locked March 2025 test; owns P/R/F1 evaluation against client deals |
| T0 | The month-end "as-of" timestamp for a snapshot. Features computed at T0 month-end; outcome window T0+1..T0+6 |
| fold | A walk-forward train/eval split. 6 expanding folds (Fold 1: 15-mo train; Fold 6: 45-mo train). Each absorbs prior eval into training |
| embargo | 1-month buffer between training T0 and evaluation window start. Closes the leak where a property listed at T0 and sold at T0+1 carries signal into training while its outcome is visible |
| HistGB | scikit-learn HistGradientBoostingClassifier. Handles tabular mixed types; interpretable feature importance. Beat LightGBM, logistic, random forest in 5×4 ablation. Deterministic (early_stopping=False, seed=42) |
| isotonic | Monotone non-parametric calibration. Maps raw model probability to empirical deal rate via IsotonicRegression(p → y) fit on held-out non-downsampled slice |
| lift / Lift@K | (positives in top-K of model list) ÷ (positives in random top-K). Lift@1% = how many more deals the top 1% of Apollo's list captures vs a random 1% of properties |
| lift ratio | Apollo Lift@1% ÷ Alpha Lift@1%. The headline head-to-head metric |
| AUC-PR | Area under precision-recall curve. Robust to class imbalance (deal rates < 9%) |
| AUC-ROC | Area under receiver-operating-characteristic curve. Used as the secondary discriminator |
| Brier score | Mean squared error between predicted probability and outcome. Lower = better calibrated. Target ≤ 0.025 |
| BSS (Brier Skill Score) | 1 − (model Brier ÷ reference Brier). Positive = better than reference. Miami went from −0.0008 → +0.0011 post-isotonic (first positive BSS in the project) |
| ECE (Expected Calibration Error) | Weighted mean of bin-level miscalibration. Top-decile ECE improved 69–95% across all 5 counties post-isotonic |
| CRM-leak guard | is_crm_matched_anywindow = 1 rows (properties 8020REI already worked through CRM) dropped entirely from training. Prevents fake head-to-head wins from prior business actions |
| Oracle v1.2 | 5-criterion AND-rule deal definition: DOCTYPE_CLEAN ∧ FLAG_CLEAN ∧ SELLER_CLEAN ∧ BUYER_INVESTOR ∧ PRICE_GATE. 8.73% prevalence; 36.6K labels across 419.7K transactions |
| C5c NDS fallback | Non-disclosure-state branch of PRICE_GATE for TX/MO. Acknowledged structural gap: 62% of deals carry zero price verification |
| arms-length filter | Filter excluding non-arms-length transactions (foreclosures, quit-claims, probate transfers, divorce sales). Intentionally OFF in current phase per CLAUDE.md hard rule #3. V2 second pass planned once Eduardo signs off on the policy |
| score_0_100 | Display score: within-county percentile rank of calibrated_probability_isotonic. Intra-county only. Cross-county comparison NOT meaningful |
| calibrated_probability_isotonic | The actual empirical deal probability per property (4–11 distinct tiers per county) |
stat_significant_lift | Sidecar boolean flag: TRUE iff 95% CI on lift ratio excludes 1.0×. TRUE for Jackson/Harris/Maricopa; FALSE for Miami/Philly |
| signal-three | Jackson + Harris + Maricopa — the 3 counties where Apollo separates from Alpha at statistical significance. Geomean lift ratio 5.72× |
| buy-box | The structural property/location/ownership fingerprint that defines a target deal. Apollo finds who fits the box; it does NOT predict motivation |
| dual-geomean framing | Comms rule: report 3.03× (all 5) and 5.72× (signal-3) together. Citing either alone misrepresents the evidence |
| Scenario A | Recency-features leakage ablation. Drop listing_duration_months + months_since_prev_sale + mortgage_age_months. Pre-embargo: −0.0033 ΔAUC-ROC. Embargoed: −0.0270. The 8× delta is the one open audit flag |
| FIPS | Federal Information Processing Standards county code. Always 5-digit zero-padded string: 04013 not 4013 (CLAUDE.md hard rule #1) |
| finding NN | Dated, evidence-first entry in notes/findings/NN_<topic>.md. Append-only; older facts may be stale (date wins) |
14 · Cross-bucket notes
- CallZeke deliverable was moved from REI hub → Roofing bucket on 2026-05-25. See
notes/Roofing/callzeke/. The REI hub at8020rei-new-model.web.appno longer hosts CallZeke content. - The REI bucket is
notes/REI/(this fileJOURNEY.md). The Roofing bucket isnotes/Roofing/(seenotes/Roofing/PROGRESS_NOTEBOOK.htmlfor live state). - Coverage platform at
coverage.8020roof.comis Roofing-side, NOT REI-side. Separate Firebase site (hosting:8020roof-coverage); neverfirebase deploybare without--only hosting:8020roof-coverage. - Brand tokens are BigQuery-synced and live at
presentations/assets/mck-ds/{colors_and_type,tokens.bigquery}.css. Theweb/Next.js app symlinks them in viaweb/styles/. Single source of truth for color/type/spacing/motion across HTML and React. - Memory of arms-length scope lives at
~/.claude/projects/-Users-ignacioaraya-Projects-new-model/memory/project_arms_length_phase.md.
15 · Source map
| JOURNEY.md section | Primary source file | Secondary sources |
|---|---|---|
| 1 · Macro project | web/app/context/page.tsx:45-130 | web/app/decks/archive/01-macro-project/page.tsx, CLAUDE.md |
| 2 · Alpha | web/app/context/page.tsx:132-175 | web/app/decks/01-why-apollo/page.tsx:45-101, web/app/decks/archive/22-ceo-summary-2026-04-27/page.tsx:54-69 |
| 3 · Apollo | web/app/context/page.tsx:177-218, web/app/decks/02-how-apollo-trains/page.tsx | web/app/decks/01-why-apollo/page.tsx:103-145, web/app/data/page.tsx:38-83 |
| 4 · Head-to-head | web/app/brief/page.tsx:119-161, web/app/decks/04-where-it-wins/page.tsx, web/app/decks/archive/24-scientific-audit-2026-05-07/page.tsx:130-187 | web/app/decks/archive/05-fold1-vs-alpha/page.tsx, web/app/decks/archive/19-current-state-2026-04-22/page.tsx:194-239, web/app/decks/archive/22-ceo-summary-2026-04-27/page.tsx:76-105 |
| 5 · Data backbone | web/app/data/page.tsx | web/app/decks/archive/19-current-state-2026-04-22/page.tsx:96-146 |
| 6 · How Apollo trains | web/app/decks/02-how-apollo-trains/page.tsx | web/app/decks/archive/19-current-state-2026-04-22/page.tsx:148-192 |
| 7 · Buy-box | web/app/decks/03-the-buy-box/page.tsx | web/app/decks/archive/22-ceo-summary-2026-04-27/page.tsx:138-180, web/app/decks/archive/24-scientific-audit-2026-05-07/page.tsx:189-277 |
| 8 · Where Apollo wins | web/app/decks/04-where-it-wins/page.tsx | web/app/decks/archive/24-scientific-audit-2026-05-07/page.tsx:130-187 |
| 9 · The submission | web/app/decks/05-the-submission/page.tsx | web/app/decks/archive/20-executive-submission/page.tsx:159-202, web/app/data/page.tsx:501-569 |
| 10 · Audits | web/app/decks/06-the-audits/page.tsx, web/app/decks/archive/24-scientific-audit-2026-05-07/page.tsx | web/app/decks/archive/21-triple-audit-2026-04-23/page.tsx (not read; referenced via deck 06 + 24) |
| 11 · Timeline | All archive decks 01 → 24 | web/app/decks/archive/19-current-state-2026-04-22/page.tsx, web/app/decks/archive/22-ceo-summary-2026-04-27/page.tsx, web/app/decks/archive/24-scientific-audit-2026-05-07/page.tsx |
| 12 · Current state | web/app/brief/page.tsx:260-318, web/app/decks/05-the-submission/page.tsx:249-304, web/app/decks/archive/22-ceo-summary-2026-04-27/page.tsx:182-237 | CLAUDE.md (hard rules), notes/PROJECT_STATUS.md, notes/findings/00_index.md |
| 13 · Glossary | Distilled across all 15 files | CLAUDE.md |
| 14 · Cross-bucket | Project memory (~/.claude/projects/.../memory/), CLAUDE.md | — |
Source-file inventory used
Live hub (6 decks + 3 pages):
web/app/brief/page.tsx(370 lines · executive brief)web/app/context/page.tsx(289 · background)web/app/data/page.tsx(593 · datasets feeding model)web/app/decks/01-why-apollo/page.tsx(238)web/app/decks/02-how-apollo-trains/page.tsx(353)web/app/decks/03-the-buy-box/page.tsx(289)web/app/decks/04-where-it-wins/page.tsx(333)web/app/decks/05-the-submission/page.tsx(332)web/app/decks/06-the-audits/page.tsx(434)
Archive milestones (6):
web/app/decks/archive/01-macro-project/page.tsx(225 · the original framing)web/app/decks/archive/05-fold1-vs-alpha/page.tsx(216 · the head-to-head opener)web/app/decks/archive/19-current-state-2026-04-22/page.tsx(304)web/app/decks/archive/20-executive-submission/page.tsx(263)web/app/decks/archive/22-ceo-summary-2026-04-27/page.tsx(272)web/app/decks/archive/24-scientific-audit-2026-05-07/page.tsx(421)
*Document status: distilled 2026-05-25 from live hub (8020rei-new-model.web.app) source. The hub remains the live source of truth — when in doubt, checkweb/app/**/page.tsxfor the latest framing. (Hub decommissioned 2026-06-01:web/removed and the8020rei-new-modelFirebase site deleted; theweb/app/**source no longer exists. Live surface is now theplatform/Models Wiki at models-8020iq.web.app. This document is preserved as a historical record.) Confidential.*