Overview › Methodology

Methodology

The rules — how every 8020IQ model is built, validated, and judged. For the step-by-step map of the pipeline, see the Framework.

One method, many models — a shared platform, then a per-model spine. Roofing, Apollo, Olivia, garage and windows share the same recipe: a point-in-time snapshot, a six-month label, a tiered feature library, walk-forward validation, a modeling ladder, and calibration to the true base rate. The shared platform is the prediction frame, the feature library, and the audit method. What each model owns is its label (§2) and its modeling + delivery. This page documents the method; each model page shows its own numbers.

1The prediction frame shared

Every model answers one question, framed identically across surfaces: given what we know about a property at a fixed point in time, will the target event happen in the next six months? The frame is shared; only the label changes per model.

The model stands at T0 and looks forward. Features may only look left; the label only looks right.

ElementDefinition (from notes/MASTER_PLAN.md §3 + CLAUDE.md)
T0 — the snapshotA month-end boundary, stored as a YYYY-MM string. Every feature is computed as-of T0 month-end. The model only ever sees the world as it stood on that date.
HorizonSix months. The window is T0+1 .. T0+6.
The label (y)y = 1 if the model's target event is recorded anywhere in T0+1 .. T0+6, else 0. Formally y(T0)=1 if event ∈ (T0, T0+6 months]. The target event differs by model — see Labeling.
Never train on future dataFeatures must be computable from the T0 snapshot alone. Training stops at T0 = T_today − 6 months because earlier labels only become observable after the horizon elapses. Walk-forward folds enforce this via t0 boundaries.

2Labeling — by label family shared method · per-model label

The frame is shared; the label is where models differ. Two families: a permit event actually happened, or a transaction happened. The methodology for defining a leakage-safe, business-valid label is shared; the specific positive is the model's own.

2.1 · Permit-event labels — Roofing · Garage · Windows

y = 1 iff a qualifying permit event is recorded in the horizon.

A positive is a real permit of the right kind in T0+1 .. T0+6. The work is (a) classifying which permits count (type × action), and (b) bounding where a 0 is trustworthy — a "no permit" only means "no event" where the vendor actually covers that jurisdiction (coverage).

  • Roofing (Hestia): a qualifying roof-replacement permit, restricted by the owner-occupied-at-permit rule (a permit only counts as a positive if the owner occupied the property at permit time) and single-family.
  • Garage / Windows: the same permit-event frame on a different permit type; reuse the roofing classification + coverage template.

2.2 · Transaction labels — Apollo · Olivia

y = 1 iff a qualifying transaction happens in the horizon.

Here the positive is a deed / sale event, and the two models cut it differently:

  • Apollo (REI): any property sale in the horizon (y_sold). The arms-length filter is intentionally OFF in this phase — every sale (including foreclosure, quit-claim, probate, divorce) counts as a positive, by design. Apollo is the generic "will it transact" signal.
  • Olivia: a dealable transaction for the client (wholesale / fix-flip) — narrower than any sale. Built by funnel decomposition: P(dealable) = P(transacts) × P(dealable | transacts). The client's closed deals are the known positives; every other transaction is unlabeled (a mix of dealable + not), so this is a positive-unlabeled problem, corrected with a true prior. Apollo's signal is Olivia's Stage A.
ModelFamilyy = 1 (positive)Key rule
roofing Roofing · HestiaPermitqualifying roof-replacement permit in T0+1..T0+6owner-occupied-at-permit · single-family · coverage-bounded
garage GaragePermitgarage-addition permit in horizonpermit classification · coverage-bounded
windows WindowsPermitwindow-replacement permit in horizonpermit classification · coverage-bounded
rei Apollo · REITransactionany property sale (y_sold) in horizonarms-length filter OFF (every sale counts), by design
olivia OliviaTransaction · dealableclient closes a dealable deal (wholesale / fix-flip)funnel decomposition · positive-unlabeled + true-prior

3Feature library — tiers A through F shared

The feature library is organized into six tiers and shared across models — each model picks its own subset and driver clusters. Tiers A–E are the core (sufficient to beat the incumbent heuristic); Tier F (local market context) is the geographic-correction layer. Each tier carries a null-policy and a provenance tag. See the full feature taxonomy.

TierWhatExamples
AProperty physicalparcels, lot size, use type, year built, building / living area (YearBuilt, LotSizeSqFt, UseType, property_age_years)
BOwner + distress23 distress trajectories (active flag, months-active, months-since-resolved, was-ever-active), absentee level, leverage. The behavioral spine of the model.
CValuation + activityAVM, assessed value, valuation_gap, equity / leverage ratio, appreciation, days-ownership (current_avm_value, leverage_ratio, price_appreciation)
DDate-diffs Under leakage auditmortgage_age_months, listing_duration_months, months_since_prev_sale. Three features under active leakage audit — not yet cited as validated until the ablation runner completes.
ENational macroFRED 30-yr mortgage rate, Fed Funds, Case-Shiller HPI (CSUSHPINSA), national unemployment. Same value for every property within a T0 month; joined on Period.
FLocal market contextcounty unemployment (BLS LAUS), ACS income, FHFA state HPI, foreclosure-law speed class. The geographic-correction layer; publication-lag-shifted to avoid future leakage.

Tier D caveat (active audit). Three Tier-D features — listing_duration_months, months_since_prev_sale, mortgage_age_months — are under leakage audit per notes/findings/09_leakage_audit.md. In the REI model listing_duration_months alone has contributed roughly half of one county's ranking signal in some windows, so single-feature dependency is tracked explicitly. Their contribution is not cited as validated until the ablation completes.

4Modeling — folding logic & calibration per-model spine

The modeling spine is the same for every model: walk-forward folds with an embargo, case-control sampling, a modeling ladder, and calibration to the true base rate.

4.1 · Walk-forward folds + embargo

Validation is walk-forward, never random K-fold. Train on everything available up to a fold, evaluate on the next fold, advance six months, repeat — mimicking production, where the model retrains every six months. Random K-fold on time-series data leaks future information into training and produces AUC figures that evaporate in production.

An embargo equal to the horizon separates train and eval: each dev fold's first eval T0 equals the last train T0 plus the horizon plus one month, so the last training label-window ends exactly one month before the first eval label-window starts — zero overlap. (The earlier layout had up to a 5-month label-window overlap; the embargo default eliminates it.)

FoldTraining T0 rangeEvaluation T0Label observable by
12021-01 .. 2021-092022-032022-09
22021-01 .. 2022-032022-092023-03
32021-01 .. 2022-092023-032023-09
42021-01 .. 2023-032023-092024-03
52021-01 .. 2023-092024-032024-09
62021-01 .. 2024-032024-092025-03
TESTlocked2021-01 .. 2024-092025-032025-09

4.2 · Case-control negative sampling + true-prior correction

Sale and roof-permit events are rare (single-digit-percent base rates), so the negative class is downsampled per cohort — keep all positives plus a fixed ratio of negatives (NEG_TO_POS_RATIO × positives). This unblocks the largest counties (Maricopa, 04013, ~66M rows; Harris, 48201, ~59M rows) on a single 32 GB machine. Downsampling distorts the output probabilities, so the model's scores are corrected back to the true base rate at calibration time. For Olivia the same machinery doubles as the positive-unlabeled correction — known positives (client deals) against sampled unlabeled rows, rescaled to the estimated true prior.

4.3 · Modeling ladder

Three rungs, in order. No rung is built until the previous rung has beaten the incumbent heuristic on the current fold. Each rung has a go/no-go gate.

RungModelRoleGate to proceed
1Logistic Regression (Tier A + active distress)Interpretable sanity-check floor — falsifies "the signal is non-linear"lift@top-10% > Alpha
2Gradient-boosted trees, calibrated (Tier A+B+C+D+E)Production candidaterecall@top-10% > Alpha
3State-stratified / hierarchical boosting (+ Tier F)Answers "does per-state training help?"Decided on Fold 6 only — never on the locked test

Rung 3 is a tie-breaker question, not a default. If per-state training is a dead heat against one national model, the simpler national model ships. Simplicity wins ties.

4.4 · Calibration — rank, then forecast on the true base rate

The boosted model is first a ranker (who is most likely to transact); calibration turns that rank into a trustworthy forecast (a 0.8 score means roughly 80% of such properties actually transact). Calibration is a post-processing step fit against the true base rate — undoing the case-control downsampling — so the predicted probabilities are neither inflated nor deflated.

5The win condition Target — not a result

Each model fixes its bar in advance and is judged against it. The bar is model-specific, but the discipline — set it before you look, judge on held-out data — is shared.

Apollo (REI) — beat Alpha + Camilo on the locked March-2025 test

recall@top-10%
must exceed BOTH the incumbent heuristic (Alpha) AND Camilo's model on the locked test cohort
±15%
30 / 60 / 90-day calibration must hold within ±15% across deciles
2025-03
single locked test T0; train ends 2025-02, separated by the full horizon

The locked test is untouched. The March-2025 evaluation cohort is frozen and not run early — no feature engineering, no hyperparameter tuning, no inspection — until Eduardo and Camilo co-sign in writing. Everything reported on the model pages is walk-forward validation ending earlier; the locked test is the final gate, not a result already achieved. Two of the three criteria met is a draw; three of three is a win.

All models draw on the same feature library and the data layer. Where each model stands — status, live validation numbers, changelogs — lives on the Models page and each model's hub; the running research log is in the Log.

Rendered from notes/MASTER_PLAN.md + notes/METHODOLOGY.md.