OverviewModelsREI › Rules

REI rules

Apollo replacement model · the hard boundaries & policy — what is fixed, what is reference, what is gated.
This is the decision/policy register for the REI (Apollo) model — the rules that govern labels, leakage, calibration and the evaluation gate. It is not the process walk-through (that lives on the pipeline page). Status of the model itself is building: the win condition is a target, not a result; the locked March-2025 test is untouched; and three features are still under active leakage audit. Each rule below is tagged HARD (a boundary we do not cross), REFERENCE (a definition/convention), or GATED (blocked on a sign-off or a future event). Every statement is quoted from its source — see the src line on each block.

Decision & policy rules

1
Rule 1 — Label definition Hard
What counts as a positive.

y_sold = 1 iff any sale is recorded in T0+1 .. T0+6 (6-month horizon). During this phase every sale event counts as a positive — foreclosures, quit-claims, probate, and divorce sales are all included. The arms-length filter is intentionally OFF. This is scope, not an oversight: re-running with the filter is a planned second pass once Eduardo signs off on the policy. Do not treat the missing filter as a bug or a blocker.

src: CLAUDE.md hard rule #3 + Data conventions (horizon = 6 months)
2
Rule 2 — No future data / T0 boundary Hard
Features may only see the past.

Never train on future data. Features must be computable from the T0 month-end snapshot alone. Walk-forward folds enforce this via t0 boundaries; the feature builder (src/new_model/features.py) reads only the as-of-T0 columns (base_globs = _globs([t0], fips)). T0 is a month-end boundary stored as a YYYY-MM string.

src: CLAUDE.md hard rule #2 + Data conventions (T0 = month-end)
3
Rule 3 — Three features under leakage audit Hard under audit
Do not cite their contribution as validated yet.

Three features are under active leakage audit: listing_duration_months, months_since_prev_sale, and mortgage_age_months. Do not cite their AUC-PR contribution as validated until the ablation runner completes. (Finding 9 records the as-of-T0 date probe — no post-T0 records found — but the cross-county ablation is still in progress; treat the dependence figures there as preliminary.)

src: CLAUDE.md hard rule #7 → notes/findings/09_leakage_audit.md
4
Rule 4 — Feature tiers A–F Reference
The six-tier feature partition.
TierFamilyNote
AProperty physical — parcels, size, use, year built
BOwner + distress — 23 distress trajectories, absentee, leverage
CValuation + activity — AVM, appreciation, days_ownership
DDate-diffs — mortgage age, listing duration, prev-sale recencyunder leakage audit (see Rule 3)
ENational macro — FRED mortgage rate, Fed funds, HPI, CPI, unemployment (same value per T0 month)
FLocal market context — BLS county unemployment, ACS income, FHFA state HPIcurrently being wired in

MASTER_PLAN §4 builds these in tiers (A–E for the MVP, F deferred). Feature counts vary across findings — this register links the findings rather than asserting one number.

src: CLAUDE.md Data conventions (feature tiers) + notes/MASTER_PLAN.md §4
5
Rule 5 — Walk-forward embargo Hard
Embargo must equal the horizon.

Evaluation is walk-forward: train on everything up to a fold, evaluate the next fold, advance six months, repeat (MASTER_PLAN §3). The embargo between the last train T0 and the first eval T0 must equal the horizon — 6 months. With a shorter gap the last train T0s carry label windows that extend into the eval window, contaminating the outcome (finding 32 quantified a 5-month overlap in the pre-fix folds). The locked test fold is structurally clean (0 overlap T0s).

src: notes/MASTER_PLAN.md §3 → notes/findings/32_fold_embargo_analysis.md
6
Rule 6 — Calibration on the true base rate Hard
Calibrate at the population prior, not the downsampled one.

Training uses a 10:1 negative-to-positive downsample to fit in memory, which inflates the training positive rate (~9.1%) above the true eval rate (2–4%). Raw gradient-boosted probabilities are therefore globally too high on eval. Calibration must be performed at the true population base rate on a held-out, non-downsampled slice — not on the downsampled training pool (which inherits the wrong prior and cannot correct the shift). Isotonic regression is the non-negotiable post-processing step (MASTER_PLAN §6); the prior-ratio rescaling in finding 10 moved 4 of 5 counties inside the Brier target without changing AUC-PR.

src: notes/MASTER_PLAN.md §6 → findings/10_prior_ratio_calibration.md, 46_isotonic_held_out.md
7
Rule 7 — Cross-county score caveat Reference
Scores rank within a county, not across them.

Model scores rank properties within a county. A top score in one county does not imply the same deal rate as a top score in another — base rates and Alpha's separation differ markedly across markets (finding 41: Alpha AUC 0.53–0.59 outside Miami; eval positive prevalence varies county to county). Do not compare a raw score in Jackson to a raw score in Miami as if the scale were shared.

src: notes/findings/11_h4_transfer.md (cross-state transfer)
8
Rule 8 — Locked March-2025 test untouchable Gated gated
The Phase-4 evaluation gate.

The locked March-2025 test cohort (T0 = 2025-03 → 2025-09) is untouchable until Eduardo + Camilo sign off in writing. No feature engineering, no hyperparameter tuning, no inspection — not even a sanity check — touches that fold before the gate. Eduardo and Camilo co-sign the lock (MASTER_PLAN §3). As of this page the test fold is untouched.

src: CLAUDE.md hard rule #4 + notes/MASTER_PLAN.md §3
9
Rule 9 — Win condition Gated target, not a result
The bar the model must clear on the locked test.

Win condition: top-decile recall ≥ Alpha AND ≥ Camilo, with 30/60/90-day calibration within ±15%, on the locked March-2025 head-to-head evaluation. This is the target, NOT a current result — it can only be adjudicated once the locked test in Rule 8 is opened. Finding 41 is the honest Alpha-vs-model comparison on the embargoed dev fold (Fold 5), not the locked gate; do not read it as the win condition being met.

src: CLAUDE.md (top of file — win condition) + notes/MASTER_PLAN.md §12

Related pages

REI overview · REI pipeline · how it works · model card · changelog

Rules from CLAUDE.md hard rules + notes/MASTER_PLAN.md · REI findings · status: building (win condition gated, not yet met).