REI rules

Apollo replacement model · the hard boundaries & policy — what is fixed, what is reference, what is gated.

This is the decision/policy register for the REI (Apollo) model — the rules that govern labels, leakage, calibration and the evaluation gate. It is not the process walk-through (that lives on the pipeline page). Status of the model itself is building: the win condition is a target, not a result; the locked March-2025 test is untouched; and three features are still under active leakage audit. Each rule below is tagged HARD (a boundary we do not cross), REFERENCE (a definition/convention), or GATED (blocked on a sign-off or a future event). Every statement is quoted from its source — see the src line on each block.

1 · Label 2 · No future data 3 · Leakage audit 4 · Feature tiers 5 · Embargo 6 · Calibration 7 · Cross-county 8 · Locked test 9 · Win condition

Decision & policy rules

Rule 1 — Label definition Hard

What counts as a positive.

y_sold = 1 iff any sale is recorded in T0+1 .. T0+6 (6-month horizon). During this phase every sale event counts as a positive — foreclosures, quit-claims, probate, and divorce sales are all included. The arms-length filter is intentionally OFF. This is scope, not an oversight: re-running with the filter is a planned second pass once Eduardo signs off on the policy. Do not treat the missing filter as a bug or a blocker.

src: CLAUDE.md hard rule #3 + Data conventions (horizon = 6 months)

F12 · investor deal criteria F18 · deal oracle v1

Rule 2 — No future data / T0 boundary Hard

Features may only see the past.

Never train on future data. Features must be computable from the T0 month-end snapshot alone. Walk-forward folds enforce this via t0 boundaries; the feature builder (src/new_model/features.py) reads only the as-of-T0 columns (base_globs = _globs([t0], fips)). T0 is a month-end boundary stored as a YYYY-MM string.

src: CLAUDE.md hard rule #2 + Data conventions (T0 = month-end)

Rule 3 — Three features under leakage audit Hard under audit

Do not cite their contribution as validated yet.

Three features are under active leakage audit: listing_duration_months, months_since_prev_sale, and mortgage_age_months. Do not cite their AUC-PR contribution as validated until the ablation runner completes. (Finding 9 records the as-of-T0 date probe — no post-T0 records found — but the cross-county ablation is still in progress; treat the dependence figures there as preliminary.)

src: CLAUDE.md hard rule #7 → notes/findings/09_leakage_audit.md

F9 · leakage audit F45 · feature subtraction

Rule 4 — Feature tiers A–F Reference

The six-tier feature partition.

Tier	Family	Note
A	Property physical — parcels, size, use, year built
B	Owner + distress — 23 distress trajectories, absentee, leverage
C	Valuation + activity — AVM, appreciation, days_ownership
D	Date-diffs — mortgage age, listing duration, prev-sale recency	under leakage audit (see Rule 3)
E	National macro — FRED mortgage rate, Fed funds, HPI, CPI, unemployment (same value per T0 month)
F	Local market context — BLS county unemployment, ACS income, FHFA state HPI	currently being wired in

MASTER_PLAN §4 builds these in tiers (A–E for the MVP, F deferred). Feature counts vary across findings — this register links the findings rather than asserting one number.

src: CLAUDE.md Data conventions (feature tiers) + notes/MASTER_PLAN.md §4

F2 · feature importance F7 · external variables F19 · distress forensics

Rule 5 — Walk-forward embargo Hard

Embargo must equal the horizon.

Evaluation is walk-forward: train on everything up to a fold, evaluate the next fold, advance six months, repeat (MASTER_PLAN §3). The embargo between the last train T0 and the first eval T0 must equal the horizon — 6 months. With a shorter gap the last train T0s carry label windows that extend into the eval window, contaminating the outcome (finding 32 quantified a 5-month overlap in the pre-fix folds). The locked test fold is structurally clean (0 overlap T0s).

src: notes/MASTER_PLAN.md §3 → notes/findings/32_fold_embargo_analysis.md

F32 · fold embargo analysis

Rule 6 — Calibration on the true base rate Hard

Calibrate at the population prior, not the downsampled one.

Training uses a 10:1 negative-to-positive downsample to fit in memory, which inflates the training positive rate (~9.1%) above the true eval rate (2–4%). Raw gradient-boosted probabilities are therefore globally too high on eval. Calibration must be performed at the true population base rate on a held-out, non-downsampled slice — not on the downsampled training pool (which inherits the wrong prior and cannot correct the shift). Isotonic regression is the non-negotiable post-processing step (MASTER_PLAN §6); the prior-ratio rescaling in finding 10 moved 4 of 5 counties inside the Brier target without changing AUC-PR.

src: notes/MASTER_PLAN.md §6 → findings/10_prior_ratio_calibration.md, 46_isotonic_held_out.md

F10 · prior-ratio calibration F46 · isotonic held-out F6 · calibration / Brier

Rule 7 — Cross-county score caveat Reference

Scores rank within a county, not across them.

Model scores rank properties within a county. A top score in one county does not imply the same deal rate as a top score in another — base rates and Alpha's separation differ markedly across markets (finding 41: Alpha AUC 0.53–0.59 outside Miami; eval positive prevalence varies county to county). Do not compare a raw score in Jackson to a raw score in Miami as if the scale were shared.

src: notes/findings/11_h4_transfer.md (cross-state transfer)

F11 · H4 cross-state transfer F4 · Jackson deep dive

Rule 8 — Locked March-2025 test untouchable Gated gated

The Phase-4 evaluation gate.

The locked March-2025 test cohort (T0 = 2025-03 → 2025-09) is untouchable until Eduardo + Camilo sign off in writing. No feature engineering, no hyperparameter tuning, no inspection — not even a sanity check — touches that fold before the gate. Eduardo and Camilo co-sign the lock (MASTER_PLAN §3). As of this page the test fold is untouched.

src: CLAUDE.md hard rule #4 + notes/MASTER_PLAN.md §3

Rule 9 — Win condition Gated target, not a result

The bar the model must clear on the locked test.

Win condition: top-decile recall ≥ Alpha AND ≥ Camilo, with 30/60/90-day calibration within ±15%, on the locked March-2025 head-to-head evaluation. This is the target, NOT a current result — it can only be adjudicated once the locked test in Rule 8 is opened. Finding 41 is the honest Alpha-vs-model comparison on the embargoed dev fold (Fold 5), not the locked gate; do not read it as the win condition being met.

src: CLAUDE.md (top of file — win condition) + notes/MASTER_PLAN.md §12

F41 · Alpha head-to-head

REI overview · REI pipeline · how it works · model card · changelog

Rules from CLAUDE.md hard rules + notes/MASTER_PLAN.md · REI findings · status: building (win condition gated, not yet met).

REI rules

Decision & policy rules

Related pages