Step 2 · Coverage
Where does a missing roof permit really mean "no roof" — and where does it just mean the vendor is blind? Coverage decides, per jurisdiction, whether we can trust permit absence as a negative label.
The roofing model learns from roof permits: a permit on a home is a positive, the absence of one is a negative. That logic only holds where the permit feed actually sees the jurisdiction. In a place BuildZoom never collected from, every home looks permit-free — and treating all of them as negatives would teach the model exactly the wrong thing, then recommend properties in zones we have no business recommending (the 25K-list incident, finding 67).
Step 2 combines the vendor's own coverage claim with our independent reality-check, then issues one verdict per (jurisdiction × year) tuple: INCLUDE (trust it — train here, ship here), FLAG (review before deciding), or EXCLUDE (train and recommend nothing here). Only INCLUDED tuples flow downstream into labeling, features, and the delivery list. The previous step, Step 1 · Labeling, defines what counts as a valid roof permit; Step 2 decides where that count is trustworthy.
2.0 · First American Municipality standardization Done
The match spine starts from First American's Municipality field — which FA defines as the legal jurisdiction, "not necessarily the property city". Each value is sorted into one of four status buckets. Only city_named carries a city; the other three set city = NULL. NULL is never assumed to mean unincorporated — FA carries a separate explicit UNINCORPORATED value.
| Bucket | Meaning | Carries a city? | Share of non-NULL strings |
|---|---|---|---|
| city_named | Resolves to an incorporated place | Yes | 89.3% |
| unincorporated | FA confirms no incorporated place → county AHJ (confident) | No | 7.7% |
| district_code | School / fire / tax district — no AHJ signal | No | 2.0% |
| unknown | NULL / junk | No | 1.0% |
The classifier (classify_fa_municipality.py) was built and audited on the local layer — 72.76M parcels across 1,420 FIPS, no AWS — over four cycles of four AI agents, climbing from 85% to roughly 96% accuracy (audited 2026-05-21). Residual error is concentrated in Los Angeles County (06037) tax-area garble, which carries no decision impact (it all falls to city_under_county).
2.0b · BuildZoom jurisdiction → canonical key Done
The vendor side of the same standardization. BuildZoom names a jurisdiction as STATE_County_City (or STATE_County); the provider coverage labels and the permit feed share this vocabulary, so a single normalizer serves both. Each of the 2,497 distinct BuildZoom strings is parsed into a canonical key — (state, county_fips, canonical_place, place_type) — the same shape the FA side emits, so the two can be matched directly.
County → FIPS resolution uses the complete Census 2024 counties gazetteer (3,222 counties) rather than only our own 1,419-county set, which fixes Connecticut's 2022 switch from counties to planning regions, diacritic folding, and NYC / GA / MD / DC aliases. Result: county → FIPS resolved 99.7% (0 unmatched), with status resolved 80.0% / county-level 19.7% / malformed 0.3%. Audited across four cycles of four AI agents (404 row-checks, 100% each cycle).
2.0c · FA ↔ BuildZoom canonical match Done
The two standardized sides are joined on the shared canonical key (county_fips, canonical_place) over 32,179 FA (fips, Municipality) rows. Each FA municipality lands in one of four match outcomes:
| Outcome | What it means | Share of SFH |
|---|---|---|
| city_matched | FA city ↔ BuildZoom city jurisdiction | 31.1% |
| city_under_county | FA city, but BuildZoom covers the county, not the city | 22.6% |
| county_matched | FA unincorporated / district / unknown ↔ BuildZoom county jurisdiction | 12.9% |
| no_bz | No BuildZoom jurisdiction for the county at all | 33.3% |
66.7% of FA single-family homes fall under a BuildZoom jurisdiction. This match table supersedes the earlier one-sided match_table_v2 as the coverage spine. Independently graded: 0% error across 600 cases for this match step (audited 2026-05-21). Known gap (as of 2026-06-05): FIPS 48113 (Dallas) silver is a dangling symlink and is excluded until the silver layer is repaired. For how a single permit address resolves to a single FA parcel, see How we match permits to properties.
2.1 · Vendor coverage labels Ingested
BuildZoom publishes its own coverage label per (jurisdiction × year), drawn from permit_coverage_labels_by_year_2000_2025.csv. This is what the provider claims to cover — not what we measure.
| Label | Provider's claim for that jurisdiction-year |
|---|---|
| Yes | Full coverage |
| Some | Partial coverage (definition pending the BuildZoom double-click call) |
| None | No coverage |
| empty | No label published (e.g. year not yet ingested) |
The join key into our spine is COLLECTION_POINT_3PART — not the human-readable jurisdiction string. (Joining on the wrong key silently sets every in_provider to false site-wide, so this is load-bearing.) For the full vendor walkthrough, see the BuildZoom ETL walkthrough.
2.2 · Our match-rate diagnostic Shipped nationally
The reality-check. For every FIPS we compute a pure match rate — distinct SFH-with-a-matched-roof-permit ÷ total SFH (excluding unit-numbered SFH from the denominator). The national average is 14.98% (~15%), with sharp regional spread: Florida averages 62.2% (54 FIPS) versus 10.7% for the rest of the US. This diagnostic shipped 2026-05-18 to the live coverage app at coverage.8020roof.com.
labels_spec_v = coverage_pipeline_pre_v5), with no date cap — so decades-old permits and even future-dated Pinellas permits (last_permit 2055 / 2060 / 2066) all count equally. It is therefore an optimistic, all-time UPPER BOUND, not a current-state visibility figure, and must not be presented as "% of homes covered" or as a quality score. The model-relevant coverage — a clean v5.3.3 roof event inside a usable recent window — is lower, and awaits the v5.3.3 + event-date-capped national rerun (currently blocked on AWS).
2.3 · Per-tuple inclusion decision tree Done
Four gates, first-fail-decides, applied to all 32,179 (fips, jurisdiction, fa_muni) tuples. The verdict is the outcome of the first gate that does not pass.
Coverage decision — first failing gate decides
None or empty → EXCLUDE. The vendor itself says it doesn't cover this jurisdiction-year.2.4 · Municipality-level training-inclusion rule Spec
"Solo los que pasan el test se usan para entrenar." A per-FA-municipality threshold over a trailing five-year window, computed only on permits with roof_sub_class ∈ {REPLACEMENT, AMBIGUOUS}. The default threshold is ≥ 25% until the backtest sweep argues otherwise — the cutoff should sit at the elbow of held-out recall, not at a number picked on a call.
coverage_recovery_queue.csv (275 jurisdictions / 9.20M SFH) is the non-circular input list for that sweep.
2.5 · Output contract — coverage_decisions.parquet Done
The materialized output is one row per (fips, canonical_jurisdiction, fa_muni) tuple, written to evidence/sources/coverage/coverage_decisions.parquet (32,179 rows). Each row carries the verdict plus a human-readable reason (e.g. "rule 2: provider None"), the measured match_rate, the provider labels, provider_first_year, reproducibility back-pointers (labels_spec_v + gold_vintage), and last_evaluated_at.
| Decision | Tuples | Share of SFH | Downstream effect |
|---|---|---|---|
| INCLUDE | 1,006 | 9.6% | Train on these rows; recommend properties here |
| FLAG | 9,445 | 46.4% | Manual review before deciding |
| EXCLUDE | 21,728 | 44.0% | No training, no recommendations |
The decision set covers 1,420 of 1,421 FIPS — only FIPS 48113 (Dallas) is missing, pending an AWS re-pull of its silver layer. The downstream consumers — the labeling universe builder and the delivery-list builder — filter to INCLUDE rows before they run, and the coverage_decision_reason is surfaced on the live app so clients can self-serve the "why aren't you predicting here?" question.
coverage_decision is a single static label computed over the full 1900→2026-03 permit history, which is not walk-forward-safe — a fold standing at an early T0 would inherit an INCLUDE set decided by later permits. The fix (making the decision a function of fold T0) is specified and pending. The match-rate metric is itself pre-v5 (caveat in 2.2). See the next stage in the full pipeline spine and the governing constraints in the model rules.
Rendered from notes/Roofing/steps/03_geographic_coverage.md and PROGRESS_NOTEBOOK.html §Step 2 (coverage decision tree + per-muni inclusion rule). Materialized run 2026-05-21, post DS-audit + 3-layer-audit patches.