23 pass/fail checks. An audit that fails in any one county fails.
| # | ID | Audit | Pasco12101 | Hernando12053 | Pinellas12103 | Los Angeles06037 | Cook17031 | Harris48201 | Maricopa04013 | San Diego06073 | Boulder08013 | St. Clair17163 | Cumberland23005 | Multnomah41051 | Davidson47037 | ||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | ▸A1 | Permit fan-outexpect ratio = 1.0 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | ||||||||||||||||||||||||||||
Each permit must occupy exactly ONE row. If permit #5 shows twice, you double-count it. ① Count query (must hit the expected value) · paste into Athena SELECT count(*)*1.0 / count(DISTINCT building_permit_id) AS rows_per_permit FROM "8020rei-gold-data-lake".roofing_master_gold_v1 WHERE fips = 12103 ② Inspect the rows (drill-down) · paste into Athena SELECT building_permit_id, count(*) AS n_rows FROM "8020rei-gold-data-lake".roofing_master_gold_v1 WHERE fips = 12103 GROUP BY building_permit_id HAVING count(*) > 1 LIMIT 50 Sample failing rows (Pinellas) No sample rows captured — run the drill-down query above in Athena to inspect. | |||||||||||||||||||||||||||||||||||||||||||
| 2 | ▸A2 | Duplicate permitexpect 0 groups | 293,378 | 95,456 | 824,318 | 2,057,590 | 689,718 | 1,293,576 | 466,627 | 459,055 | 135,218 | 3 | 1,896 | 81,963 | 160,391 | ||||||||||||||||||||||||||||
The same municipal permit number should not appear under two records. If #BCP-123 (Clearwater) shows twice, it is double-counted. ① Count query (must hit the expected value) · paste into Athena SELECT count(*) FROM ( SELECT permit_number, permit_jurisdiction FROM "8020rei-gold-data-lake".roofing_master_gold_v1 WHERE fips = 12103 AND permit_number IS NOT NULL GROUP BY 1,2 HAVING count(*) > 1) ② Inspect the rows (drill-down) · paste into Athena SELECT permit_number, permit_jurisdiction, count(*) AS n_rows,
array_join(array_agg(building_permit_id), ', ') AS building_permit_ids
FROM "8020rei-gold-data-lake".roofing_master_gold_v1
WHERE fips = 12103 AND permit_number IS NOT NULL
GROUP BY 1,2 HAVING count(*) > 1
ORDER BY n_rows DESC LIMIT 20Sample failing rows (Pinellas)
| |||||||||||||||||||||||||||||||||||||||||||
| 3 | ▸B1 | FA_PROPERTYID -> Addressexpect 0 ids | 2.16%5,160 · worst 33 | 0.15%138 · worst 2 | 1.69%5,152 · worst 252 | 1.39%18,833 · worst 82 | 0.32%1,275 · worst 18 | 0.69%6,504 · worst 15 | 0.69%6,466 · worst 228 | 2.50%13,960 · worst 119 | 0.36%328 · worst 7 | 0.00%0 | 1.26%136 · worst 2 | 1.18%2,272 · worst 4 | 0.38%715 · worst 5 | ||||||||||||||||||||||||||||
One property (FA id) is ONE house. If the same id appears at 2+ addresses, it is mis-linked. ① Count query (must hit the expected value) · paste into Athena WITH ex AS ( SELECT pid, coalesce(cast(fips as varchar),'')||'|'||coalesce(cast(zip_code as varchar),'')||'|'||coalesce(regexp_replace(upper(street), '[^A-Z0-9]', ''),'') AS addr FROM "8020rei-gold-data-lake".roofing_master_gold_v1 CROSS JOIN UNNEST(fa_propertyid) AS t(pid) WHERE fips = 12103) SELECT count(*) FROM ( SELECT pid FROM ex GROUP BY pid HAVING count(DISTINCT addr) > 1) ② Inspect the rows (drill-down) · paste into Athena WITH ex AS (
SELECT pid, coalesce(cast(fips as varchar),'')||'|'||coalesce(cast(zip_code as varchar),'')||'|'||coalesce(regexp_replace(upper(street), '[^A-Z0-9]', ''),'') AS addr
FROM "8020rei-gold-data-lake".roofing_master_gold_v1
CROSS JOIN UNNEST(fa_propertyid) AS t(pid)
WHERE fips = 12103)
SELECT pid, count(DISTINCT addr) AS n_addresses,
array_join(array_agg(DISTINCT addr), ' | ') AS addresses
FROM ex GROUP BY pid HAVING count(DISTINCT addr) > 1
ORDER BY n_addresses DESC LIMIT 20Sample failing rows (Pinellas)
| |||||||||||||||||||||||||||||||||||||||||||
| 4 | ▸B2 | Address -> FA_PROPERTYIDexpect 0 addresses | 0.49%1,201 · worst 18 | 2.95%2,648 · worst 40 | 0.64%1,953 · worst 304 | 0.70%9,601 · worst 206 | 4.47%16,516 · worst 269 | 2.94%26,368 · worst 373 | 1.31%12,074 · worst 274 | 1.23%6,331 · worst 25,278 | 1.16%1,024 · worst 79 | 3.96%102 · worst 9 | 1.87%198 · worst 38 | 2.08%3,904 · worst 215 | 1.76%3,221 · worst 70 | ||||||||||||||||||||||||||||
One house must have ONE property id. If an address carries several FA ids, we cannot tell which is real. ① Count query (must hit the expected value) · paste into Athena WITH ex AS ( SELECT pid, coalesce(cast(fips as varchar),'')||'|'||coalesce(cast(zip_code as varchar),'')||'|'||coalesce(regexp_replace(upper(street), '[^A-Z0-9]', ''),'') AS addr FROM "8020rei-gold-data-lake".roofing_master_gold_v1 CROSS JOIN UNNEST(fa_propertyid) AS t(pid) WHERE fips = 12103) SELECT count(*) FROM ( SELECT addr FROM ex GROUP BY addr HAVING count(DISTINCT pid) > 1) ② Inspect the rows (drill-down) · paste into Athena WITH ex AS (
SELECT pid, coalesce(cast(fips as varchar),'')||'|'||coalesce(cast(zip_code as varchar),'')||'|'||coalesce(regexp_replace(upper(street), '[^A-Z0-9]', ''),'') AS addr
FROM "8020rei-gold-data-lake".roofing_master_gold_v1
CROSS JOIN UNNEST(fa_propertyid) AS t(pid)
WHERE fips = 12103)
SELECT addr, count(DISTINCT pid) AS n_fa_ids,
array_join(array_agg(DISTINCT cast(pid AS varchar)), ', ') AS fa_ids
FROM ex GROUP BY addr HAVING count(DISTINCT pid) > 1
ORDER BY n_fa_ids DESC LIMIT 20Sample failing rows (Pinellas)
| |||||||||||||||||||||||||||||||||||||||||||
| 5 | ▸B3 | Address -> PropertyID (gold)expect 0 addresses | 0.21%652 · worst 19 | 0.51%568 · worst 11 | 0.64%2,364 · worst 158 | 1.00%19,467 · worst 217 | 0.09%404 · worst 13 | 0.37%4,491 · worst 79 | 0.20%2,175 · worst 174 | 1.01%6,368 · worst 27 | 0.32%341 · worst 14 | 0.27%9 · worst 3 | 1.00%157 · worst 12 | 0.06%128 · worst 16 | 1.61%3,715 · worst 186 | ||||||||||||||||||||||||||||
Same as B2 but with the Gold internal PropertyID. This is the trustworthy key. ① Count query (must hit the expected value) · paste into Athena SELECT count(*) FROM ( SELECT coalesce(cast(fips as varchar),'')||'|'||coalesce(cast(zip_code as varchar),'')||'|'||coalesce(regexp_replace(upper(street), '[^A-Z0-9]', ''),'') AS addr FROM "8020rei-gold-data-lake".roofing_master_gold_v1 WHERE fips = 12103 GROUP BY coalesce(cast(fips as varchar),'')||'|'||coalesce(cast(zip_code as varchar),'')||'|'||coalesce(regexp_replace(upper(street), '[^A-Z0-9]', ''),'') HAVING count(DISTINCT propertyid) > 1) ② Inspect the rows (drill-down) · paste into Athena SELECT coalesce(cast(fips as varchar),'')||'|'||coalesce(cast(zip_code as varchar),'')||'|'||coalesce(regexp_replace(upper(street), '[^A-Z0-9]', ''),'') AS addr, count(DISTINCT propertyid) AS n_propertyid,
array_join(array_agg(DISTINCT propertyid), ', ') AS propertyids
FROM "8020rei-gold-data-lake".roofing_master_gold_v1
WHERE fips = 12103
GROUP BY coalesce(cast(fips as varchar),'')||'|'||coalesce(cast(zip_code as varchar),'')||'|'||coalesce(regexp_replace(upper(street), '[^A-Z0-9]', ''),'') HAVING count(DISTINCT propertyid) > 1
ORDER BY n_propertyid DESC LIMIT 20Sample failing rows (Pinellas) No sample rows captured — run the drill-down query above in Athena to inspect. | |||||||||||||||||||||||||||||||||||||||||||
| 6 | ▸B4 | PropertyID (gold) -> Addressexpect 0 PropertyID | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | ||||||||||||||||||||||||||||
One internal PropertyID must be one house. ① Count query (must hit the expected value) · paste into Athena SELECT count(*) FROM ( SELECT propertyid FROM "8020rei-gold-data-lake".roofing_master_gold_v1 WHERE fips = 12103 GROUP BY propertyid HAVING count(DISTINCT coalesce(cast(fips as varchar),'')||'|'||coalesce(cast(zip_code as varchar),'')||'|'||coalesce(regexp_replace(upper(street), '[^A-Z0-9]', ''),'')) > 1) ② Inspect the rows (drill-down) · paste into Athena SELECT propertyid, count(DISTINCT coalesce(cast(fips as varchar),'')||'|'||coalesce(cast(zip_code as varchar),'')||'|'||coalesce(regexp_replace(upper(street), '[^A-Z0-9]', ''),'')) AS n_addresses FROM "8020rei-gold-data-lake".roofing_master_gold_v1 WHERE fips = 12103 GROUP BY propertyid HAVING count(DISTINCT coalesce(cast(fips as varchar),'')||'|'||coalesce(cast(zip_code as varchar),'')||'|'||coalesce(regexp_replace(upper(street), '[^A-Z0-9]', ''),'')) > 1 ORDER BY n_addresses DESC LIMIT 20 Sample failing rows (Pinellas) No sample rows captured — run the drill-down query above in Athena to inspect. | |||||||||||||||||||||||||||||||||||||||||||
| 7 | ▸B5 | Address -> APNexpect 0 addresses | 16.08%44,744 · worst 1,183 | n/a | 11.03%30,558 · worst 448 | 4.92%11,426 · worst 1,294 | 4.93%2,555 · worst 105 | 4.02%591 · worst 39 | 5.38%25,672 · worst 1,495 | 12.03%42,447 · worst 7,301 | 13.39%9,836 · worst 167 | n/a | n/a | 1.44%133 · worst 19 | 5.69%9,167 · worst 20,114 | ||||||||||||||||||||||||||||
One house must have ONE parcel number (APN). Several means the catalog or the join is wrong. ① Count query (must hit the expected value) · paste into Athena SELECT count(*) FROM ( SELECT coalesce(cast(fips as varchar),'')||'|'||coalesce(cast(zip_code as varchar),'')||'|'||coalesce(regexp_replace(upper(street), '[^A-Z0-9]', ''),'') AS addr FROM "8020rei-gold-data-lake".roofing_master_gold_v1 WHERE fips = 12103 AND parcel_number IS NOT NULL AND trim(parcel_number) <> '' GROUP BY coalesce(cast(fips as varchar),'')||'|'||coalesce(cast(zip_code as varchar),'')||'|'||coalesce(regexp_replace(upper(street), '[^A-Z0-9]', ''),'') HAVING count(DISTINCT parcel_number) > 1) ② Inspect the rows (drill-down) · paste into Athena SELECT coalesce(cast(fips as varchar),'')||'|'||coalesce(cast(zip_code as varchar),'')||'|'||coalesce(regexp_replace(upper(street), '[^A-Z0-9]', ''),'') AS addr, count(DISTINCT parcel_number) AS n_apn,
array_join(array_agg(DISTINCT parcel_number), ', ') AS apns
FROM "8020rei-gold-data-lake".roofing_master_gold_v1
WHERE fips = 12103 AND parcel_number IS NOT NULL AND trim(parcel_number) <> ''
GROUP BY coalesce(cast(fips as varchar),'')||'|'||coalesce(cast(zip_code as varchar),'')||'|'||coalesce(regexp_replace(upper(street), '[^A-Z0-9]', ''),'') HAVING count(DISTINCT parcel_number) > 1
ORDER BY n_apn DESC LIMIT 20Sample failing rows (Pinellas)
| |||||||||||||||||||||||||||||||||||||||||||
| 8 | ▸C1 | FA_PROPERTYID emptyexpect 0% | 17.43%370,840 | 20.75%184,437 | 13.27%717,056 | 28.08%3,604,321 | 26.17%854,436 | 31.17%3,063,594 | 19.63%828,762 | 26.90%847,731 | 16.38%151,747 | 34.79%2,417 | 30.73%12,610 | 13.09%181,671 | 20.62%359,038 | ||||||||||||||||||||||||||||
The permit ended up linked to no property (empty list []). A letter with no recipient. ① Count query (must hit the expected value) · paste into Athena SELECT count(*) FROM "8020rei-gold-data-lake".roofing_master_gold_v1 WHERE fips = 12103 AND cardinality(fa_propertyid) = 0 ② Inspect the rows (drill-down) · paste into Athena SELECT building_permit_id, permit_number, street, city, zip_code, subtype, issued_date FROM "8020rei-gold-data-lake".roofing_master_gold_v1 WHERE fips = 12103 AND cardinality(fa_propertyid) = 0 AND lower(subtype) LIKE '%roof%' LIMIT 20 Sample failing rows (Pinellas)
| |||||||||||||||||||||||||||||||||||||||||||
| 9 | ▸C2 | ISSUED_DATE nullexpect 0% | 66.51%1,415,295 | 7.57%67,251 | 51.98%2,808,681 | 23.44%3,008,991 | 6.41%209,197 | 15.39%1,512,802 | 24.43%1,031,063 | 25.25%795,605 | 28.82%266,970 | 19.96%1,387 | 7.26%2,981 | 17.35%240,699 | 27.33%475,929 | ||||||||||||||||||||||||||||
Permit with no issue date. Without it we cannot tell if the roof is from 2024 or 2005. ① Count query (must hit the expected value) · paste into Athena SELECT count(*) FROM "8020rei-gold-data-lake".roofing_master_gold_v1 WHERE fips = 12103 AND issued_date IS NULL ② Inspect the rows (drill-down) · paste into Athena SELECT building_permit_id, street, subtype, latest_status, issued_date, partition_year FROM "8020rei-gold-data-lake".roofing_master_gold_v1 WHERE fips = 12103 AND issued_date IS NULL AND lower(subtype) LIKE '%roof%' LIMIT 20 Sample failing rows (Pinellas)
| |||||||||||||||||||||||||||||||||||||||||||
| 10 | ▸C3 | APN missingexpect 0% | 29.94%637,141 | 99.99%888,634 | 46.06%2,488,473 | 91.51%11,744,351 | 83.97%2,741,615 | 99.54%9,783,877 | 70.68%2,983,447 | 60.10%1,893,947 | 67.43%624,550 | 100.00%6,948 | 99.79%40,948 | 98.78%1,370,594 | 66.70%1,161,511 | ||||||||||||||||||||||||||||
Permit with no parcel number (PARCEL_NUMBER). In Hernando 99.99% is missing. ① Count query (must hit the expected value) · paste into Athena SELECT count(*) FROM "8020rei-gold-data-lake".roofing_master_gold_v1 WHERE fips = 12103 AND (parcel_number IS NULL OR trim(parcel_number) = '') ② Inspect the rows (drill-down) · paste into Athena SELECT building_permit_id, street, city, parcel_number, subtype FROM "8020rei-gold-data-lake".roofing_master_gold_v1 WHERE fips = 12103 AND (parcel_number IS NULL OR trim(parcel_number) = '') LIMIT 20 Sample failing rows (Pinellas)
| |||||||||||||||||||||||||||||||||||||||||||
| 11 | ▸C4 | No house numberexpect 0% | 5.39%114,741 | 3.37%29,955 | 0.49%26,296 | 0.54%69,106 | 0.42%13,612 | 0.38%37,737 | 3.03%127,747 | 1.29%40,637 | 0.35%3,242 | 0.43%30 | 1.08%442 | 2.14%29,739 | 3.72%64,723 | ||||||||||||||||||||||||||||
The address does not start with a number ('123 street'), so it cannot be reconstructed. ① Count query (must hit the expected value) · paste into Athena SELECT count(*) FROM "8020rei-gold-data-lake".roofing_master_gold_v1 WHERE fips = 12103 AND NOT regexp_like(coalesce(street,''), '^[0-9]') ② Inspect the rows (drill-down) · paste into Athena SELECT building_permit_id, street, city, zip_code FROM "8020rei-gold-data-lake".roofing_master_gold_v1 WHERE fips = 12103 AND NOT regexp_like(coalesce(street,''), '^[0-9]') AND street <> '' LIMIT 20 Sample failing rows (Pinellas)
| |||||||||||||||||||||||||||||||||||||||||||
| 12 | ▸C5 | ZIP nullexpect 0% | 5.63%119,855 | 8.77%77,967 | 0.99%53,666 | 1.50%192,764 | 0.63%20,522 | 1.31%128,312 | 3.65%153,910 | 1.81%57,169 | 1.85%17,106 | 3.51%244 | 63.80%26,181 | 1.20%16,646 | 3.55%61,805 | ||||||||||||||||||||||||||||
Permit with no ZIP code. Weakens the address match and every geo/demographic join. ① Count query (must hit the expected value) · paste into Athena SELECT count(*) FROM "8020rei-gold-data-lake".roofing_master_gold_v1 WHERE fips = 12103 AND zip_code IS NULL ② Inspect the rows (drill-down) · paste into Athena SELECT building_permit_id, street, city, zip_code, subtype FROM "8020rei-gold-data-lake".roofing_master_gold_v1 WHERE fips = 12103 AND zip_code IS NULL LIMIT 20 Sample failing rows (Pinellas) No sample rows captured — run the drill-down query above in Athena to inspect. | |||||||||||||||||||||||||||||||||||||||||||
| 13 | ▸C6 | No description or subtypeexpect 0% | 9.69%206,245 | 0.01%51 | 16.46%889,178 | 1.46%187,539 | 2.46%80,475 | 20.85%2,049,107 | 11.43%482,546 | 4.41%138,826 | 3.53%32,690 | 0.00%0 | 26.86%11,020 | 0.23%3,259 | 9.83%171,233 | ||||||||||||||||||||||||||||
Both DESCRIPTION and SUBTYPE are empty, so the work cannot be classified — no roofing label is possible for this row. ① Count query (must hit the expected value) · paste into Athena SELECT count(*) FROM "8020rei-gold-data-lake".roofing_master_gold_v1 WHERE fips = 12103 AND (description IS NULL OR trim(description) = '') AND (subtype IS NULL OR trim(subtype) = '') ② Inspect the rows (drill-down) · paste into Athena SELECT building_permit_id, street, type, subtype, description FROM "8020rei-gold-data-lake".roofing_master_gold_v1 WHERE fips = 12103 AND (description IS NULL OR trim(description) = '') AND (subtype IS NULL OR trim(subtype) = '') LIMIT 20 Sample failing rows (Pinellas) No sample rows captured — run the drill-down query above in Athena to inspect. | |||||||||||||||||||||||||||||||||||||||||||
| 14 | ▸D1 | ISSUED > COMPLETEDexpect 0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | ||||||||||||||||||||||||||||
Issue date is AFTER completion date. Impossible: you cannot finish before the permit is issued. ① Count query (must hit the expected value) · paste into Athena SELECT count(*) FROM "8020rei-gold-data-lake".roofing_master_gold_v1 WHERE fips = 12103 AND issued_date > completed_date ② Inspect the rows (drill-down) · paste into Athena SELECT building_permit_id, street, issued_date, completed_date FROM "8020rei-gold-data-lake".roofing_master_gold_v1 WHERE fips = 12103 AND issued_date > completed_date ORDER BY issued_date DESC LIMIT 20 Sample failing rows (Pinellas) No sample rows captured — run the drill-down query above in Athena to inspect. | |||||||||||||||||||||||||||||||||||||||||||
| 15 | ▸D2 | APPLIED > ISSUEDexpect 0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | ||||||||||||||||||||||||||||
You applied AFTER it was issued. Impossible. ① Count query (must hit the expected value) · paste into Athena SELECT count(*) FROM "8020rei-gold-data-lake".roofing_master_gold_v1 WHERE fips = 12103 AND applied_date > issued_date ② Inspect the rows (drill-down) · paste into Athena SELECT building_permit_id, street, applied_date, issued_date FROM "8020rei-gold-data-lake".roofing_master_gold_v1 WHERE fips = 12103 AND applied_date > issued_date ORDER BY applied_date DESC LIMIT 20 Sample failing rows (Pinellas) No sample rows captured — run the drill-down query above in Athena to inspect. | |||||||||||||||||||||||||||||||||||||||||||
| 16 | ▸D3 | Future dateexpect 0 | 0.00%0 | 0.00%7 | 0.00%115 | 0.00%153 | 0.00%22 | 0.00%0 | 0.00%9 | 0.00%0 | 0.00%0 | 0.00%0 | 0.02%7 | 0.00%3 | 0.00%0 | ||||||||||||||||||||||||||||
Some date is in the future (after today). Capture error. ① Count query (must hit the expected value) · paste into Athena SELECT count(*) FROM "8020rei-gold-data-lake".roofing_master_gold_v1 WHERE fips = 12103 AND greatest( coalesce(issued_date, DATE '1900-01-01'), coalesce(completed_date, DATE '1900-01-01'), coalesce(applied_date, DATE '1900-01-01')) > current_date ② Inspect the rows (drill-down) · paste into Athena SELECT building_permit_id, street, issued_date, completed_date, applied_date FROM "8020rei-gold-data-lake".roofing_master_gold_v1 WHERE fips = 12103 AND greatest( coalesce(issued_date, DATE '1900-01-01'), coalesce(completed_date, DATE '1900-01-01'), coalesce(applied_date, DATE '1900-01-01')) > current_date LIMIT 20 Sample failing rows (Pinellas)
| |||||||||||||||||||||||||||||||||||||||||||
| 17 | ▸D4 | Final without issueexpect 0 | 56.13%1,194,386 | 0.40%3,547 | 42.83%2,314,415 | 13.68%1,755,493 | 2.03%66,235 | 1.66%163,277 | 14.20%599,241 | 11.84%373,158 | 21.26%196,962 | 0.00%0 | 4.11%1,688 | 1.39%19,348 | 10.81%188,262 | ||||||||||||||||||||||||||||
Permit says 'final/complete' but has no issue date. Contradiction. ① Count query (must hit the expected value) · paste into Athena SELECT count(*) FROM "8020rei-gold-data-lake".roofing_master_gold_v1
WHERE fips = 12103 AND lower(latest_status) IN ('final','complete','completed')
AND issued_date IS NULL② Inspect the rows (drill-down) · paste into Athena SELECT building_permit_id, street, subtype, latest_status, issued_date
FROM "8020rei-gold-data-lake".roofing_master_gold_v1
WHERE fips = 12103 AND lower(latest_status) IN ('final','complete','completed')
AND issued_date IS NULL AND lower(subtype) LIKE '%roof%'
LIMIT 20Sample failing rows (Pinellas)
| |||||||||||||||||||||||||||||||||||||||||||
| 18 | ▸D5 | Pre-1980 sentinel dateexpect 0 | 0.00%0 | 0.00%12 | 0.00%40 | 0.01%1,466 | 0.00%7 | 0.01%1,289 | 0.00%24 | 0.00%93 | 0.01%54 | 0.00%0 | 0.01%4 | 0.00%0 | 0.00%0 | ||||||||||||||||||||||||||||
ISSUED_DATE before 1980 — almost always a sentinel/placeholder, not a real permit date. ① Count query (must hit the expected value) · paste into Athena SELECT count(*) FROM "8020rei-gold-data-lake".roofing_master_gold_v1 WHERE fips = 12103 AND issued_date < DATE '1980-01-01' ② Inspect the rows (drill-down) · paste into Athena SELECT building_permit_id, street, issued_date, subtype, latest_status FROM "8020rei-gold-data-lake".roofing_master_gold_v1 WHERE fips = 12103 AND issued_date < DATE '1980-01-01' ORDER BY issued_date LIMIT 20 Sample failing rows (Pinellas) No sample rows captured — run the drill-down query above in Athena to inspect. | |||||||||||||||||||||||||||||||||||||||||||
| 19 | ▸E1 | FIPS = COUNTY_FIPSexpect 0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | ||||||||||||||||||||||||||||
The row's county (FIPS) must equal COUNTY_FIPS. ① Count query (must hit the expected value) · paste into Athena SELECT count(*) FROM "8020rei-gold-data-lake".roofing_master_gold_v1 WHERE fips = 12103 AND fips <> county_fips ② Inspect the rows (drill-down) · paste into Athena SELECT building_permit_id, fips, county_fips, street FROM "8020rei-gold-data-lake".roofing_master_gold_v1 WHERE fips = 12103 AND fips <> county_fips LIMIT 20 Sample failing rows (Pinellas) No sample rows captured — run the drill-down query above in Athena to inspect. | |||||||||||||||||||||||||||||||||||||||||||
| 20 | ▸E2 | Partition = FIPSexpect 0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | ||||||||||||||||||||||||||||
The file is partitioned by county; that number must equal FIPS. ① Count query (must hit the expected value) · paste into Athena SELECT count(*) FROM "8020rei-gold-data-lake".roofing_master_gold_v1 WHERE fips = 12103 AND split_part(partition_fips_salting, '_', 1) <> cast(fips AS varchar) ② Inspect the rows (drill-down) · paste into Athena SELECT building_permit_id, fips, partition_fips_salting FROM "8020rei-gold-data-lake".roofing_master_gold_v1 WHERE fips = 12103 AND split_part(partition_fips_salting, '_', 1) <> cast(fips AS varchar) LIMIT 20 Sample failing rows (Pinellas) No sample rows captured — run the drill-down query above in Athena to inspect. | |||||||||||||||||||||||||||||||||||||||||||
| 21 | ▸E3 | ZIP outside its stateexpect 0 | 0.00%70 | 0.00%41 | 0.03%1,415 | 0.19%24,007 | 0.01%278 | 0.00%460 | 0.02%1,052 | 0.02%637 | 0.01%61 | 0.00%0 | 0.64%263 | 0.02%255 | 0.00%66 | ||||||||||||||||||||||||||||
ZIP code falls outside the band for the row's own state (e.g. a 90210 in a Florida county). State-aware — meaningful for every county, not just FL. ① Count query (must hit the expected value) · paste into Athena SELECT count(*) FROM "8020rei-gold-data-lake".roofing_master_gold_v1 WHERE fips = 12103 AND zip_code IS NOT NULL AND NOT ( CASE substr(lpad(cast(fips as varchar),5,'0'),1,2) WHEN '12' THEN zip_code BETWEEN 32000 AND 34999 WHEN '06' THEN zip_code BETWEEN 90000 AND 96199 WHEN '17' THEN zip_code BETWEEN 60000 AND 62999 WHEN '48' THEN zip_code BETWEEN 75000 AND 79999 OR zip_code BETWEEN 73300 AND 73399 OR zip_code BETWEEN 88500 AND 88599 WHEN '04' THEN zip_code BETWEEN 85000 AND 86599 WHEN '08' THEN zip_code BETWEEN 80000 AND 81699 WHEN '23' THEN zip_code BETWEEN 3900 AND 4999 WHEN '41' THEN zip_code BETWEEN 97000 AND 97999 WHEN '47' THEN zip_code BETWEEN 37000 AND 38599 ELSE true END) ② Inspect the rows (drill-down) · paste into Athena SELECT building_permit_id, state, city, zip_code FROM "8020rei-gold-data-lake".roofing_master_gold_v1 WHERE fips = 12103 AND zip_code IS NOT NULL AND NOT ( CASE substr(lpad(cast(fips as varchar),5,'0'),1,2) WHEN '12' THEN zip_code BETWEEN 32000 AND 34999 WHEN '06' THEN zip_code BETWEEN 90000 AND 96199 WHEN '17' THEN zip_code BETWEEN 60000 AND 62999 WHEN '48' THEN zip_code BETWEEN 75000 AND 79999 OR zip_code BETWEEN 73300 AND 73399 OR zip_code BETWEEN 88500 AND 88599 WHEN '04' THEN zip_code BETWEEN 85000 AND 86599 WHEN '08' THEN zip_code BETWEEN 80000 AND 81699 WHEN '23' THEN zip_code BETWEEN 3900 AND 4999 WHEN '41' THEN zip_code BETWEEN 97000 AND 97999 WHEN '47' THEN zip_code BETWEEN 37000 AND 38599 ELSE true END) LIMIT 20 Sample failing rows (Pinellas)
| |||||||||||||||||||||||||||||||||||||||||||
| 22 | ▸E4 | STATE = county stateexpect 0 | 0.00%36 | 0.01%45 | 0.03%1,500 | 0.01%1,254 | 0.01%307 | 0.00%340 | 0.03%1,073 | 0.02%526 | 0.00%41 | 0.00%0 | 0.68%281 | 0.01%146 | 0.00%66 | ||||||||||||||||||||||||||||
The STATE column must match the state implied by the county FIPS. A mismatch is a mislabeled / cross-pasted row. ① Count query (must hit the expected value) · paste into Athena SELECT count(*) FROM "8020rei-gold-data-lake".roofing_master_gold_v1 WHERE fips = 12103 AND state IS NOT NULL AND trim(state) <> '' AND upper(trim(state)) <> ( CASE substr(lpad(cast(fips as varchar),5,'0'),1,2) WHEN '12' THEN 'FL' WHEN '06' THEN 'CA' WHEN '17' THEN 'IL' WHEN '48' THEN 'TX' WHEN '04' THEN 'AZ' WHEN '08' THEN 'CO' WHEN '23' THEN 'ME' WHEN '41' THEN 'OR' WHEN '47' THEN 'TN' ELSE upper(trim(state)) END) ② Inspect the rows (drill-down) · paste into Athena SELECT building_permit_id, state, fips, county_name, city, zip_code FROM "8020rei-gold-data-lake".roofing_master_gold_v1 WHERE fips = 12103 AND state IS NOT NULL AND trim(state) <> '' AND upper(trim(state)) <> ( CASE substr(lpad(cast(fips as varchar),5,'0'),1,2) WHEN '12' THEN 'FL' WHEN '06' THEN 'CA' WHEN '17' THEN 'IL' WHEN '48' THEN 'TX' WHEN '04' THEN 'AZ' WHEN '08' THEN 'CO' WHEN '23' THEN 'ME' WHEN '41' THEN 'OR' WHEN '47' THEN 'TN' ELSE upper(trim(state)) END) LIMIT 20 Sample failing rows (Pinellas) No sample rows captured — run the drill-down query above in Athena to inspect. | |||||||||||||||||||||||||||||||||||||||||||
| 23 | ▸F1 | STREET4JOIN1 ≠ STREET (mis-keyed join)expect 0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | 0.00%0 | ||||||||||||||||||||||||||||
STREET4JOIN1 is the key the ETL uses to attach each permit to a property. It must be THIS row's STREET, normalized (upper-case, no spaces or punctuation). When it instead holds a DIFFERENT address, the permit is joined to the WRONG house. Example: 2672 MONTAGUE CT W, Clearwater has 6 permits — only one carries the correct key (2672MONTAGUECTW); the other five are keyed to unrelated properties (2041 AUSTRALIA WAY, 3343 SAN PEDRO ST, 1175 PINELLAS POINT DR, 1589 W KLOSTERMAN RD, 1412 S BETTY LN). In Pinellas 79% of rows are mis-keyed like this — the permit→property link is silently wrong for most of the county. ① Count query (must hit the expected value) · paste into Athena SELECT count(*) FROM "8020rei-gold-data-lake".roofing_master_gold_v1 WHERE fips = 12103 AND regexp_replace(upper(street), '[^A-Z0-9]', '') <> regexp_replace(upper(street4join1), '[^A-Z0-9]', '') ② Inspect the rows (drill-down) · paste into Athena -- each row's join key (street4join1) should equal norm_street; where it doesn't, it points at another house
SELECT building_permit_id, street AS real_street,
regexp_replace(upper(street), '[^A-Z0-9]', '') AS norm_street,
street4join1 AS join_key, zip_code
FROM "8020rei-gold-data-lake".roofing_master_gold_v1
WHERE fips = 12103
AND regexp_replace(upper(street), '[^A-Z0-9]', '')
<> regexp_replace(upper(street4join1), '[^A-Z0-9]', '')
-- reproduce the example: AND street LIKE '2672 MONTAGUE%'
LIMIT 20Sample failing rows (Pinellas) No sample rows captured — run the drill-down query above in Athena to inspect. | |||||||||||||||||||||||||||||||||||||||||||
↔ scrolls sideways · toggle columns with “Show counties” above · click a row to expand it.
| Column | Type | Description | Example |
|---|---|---|---|
| _hoodie_commit_time | varchar | Hudi: commit timestamp that wrote the row. Internal. | |
| _hoodie_commit_seqno | varchar | Hudi: commit sequence number. Internal. | |
| _hoodie_record_key | varchar | Hudi: unique physical record key (lake PK). | |
| _hoodie_partition_path | varchar | Hudi: partition path. Internal. | |
| _hoodie_file_name | varchar | Hudi: source parquet file. Internal. | |
| BUILDING_PERMIT_ID | varchar | Unique permit id (business key). 1 row = 1 permit. | |
| FIPS | integer | 5-digit county code. 12103=Pinellas, 12101=Pasco, 12053=Hernando. | |
| ID_PREFIX | varchar | First char of the record key. Internal. | |
| STATE | varchar | State (FL). | |
| PropertyID | varchar | Gold internal property id. Stable per physical address -> recommended key. | |
| CONTRACTOR_ID | varchar | Contractor id. | |
| PERMIT_NUMBER | varchar | Permit number assigned by the municipality (verifiable on portal). | |
| DESCRIPTION | varchar | Free text of the work. Source for roofing classification. | |
| TYPE | varchar | Permit category (e.g. 'Building - construction permit'). | |
| SUBTYPE | varchar | Subcategory (e.g. 'Roof', 'HVAC'). | |
| PROJECT_NAME | varchar | Project name (sparsely populated). | |
| HOMEOWNER | varchar | Homeowner name. | |
| BUSINESS_NAME | varchar | Business / contractor name. | |
| JOB_VALUE | double | Declared job value (USD). | |
| FEES | double | Fees paid (USD). | |
| PARCEL_NUMBER | varchar | County APN / parcel number. Very incomplete (missing 45-99%). | |
| SUBDIVISION | varchar | Subdivision. | |
| STORIES | integer | Number of stories. | |
| SQUARE_FEET | integer | Square feet. | |
| UNITS | integer | Units. | |
| INITIAL_STATUS_DATE | date | Initial status date. | |
| INITIAL_STATUS | varchar | Initial status (e.g. 'applied'). | |
| LATEST_STATUS_DATE | date | Latest status date. | |
| LATEST_STATUS | varchar | Latest status ('final','complete','expired'...). | |
| APPLIED_DATE | date | Application date. | |
| ISSUED_DATE | date | Issue date. Key for roof recency; often null (52-72%). | |
| COMPLETED_DATE | date | Completion date. | |
| CANCELLED_DATE | date | Cancellation date. | |
| PERMIT_JURISDICTION | varchar | Jurisdiction (e.g. 'FL_Pinellas_Clearwater'). | |
| PARTITION_YEAR | integer | Partition year (derived from dates). | |
| STREET | varchar | Real street address of the permit. | |
| CITY | varchar | Permit city (use silver to group by property). | |
| ZIP_CODE | integer | ZIP code (integer). Sometimes mistyped. | |
| LATITUDE | double | Geocoded latitude. | |
| LONGITUDE | double | Geocoded longitude. | |
| PLACE_FIPS | varchar | Census place FIPS. | |
| PLACE_NAME | varchar | Place name. | |
| COUNTY_FIPS | integer | County FIPS. Must equal FIPS. | |
| COUNTY_NAME | varchar | County name. | |
| CBSA_FIPS | varchar | Metro area (CBSA) FIPS. | |
| CBSA_NAME | varchar | Metro area name. | |
| DateSnapshot | date | Data snapshot date. | |
| PROJECT_TYPE | array<varchar> | Array of classified types (e.g. ['Roofing','Residential']). | |
| AUX_PERMIT_STATUS | varchar | Auxiliary permit status. | |
| AUX_EFFECTIVE_STATUS_DATE | date | Auxiliary status effective date. | |
| STREET4JOIN1 | varchar | Normalized street = ETL join key permit<->property. | |
| Partition_FIPS_Salting | varchar | FIPS + salt for partition distribution (e.g. '12103_-2'). | |
| FA_PROPERTYID | array<integer> | Array of linked First American property ids. | |
| ts | timestamp | Processing timestamp. |
One row per check, across 13 counties (3 FL deep-audit · 5 big + 5 medium-random national sample, seed 8020, 8 states). Green = pass (hits the expected value exactly), red = fail.
Click any row to read it in plain words and copy the exact Athena count + drill-down queries. Use Show counties to add or remove columns and Queries target to point the queries at a county — the copy-paste queries inside each row rewrite to that county so the result matches its chip.
All checks are state-aware — the geo checks (E3 ZIP-vs-state, E4 STATE-vs-FIPS) apply to every county, not just FL. The bugs are NOT FL-specific (finding 99).