Severity methodology v1
Methodology version key: hq_methodology_v1.0_2026
Effective: January 1, 2026
Scope
This document defines how HomeQuotr computes severity statistics on the pricing_aggregates table from the underlying permits corpus. It applies to all city and state-fallback aggregates surfaced through the public website and the B2B REST API.
Source data
Every row in pricing_aggregates is computed from declared permit values filed with municipal building departments. Records with declared_value <= 0 or trade_id IS NULL are excluded. Per-trade valuation caps are applied to suppress data-entry outliers: HVAC $30K, Electrical $25K, Plumbing $20K, Roofing $40K, Foundation $50K, Solar $80K.
Percentiles
For each (metro_id, trade_id, sub_category, aggregate_scope) group, the following percentiles are computed using PostgreSQL PERCENTILE_CONT continuous interpolation:
- p25, median (p50), p75, p90, p95
p90 and p95 are exposed at Growth tier and above. p25 and p75 are exposed at all tiers (subject to differential granularity bucketing for accounts under 30 days old per P6.4).
Confidence interval on the median
For sample sizes N >= 30, the 95 percent confidence interval on the median is computed using a robust IQR-based standard error derived from the central limit theorem:
se_median = (p75 - p25) / sqrt(n)
ci_low = max(0, median - 1.8210 * se_median)
ci_high = median + 1.8210 * se_median
The coefficient 1.8210 corresponds to the asymptotic distribution of the sample median under a normal approximation with the IQR substituted for the population standard deviation. The lower bound is clamped to zero. CIs are valid for N >= 30; sub-category aggregates may use a floor of N = 15 with the borderline flagged in the API response.
Confidence intervals are exposed at Growth tier and above.
Quality score
permit_count_quality_score is an equal-thirds composite expressed as a 0 to 100 integer:
score = (
100 * LEAST(1, LOG10(n) / 3) -- count component
+ 100 * share_of_permits_within_365d -- recency component
+ 100 * share_of_permits_with_source_url -- source diversity component
) / 3
Clamped to the closed interval [0, 100]. The count component saturates at N = 1000 (LOG10(1000)/3 = 1). Quality score is exposed at Starter tier and above.
Severity distribution
severity_distribution is a JSONB array of 20 equal-width bins spanning the p5 to p99 range of the underlying permit values. Each bin object has the shape {bucket: int, min: number, max: number, count: int} matching the legacy distribution_buckets shape so API clients can parse either field identically. Severity distributions are exposed at Sandbox tier and above (with bin edges rounded to $50 banker's rounding for accounts under 30 days old per P6.4).
Stamping
Every row in pricing_aggregates and pricing_trends carries methodology_version = 'hq_methodology_v1.0_2026'. Future methodology revisions ship under a new key, and aggregates recomputed under the new methodology are stamped with that key. API responses include the methodology version so clients can pin to a specific revision.
Implementation reference
The canonical implementation is pipeline/scripts/recompute_aggregates_v2.py (Phase 15 P2 task t0, shipped Session 38 W-12). State-fallback aggregates use the same formulas applied to a state-level rollup of permits when a city has fewer than 30 permits in the trade.
Limitations
- Declared values are self-reported by permit applicants and may understate actual contractor invoices.
- Geographic scope is limited to the 100 metros in the active pipeline as of January 2026.
- Sub-category classification mixes rule-based regex and Claude API classification; ambiguous permits roll up to
sub_category = 'general'. - CI bounds assume independent samples within a city-trade group; spatial or contractor clustering is not currently corrected.
- National averages are never computed or surfaced. Cities with insufficient data fall back to state-level aggregates with
aggregate_scope = 'state_fallback'.