Sub-document

Severity methodology v1

version 1.0

key hq_methodology_v1.0_2026

Methodology version key: hq_methodology_v1.0_2026

Effective: January 1, 2026

Scope

This document defines how HomeQuotr computes severity statistics on the pricing_aggregates table from the underlying permits corpus. It applies to all city and state-fallback aggregates surfaced through the public website and the B2B REST API.

Source data

Every row in pricing_aggregates is computed from declared permit values filed with municipal building departments. Records with declared_value <= 0 or trade_id IS NULL are excluded. Per-trade valuation caps are applied to suppress data-entry outliers: HVAC $30K, Electrical $25K, Plumbing $20K, Roofing $40K, Foundation $50K, Solar $80K.

Percentiles

For each (metro_id, trade_id, sub_category, aggregate_scope) group, the following percentiles are computed using PostgreSQL PERCENTILE_CONT continuous interpolation:

p25, median (p50), p75, p90, p95

p90 and p95 are exposed at Growth tier and above. p25 and p75 are exposed at all tiers (subject to differential granularity bucketing for accounts under 30 days old per P6.4).

Confidence interval on the median

For sample sizes N >= 30, the 95 percent confidence interval on the median is computed using a robust IQR-based standard error derived from the central limit theorem:

se_median = (p75 - p25) / sqrt(n)
ci_low    = max(0, median - 1.8210 * se_median)
ci_high   = median + 1.8210 * se_median

The coefficient 1.8210 corresponds to the asymptotic distribution of the sample median under a normal approximation with the IQR substituted for the population standard deviation. The lower bound is clamped to zero. CIs are valid for N >= 30; sub-category aggregates may use a floor of N = 15 with the borderline flagged in the API response.

Confidence intervals are exposed at Growth tier and above.

Quality score

permit_count_quality_score is an equal-thirds composite expressed as a 0 to 100 integer:

score = (
  100 * LEAST(1, LOG10(n) / 3)               -- count component
  + 100 * share_of_permits_within_365d        -- recency component
  + 100 * share_of_permits_with_source_url    -- source diversity component
) / 3

Clamped to the closed interval [0, 100]. The count component saturates at N = 1000 (LOG10(1000)/3 = 1). Quality score is exposed at Starter tier and above.

Severity distribution

severity_distribution is a JSONB array of 20 equal-width bins spanning the p5 to p99 range of the underlying permit values. Each bin object has the shape {bucket: int, min: number, max: number, count: int} matching the legacy distribution_buckets shape so API clients can parse either field identically. Severity distributions are exposed at Sandbox tier and above (with bin edges rounded to $50 banker's rounding for accounts under 30 days old per P6.4).

Stamping

Every row in pricing_aggregates and pricing_trends carries methodology_version = 'hq_methodology_v1.0_2026'. Future methodology revisions ship under a new key, and aggregates recomputed under the new methodology are stamped with that key. API responses include the methodology version so clients can pin to a specific revision.

Implementation reference

The canonical implementation is pipeline/scripts/recompute_aggregates_v2.py (Phase 15 P2 task t0, shipped Session 38 W-12). State-fallback aggregates use the same formulas applied to a state-level rollup of permits when a city has fewer than 30 permits in the trade.

Limitations

Declared values are self-reported by permit applicants and may understate actual contractor invoices.
Geographic scope is limited to the 100 metros in the active pipeline as of January 2026.
Sub-category classification mixes rule-based regex and Claude API classification; ambiguous permits roll up to sub_category = 'general'.
CI bounds assume independent samples within a city-trade group; spatial or contractor clustering is not currently corrected.
National averages are never computed or surfaced. Cities with insufficient data fall back to state-level aggregates with aggregate_scope = 'state_fallback'.

Back to methodology