METHODOLOGY

Open methodology. Reproducible math.

Every reported number on Findrix carries a method ID and confidence bounds. CFOs verify it. Statisticians break it. We ship the formulas, the assumptions, and the limits.

LAYER 1 · CITATION SHARE

Wilson confidence intervals on every count.

When we say "38% citation share", we report the Wilson CI bounds alongside it. With small samples, the lower bound widens — and we surface that uncertainty rather than hide it.

Mentions (cited prompts)

Sample size (total prompts)

100

Citation share (point estimate)

38.0%

Wilson 95% CI

[29.1% — 47.8%]

method_id: wilson-v1.2

Wilson Score Interval (95% CI):

p̂ + z²/(2n) ± z·√(p̂(1-p̂)/n + z²/(4n²))
─────────────────────────────────────────
              1 + z²/n

where p̂ = mentions / sample, n = sample size, z = 1.96 for 95% CI.

method_id: wilson-v1.2

LAYER 2 · LIFT ATTRIBUTION

Difference-in-differences for citation lift.

Did the schema deploy actually move the citation rate? We compare your treatment cohort against a matched control cohort — the difference of differences gives causal lift. Placebo tests on pre-deploy windows guard against regression-to-the-mean illusion.

Difference-in-differences (DiD):

  ATE = (Y_treatment_post - Y_treatment_pre) - (Y_control_post - Y_control_pre)

Standard error from cluster-robust regression at the (LLM × prompt × week) level.
Placebo test: re-run on (week_-4, week_-2) windows; if "effect" appears, abort.

method_id: did-v1.4 (with placebo guard did-placebo-v1.0)

LAYER 3 · SHARE OF VOICE

BCa Bootstrap for share-of-voice.

Share of voice is bounded between 0 and 1 — Wilson doesn't fit. We use Bias-Corrected accelerated Bootstrap (10,000 resamples) for asymmetric CI. When the LLM mix is heavy on one source (e.g. only Reddit cited you), the CI tells you.

BCa Bootstrap:

For B=10,000 resamples of (prompt, LLM, week) tuples:
  1. Compute SoV for each resample
  2. Bias-correction z₀ = Φ⁻¹(P(SoV* < SoV̂))
  3. Acceleration â = jackknife formula
  4. Adjusted percentiles α₁, α₂

CI_BCa = (SoV*_α₁, SoV*_α₂)

method_id: bca-v1.1

LAYER 4 · MULTIPLE COMPARISONS

FDR correction for prompt-by-prompt tests.

When you test 240 prompts × 4 LLMs simultaneously, naive p-values find "significant" lifts that are noise. Findrix applies Benjamini-Hochberg FDR correction at q=0.05 — every flagged prompt has been adjusted for the multiple-testing burden.

Benjamini-Hochberg FDR:

For sorted p-values p_(1) ≤ p_(2) ≤ ... ≤ p_(m):
  Reject H_0(i) for all i ≤ k
  where k = max{i : p_(i) ≤ (i/m) · q}

q = 0.05 (5% expected false discovery rate)
method_id: bh-fdr-v1.0

OPEN SOURCE

Verify the math yourself.

All four formulas live in our open-source statistics library. Pull it, run our test fixtures, reproduce any number we report.

# Findrix open-source stats library
$ git clone https://github.com/findrix/findrix-stats
$ cd findrix-stats && pip install -e .

# Reproduce the Wilson CI on your fixture data
>>> from findrix_stats import wilson_ci
>>> wilson_ci(mentions=38, n=100)
(0.291, 0.474)  # method_id: wilson-v1.2

Repository: github.com/findrix/findrix-stats (MIT license · Phase 4)

LIMITS · WHAT THIS METHODOLOGY DOES NOT DO

Honest disclosures.

LLMs are stochastic. Two identical queries 10 minutes apart can yield different outputs. We sample n=10 calls per (prompt, LLM, window) and report the mean with CI — but raw point estimates are not deterministic.
Citation share is a sampled estimate, not a census. The full population of LLM queries is unobservable. We use stratified random prompts within your category.
LLM model versions change. When OpenAI ships GPT-5.5, our citation rate baseline shifts. We re-baseline within 14 days of any major model release.
DiD assumes the control cohort follows the same trend as treatment in the absence of intervention. We validate this with pre-deploy parallel-trend tests; failures block the report.

REPRODUCIBLE BY DESIGN

Run an audit. Get the methodology IDs.

Run free audit →Read research