Test homogeneity of an exposure’s odds ratios across disease subtypes

Description

Given a fitted polytomous (multinomial) case-control model from matcha() (estimator = “polytomous”), tests whether the exposure acts the same way on every disease subtype — the etiologic-heterogeneity question — and reports the efficient pooled ("common") odds ratio that holds under homogeneity. For each exposure term the null hypothesis is that its log odds ratio is equal across the non-reference outcome groups (H0: beta_1 = beta_2 = … = beta_M).

Usage

test_homogeneity(fit, conf_level = 0.95)

Arguments

fit A matchatr_fit returned by matcha() whose engine is “multinom” (i.e. estimator = “polytomous”).
conf_level Numeric confidence level for the common-OR interval, a single number strictly in (0, 1). Defaults to 0.95.

Details

The test is the Wald test of the canonical etiologic-heterogeneity analysis (Begg & Gray, 1984; as implemented in riskclustr::eh_test_subtype): with the stacked subtype log odds ratios b (length M = number of non-reference groups) and their joint covariance V from the multinomial information matrix, and a full-rank contrast matrix C (M - 1 rows) that differences consecutive subtypes,

W = (C b)’ (C V C’)^-1 (C b) ~ chi-squared with M - 1 degrees of freedom.

The common odds ratio is the minimum-variance (generalized-least-squares / inverse-variance) combination of the subtype log odds ratios — the restricted estimator under the equality constraint, asymptotically equivalent to the constrained maximum-likelihood fit:

b_common = (1’ V^-1 b) / (1’ V^-1 1), Var(b_common) = 1 / (1’ V^-1 1),

exponentiated to the odds-ratio scale with a Wald interval on the log scale (so the interval is asymmetric on the OR scale). Because the constraint is imposed on the already-fitted unconstrained model, no refit is needed and the test handles continuous confounders directly. The pooled estimate is more efficient than any single subtype odds ratio (Begg & Gray, 1984): its standard error is smaller than each pooled term’s.

Each exposure term is tested separately (one "risk factor" per column): a binary or continuous exposure contributes one row, an unordered factor exposure one row per non-reference level. There is no omnibus test across the levels of a multi-level factor exposure — it yields one independent homogeneity test per level, so adjust for multiple comparisons if several levels are screened. The fit must be the polytomous multinomial engine (three or more outcome groups); any other engine — or a fit that produced no model — is rejected.

Value

A matchatr_homogeneity object: a list carrying homogeneity (a data.table with one row per exposure term — the term, the common odds ratio with its Wald bounds, and the homogeneity chi-squared statistic, df, and p.value), subtype (the per-subtype odds ratios it pools), the baseline reference group, the conf_level, the analysis size n, and the estimator / engine labels.

References

Begg CB, Gray R (1984). Calculation of polytomous logistic regression parameters using individualized regressions. Biometrika 71(1), 11-18.

Borgan O, Breslow N, Chatterjee N, Gail MH, Scott A, Wild CJ (2018). Handbook of Statistical Methods for Case-Control Studies, Chapter 5.

See Also

matcha(), contrast(), tidy.matchatr_homogeneity()

Examples

library("matchatr")

set.seed(5)
n <- 4000
x <- rbinom(n, 1, 0.4)
# Subtype A and B share the exposure effect (homogeneity holds).
eta <- cbind(control = 0, A = -1 + log(2) * x, B = -1.4 + log(2) * x)
prob <- exp(eta) / rowSums(exp(eta))
g <- apply(prob, 1, function(p) sample(c("control", "A", "B"), 1, prob = p))
d <- data.frame(g = g, x = x)
fit <- matcha(d, outcome = "g", exposure = "x",
              design = unmatched_cc(), estimator = "polytomous",
              reference = "control")
test_homogeneity(fit)
<matchatr_homogeneity>
 Estimator:  polytomous  (engine: multinom)
 Test:       Homogeneity of subtype odds ratios (Wald)
 Reference:  control
 N:          4000

Common (pooled) odds ratio per exposure term and homogeneity test:
     term common_or        se ci_lower ci_upper statistic    df   p.value
   <char>     <num>     <num>    <num>    <num>     <num> <int>     <num>
1:      x  1.868871 0.1216725 1.644985 2.123228   1.25593     1 0.2624228

Per-subtype odds ratios (pooled):
   comparison       or ci_lower ci_upper
       <char>    <num>    <num>    <num>
1:       A: x 1.951515 1.682447 2.263613
2:       B: x 1.751165 1.475982 2.077653

A small p-value is evidence the exposure odds ratio differs across subtypes.