Nested case-control hazard ratios

A nested case-control (NCC) study samples controls from inside a cohort by risk-set (incidence-density) sampling: at each case’s failure time, a few controls are drawn at random from the subjects still at risk. The design-faithful analysis is the conditional partial likelihood with each sampled risk set as a stratum — exactly the conditional logistic regression matchatr already uses for matched case-control data, because a matched set and a sampled risk set are the same stratum construction.

The estimand, however, is different. Under proper risk-set sampling the conditional estimate is the hazard ratio, exactly — there is no rare-disease approximation involved (Prentice & Breslow, 1978). So matchatr reports it on the hazard-ratio scale (type = "hr"), which is the default for nested_cc().

Code
library(matchatr)

A cohort and a nested sample

We simulate a cohort with a constant baseline hazard and a known Cox log hazard ratio for a binary exposure x.

Code
set.seed(51)
n <- 3000
x <- rbinom(n, 1, 0.4)
beta <- log(2.2)                       # the true log hazard ratio for x
rate <- 0.08 * exp(beta * x)           # PH hazard: baseline times exp(lp)
time <- rexp(n, rate)
tau  <- 4                              # administrative censoring
cohort <- data.frame(
  id = seq_len(n), t = pmin(time, tau), d = as.integer(time <= tau), x = x
)

sample_ncc() draws the risk-set (incidence-density) sample: for each case it takes m controls at random from the subjects still at risk at that failure time. The result is analysis-ready — it carries a per-set case indicator (distinct from the cohort-wide event, since a sampled control may itself fail later), a matched-set id set, and the set’s risk_time.

Code
set.seed(1)
ncc <- sample_ncc(cohort, time = "t", event = "d", m = 3)
head(ncc)
#>       id           t     d     x   set  case   risk_time
#>    <int>       <num> <int> <int> <int> <int>       <num>
#> 1:    33 0.002633132     1     1     1     1 0.002633132
#> 2:  1018 4.000000000     0     0     1     0 0.002633132
#> 3:   680 4.000000000     0     0     1     0 0.002633132
#> 4:  2178 2.139775944     1     0     1     0 0.002633132
#> 5:   726 0.007580645     1     1     2     1 0.007580645
#> 6:   932 4.000000000     0     0     2     0 0.007580645

Two further sampling options match real designs: match = ~ sex + birth_cohort confines each case’s controls to its own population stratum, and entry = supplies a delayed-entry (left-truncation) column so a subject counts as at risk only after it enters follow-up. A case left with no eligible control aborts with an informative matchatr_empty_risk_set error rather than a degenerate set.

The risk-set hazard ratio

matcha() with design = nested_cc() fits the conditional partial likelihood; contrast() reports the hazard ratio with the partial-likelihood information-matrix Wald interval.

Code
fit <- matcha(
  ncc,
  outcome = "case", exposure = "x",
  design = nested_cc(strata = "set", time = "risk_time"),
  estimator = "clogit"
)
fit
#> <matchatr_fit>
#>  Design:     Nested case-control
#>  Estimator:  clogit  (engine: clogit)
#>  Outcome:    case
#>  Exposure:   x
#>  Confounders: none
#>  N:          4500  (cases: 1125, controls: 3375)

contrast(fit)        # type = "hr" is the default for a nested design
#> <matchatr_result>
#>  Estimator:  clogit  (engine: clogit)
#>  Estimand:   hazard ratio
#>  Contrast:   Hazard ratio
#>  CI method:  model
#>  N:          4500
#> 
#> Contrasts:
#>    comparison estimate        se ci_lower ci_upper
#>        <char>    <num>     <num>    <num>    <num>
#> 1:          x 2.352281 0.1666529 2.047311  2.70268

The estimate recovers the true log hazard ratio log(2.2) ≈ 0.79. The design’s time column records how the controls were sampled; the risk-set membership is read from strata, so the conditional likelihood does not enter time here.

OR = HR exactly

The whole point of risk-set sampling is that the sampled odds ratio is the hazard ratio, with no rare-disease assumption. We can see this directly: the NCC estimate matches the Cox hazard ratio fit on the full cohort (within sampling error, since the NCC sample is a subset).

Code
full_cohort <- survival::coxph(survival::Surv(t, d) ~ x, data = cohort)
c(ncc_HR    = exp(coef(fit$model)[["x"]]),
  cohort_HR = exp(coef(full_cohort)[["x"]]))
#>    ncc_HR cohort_HR 
#>  2.352281  2.338995

One scale per design

Each conditional design identifies exactly one scale: the matched design reports the odds ratio, the nested design the hazard ratio. Asking for the off-design scale is declined — the value would be the same number, but it is not the estimand the design targets:

Code
contrast(fit, type = "or")
#> Error in `contrast()`:
#> ! A nested case-control design is reported on the hazard-ratio scale.
#> ℹ Risk-set (incidence-density) sampling identifies the hazard ratio (OR = HR exactly; Prentice & Breslow 1978). Use `type = "hr"` (the default).

As with the other case-control designs, a marginal risk difference / risk ratio needs a source-population prevalence q0 and a case-control-weighted estimator, and the conditional fit reports the information-matrix interval only, so ci_method = "sandwich" / "bootstrap" are declined.

Covered combinations

Legend. ✅ truth-pinned in tests · ⛔ rejected with an informative error.

Sampling Estimator Estimand Status
sample_ncc() risk-set draw (+ match / entry) NCC sample from a cohort
m:1 risk-set clogit conditional hazard ratio
m:1 risk-set + confounder clogit conditional hazard ratio
risk-set + effect_modifier clogit stratum-specific hazard ratio
nested clogit odds ratio ⛔ use type = "hr"
nested clogit RD / RR ⛔ need q0
nested, non-binary outcome

See FEATURE_COVERAGE_MATRIX.md for the authoritative status of every combination.

References

Prentice RL, Breslow NE (1978). Retrospective studies and failure time models. Biometrika 65(1):153-158.

Goldstein L, Langholz B (1992). Asymptotic theory for nested case-control sampling in the Cox regression model. Annals of Statistics 20(4):1903-1928.

Thomas DC (1977). Addendum to “Methods of cohort analysis: appraisal by application to asbestos mining” by Liddell FDK, McDonald JC, Thomas DC. Journal of the Royal Statistical Society A 140(4):469-491.