Fit a case-control-type design

Description

The fit verb for matchatr (mirroring causatr::causat() and survatr::surv_fit()). matcha() takes the analysis data, the outcome/exposure roles, a sampling design object, and an estimator, then validates the request and resolves it to an estimation engine. The two arguments are orthogonal: design selects the sampling structure (strata, time, prevalence q0, inclusion weights) and estimator selects the analysis (conditional vs marginal; odds ratio vs hazard ratio vs risk difference).

Usage

matcha(
  data,
  outcome,
  exposure,
  design,
  confounders = NULL,
  estimator = NULL,
  model_fn = NULL,
  effect_modifier = NULL,
  reference = NULL
)

Arguments

data A data.frame or data.table. Not mutated; a data.table copy is stored on the fit.
outcome Character scalar naming the case-status column. For the binary estimators this is a logical, two-level factor, or numeric 0/1 column; for estimator = “polytomous” it is a factor or character column with three or more groups (multiple case subtypes, or several control groups).
exposure Character scalar naming the exposure column.
design A matchatr_design object from one of the design constructors (unmatched_cc(), matched_cc(), nested_cc(), case_cohort(), two_phase(), counter_matched()).
confounders A one-sided formula of confounders (e.g. ~ age + smoke), or NULL for an unadjusted analysis.
estimator Character scalar naming the analysis, or NULL to use the design’s canonical default. Classical choices are design-specific (“logistic” / “mh” for unmatched CC, “clogit” for matched CC / NCC, “cch” for case-cohort); the case-control-weighted causal estimators “ccw_gformula”, “ccw_ipw”, “ccw_aipw”, “ccw_tmle” apply to any design but require a prevalence q0 on the design.
model_fn Optional model-fitting function for the unmatched case-control logistic engine, with a (formula, family, data) interface. Defaults to stats::glm(); pass e.g. mgcv::gam to adjust for a confounder with a smooth term (confounders = ~ s(age)) while keeping the exposure parametric. Ignored by the other engines.
effect_modifier NULL or a character scalar naming a categorical (logical / character / factor) column whose levels modify the exposure effect. When supplied, the conditional logistic engine fits outcome ~ exposure * effect_modifier + confounders + strata(set) and contrast(type = “or”) reports the stratum-specific odds ratio of the exposure within each modifier level (one OR per level, with a Wald interval from the joint partial-likelihood variance). Supported only for estimator = “clogit” with a single-coefficient exposure (binary, continuous, or two-level factor); the modifier may coincide with a matching variable. Defaults to NULL (no effect modification).
reference NULL or a character scalar naming the reference outcome group for estimator = “polytomous”. The multinomial logistic contrasts every other group against this baseline, so each non-reference equation’s exposure coefficient is that subtype’s log odds ratio versus the reference. It must name one of the observed groups; when NULL the first factor level (or the first level in sorted order, for a character outcome) is used. Supplying it for a non-polytomous estimator is an error. Defaults to NULL.

Details

Weights are never read from or written to data. The design’s weight_spec records the intended scheme; the case-control weights (q0-based, Rose & van der Laan) and design / inclusion-probability weights (Samuelsen, Borgan) are kept in distinct slots on the fit (details$cc_weights, details$design_weights) because their variance consequences differ.

The resolved engine is run as part of the fit: an implemented estimator (the unmatched case-control logistic regression) populates the model slot, while an engine with no wired estimator leaves it NULL. details$engine records the engine the (design, estimator) pair resolved to.

Value

A matchatr_fit object: a list with the validated specification (data, outcome, exposure, confounders, design, estimator, engine, effect_modifier), a details list (resolved engine, weighting scheme, reserved variance / weight slots, case and control counts), and the originating call. The model slot holds the fitted estimation object for an implemented engine, or NULL otherwise.

See Also

unmatched_cc(), matched_cc(), nested_cc(), case_cohort()

Examples

library("matchatr")

set.seed(1)
df <- data.frame(
  case  = rep(c(1, 0), each = 50),
  x     = rbinom(100, 1, 0.4),
  age   = rnorm(100, 50, 10),
  smoke = rbinom(100, 1, 0.3),
  set   = rep(1:50, times = 2)
)

# Unmatched case-control -> conditional odds ratio (logistic)
matcha(df, outcome = "case", exposure = "x",
       design = unmatched_cc(), confounders = ~ age + smoke)
<matchatr_fit>
 Design:     Unmatched case-control
 Estimator:  logistic  (engine: glm_logistic)
 Outcome:    case
 Exposure:   x
 Confounders: ~age + smoke
 N:          100  (cases: 50, controls: 50)
# Matched case-control -> conditional logistic regression
matcha(df, outcome = "case", exposure = "x",
       design = matched_cc(strata = "set"), estimator = "clogit")
<matchatr_fit>
 Design:     Matched case-control
 Estimator:  clogit  (engine: clogit)
 Outcome:    case
 Exposure:   x
 Confounders: none
 N:          100  (cases: 50, controls: 50)