Pool causal estimates across multiply-imputed datasets

Description

Fits a causal model with causat() and computes a causal contrast() on every completed dataset stored in a mice mids object, then pools the per-imputation estimates into a single causatr_result. This is the analysis step of a multiple-imputation (MI) workflow: the user imputes missing covariates and/or treatment upstream with mice::mice(), and causat_mice() propagates the imputation uncertainty into the causal estimate and its standard error.

Multiple imputation is the right tool for missing covariates (L) or missing treatment (A) under a missing-at-random mechanism. Missing outcomes (Y) are handled by inverse-probability-of-censoring weighting (ipcw = TRUE) or complete-case analysis, not by imputing Y; however Y should be a predictor in the upstream imputation model.

Usage

causat_mice(
  imp,
  outcome,
  treatment,
  confounders = NULL,
  interventions = NULL,
  estimator = "gcomp",
  family = "gaussian",
  estimand = "ATE",
  type = "difference",
  ci_method = "sandwich",
  conf_level = 0.95,
  by = NULL,
  pool_method = c("rubin", "boot_mi"),
  B = 200L,
  M = 2L,
  parallel = c("no", "future"),
  seed = NULL,
  ...
)

Arguments

imp A mids object returned by mice::mice().
outcome Character scalar naming the outcome column. Passed to causat().
treatment Character scalar (or vector for multivariate treatment) naming the treatment column(s). Passed to causat().
confounders A one-sided formula of baseline confounders, or NULL when per-component formulas are supplied through . Passed to causat().
interventions A named list of intervention objects (e.g. list(a1 = static(1), a0 = static(0))). Passed to contrast(). Leave NULL for estimator = “snm”, whose estimand is the blip parameter itself and which rejects an interventions argument.
estimator Character causal estimator: “gcomp” (default), “ipw”, “aipw”, “matching”, or “snm”. Passed to causat().
family Outcome family (character or family object). Passed to causat().
estimand Character estimand (“ATE”, “ATT”, “ATC”). Passed to causat().
type Character contrast scale: “difference” (default), “ratio”, or “or”. Passed to contrast().
ci_method Character within-imputation variance method, “sandwich” (default) or “bootstrap”, used for each per-imputation contrast() call. The pooled variance is governed by pool_method, not this argument.
conf_level Numeric confidence level for the pooled intervals. Default 0.95.
by Optional one-sided formula or character naming a baseline stratifier. Pooling is applied per by-stratum row independently. Passed to contrast().
pool_method Character pooling strategy. “rubin” (default) applies Rubin’s rules to the per-imputation sandwich variances. “boot_mi” uses von Hippel’s bootstrap-then-impute two-stage variance, valid under uncongeniality. See Details.
B Integer number of bootstrap resamples for pool_method = “boot_mi”. Default 200. Ignored for “rubin”.
M Integer number of imputations per bootstrap resample for pool_method = “boot_mi”. Default 2 (von Hippel’s efficient variant). Ignored for “rubin”.
parallel Character parallel backend forwarded to the Boot MI engine: “no” (default) or “future” (uses future.apply::future_lapply()).
seed Optional integer seed. For pool_method = “boot_mi” it seeds the bootstrap-and-impute loop reproducibly.
Additional arguments forwarded to causat() (e.g. id, time, confounders_tv, censoring, ipcw, confounders_outcome, propensity_model_fn, model_fn).

Details

Rubin’s rules (pool_method = “rubin”)

Let \(\hat{Q}_i\) and \(U_i\) be the estimate and variance from imputation \(i\). The pooled estimate is {Q} = m^{-1}_i _i and the total variance is \(T = \bar{U} + (1 + 1/m) B\) with within variance \(\bar{U} = m^{-1}\sum_i U_i\) and between variance \(B = (m-1)^{-1}\sum_i (\hat{Q}_i - \bar{Q})^2\). Confidence intervals use Barnard-Rubin degrees of freedom.

Congeniality

Causal estimands are typically uncongenial with the mice imputation model (the estimand is a functional of the outcome/treatment model under intervention, not a parameter of the imputation model). Under uncongeniality Rubin’s variance can be biased in either direction depending on the situation – conservative for some kinds of uncongeniality (Meng 1994), but anticonservative in others (Bartlett & Hughes 2020). pool_method = “boot_mi” sidesteps Rubin’s variance decomposition with a resampling variance that attains nominal coverage provided the point estimator stays consistent: a bootstrap corrects the variance, not bias in the estimate itself. Always include the outcome, treatment, all confounders, and any effect modifiers as predictors in the upstream mice() call – omitting a key predictor (e.g. the outcome) misspecifies the imputation, which can bias the causal estimate and so defeat both pooling rules. causat_mice() warns when an analysis variable is absent or unused.

What this does not do

It does not perform the imputation (call mice::mice() first), impute the outcome, handle MNAR mechanisms, or pool omnibus tests across contrasts.

Value

A causatr_result with pooled estimates, standard errors, and confidence intervals. ci_method is “rubin” or “boot_mi”. The per-row pooling diagnostics are attached as the “mi_details” attribute.

References

Rubin DB (1987). Multiple Imputation for Nonresponse in Surveys. Wiley.

van Buuren S, Groothuis-Oudshoorn K (2011). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software 45(3):1-67.

von Hippel PT (2020). How many imputations do you need? Sociological Methods & Research 49(3):699-718.

Meng XL (1994). Multiple-imputation inferences with uncongenial sources of input. Statistical Science 9(4):538-558.

Bartlett JW, Hughes RA (2020). Bootstrap inference for multiple imputation under uncongeniality and misspecification. Statistical Methods in Medical Research 29(12):3533-3546.

See Also

causat(), contrast()

Examples

library("causatr")

if (requireNamespace("mice", quietly = TRUE)) {
  set.seed(1)
  n <- 400
  L <- rnorm(n)
  A <- rbinom(n, 1, plogis(0.5 * L))
  Y <- 2 + 3 * A + 1.5 * L + rnorm(n)
  # L missing-at-random on the (observed) treatment.
  L[rbinom(n, 1, plogis(-1 + 0.8 * A)) == 0] <- NA
  dat <- data.frame(Y = Y, A = A, L = L)

  imp <- mice::mice(dat, m = 5, printFlag = FALSE)
  res <- causat_mice(
    imp,
    outcome = "Y",
    treatment = "A",
    confounders = ~L,
    interventions = list(a1 = static(1), a0 = static(0)),
    estimator = "gcomp"
  )
  summary(res)
}
<causatr_result>
 Estimator: G-computation
 Estimand:  ATE
 Contrast:  Difference
 CI method: rubin
 N:         400

Intervention means:
   intervention estimate    se ci_lower ci_upper
         <char>    <num> <num>    <num>    <num>
1:           a1     5.01 0.129     4.75     5.27
2:           a0     2.01 0.124     1.77     2.26

Contrasts:
   comparison estimate    se ci_lower ci_upper
       <char>    <num> <num>    <num>    <num>
1:   a0 vs a1       -3 0.171    -3.37    -2.62

Intervention details:
  a1: static, value = 1
  a0: static, value = 0

Variance-covariance matrix of marginal means:
          [,1]      [,2]
[1,] 0.0166584 0.0000000
[2,] 0.0000000 0.0153196