G-computation with causatr

G-computation (the parametric g-formula) estimates causal effects by fitting a model for the outcome conditional on treatment and confounders, then standardising predictions over the target population under each hypothetical intervention. This vignette demonstrates g-computation in causatr using the NHEFS dataset from Hernán & Robins (2025), covering every supported combination of treatment type, outcome type, contrast scale, and inference method for time-fixed treatments.

Data: NHEFS

The examples in this vignette use the NHEFS (National Health and Nutrition Examination Survey — Epidemiologic Follow-up Study) dataset, the running example in Hernán & Robins (2025). The assumed causal structure is:

  • Treatment (\(A\)): qsmk — quit smoking between 1971 and 1982 (binary: 0/1).
  • Outcome (\(Y\)): wt82_71 — weight change in kg over the same period (continuous). A derived binary version, gained_weight (\(\mathbf{1}\{Y > 0\}\)), is used in the binary outcome sections.
  • Confounders (\(L\)): sex, age, race, education, smoking intensity (cigarettes/day), years smoked, exercise, physical activity, and baseline weight. These jointly affect both the probability of quitting and the subsequent weight change.
  • Censoring: the censored column flags individuals lost to follow-up. Censored rows are excluded from model fitting; predictions are standardised over the full target population.

Structural causal model (assumed):

\[ \begin{aligned} L &\sim F_L \\ A &\sim \text{Bernoulli}\!\bigl(\text{logit}^{-1}(\alpha_0 + \alpha_L^\top L)\bigr) \\ Y &= \beta_0 + \beta_A A + \beta_L^\top L + \beta_{AL}^\top (A \cdot L) + \varepsilon, \quad \varepsilon \sim N(0, \sigma^2) \end{aligned} \]

The confounders \(L\) are common causes of \(A\) and \(Y\); adjusting for \(L\) blocks all backdoor paths, identifying \(\mathbb{E}[Y^a]\) via the g-formula.

Code
library(causatr)
library(tinytable)
library(tinyplot)

data("nhefs")

nhefs_complete <- nhefs[!is.na(nhefs$wt82_71), ]

nhefs_complete$gained_weight <- as.integer(nhefs_complete$wt82_71 > 0)

nhefs$sex <- factor(nhefs$sex, levels = 0:1, labels = c("Male", "Female"))
nhefs_complete$sex <- factor(
  nhefs_complete$sex,
  levels = 0:1,
  labels = c("Male", "Female")
)

We create a binary outcome gained_weight (1 if weight increased, 0 otherwise) for the binary outcome examples below.

Binary treatment, continuous outcome

This is the core example from Chapter 13 of Hernán & Robins: the average causal effect of quitting smoking (qsmk) on weight change (wt82_71). By default, confounders specifies the same covariate set for all model components. Use confounders_outcome to specify a different set for the outcome model when needed (e.g. for AIPW double robustness).

ATE with sandwich SE

Code
fit_gc <- causat(
  nhefs,
  outcome = "wt82_71",
  treatment = "qsmk",
  confounders = ~ sex + age + I(age^2) + race + factor(education) +
    smokeintensity + I(smokeintensity^2) + smokeyrs + I(smokeyrs^2) +
    factor(exercise) + factor(active) + wt71 + I(wt71^2) +
    qsmk:smokeintensity,
  censoring = "censored",
  model_fn = stats::glm
)

res_ate_sw <- contrast(
  fit_gc,
  interventions = list(quit = static(1), continue = static(0)),
  reference = "continue",
  type = "difference",
  ci_method = "sandwich"
)
#> 117 row(s) with NA predictions excluded from the target population.
res_ate_sw

Estimator: gcomp  ·  Estimand: ATE  ·  Contrast: difference  ·  CI method: sandwich  ·  N: 1629

Intervention means
intervention estimate se ci_lower ci_upper
quit 5.176 0.437 4.32 6.032
continue 1.66 0.219 1.23 2.09
Contrasts
comparison estimate se ci_lower ci_upper
quit vs continue 3.516 0.479 2.577 4.455

The book reports ATE \(\approx\) 3.5 kg (95% CI: 2.6, 4.5).

ATE with bootstrap SE

Code
res_ate_bs <- contrast(
  fit_gc,
  interventions = list(quit = static(1), continue = static(0)),
  reference = "continue",
  type = "difference",
  ci_method = "bootstrap",
  n_boot = 50L
)
#> 117 row(s) with NA predictions excluded from the target population.
res_ate_bs

Estimator: gcomp  ·  Estimand: ATE  ·  Contrast: difference  ·  CI method: bootstrap  ·  N: 1629

Intervention means
intervention estimate se ci_lower ci_upper
quit 5.176 0.466 4.263 6.089
continue 1.66 0.222 1.226 2.094
Contrasts
comparison estimate se ci_lower ci_upper
quit vs continue 3.516 0.473 2.588 4.443

Sandwich and bootstrap SEs should be in close agreement for correctly specified GLMs.

ATT estimand

The average treatment effect on the treated (ATT) averages only over individuals who actually quit smoking. With g-computation, the estimand can be changed in contrast() without refitting.

Code
res_att <- contrast(
  fit_gc,
  interventions = list(quit = static(1), continue = static(0)),
  reference = "continue",
  estimand = "ATT",
  ci_method = "sandwich"
)
#> 36 row(s) with NA predictions excluded from the target population.
res_att

Estimator: gcomp  ·  Estimand: ATT  ·  Contrast: difference  ·  CI method: sandwich  ·  N: 428

Intervention means
intervention estimate se ci_lower ci_upper
quit 4.366 0.438 3.508 5.225
continue 0.931 0.286 0.369 1.492
Contrasts
comparison estimate se ci_lower ci_upper
quit vs continue 3.436 0.464 2.525 4.346

ATC estimand

The average treatment effect on the controls (ATC) averages over those who continued smoking.

Code
res_atc <- contrast(
  fit_gc,
  interventions = list(quit = static(1), continue = static(0)),
  reference = "continue",
  estimand = "ATC",
  ci_method = "sandwich"
)
#> 81 row(s) with NA predictions excluded from the target population.
res_atc

Estimator: gcomp  ·  Estimand: ATC  ·  Contrast: difference  ·  CI method: sandwich  ·  N: 1201

Intervention means
intervention estimate se ci_lower ci_upper
quit 5.465 0.452 4.579 6.35
continue 1.92 0.22 1.49 2.351
Contrasts
comparison estimate se ci_lower ci_upper
quit vs continue 3.544 0.487 2.59 4.499

Subset estimand

A custom subgroup: effect among individuals aged 50 or older.

Code
res_sub <- contrast(
  fit_gc,
  interventions = list(quit = static(1), continue = static(0)),
  reference = "continue",
  subset = quote(age >= 50),
  ci_method = "sandwich"
)
#> 20 row(s) with NA predictions excluded from the target population.
res_sub

Estimator: gcomp  ·  Estimand: subset  ·  Contrast: difference  ·  CI method: sandwich  ·  N: 565

Intervention means
intervention estimate se ci_lower ci_upper
quit 2.596 0.476 1.662 3.53
continue -0.874 0.329 -1.518 -0.23
Contrasts
comparison estimate se ci_lower ci_upper
quit vs continue 3.47 0.47 2.549 4.39

Extracting results programmatically

Code
coef(res_ate_sw)
#>     quit continue 
#> 5.175939 1.660268
confint(res_ate_sw)
#>             lower    upper
#> quit     4.319957 6.031921
#> continue 1.230070 2.090465

Comparing estimands

Code
est_df <- data.frame(
  estimand = c("ATE", "ATT", "ATC", "Subset (age >= 50)"),
  estimate = c(
    res_ate_sw$contrasts$estimate[1],
    res_att$contrasts$estimate[1],
    res_atc$contrasts$estimate[1],
    res_sub$contrasts$estimate[1]
  ),
  ci_lower = c(
    res_ate_sw$contrasts$ci_lower[1],
    res_att$contrasts$ci_lower[1],
    res_atc$contrasts$ci_lower[1],
    res_sub$contrasts$ci_lower[1]
  ),
  ci_upper = c(
    res_ate_sw$contrasts$ci_upper[1],
    res_att$contrasts$ci_upper[1],
    res_atc$contrasts$ci_upper[1],
    res_sub$contrasts$ci_upper[1]
  )
)

tinyplot(
  estimate ~ estimand,
  data = est_df,
  type = "pointrange",
  ymin = est_df$ci_lower,
  ymax = est_df$ci_upper,
  xlab = "Estimand",
  ylab = "Effect on weight change (kg)",
  main = "G-computation estimates by estimand"
)
abline(h = 0, lty = 2, col = "grey40")

Point estimates and confidence intervals for ATE, ATT, ATC, and subset estimands from g-computation.

Binary treatment, binary outcome

Using the gained_weight indicator as a binary outcome, we estimate the risk difference, risk ratio, and odds ratio of quitting smoking on gaining weight.

Risk difference (sandwich)

Code
fit_bin <- causat(
  nhefs_complete,
  outcome = "gained_weight",
  treatment = "qsmk",
  confounders = ~ sex + age + I(age^2) + race + factor(education) +
    smokeintensity + I(smokeintensity^2) + smokeyrs + I(smokeyrs^2) +
    factor(exercise) + factor(active) + wt71 + I(wt71^2) +
    qsmk:smokeintensity,
  family = "binomial",
  model_fn = stats::glm
)

res_rd <- contrast(
  fit_bin,
  interventions = list(quit = static(1), continue = static(0)),
  reference = "continue",
  type = "difference",
  ci_method = "sandwich"
)
#> 113 row(s) with NA predictions excluded from the target population.
res_rd

Estimator: gcomp  ·  Estimand: ATE  ·  Contrast: difference  ·  CI method: sandwich  ·  N: 1566

Intervention means
intervention estimate se ci_lower ci_upper
quit 0.77 0.02 0.73 0.809
continue 0.638 0.014 0.611 0.666
Contrasts
comparison estimate se ci_lower ci_upper
quit vs continue 0.131 0.024 0.084 0.178

Risk difference (bootstrap)

Code
res_rd_bs <- contrast(
  fit_bin,
  interventions = list(quit = static(1), continue = static(0)),
  reference = "continue",
  type = "difference",
  ci_method = "bootstrap",
  n_boot = 50L
)
#> 113 row(s) with NA predictions excluded from the target population.
res_rd_bs

Estimator: gcomp  ·  Estimand: ATE  ·  Contrast: difference  ·  CI method: bootstrap  ·  N: 1566

Intervention means
intervention estimate se ci_lower ci_upper
quit 0.77 0.02 0.731 0.809
continue 0.638 0.013 0.613 0.663
Contrasts
comparison estimate se ci_lower ci_upper
quit vs continue 0.131 0.024 0.083 0.179

Risk ratio

Code
res_rr <- contrast(
  fit_bin,
  interventions = list(quit = static(1), continue = static(0)),
  reference = "continue",
  type = "ratio",
  ci_method = "sandwich"
)
#> 113 row(s) with NA predictions excluded from the target population.
res_rr

Estimator: gcomp  ·  Estimand: ATE  ·  Contrast: ratio  ·  CI method: sandwich  ·  N: 1566

Intervention means
intervention estimate se ci_lower ci_upper
quit 0.77 0.02 0.73 0.809
continue 0.638 0.014 0.611 0.666
Contrasts
comparison estimate se ci_lower ci_upper
quit vs continue 1.206 0.04 1.13 1.287

Odds ratio

Code
res_or <- contrast(
  fit_bin,
  interventions = list(quit = static(1), continue = static(0)),
  reference = "continue",
  type = "or",
  ci_method = "sandwich"
)
#> 113 row(s) with NA predictions excluded from the target population.
res_or

Estimator: gcomp  ·  Estimand: ATE  ·  Contrast: or  ·  CI method: sandwich  ·  N: 1566

Intervention means
intervention estimate se ci_lower ci_upper
quit 0.77 0.02 0.73 0.809
continue 0.638 0.014 0.611 0.666
Contrasts
comparison estimate se ci_lower ci_upper
quit vs continue 1.893 0.238 1.479 2.422

Continuous treatment

G-computation supports continuous treatments with all intervention types: static() (set to a fixed value), shift(), scale_by(), threshold(), dynamic(), and NULL (natural course). Here we use smokeintensity (cigarettes per day) as the treatment and wt82_71 as the outcome.

Code
fit_cont <- causat(
  nhefs,
  outcome = "wt82_71",
  treatment = "smokeintensity",
  confounders = ~ sex + age + I(age^2) + race + factor(education) +
    smokeyrs + I(smokeyrs^2) + factor(exercise) + factor(active) +
    wt71 + I(wt71^2),
  censoring = "censored",
  model_fn = stats::glm
)

Static intervention

Set smoking intensity to fixed values and compare. This answers the question: what would happen if everyone smoked 5 vs 40 cigarettes per day?

Code
res_static <- contrast(
  fit_cont,
  interventions = list(light = static(5), heavy = static(40)),
  reference = "light",
  type = "difference",
  ci_method = "sandwich"
)
#> 117 row(s) with NA predictions excluded from the target population.
res_static

Estimator: gcomp  ·  Estimand: ATE  ·  Contrast: difference  ·  CI method: sandwich  ·  N: 1629

Intervention means
intervention estimate se ci_lower ci_upper
light 2.62 0.33 1.972 3.267
heavy 2.458 0.421 1.634 3.283
Contrasts
comparison estimate se ci_lower ci_upper
heavy vs light -0.162 0.632 -1.401 1.078

Shift intervention

Reduce smoking intensity by 10 cigarettes per day for everyone.

Code
res_shift <- contrast(
  fit_cont,
  interventions = list(reduce10 = shift(-10), observed = NULL),
  reference = "observed",
  type = "difference",
  ci_method = "sandwich"
)
#> 117 row(s) with NA predictions excluded from the target population.
res_shift

Estimator: gcomp  ·  Estimand: ATE  ·  Contrast: difference  ·  CI method: sandwich  ·  N: 1629

Intervention means
intervention estimate se ci_lower ci_upper
reduce10 2.594 0.258 2.089 3.099
observed 2.548 0.201 2.154 2.942
Contrasts
comparison estimate se ci_lower ci_upper
reduce10 vs observed 0.046 0.181 -0.308 0.4

Scale intervention

Halve each individual’s smoking intensity.

Code
res_scale <- contrast(
  fit_cont,
  interventions = list(halved = scale_by(0.5), observed = NULL),
  reference = "observed",
  type = "difference",
  ci_method = "sandwich"
)
#> 117 row(s) with NA predictions excluded from the target population.
res_scale

Estimator: gcomp  ·  Estimand: ATE  ·  Contrast: difference  ·  CI method: sandwich  ·  N: 1629

Intervention means
intervention estimate se ci_lower ci_upper
halved 2.595 0.261 2.084 3.107
observed 2.548 0.201 2.154 2.942
Contrasts
comparison estimate se ci_lower ci_upper
halved vs observed 0.047 0.186 -0.317 0.411

Threshold intervention

Cap smoking intensity at 20 cigarettes per day for everyone.

Code
res_thresh <- contrast(
  fit_cont,
  interventions = list(cap20 = threshold(0, 20), observed = NULL),
  reference = "observed",
  type = "difference",
  ci_method = "sandwich"
)
#> 117 row(s) with NA predictions excluded from the target population.
res_thresh

Estimator: gcomp  ·  Estimand: ATE  ·  Contrast: difference  ·  CI method: sandwich  ·  N: 1629

Intervention means
intervention estimate se ci_lower ci_upper
cap20 2.568 0.21 2.157 2.979
observed 2.548 0.201 2.154 2.942
Contrasts
comparison estimate se ci_lower ci_upper
cap20 vs observed 0.021 0.081 -0.138 0.179

Comparing multiple interventions

Code
res_multi <- contrast(
  fit_cont,
  interventions = list(
    light    = static(5),
    reduce10 = shift(-10),
    halved   = scale_by(0.5),
    cap20    = threshold(0, 20),
    observed = NULL
  ),
  reference = "observed",
  type = "difference",
  ci_method = "sandwich"
)
#> 117 row(s) with NA predictions excluded from the target population.
res_multi

Estimator: gcomp  ·  Estimand: ATE  ·  Contrast: difference  ·  CI method: sandwich  ·  N: 1629

Intervention means
intervention estimate se ci_lower ci_upper
light 2.62 0.33 1.972 3.267
reduce10 2.594 0.258 2.089 3.099
halved 2.595 0.261 2.084 3.107
cap20 2.568 0.21 2.157 2.979
observed 2.548 0.201 2.154 2.942
Contrasts
comparison estimate se ci_lower ci_upper
light vs observed 0.072 0.281 -0.479 0.623
reduce10 vs observed 0.046 0.181 -0.308 0.4
halved vs observed 0.047 0.186 -0.317 0.411
cap20 vs observed 0.021 0.081 -0.138 0.179

Visualising intervention effects

Code
est <- res_multi$contrasts
tinyplot(
  estimate ~ comparison,
  data = est,
  type = "pointrange",
  ymin = est$ci_lower,
  ymax = est$ci_upper,
  xlab = "Comparison",
  ylab = "Difference in weight change (kg)",
  main = "Continuous treatment interventions"
)
abline(h = 0, lty = 2, col = "grey40")

Point estimates and confidence intervals for mean weight change under different smoking intensity interventions.

Dynamic intervention on continuous treatment

A dynamic rule that depends on individual characteristics. Here, heavy smokers (> 20 cigarettes/day) are reduced to 20, while others keep their observed level.

Code
res_dyn_cont <- contrast(
  fit_cont,
  interventions = list(
    capped = dynamic(\(data, trt) pmin(trt, 20)),
    observed = NULL
  ),
  reference = "observed",
  type = "difference",
  ci_method = "sandwich"
)
#> 117 row(s) with NA predictions excluded from the target population.
res_dyn_cont

Estimator: gcomp  ·  Estimand: ATE  ·  Contrast: difference  ·  CI method: sandwich  ·  N: 1629

Intervention means
intervention estimate se ci_lower ci_upper
capped 2.568 0.21 2.157 2.979
observed 2.548 0.201 2.154 2.942
Contrasts
comparison estimate se ci_lower ci_upper
capped vs observed 0.021 0.081 -0.138 0.179

Continuous treatment with bootstrap

Code
res_shift_bs <- contrast(
  fit_cont,
  interventions = list(reduce10 = shift(-10), observed = NULL),
  reference = "observed",
  type = "difference",
  ci_method = "bootstrap",
  n_boot = 50L
)
#> 117 row(s) with NA predictions excluded from the target population.
res_shift_bs

Estimator: gcomp  ·  Estimand: ATE  ·  Contrast: difference  ·  CI method: bootstrap  ·  N: 1629

Intervention means
intervention estimate se ci_lower ci_upper
reduce10 2.594 0.253 2.098 3.09
observed 2.548 0.199 2.157 2.938
Contrasts
comparison estimate se ci_lower ci_upper
reduce10 vs observed 0.046 0.214 -0.374 0.466

Dynamic intervention

A dynamic intervention assigns treatment based on individual characteristics. Here, we assign quitting smoking only to individuals who smoked more than 20 cigarettes per day at baseline.

Code
res_dyn <- contrast(
  fit_gc,
  interventions = list(
    rule = dynamic(\(data, trt) ifelse(data$smokeintensity > 20, 1, 0)),
    all_quit = static(1)
  ),
  reference = "all_quit",
  type = "difference",
  ci_method = "sandwich"
)
#> 117 row(s) with NA predictions excluded from the target population.
res_dyn

Estimator: gcomp  ·  Estimand: ATE  ·  Contrast: difference  ·  CI method: sandwich  ·  N: 1629

Intervention means
intervention estimate se ci_lower ci_upper
rule 2.852 0.303 2.259 3.445
all_quit 5.176 0.437 4.32 6.032
Contrasts
comparison estimate se ci_lower ci_upper
rule vs all_quit -2.324 0.341 -2.992 -1.656

Effect modification with by

The by argument stratifies causal effect estimates by levels of a variable. It averages the fitted model’s counterfactual predictions over each subset — so it only produces genuine effect-modification estimates when the outcome model itself contains an interaction between treatment and the modifier. Here we refit the model with a qsmk:sex term and then examine how the effect of quitting smoking differs by sex.

Code
fit_gc_hte <- causat(
  nhefs,
  outcome = "wt82_71",
  treatment = "qsmk",
  confounders = ~ sex + age + I(age^2) + race + factor(education) +
    smokeintensity + I(smokeintensity^2) + smokeyrs + I(smokeyrs^2) +
    factor(exercise) + factor(active) + wt71 + I(wt71^2) +
    qsmk:smokeintensity + qsmk:sex,
  censoring = "censored",
  model_fn = stats::glm
)

res_by_sex <- contrast(
  fit_gc_hte,
  interventions = list(quit = static(1), continue = static(0)),
  reference = "continue",
  type = "difference",
  ci_method = "sandwich",
  by = "sex"
)
#> 65 row(s) with NA predictions excluded from the target population.
#> 52 row(s) with NA predictions excluded from the target population.
res_by_sex

Estimator: gcomp  ·  Estimand: ATE  ·  Contrast: difference  ·  CI method: sandwich  ·  N: 1629

Intervention means
intervention estimate se ci_lower ci_upper by n_by
quit 5.276 0.561 4.177 6.375 Male 799
continue 1.668 0.297 1.087 2.249 Male 799
quit 5.088 0.651 3.813 6.363 Female 830
continue 1.654 0.318 1.029 2.278 Female 830
Contrasts
comparison estimate se ci_lower ci_upper by n_by
quit vs continue 3.608 0.624 2.385 4.831 Male 799
quit vs continue 3.435 0.708 2.048 4.821 Female 830
Note

Without the qsmk:sex term in the outcome model, by = "sex" would still run but would return near-identical estimates for both strata (any remaining variation coming only from indirect interactions like qsmk:smokeintensity combined with differing covariate distributions). All four estimation methods support A:modifier terms: g-comp feeds them to the outcome model directly, IPW expands the MSM to Y ~ 1 + modifier, matching expands to Y ~ A + modifier + A:modifier, and ICE auto-expands the interaction across treatment lags (lag1_A:modifier, etc.).

Tidy and glance

causatr results work with the broom ecosystem via tidy() and glance():

Code
tidy(res_ate_sw)
#>               term estimate std.error     type conf.low conf.high
#> 1 quit vs continue 3.515671 0.4791306 contrast 2.576593   4.45475
glance(res_ate_sw)
#>   estimator estimand contrast_type ci_method    n n_interventions
#> 1     gcomp      ATE    difference  sandwich 1629               2

Forest plot

The plot() method produces a forest plot using the forrest package. Forest plots are most useful when displaying multiple estimates, such as effect modification results:

Code
plot(res_by_sex)

Forest plot of the effect of quitting smoking on weight change, stratified by sex.

Categorical treatment (k > 2 levels)

G-computation handles categorical treatments natively by fitting a single outcome model with the treatment as a factor covariate. contrast() then standardises predictions under each named static() level and pairwise differences are read off against the reference category.

DGM. A three-arm trial with one confounder. The treatment is assigned with fixed probabilities (no confounding of treatment assignment in this simple example; \(L\) is a pure precision variable).

\[ \begin{aligned} L &\sim N(0,1) \\ A &\sim \text{Categorical}(0.40,\; 0.35,\; 0.25) \quad\text{levels } \{A, B, C\} \\ Y &= 2 + 1.5 \cdot \mathbf{1}(A{=}B) - 0.8 \cdot \mathbf{1}(A{=}C) + 0.5\,L + \varepsilon, \quad \varepsilon \sim N(0,1) \end{aligned} \]

True contrasts: \(\mathbb{E}[Y^B] - \mathbb{E}[Y^A] = 1.5\); \(\mathbb{E}[Y^C] - \mathbb{E}[Y^A] = -0.8\).

Code
set.seed(42)
n <- 2000
L <- rnorm(n)
A <- sample(c("A", "B", "C"), n, replace = TRUE, prob = c(0.4, 0.35, 0.25))
Y <- 2 + (A == "B") * 1.5 + (A == "C") * -0.8 + 0.5 * L + rnorm(n)
cat_df <- data.frame(Y = Y, A = factor(A, levels = c("A", "B", "C")), L = L)

fit_cat <- causat(
  cat_df,
  outcome = "Y",
  treatment = "A",
  confounders = ~ L,
  model_fn = stats::glm
)

res_cat <- contrast(
  fit_cat,
  interventions = list(
    arm_a = static("A"),
    arm_b = static("B"),
    arm_c = static("C")
  ),
  reference = "arm_a",
  type = "difference",
  ci_method = "sandwich"
)
res_cat

Estimator: gcomp  ·  Estimand: ATE  ·  Contrast: difference  ·  CI method: sandwich  ·  N: 2000

Intervention means
intervention estimate se ci_lower ci_upper
arm_a 1.959 0.038 1.884 2.034
arm_b 3.481 0.039 3.404 3.557
arm_c 1.185 0.045 1.096 1.274
Contrasts
comparison estimate se ci_lower ci_upper
arm_b vs arm_a 1.521 0.052 1.419 1.624
arm_c vs arm_a -0.775 0.057 -0.886 -0.663

Structural truth: B − A = 1.5, C − A = −0.8. The sandwich CIs should cover both.

Multivariate (joint) treatment

Pass treatment = c("A1", "A2") for a joint intervention on two or more treatment variables. Interventions are supplied as named sub-lists, one entry per treatment column, and each sub-intervention is independent (the rules can be different types — e.g. static() for one, shift() for another).

DGM. Two treatments with separate confounders. \(A_1\) is binary (confounded by \(L_1\)); \(A_2\) is continuous (confounded by \(L_2\)). The outcome depends additively on both treatments.

\[ \begin{aligned} L_1,\; L_2 &\sim N(0,1) \\ A_1 &\sim \text{Bernoulli}\!\bigl(\text{logit}^{-1}(0.3\,L_1)\bigr) \\ A_2 &= 1 + 0.4\,L_2 + \eta, \quad \eta \sim N(0,1) \\ Y &= 1 + 2\,A_1 + 0.5\,A_2 + L_1 + L_2 + \varepsilon, \quad \varepsilon \sim N(0,1) \end{aligned} \]

True contrast: \(\mathbb{E}[Y^{A_1=1,\,A_2=2}] - \mathbb{E}[Y^{A_1=0,\,A_2=0}] = 2 \times 1 + 0.5 \times 2 = 3.0\).

Code
set.seed(1)
n <- 2000
L1 <- rnorm(n)
L2 <- rnorm(n)
A1 <- rbinom(n, 1, plogis(0.3 * L1))
A2 <- 1 + 0.4 * L2 + rnorm(n)
Y <- 1 + 2 * A1 + 0.5 * A2 + L1 + L2 + rnorm(n)
mv_df <- data.frame(Y = Y, A1 = A1, A2 = A2, L1 = L1, L2 = L2)

fit_mv <- causat(
  mv_df,
  outcome = "Y",
  treatment = c("A1", "A2"),
  confounders = ~ L1 + L2,
  model_fn = stats::glm
)

res_mv <- contrast(
  fit_mv,
  interventions = list(
    both_on  = list(A1 = static(1), A2 = static(2)),
    both_off = list(A1 = static(0), A2 = static(0))
  ),
  reference = "both_off",
  type = "difference"
)
res_mv

Estimator: gcomp  ·  Estimand: ATE  ·  Contrast: difference  ·  CI method: sandwich  ·  N: 2000

Intervention means
intervention estimate se ci_lower ci_upper
both_on 3.975 0.05 3.876 4.073
both_off 1.004 0.05 0.906 1.102
Contrasts
comparison estimate se ci_lower ci_upper
both_on vs both_off 2.971 0.061 2.852 3.09

Structural truth: both_on − both_off = 2*1 + 0.5*2 = 3.0.

External (survey) weights

Pre-computed weights (survey weights, calibration weights, externally-fit IPCW) are passed via weights = and validated upfront by check_weights() (NA / Inf / negative / mis-sized vectors are rejected with a specific error). The sandwich engine propagates the weights through both Channel 1 (target-population weighting) and Channel 2 (IWLS weighted score).

DGM. A simple confounded binary treatment with \(\text{ATE} = 3\). The survey weights \(w_i \sim \text{Uniform}(0.5, 2)\) are deliberately uninformative — they carry no information about the outcome — to verify that the IF correctly propagates weights without introducing bias.

\[ \begin{aligned} L &\sim N(0,1) \\ A &\sim \text{Bernoulli}\!\bigl(\text{logit}^{-1}(0.5\,L)\bigr) \\ Y &= 2 + 3\,A + 1.5\,L + \varepsilon, \quad \varepsilon \sim N(0,1) \\ w &\sim \text{Uniform}(0.5,\; 2.0) \end{aligned} \]

Code
set.seed(2)
n <- 2000
L <- rnorm(n)
A <- rbinom(n, 1, plogis(0.5 * L))
Y <- 2 + 3 * A + 1.5 * L + rnorm(n)
# A random survey weight unrelated to outcome: verifies the IF correctly
# propagates weights without introducing bias when the weights carry no
# real information.
w <- runif(n, 0.5, 2.0)
sv_df <- data.frame(Y = Y, A = A, L = L)

fit_sv <- causat(
  sv_df,
  outcome = "Y",
  treatment = "A",
  confounders = ~ L,
  weights = w,
  model_fn = stats::glm
)

res_sv <- contrast(
  fit_sv,
  interventions = list(a1 = static(1), a0 = static(0)),
  reference = "a0",
  type = "difference"
)
res_sv

Estimator: gcomp  ·  Estimand: ATE  ·  Contrast: difference  ·  CI method: sandwich  ·  N: 2000

Intervention means
intervention estimate se ci_lower ci_upper
a1 5.071 0.048 4.977 5.164
a0 2.118 0.048 2.025 2.211
Contrasts
comparison estimate se ci_lower ci_upper
a1 vs a0 2.953 0.047 2.861 3.045

The weighted ATE estimate is still \(\approx\) 3 (the structural truth), and the sandwich SE reflects both the outcome-model and weighted-target sampling uncertainty.

Direct survey::svydesign integration

causat(weights = ...) also accepts a survey::svydesign object directly. The sampling weights are extracted via stats::weights() and the design’s first-stage PSU is auto-adopted as the contrast-time cluster (see the next section for what that means for the sandwich). Explicit cluster = at causat() or contrast() overrides the auto-adoption; single-PSU designs (svydesign(ids = ~1, ...)) propagate only weights.

Code
library(survey)
des <- svydesign(ids = ~psu, weights = ~pw, data = d)
fit <- causat(d, outcome = "Y", treatment = "A",
              confounders = ~ L, weights = des,
              model_fn = stats::glm)
# weights + cluster both applied automatically
contrast(fit, list(a1 = static(1), a0 = static(0)))

Cluster-robust sandwich

For design clusters (site, household, PSU, repeated measures) the sandwich variance should account for within-cluster comovement of the per-individual influence functions. causat(cluster = "col") preserves the cluster column through prepare_data() and stashes its name on the fit; contrast() then defaults to that cluster and runs the sum-within-cluster-then-square aggregation (vcov_from_if(cluster = ...), Liang & Zeger 1986). Equivalent to sandwich::vcovCL(type = "HC0") applied to the final predict-then-average step, up to a small \(G/(G-1)\) cluster-count factor. Pass cluster = ... at contrast() time to override the default or switch clustering off (cluster = NULL).

Code
# cluster at fit time -- auto-propagates to contrast
fit_cl <- causat(d, outcome = "Y", treatment = "A",
                 confounders = ~ L, cluster = "site",
                 model_fn = stats::glm)
contrast(fit_cl, list(a1 = static(1), a0 = static(0)))

# or at contrast time, column name or direct vector
contrast(fit, list(a1 = static(1), a0 = static(0)),
         cluster = "site")
contrast(fit, list(a1 = static(1), a0 = static(0)),
         cluster = d$site)

Poisson count outcome

For rate / count outcomes, pass family = "poisson". The outcome model becomes a log-link GLM and contrast(..., type = "ratio") returns the marginal rate ratio.

DGM. A binary treatment with a Poisson count outcome. The outcome’s conditional mean is log-linear in \(A\) and \(L\), so the conditional rate ratio is \(\exp(0.7) \approx 2.01\).

\[ \begin{aligned} L &\sim N(0,1) \\ A &\sim \text{Bernoulli}\!\bigl(\text{logit}^{-1}(0.5\,L)\bigr) \\ Y &\sim \text{Poisson}\!\bigl(\exp(0.5 + 0.7\,A + 0.3\,L)\bigr) \end{aligned} \]

True marginal rate ratio: \(\approx \exp(0.7) \approx 2.01\).

Code
set.seed(3)
n <- 2000
L <- rnorm(n)
A <- rbinom(n, 1, plogis(0.5 * L))
Y <- rpois(n, lambda = exp(0.5 + 0.7 * A + 0.3 * L))
pois_df <- data.frame(Y = Y, A = A, L = L)

fit_pois <- causat(
  pois_df,
  outcome = "Y",
  treatment = "A",
  confounders = ~ L,
  family = "poisson",
  model_fn = stats::glm
)

res_rr_pois <- contrast(
  fit_pois,
  interventions = list(a1 = static(1), a0 = static(0)),
  reference = "a0",
  type = "ratio",
  ci_method = "sandwich"
)
res_rr_pois

Estimator: gcomp  ·  Estimand: ATE  ·  Contrast: ratio  ·  CI method: sandwich  ·  N: 2000

Intervention means
intervention estimate se ci_lower ci_upper
a1 3.428 0.061 3.309 3.547
a0 1.712 0.047 1.62 1.804
Contrasts
comparison estimate se ci_lower ci_upper
a1 vs a0 2.002 0.063 1.882 2.13

Structural truth: marginal RR \(\approx\) exp(0.7) \(\approx\) 2.01.

Gamma outcome

For strictly positive continuous outcomes (e.g. costs, durations), the Gamma family with log link avoids predicting negative values. The marginal rate ratio is a natural summary.

DGM. A binary treatment with a Gamma-distributed positive outcome. The conditional mean is log-linear: \(E[Y \mid A, L] = \exp(1 + 0.5\,A + 0.3\,L)\), so the conditional rate ratio is \(\exp(0.5) \approx 1.65\).

\[ \begin{aligned} L &\sim N(0,1) \\ A &\sim \text{Bernoulli}\!\bigl(\text{logit}^{-1}(0.5\,L)\bigr) \\ Y &\sim \text{Gamma}\!\bigl(\text{shape}=4,\; \text{rate}=4\,/\,\exp(1 + 0.5\,A + 0.3\,L)\bigr) \end{aligned} \]

True marginal rate ratio: \(\approx \exp(0.5) \approx 1.65\).

Code
set.seed(4)
n <- 2000
L <- rnorm(n)
A <- rbinom(n, 1, plogis(0.5 * L))
Y <- rgamma(n, shape = 4, rate = 4 / exp(1 + 0.5 * A + 0.3 * L))
gamma_df <- data.frame(Y = Y, A = A, L = L)

fit_gamma <- causat(
  gamma_df,
  outcome = "Y",
  treatment = "A",
  confounders = ~ L,
  family = "Gamma",
  model_fn = stats::glm
)

res_gamma <- contrast(
  fit_gamma,
  interventions = list(a1 = static(1), a0 = static(0)),
  reference = "a0",
  type = "ratio",
  ci_method = "sandwich"
)
res_gamma

Estimator: gcomp  ·  Estimand: ATE  ·  Contrast: ratio  ·  CI method: sandwich  ·  N: 2000

Intervention means
intervention estimate se ci_lower ci_upper
a1 4.428 0.075 4.281 4.575
a0 2.785 0.053 2.682 2.889
Contrasts
comparison estimate se ci_lower ci_upper
a1 vs a0 1.59 0.039 1.516 1.668

Structural truth: marginal RR \(\approx\) exp(0.5) \(\approx\) 1.65.

Code
tt(tidy(res_gamma), digits = 3)
term estimate std.error type conf.low conf.high
a1 vs a0 1.59 0.0388 contrast 1.52 1.67

Quasibinomial outcome

When the outcome is a proportion in (0, 1) — for example a bounded continuous score — or an over-dispersed binary response, family = "quasibinomial" uses a logit link without assuming binomial variance. Sandwich SEs handle the quasi-likelihood correctly.

DGM. A confounded binary treatment with a binary outcome generated from a logistic model. Using quasibinomial instead of binomial relaxes the strict variance assumption.

\[ \begin{aligned} L &\sim N(0,1) \\ A &\sim \text{Bernoulli}\!\bigl(\text{logit}^{-1}(0.3\,L)\bigr) \\ Y &\sim \text{Bernoulli}\!\bigl(\text{logit}^{-1}(-0.5 + 0.6\,A + 0.4\,L)\bigr) \end{aligned} \]

Code
set.seed(5)
n <- 2000
L <- rnorm(n)
A <- rbinom(n, 1, plogis(0.3 * L))
Y <- rbinom(n, 1, plogis(-0.5 + 0.6 * A + 0.4 * L))
quasi_df <- data.frame(Y = Y, A = A, L = L)

fit_quasi <- causat(
  quasi_df,
  outcome = "Y",
  treatment = "A",
  confounders = ~ L,
  family = "quasibinomial",
  model_fn = stats::glm
)

res_quasi <- contrast(
  fit_quasi,
  interventions = list(a1 = static(1), a0 = static(0)),
  reference = "a0",
  type = "difference",
  ci_method = "sandwich"
)
res_quasi

Estimator: gcomp  ·  Estimand: ATE  ·  Contrast: difference  ·  CI method: sandwich  ·  N: 2000

Intervention means
intervention estimate se ci_lower ci_upper
a1 0.533 0.016 0.502 0.564
a0 0.369 0.016 0.338 0.399
Contrasts
comparison estimate se ci_lower ci_upper
a1 vs a0 0.164 0.022 0.121 0.207
Code
tt(tidy(res_quasi), digits = 3)
term estimate std.error type conf.low conf.high
a1 vs a0 0.164 0.0221 contrast 0.121 0.207

Negative binomial outcome

For count outcomes with overdispersion (variance exceeds the mean), a negative binomial model is more appropriate than Poisson. Pass model_fn = MASS::glm.nb — note that family is not needed because glm.nb estimates its own dispersion parameter internally.

DGM. A confounded binary treatment with a count outcome from a negative binomial distribution.

\[ \begin{aligned} L &\sim N(2,1) \\ A &\sim \text{Bernoulli}\!\bigl(\text{logit}^{-1}(-1 + 0.5\,L)\bigr) \\ Y &\sim \text{NB}\!\bigl(\mu = \exp(0.5 + 0.3\,A + 0.4\,L),\; \theta = 2\bigr) \end{aligned} \]

The marginal rate ratio is \(\exp(0.3) \approx 1.35\) (the \(L\) term cancels in the ratio).

Code
set.seed(77)
n <- 2000
L <- rnorm(n, 2, 1)
A <- rbinom(n, 1, plogis(-1 + 0.5 * L))
Y <- rnbinom(n, mu = exp(0.5 + 0.3 * A + 0.4 * L), size = 2)
nb_df <- data.frame(Y = Y, A = A, L = L)

fit_nb <- causat(
  nb_df,
  outcome = "Y",
  treatment = "A",
  confounders = ~ L,
  model_fn = MASS::glm.nb
)

res_nb <- contrast(
  fit_nb,
  interventions = list(a1 = static(1), a0 = static(0)),
  reference = "a0",
  type = "ratio",
  ci_method = "sandwich"
)
res_nb

Estimator: gcomp  ·  Estimand: ATE  ·  Contrast: ratio  ·  CI method: sandwich  ·  N: 2000

Intervention means
intervention estimate se ci_lower ci_upper
a1 5.39 0.151 5.095 5.686
a0 4.133 0.131 3.877 4.389
Contrasts
comparison estimate se ci_lower ci_upper
a1 vs a0 1.304 0.051 1.208 1.408
Code
tt(tidy(res_nb), digits = 3)
term estimate std.error type conf.low conf.high
a1 vs a0 1.3 0.0511 contrast 1.21 1.41

Beta regression outcome

For outcomes bounded in (0, 1) — proportions, rates, indices — beta regression models the conditional mean on the logit scale with a Beta-distributed error. Pass model_fn = betareg::betareg and family = "beta".

DGM. A confounded binary treatment with a beta-distributed outcome.

\[ \begin{aligned} L &\sim N(0,1) \\ A &\sim \text{Bernoulli}\!\bigl(\text{logit}^{-1}(-0.5 + 0.3\,L)\bigr) \\ \mu &= \text{logit}^{-1}(0.2 + 0.5\,A + 0.3\,L) \\ Y &\sim \text{Beta}(\mu\,\phi,\; (1-\mu)\,\phi), \quad \phi = 10 \end{aligned} \]

Code
set.seed(78)
n <- 2000
L <- rnorm(n)
A <- rbinom(n, 1, plogis(-0.5 + 0.3 * L))
mu <- plogis(0.2 + 0.5 * A + 0.3 * L)
Y <- rbeta(n, mu * 10, (1 - mu) * 10)
beta_df <- data.frame(Y = Y, A = A, L = L)

fit_beta <- causat(
  beta_df,
  outcome = "Y",
  treatment = "A",
  confounders = ~ L,
  model_fn = betareg::betareg,
  family = "beta"
)

res_beta <- contrast(
  fit_beta,
  interventions = list(a1 = static(1), a0 = static(0)),
  reference = "a0",
  type = "or",
  ci_method = "sandwich"
)
res_beta

Estimator: gcomp  ·  Estimand: ATE  ·  Contrast: or  ·  CI method: sandwich  ·  N: 2000

Intervention means
intervention estimate se ci_lower ci_upper
a1 0.665 0.005 0.654 0.675
a0 0.547 0.005 0.538 0.556
Contrasts
comparison estimate se ci_lower ci_upper
a1 vs a0 1.64 0.047 1.55 1.736

The odds ratio is valid because the outcome is in (0, 1):

Code
tt(tidy(res_beta), digits = 3)
term estimate std.error type conf.low conf.high
a1 vs a0 1.64 0.0475 contrast 1.55 1.74

GLM with splines

For flexible confounder adjustment without the overhead of a full GAM, use splines::ns() (natural cubic splines) directly in the formula. This keeps the model as a plain glm, so the analytic sandwich path fires — unlike GAM, no mgcv dependency is needed.

Code
fit_spline <- causat(
  nhefs,
  outcome = "wt82_71",
  treatment = "qsmk",
  confounders = ~ sex + splines::ns(age, 3) + race + factor(education) +
    splines::ns(smokeintensity, 3) + splines::ns(smokeyrs, 3) +
    factor(exercise) + factor(active) + splines::ns(wt71, 3),
  censoring = "censored",
  model_fn = stats::glm
)

res_spline <- contrast(
  fit_spline,
  interventions = list(quit = static(1), continue = static(0)),
  reference = "continue",
  ci_method = "sandwich"
)
#> 117 row(s) with NA predictions excluded from the target population.
res_spline

Estimator: gcomp  ·  Estimand: ATE  ·  Contrast: difference  ·  CI method: sandwich  ·  N: 1629

Intervention means
intervention estimate se ci_lower ci_upper
quit 5.119 0.421 4.293 5.945
continue 1.654 0.219 1.225 2.083
Contrasts
comparison estimate se ci_lower ci_upper
quit vs continue 3.465 0.467 2.549 4.38

GAM model via model_fn

Pass mgcv::gam instead of stats::glm for flexible nonlinear confounder adjustment using splines.

Code
fit_gam <- causat(
  nhefs,
  outcome = "wt82_71",
  treatment = "qsmk",
  confounders = ~ sex + s(age) + race + factor(education) +
    s(smokeintensity) + s(smokeyrs) + factor(exercise) +
    factor(active) + s(wt71),
  censoring = "censored",
  model_fn = mgcv::gam
)

res_gam <- contrast(
  fit_gam,
  interventions = list(quit = static(1), continue = static(0)),
  reference = "continue",
  ci_method = "sandwich"
)
#> 117 row(s) with NA predictions excluded from the target population.
res_gam

Estimator: gcomp  ·  Estimand: ATE  ·  Contrast: difference  ·  CI method: sandwich  ·  N: 1629

Intervention means
intervention estimate se ci_lower ci_upper
quit 5.093 22.128 -38.277 48.464
continue 1.67 11.029 -19.947 23.287
Contrasts
comparison estimate se ci_lower ci_upper
quit vs continue 3.424 24.924 -45.426 52.273
NoteGAM sandwich SEs can be larger than GLM SEs

The sandwich variance for GAMs uses the penalised Bayesian covariance (model$Vp) as the bread inverse, which accounts for the smoothing penalty. This can produce noticeably larger SEs than GLM-based models (polynomial or spline), especially when the effective degrees of freedom are high relative to the sample size. If the GAM SE looks disproportionately large, consider using bootstrap inference (ci_method = "bootstrap") or switching to a GLM with splines::ns() terms, which stays on the standard (unpenalised) analytic sandwich path.

WarningMachine learning outcome models require debiased estimation

causatr’s model_fn argument accepts any GLM or GAM, but not machine learning models (random forests, gradient boosting, neural networks). Two reasons:

  1. Bias. Plug-in ML predictions for the g-formula are biased at \(\sqrt{n}\) rate due to overfitting (regularisation bias / Dorn bias). Consistent estimation requires cross-fitting + a bias-correction step (one-step estimator, TMLE, or SDR), which causatr does not implement.
  2. Variance. The sandwich variance estimator requires the parametric likelihood score of the outcome model. ML models do not have one, so causatr cannot compute standard errors.

If you need flexible (nonparametric) outcome models, use the lmtp package, which implements TMLE and SDR with Super Learner stacks and supports the same intervention types (shift, ipsi, etc.). See Hernán & Robins (2025), Ch. 18 and van der Laan & Rose (2011) for the theory behind debiased ML.

Comparing model specifications

All three approaches — standard GLM with polynomial terms, GLM with natural splines, and GAM — target the same causal parameter. Differences in point estimates reflect model flexibility; all three should agree closely when the outcome model is correctly specified.

Code
model_comp <- data.frame(
  Model = c("GLM (polynomial)", "GLM (splines)", "GAM"),
  Estimate = c(
    res_ate_sw$contrasts$estimate[1],
    res_spline$contrasts$estimate[1],
    res_gam$contrasts$estimate[1]
  ),
  SE = c(
    res_ate_sw$contrasts$se[1],
    res_spline$contrasts$se[1],
    res_gam$contrasts$se[1]
  ),
  CI_lower = c(
    res_ate_sw$contrasts$ci_lower[1],
    res_spline$contrasts$ci_lower[1],
    res_gam$contrasts$ci_lower[1]
  ),
  CI_upper = c(
    res_ate_sw$contrasts$ci_upper[1],
    res_spline$contrasts$ci_upper[1],
    res_gam$contrasts$ci_upper[1]
  )
)

tt(model_comp, digits = 3)
Model Estimate SE CI_lower CI_upper
GLM (polynomial) 3.52 0.479 2.58 4.45
GLM (splines) 3.46 0.467 2.55 4.38
GAM 3.42 24.924 -45.43 52.27
Code
tinyplot(
  Estimate ~ Model,
  data = model_comp,
  type = "pointrange",
  ymin = model_comp$CI_lower,
  ymax = model_comp$CI_upper,
  ylab = "ATE (kg)",
  main = "Model specification comparison (NHEFS)"
)
abline(h = 0, lty = 2, col = "grey40")

Forest plot comparing ATE estimates from three model specifications.

Stochastic interventions

stochastic() defines a randomised intervention where the counterfactual treatment for each individual is drawn from a user-supplied distribution. The g-formula evaluates \(E[Y^g]\) via Monte Carlo integration: for each of n_mc draws, the sampler assigns counterfactual treatments, the outcome model predicts, and the predictions are averaged across draws.

Stochastic interventions are only supported under g-computation (point and longitudinal). See vignette("interventions") for a full comparison of intervention types and estimator compatibility.

Binary treatment: covariate-dependent randomisation

Assign quitting with a probability that depends on baseline smoking intensity (heavier smokers are more likely to be assigned to quit):

Code
set.seed(42)
res_stoch <- contrast(
  fit_gc,
  interventions = list(
    random_quit = stochastic(
      \(data, trt) rbinom(nrow(data), 1,
                          plogis(-1 + 0.05 * data$smokeintensity)),
      n_mc = 200L
    ),
    all_quit = static(1)
  ),
  reference = "all_quit",
  type = "difference"
)
#> 117 row(s) with NA predictions excluded from the target population.
res_stoch

Estimator: gcomp  ·  Estimand: ATE  ·  Contrast: difference  ·  CI method: sandwich  ·  N: 1629

Intervention means
intervention estimate se ci_lower ci_upper
random_quit 3.506 0.278 2.961 4.052
all_quit 5.176 0.437 4.32 6.032
Contrasts
comparison estimate se ci_lower ci_upper
random_quit vs all_quit -1.67 0.228 -2.116 -1.223

The stochastic policy’s marginal mean is between the “all quit” and “no one quits” extremes: heavier smokers are more likely to quit, but no one is forced.

Continuous treatment: random shift

Rather than a fixed shift(-10), add a random perturbation to each person’s smoking intensity:

Code
set.seed(42)
fit_cont_stoch <- causat(
  nhefs_complete,
  outcome = "wt82_71",
  treatment = "smokeintensity",
  confounders = ~ sex + age + I(age^2) + race + factor(education) +
    smokeyrs + I(smokeyrs^2) + factor(exercise) + factor(active) +
    wt71 + I(wt71^2),
  model_fn = stats::glm
)

res_stoch_cont <- contrast(
  fit_cont_stoch,
  interventions = list(
    noisy_reduce = stochastic(
      \(data, trt) pmax(0, trt + rnorm(length(trt), mean = -5, sd = 2)),
      n_mc = 200L
    ),
    observed = NULL
  ),
  reference = "observed",
  type = "difference"
)
#> 113 row(s) with NA predictions excluded from the target population.
res_stoch_cont

Estimator: gcomp  ·  Estimand: ATE  ·  Contrast: difference  ·  CI method: sandwich  ·  N: 1566

Intervention means
intervention estimate se ci_lower ci_upper
noisy_reduce 2.66 0.209 2.25 3.071
observed 2.638 0.199 2.248 3.028
Contrasts
comparison estimate se ci_lower ci_upper
noisy_reduce vs observed 0.022 0.086 -0.147 0.191
Tip

Choosing n_mc: 100–500 is typical. The sandwich SE accounts for the MC draws used at estimation time, so there is no need to inflate n_mc just for inference. If point estimates visibly change when you re-run with a different seed, increase n_mc.

Summary of covered combinations

Legend. ✅ covered and truth-pinned in tests · 🟡 smoke test only · ⛔ rejected with an informative error.

Treatment Outcome Intervention Estimand Contrast Inference Weights Status
Binary Continuous (gaussian, GLM) Static ATE Difference Sandwich none
Binary Continuous (gaussian, GLM) Static ATE Difference Bootstrap none
Binary Continuous (gaussian, GLM) Static ATT Difference Sandwich none
Binary Continuous (gaussian, GLM) Static ATC Difference Sandwich none
Binary Continuous (gaussian, GLM) Static Subset Difference Sandwich none
Binary Continuous (gaussian, GLM) Static ATE (by sex, A:sex in model) Difference Sandwich none
Binary Continuous (gaussian, GLM) Dynamic ATE Difference Sandwich none
Binary Continuous (gaussian, GLM) Static ATE Difference Sandwich survey
Binary Continuous (gaussian, GAM) Static ATE Difference Sandwich none
Binary Binary (binomial, GLM) Static ATE Difference Sandwich none
Binary Binary (binomial, GLM) Static ATE Difference Bootstrap none
Binary Binary (binomial, GLM) Static ATE Ratio Sandwich none
Binary Binary (binomial, GLM) Static ATE OR Sandwich none
Binary Count (poisson, GLM) Static ATE Ratio Sandwich none
Binary Positive continuous (Gamma, GLM) Static ATE Ratio Sandwich none
Binary Binary (quasibinomial, GLM) Static ATE Difference Sandwich none
Binary Continuous (gaussian, GLM+splines) Static ATE Difference Sandwich none
Continuous Continuous (gaussian, GLM) Static ATE Difference Sandwich none
Continuous Continuous (gaussian, GLM) Shift ATE Difference Sandwich none
Continuous Continuous (gaussian, GLM) Scale ATE Difference Sandwich none
Continuous Continuous (gaussian, GLM) Threshold ATE Difference Sandwich none
Continuous Continuous (gaussian, GLM) Dynamic ATE Difference Sandwich none
Continuous Continuous (gaussian, GLM) Shift ATE Difference Bootstrap none
Categorical (k>2) Continuous (gaussian, GLM) Static ATE Difference Sandwich none
Multivariate Continuous (gaussian, GLM) Joint static ATE Difference Sandwich none
Binary Continuous (gaussian, GLM) Stochastic ATE Difference Sandwich none
Continuous Continuous (gaussian, GLM) Stochastic ATE Difference Sandwich none
Binary Count (NB, MASS::glm.nb) Static ATE Ratio Sandwich none
Binary Proportion (beta, betareg::betareg) Static ATE Difference Sandwich none

See FEATURE_COVERAGE_MATRIX.md for the authoritative coverage status of every method × treatment × outcome × intervention × variance combination, including censoring and the numeric two-tier variance fallback for custom model_fn choices.

References

Hernán MA, Robins JM (2025). Causal Inference: What If. Chapman & Hall/CRC. Chapter 13: Standardization and the parametric g-formula.

Díaz I, van der Laan MJ (2012). Population intervention causal effects based on stochastic interventions. Biometrics 68:541–549.