Methodological triangulation: g-computation, IPW, AIPW, matching, and SNM

When multiple estimation methods agree on the same causal effect, we gain confidence that the result is not an artefact of a single modelling assumption. This vignette compares all five methods on the NHEFS dataset:

Data: NHEFS

All examples use the NHEFS dataset (Hernán & Robins, 2025). The assumed causal structure is:

  • Treatment (\(A\)): qsmk — quit smoking between 1971 and 1982 (binary: 0/1).
  • Outcome (\(Y\)): wt82_71 — weight change in kg (continuous), or gained_weight (\(\mathbf{1}\{Y > 0\}\), binary).
  • Confounders (\(L\)): sex, age, race, education, smoking intensity, years smoked, exercise, physical activity, and baseline weight.

\[ \begin{aligned} L &\sim F_L \\ A &\sim \text{Bernoulli}\!\bigl(\text{logit}^{-1}(\alpha_0 + \alpha_L^\top L)\bigr) \\ Y &= \beta_0 + \beta_A A + \beta_L^\top L + \varepsilon \end{aligned} \]

The confounders \(L\) are common causes of both \(A\) and \(Y\). Under this structure, g-computation adjusts for \(L\) via an outcome model, IPW adjusts via a treatment model, and matching balances \(L\) between treatment groups. When all four methods agree, the result is unlikely to be an artefact of a single modelling assumption.

Code
library(causatr)
library(tinyplot)
data("nhefs")

nhefs_complete <- nhefs[complete.cases(nhefs), ]
nhefs_complete$gained_weight <- as.integer(nhefs_complete$wt82_71 > 0)
nhefs_complete$sex <- factor(
  nhefs_complete$sex,
  levels = 0:1,
  labels = c("Male", "Female")
)

confounders <- ~ sex + age + I(age^2) + race + factor(education) +
  smokeintensity + I(smokeintensity^2) + smokeyrs + I(smokeyrs^2) +
  factor(exercise) + factor(active) + wt71 + I(wt71^2)

Continuous outcome: weight change

Fitting all four methods

Code
fit_gc <- causat(
  nhefs,
  outcome = "wt82_71",
  treatment = "qsmk",
  confounders = confounders,
  censoring = "censored",
  model_fn = stats::glm
)

fit_ipw <- causat(
  nhefs_complete,
  outcome = "wt82_71",
  treatment = "qsmk",
  confounders = confounders,
  estimator = "ipw",
  model_fn = stats::glm,
  propensity_model_fn = stats::glm
)

fit_aipw <- causat(
  nhefs_complete,
  outcome = "wt82_71",
  treatment = "qsmk",
  confounders = confounders,
  estimator = "aipw",
  model_fn = stats::glm,
  propensity_model_fn = stats::glm
)

fit_m <- causat(
  nhefs_complete,
  outcome = "wt82_71",
  treatment = "qsmk",
  confounders = confounders,
  estimator = "matching",
  estimand = "ATT"
)

Contrasts

Code
ivs <- list(quit = static(1), continue = static(0))

res_gc   <- contrast(fit_gc,   ivs, reference = "continue")
#> 117 row(s) with NA predictions excluded from the target population.
res_ipw  <- contrast(fit_ipw,  ivs, reference = "continue")
res_aipw <- contrast(fit_aipw, ivs, reference = "continue")
res_m    <- contrast(fit_m,    ivs, reference = "continue")

results <- rbind(
  cbind(method = "G-computation", res_gc$contrasts),
  cbind(method = "IPW",           res_ipw$contrasts),
  cbind(method = "AIPW",          res_aipw$contrasts),
  cbind(method = "Matching (ATT)", res_m$contrasts)
)
results
#>            method       comparison estimate        se ci_lower ci_upper
#>            <char>           <char>    <num>     <num>    <num>    <num>
#> 1:  G-computation quit vs continue 3.462484 0.4677686 2.545674 4.379294
#> 2:            IPW quit vs continue 3.499174 0.4902045 2.538391 4.459958
#> 3:           AIPW quit vs continue 3.477009 0.4828865 2.530569 4.423449
#> 4: Matching (ATT) quit vs continue 3.341010 0.5586286 2.246118 4.435902
NoteWhy AIPW SE may be smaller

When both the outcome and propensity models are correctly specified, AIPW attains the semiparametric efficiency bound — its standard error is no larger than either plain g-computation or plain IPW. In applied data the gain is often modest but never negative.

Forest plot

Code
tinyplot(
  estimate ~ method,
  data = results,
  type = "pointrange",
  ymin = results$ci_lower,
  ymax = results$ci_upper,
  xlab = "Method",
  ylab = "ATE: weight change (kg)",
  main = "Triangulation: continuous outcome"
)
abline(h = 0, lty = 2, col = "grey40")

Forest plot comparing g-computation, IPW, AIPW, and matching estimates of the effect of quitting smoking on weight change.

Binary outcome: gained weight

Code
fit_gc_bin <- causat(
  nhefs_complete,
  outcome = "gained_weight",
  treatment = "qsmk",
  confounders = confounders,
  family = "binomial",
  model_fn = stats::glm
)

fit_ipw_bin <- causat(
  nhefs_complete,
  outcome = "gained_weight",
  treatment = "qsmk",
  confounders = confounders,
  estimator = "ipw",
  model_fn = stats::glm,
  propensity_model_fn = stats::glm
)

fit_aipw_bin <- causat(
  nhefs_complete,
  outcome = "gained_weight",
  treatment = "qsmk",
  confounders = confounders,
  estimator = "aipw",
  family = "binomial",
  model_fn = stats::glm,
  propensity_model_fn = stats::glm
)

fit_m_bin <- causat(
  nhefs_complete,
  outcome = "gained_weight",
  treatment = "qsmk",
  confounders = confounders,
  estimator = "matching",
  estimand = "ATT"
)

res_gc_bin   <- contrast(fit_gc_bin,   ivs, reference = "continue")
res_ipw_bin  <- contrast(fit_ipw_bin,  ivs, reference = "continue")
res_aipw_bin <- contrast(fit_aipw_bin, ivs, reference = "continue")
res_m_bin    <- contrast(fit_m_bin,    ivs, reference = "continue")

results_bin <- rbind(
  cbind(method = "G-computation", res_gc_bin$contrasts),
  cbind(method = "IPW",           res_ipw_bin$contrasts),
  cbind(method = "AIPW",          res_aipw_bin$contrasts),
  cbind(method = "Matching (ATT)", res_m_bin$contrasts)
)
results_bin
#>            method       comparison  estimate         se   ci_lower  ci_upper
#>            <char>           <char>     <num>      <num>      <num>     <num>
#> 1:  G-computation quit vs continue 0.1326922 0.02366249 0.08631463 0.1790699
#> 2:            IPW quit vs continue 0.1379464 0.02498002 0.08898648 0.1869064
#> 3:           AIPW quit vs continue 0.1361954 0.02487782 0.08743580 0.1849550
#> 4: Matching (ATT) quit vs continue 0.1488834 0.03168677 0.08677844 0.2109883
Code
tinyplot(
  estimate ~ method,
  data = results_bin,
  type = "pointrange",
  ymin = results_bin$ci_lower,
  ymax = results_bin$ci_upper,
  xlab = "Method",
  ylab = "Risk difference (gained weight)",
  main = "Triangulation: binary outcome"
)
abline(h = 0, lty = 2, col = "grey40")

Forest plot comparing g-computation, IPW, AIPW, and matching on the binary outcome (gained weight).

Diagnostics

Before interpreting results, check that the assumptions hold. diagnose() provides positivity checks, covariate balance, and method-specific summaries.

Code
diag_ipw <- diagnose(fit_ipw)
#> Note: `s.d.denom` not specified; assuming "pooled".
diag_m <- diagnose(fit_m)

IPW balance

Code
plot(diag_ipw)
#> Note: `s.d.denom` not specified; assuming "pooled".

Love plot for IPW covariate balance in the triangulation analysis.

Matching balance

Code
plot(diag_m)

Love plot for matching covariate balance in the triangulation analysis.

Cross-method effect modification

All four point-treatment methods support effect modification via A:modifier interaction terms in confounders. When the four methods agree on stratum-specific treatment effects, this gives strong evidence that the effect heterogeneity is real rather than an artefact of model specification.

Here we add qsmk:sex to examine whether the effect of quitting smoking on weight change differs by sex. Each method handles the interaction differently (g-comp feeds it to the outcome model; IPW expands the MSM to Y ~ 1 + sex; matching expands to Y ~ qsmk + sex + qsmk:sex), but all should recover the same stratum-specific estimates under correct specification.

Code
confounders_em <- ~ sex + age + I(age^2) + race + factor(education) +
  smokeintensity + I(smokeintensity^2) + smokeyrs + I(smokeyrs^2) +
  factor(exercise) + factor(active) + wt71 + I(wt71^2) +
  qsmk:sex

fit_gc_em <- causat(
  nhefs_complete,
  outcome = "wt82_71",
  treatment = "qsmk",
  confounders = confounders_em,
  model_fn = stats::glm
)
fit_ipw_em <- causat(
  nhefs_complete,
  outcome = "wt82_71",
  treatment = "qsmk",
  confounders = confounders_em,
  estimator = "ipw",
  model_fn = stats::glm,
  propensity_model_fn = stats::glm
)
fit_aipw_em <- causat(
  nhefs_complete,
  outcome = "wt82_71",
  treatment = "qsmk",
  confounders = confounders_em,
  estimator = "aipw",
  model_fn = stats::glm,
  propensity_model_fn = stats::glm
)
fit_m_em <- causat(
  nhefs_complete,
  outcome = "wt82_71",
  treatment = "qsmk",
  confounders = confounders_em,
  estimator = "matching"
)

res_gc_em <- contrast(
  fit_gc_em, ivs,
  reference = "continue",
  ci_method = "sandwich",
  by = "sex"
)
res_ipw_em <- contrast(
  fit_ipw_em, ivs,
  reference = "continue",
  ci_method = "sandwich",
  by = "sex"
)
res_aipw_em <- contrast(
  fit_aipw_em, ivs,
  reference = "continue",
  ci_method = "sandwich",
  by = "sex"
)
res_m_em <- contrast(
  fit_m_em, ivs,
  reference = "continue",
  ci_method = "sandwich",
  by = "sex"
)

results_em <- rbind(
  cbind(method = "G-computation", res_gc_em$contrasts),
  cbind(method = "IPW",           res_ipw_em$contrasts),
  cbind(method = "AIPW",          res_aipw_em$contrasts),
  cbind(method = "Matching",      res_m_em$contrasts)
)
results_em
#>           method       comparison estimate        se ci_lower ci_upper     by
#>           <char>           <char>    <num>     <num>    <num>    <num> <char>
#> 1: G-computation quit vs continue 3.576000 0.6174533 2.365814 4.786186   Male
#> 2: G-computation quit vs continue 3.333079 0.6924483 1.975906 4.690253 Female
#> 3:           IPW quit vs continue 3.568991 0.6329339 2.328464 4.809519   Male
#> 4:           IPW quit vs continue 3.430123 0.7769946 1.907242 4.953005 Female
#> 5:          AIPW quit vs continue 3.437330 0.6259255 2.210539 4.664122   Male
#> 6:          AIPW quit vs continue 3.511083 0.7378491 2.064925 4.957240 Female
#> 7:      Matching quit vs continue 3.057178 0.5785409 1.923259 4.191097   Male
#> 8:      Matching quit vs continue 4.178475 0.5785409 3.044555 5.312394 Female
#>     n_by
#>    <int>
#> 1:   762
#> 2:   804
#> 3:   762
#> 4:   804
#> 5:   762
#> 6:   804
#> 7:   762
#> 8:   804
Code
tinyplot(
  estimate ~ interaction(method, by),
  data = results_em,
  type = "pointrange",
  ymin = results_em$ci_lower,
  ymax = results_em$ci_upper,
  xlab = "Method x Sex",
  ylab = "ATE: weight change (kg)",
  main = "Triangulation: effect modification by sex"
)
abline(h = 0, lty = 2, col = "grey40")

Forest plot comparing g-computation, IPW, AIPW, and matching on sex-stratified ATE.

Agreement across methods within each stratum provides evidence that the sex-specific effects are real. Disagreement would point to model misspecification in one or more methods.

ImportantEffect modifier must be a baseline variable under IPW, AIPW, and matching

Under IPW, AIPW, and matching, the modifier in A:modifier must be a baseline (pre-treatment) variable. A time-varying modifier in a marginal structural model conditions on a post-treatment variable, which biases the estimand via mediator and collider paths (Robins, Hernán, Brumback 2000). G-computation has no such restriction because it models the outcome, not the treatment.

For time-varying effect modification (modifiers affected by prior treatment), use estimator = "snm" — see vignette("snm").

causatr does not enforce baseline-ness at runtime (time-varying status isn’t inferable from data alone). This is a user responsibility.

When to use which method

Criterion G-computation IPW AIPW SNM Matching
Model required Outcome Treatment Both Treatment Treatment
Doubly robust No No Yes No No
Semiparametrically efficient No No Yes (when both correct) No No
Non-static interventions Yes Yes (shift, scale_by, dynamic, IPSI) Yes (same as IPW) No (treatment_values = only) No (static only)
Continuous treatment Yes Yes (GPS) Yes (shift, scale_by) Yes No
Categorical (k > 2) Yes Yes Yes Pending No
Longitudinal (time-varying) Yes (ICE) Yes Yes (ICE-AIPW) Yes (backward g-estimation) No
Time-varying effect modification Requires correct high-dim outcome model Biased Biased Yes (headline use case) Biased
Variance estimation Sandwich or bootstrap Sandwich or bootstrap Sandwich or bootstrap Sandwich (bootstrap pending) Cluster-robust sandwich

Recommendation:

  • Use AIPW when you have credible models for both the outcome and treatment and want protection against misspecification in either one.
  • Use g-computation when you want to exploit complex outcome-model structure (e.g. threshold() interventions, GAM nuisances) not supported by the density-ratio engine.
  • Use IPW when the treatment model is the primary scientific object and you want to avoid specifying an outcome model.
  • Use SNM when your research question involves effect modification by a variable that is itself affected by treatment — the other methods produce biased results in this setting. See vignette("snm").
  • Use matching for ATT estimation with balance diagnostics as the primary product.

When all methods agree, the result is unlikely to be an artefact of any single modelling assumption. Disagreement points to misspecification — investigate further with diagnose() and covariate balance checks.

For features marked “No”, causatr aborts at fit or contrast time with a pointer to the planning doc rather than silently returning a wrong answer — see FEATURE_COVERAGE_MATRIX.md for the full rejection-path inventory.

Separate confounders for double robustness

AIPW is consistent when either the outcome model or the propensity model is correctly specified. To exploit this, you can pass different covariate sets to each model via confounders_outcome and confounders_treatment:

Code
fit_dr <- causat(
  nhefs_complete,
  outcome = "wt82_71",
  treatment = "qsmk",
  confounders_outcome = ~ sex + age + I(age^2) + race +
    factor(education) + smokeintensity + I(smokeintensity^2) +
    smokeyrs + I(smokeyrs^2) + factor(exercise) + factor(active) +
    wt71 + I(wt71^2),
  confounders_treatment = ~ sex + age + race + factor(education) +
    smokeintensity + smokeyrs + factor(exercise) + factor(active) + wt71,
  estimator = "aipw",
  propensity_model_fn = stats::glm
)

The same per-component formulas are available for every estimator (though g-comp only uses the outcome side and IPW only uses the treatment side). Additional per-component formulas exist for the censoring model (confounders_censoring), the transportability sampling model (confounders_sampling), and time-varying confounders (confounders_tv_outcome, confounders_tv_treatment).

The legacy confounders argument still works as a shorthand that sets all components at once.

References

Hernán MA, Robins JM (2025). Causal Inference: What If. Chapman & Hall/CRC.