Introduction to causatr

causatr provides a unified interface for causal effect estimation via five complementary methods:

When multiple methods agree, you can be more confident in your findings. This is called methodological triangulation.

Two-step API

All analyses follow the same two-step pattern:

# Step 1: Fit
fit <- causat(data, outcome = "Y", treatment = "A", confounders = ~ L1 + L2,
              estimator = "gcomp", model_fn = stats::glm)

# Step 2: Contrast
result <- contrast(fit,
  interventions = list(treated = static(1), control = static(0)),
  reference = "control")

Quick example

The example below uses the NHEFS (National Health and Nutrition Examination Survey — Epidemiologic Follow-up Study) dataset from Hernán & Robins (2025). The assumed causal structure is:

  • Treatment (\(A\)): qsmk — whether the individual quit smoking between baseline (1971) and follow-up (1982). Binary (0 = continued, 1 = quit).
  • Outcome (\(Y\)): wt82_71 — change in body weight (kg) from 1971 to
    1. Continuous.
  • Confounders (\(L\)): a set of baseline covariates — sex, age, race, smoking intensity (cigarettes/day), years of smoking, exercise, physical activity level, and baseline weight — that jointly affect both the decision to quit smoking and the change in weight.
  • Censoring (\(C\)): censored — whether the individual was lost to follow-up. Rows with \(C = 1\) are excluded from model fitting; predictions are standardised over the full target population.

Under the causal DAG, \(L\) is a common cause of both \(A\) and \(Y\). Adjusting for \(L\) blocks all backdoor paths from \(A\) to \(Y\), so the g-formula identifies the average causal effect \(\mathbb{E}[Y^{a=1}] - \mathbb{E}[Y^{a=0}]\).

Code
library(causatr)
data("nhefs")

fit <- causat(
  nhefs,
  outcome = "wt82_71",
  treatment = "qsmk",
  confounders = ~ sex + age + race + smokeintensity + smokeyrs +
    factor(exercise) + factor(active) + wt71,
  censoring = "censored",
  model_fn = stats::glm
)

result <- contrast(
  fit,
  interventions = list(quit = static(1), continue = static(0)),
  reference = "continue"
)
result

Estimator: gcomp  ·  Estimand: ATE  ·  Contrast: difference  ·  CI method: sandwich  ·  N: 1746

Intervention means
intervention estimate se ci_lower ci_upper
quit 5.033 0.404 4.241 5.825
continue 1.877 0.209 1.467 2.287
Contrasts
comparison estimate se ci_lower ci_upper
quit vs continue 3.156 0.449 2.276 4.035

Intervention types

causatr supports several intervention types for g-computation:

Function Description Example
static(value) Set treatment to a fixed value static(1)
shift(delta) Add a constant to observed treatment shift(-10)
scale_by(factor) Multiply observed treatment scale_by(0.5)
threshold(lower, upper) Clamp treatment within bounds threshold(0, 20)
dynamic(rule) Apply a user-defined rule dynamic(\(d, a) ifelse(d$L > 0, 1, 0))
NULL Natural course (observed values)

IPW supports static(), shift(), scale_by(), dynamic(), and ipsi(). AIPW supports the same set as IPW (all except threshold() and stochastic()). Matching supports only static() interventions.

Contrast types

Type Meaning
"difference" Risk/mean difference (default)
"ratio" Risk ratio
"or" Odds ratio (binary outcomes)

Inference methods

Method Description
"sandwich" Robust sandwich SE (default, fast)
"bootstrap" Nonparametric bootstrap (resamples full pipeline)

Estimands

Estimand Population Applicability
"ATE" All individuals Always (default)
"ATT" Observed treated (A = 1) Binary point treatment only
"ATC" Observed controls (A = 0) Binary point treatment only
subset = quote(...) Custom subgroup Always

For g-computation, the estimand can be changed in contrast() without refitting. For IPW and matching, refit with causat(estimand = ...).

Treatment types

causatr supports binary, continuous, categorical, count, and multivariate treatments:

Treatment type G-comp IPW AIPW Matching
Binary (0/1) Yes Yes Yes Yes
Continuous Yes Yes (GPS) Yes (shift, scale_by) No (MatchIt binary-only)
Categorical (k > 2) Yes Yes Yes No (MatchIt binary-only)
Count (Poisson / NB) Yes Yes (propensity_family =) Yes No
Multivariate (joint) Yes Yes (sequential MTP) Yes No

For multivariate treatments (e.g. treatment = c("A1", "A2")), supply interventions as named lists with one element per treatment variable.

Effect modification via A:modifier interaction terms in confounders is supported across all estimation methods. G-comp and AIPW feed the interaction to the outcome model directly; IPW expands the per-intervention MSM to Y ~ 1 + modifier; matching expands to Y ~ A + modifier + A:modifier; and ICE auto-expands the interaction across treatment lags. Use by = "modifier" in contrast() to obtain stratum-specific estimates.

Effect modification

The by argument in contrast() stratifies estimates by levels of a variable, producing subgroup-specific effects:

Code
contrast(fit,
  interventions = list(quit = static(1), continue = static(0)),
  by = "sex"
)

Survey designs and cluster-robust inference

causat() and contrast() support two inference-infrastructure conveniences for design-aware studies.

Sampling weights via survey::svydesign. Pass a design object directly to weights = and causatr unpacks both the sampling weights (via stats::weights()) and the first-stage cluster (PSU):

Code
library(survey)
des <- svydesign(ids = ~psu, weights = ~pw, data = nhefs)
fit <- causat(nhefs, outcome = "wt82_71", treatment = "qsmk",
              confounders = ~ sex + age + wt71, weights = des,
              model_fn = stats::glm)

Cluster-robust sandwich. Use cluster = "col" on causat() (or contrast()) to aggregate per-individual influence functions within-cluster before squaring (Liang & Zeger 1986). Numerically equivalent to sandwich::vcovCL(type = "HC0") applied to the final predict-then-average step. Supported for gcomp / IPW / ICE; rejected for matching (subclass clustering is already structural).

future backend for bootstrap. contrast(parallel = "future") dispatches replicates through future.apply::future_lapply(), so any future::plan() (local multisession, remote cluster, future.batchtools for HPC) is honoured.

Diagnostics

Use diagnose() to check positivity and covariate balance after fitting:

Code
diag <- diagnose(fit)
diag          # print positivity + balance summary
plot(diag)    # Love plot (requires cobalt)

For IPW and matching, balance is computed via cobalt and includes standardised mean differences before and after adjustment.

Extracting results

causatr result objects support standard R generics:

coef(result)       # named vector of E[Y^a]
confint(result)    # CI matrix (respects custom `level`)
vcov(result)       # vcov of marginal means
tidy(result)       # broom-style tidy data frame
glance(result)     # broom-style one-row summary
plot(result)       # forest plot (requires forrest package)

Limitations and known rejections

causatr rejects unsupported combinations with informative errors rather than silently producing wrong answers. Key restrictions:

What Why
Matching requires binary treatment MatchIt does not support continuous/categorical treatments
ATT/ATC only for binary 0/1 treatment The estimand is not defined for other treatment types
threshold() under IPW The pushforward density has point masses at the bounds (mixed measure)
dynamic() on continuous treatment under IPW A deterministic rule on a continuous treatment has no Lebesgue density
ipsi() only under IPW, binary only The IPSI weight formula requires binary treatment and a propensity model
stochastic() only under g-comp The density ratio of a user-defined stochastic policy is intractable
Missing treatment values Cannot define a counterfactual without observed treatment
na.action = na.exclude rejected Causes silent IF corruption via residual padding mismatch
Effect modifier must be baseline under IPW/matching Time-varying modifiers in MSMs bias via collider paths (not enforced at runtime)
Multivariate longitudinal IPW Not yet supported; use ICE g-computation

For the full matrix of supported estimator × treatment × intervention × outcome combinations, see vignette("interventions") and FEATURE_COVERAGE_MATRIX.md.

WarningConfounder selection: use a DAG, not data-driven methods

The confounders = formula in causat() should reflect a pre-specified causal DAG (directed acyclic graph) encoding your assumptions about the data-generating process. Do not select confounders via stepwise regression, LASSO, or other data-driven variable selection. These methods optimise predictive fit, not causal identification, and can: - Include colliders — conditioning on a common effect of treatment and outcome opens a non-causal path, biasing the estimate toward zero or even flipping its sign. - Include instruments — variables that affect treatment but not outcome inflate variance without reducing confounding bias (a pure efficiency loss under correct specification, but can amplify bias from unmeasured confounders). - Include mediators — variables on the causal path from treatment to outcome block part of the effect you are trying to estimate, attenuating it toward the null.

Use dagitty to encode your DAG and query it for the minimal sufficient adjustment set (dagitty::adjustmentSets()). See Hernán & Robins (2025), Ch. 18.

Learning more

causatr includes detailed vignettes for each estimation method: g-computation, IPW, matching, longitudinal ICE, transportability, and methodological triangulation. Use vignette(package = "causatr") to see all available vignettes.