Introduction to causatr

causatr provides a unified interface for causal effect estimation via three complementary methods:

When all three methods agree, you can be more confident in your findings. This is called methodological triangulation.

Two-step API

All analyses follow the same two-step pattern:

# Step 1: Fit
fit <- causat(data, outcome = "Y", treatment = "A", confounders = ~ L1 + L2,
              estimator = "gcomp")

# Step 2: Contrast
result <- contrast(fit,
  interventions = list(treated = static(1), control = static(0)),
  reference = "control")

Quick example

The example below uses the NHEFS (National Health and Nutrition Examination Survey — Epidemiologic Follow-up Study) dataset from Hernán & Robins (2025). The assumed causal structure is:

  • Treatment (\(A\)): qsmk — whether the individual quit smoking between baseline (1971) and follow-up (1982). Binary (0 = continued, 1 = quit).
  • Outcome (\(Y\)): wt82_71 — change in body weight (kg) from 1971 to
    1. Continuous.
  • Confounders (\(L\)): a set of baseline covariates — sex, age, race, smoking intensity (cigarettes/day), years of smoking, exercise, physical activity level, and baseline weight — that jointly affect both the decision to quit smoking and the change in weight.
  • Censoring (\(C\)): censored — whether the individual was lost to follow-up. Rows with \(C = 1\) are excluded from model fitting; predictions are standardised over the full target population.

Under the causal DAG, \(L\) is a common cause of both \(A\) and \(Y\). Adjusting for \(L\) blocks all backdoor paths from \(A\) to \(Y\), so the g-formula identifies the average causal effect \(\mathbb{E}[Y^{a=1}] - \mathbb{E}[Y^{a=0}]\).

Code
library(causatr)
data("nhefs")

fit <- causat(
  nhefs,
  outcome = "wt82_71",
  treatment = "qsmk",
  confounders = ~ sex + age + race + smokeintensity + smokeyrs +
    factor(exercise) + factor(active) + wt71,
  censoring = "censored"
)

result <- contrast(
  fit,
  interventions = list(quit = static(1), continue = static(0)),
  reference = "continue"
)
result

Estimator: gcomp  ·  Estimand: ATE  ·  Contrast: difference  ·  CI method: sandwich  ·  N: 1746

Intervention means
intervention estimate se ci_lower ci_upper
quit 5.033 0.404 4.241 5.825
continue 1.877 0.209 1.467 2.287
Contrasts
comparison estimate se ci_lower ci_upper
quit vs continue 3.156 0.449 2.276 4.035

Intervention types

causatr supports several intervention types for g-computation:

Function Description Example
static(value) Set treatment to a fixed value static(1)
shift(delta) Add a constant to observed treatment shift(-10)
scale_by(factor) Multiply observed treatment scale_by(0.5)
threshold(lower, upper) Clamp treatment within bounds threshold(0, 20)
dynamic(rule) Apply a user-defined rule dynamic(\(d, a) ifelse(d$L > 0, 1, 0))
NULL Natural course (observed values)

IPW supports static(), shift(), scale_by(), dynamic(), and ipsi(). Matching supports only static() interventions.

Contrast types

Type Meaning
"difference" Risk/mean difference (default)
"ratio" Risk ratio
"or" Odds ratio (binary outcomes)

Inference methods

Method Description
"sandwich" Robust sandwich SE (default, fast)
"bootstrap" Nonparametric bootstrap (resamples full pipeline)

Estimands

Estimand Population Applicability
"ATE" All individuals Always (default)
"ATT" Observed treated (A = 1) Binary point treatment only
"ATC" Observed controls (A = 0) Binary point treatment only
subset = quote(...) Custom subgroup Always

For g-computation, the estimand can be changed in contrast() without refitting. For IPW and matching, refit with causat(estimand = ...).

Treatment types

causatr supports binary, continuous, categorical, and multivariate treatments:

Treatment type G-comp IPW Matching
Binary (0/1) Yes Yes Yes
Continuous Yes Yes (GPS) No (MatchIt binary-only)
Categorical (k > 2) Yes Yes No (MatchIt binary-only)
Multivariate (joint) Yes Not yet Not yet

For multivariate treatments (e.g. treatment = c("A1", "A2")), supply interventions as named lists with one element per treatment variable.

Effect modification in the outcome / MSM model itself (e.g. ~ L + sex + A:sex) is currently supported only for g-computation. IPW and matching wrap a saturated Y ~ A MSM around the propensity / matched design, and reject an A:modifier term up front — use estimator = "gcomp" for heterogeneous effects, or by = "modifier" below for stratum-specific summaries of a homogeneous effect. See PHASE_6_INTERACTIONS.md for the planned unified API.

Effect modification

The by argument in contrast() stratifies estimates by levels of a variable, producing subgroup-specific effects:

Code
contrast(fit,
  interventions = list(quit = static(1), continue = static(0)),
  by = "sex"
)

Diagnostics

Use diagnose() to check positivity and covariate balance after fitting:

Code
diag <- diagnose(fit)
diag          # print positivity + balance summary
plot(diag)    # Love plot (requires cobalt)

For IPW and matching, balance is computed via cobalt and includes standardised mean differences before and after adjustment.

Extracting results

causatr result objects support standard R generics:

coef(result)       # named vector of E[Y^a]
confint(result)    # CI matrix (respects custom `level`)
vcov(result)       # vcov of marginal means
tidy(result)       # broom-style tidy data frame
glance(result)     # broom-style one-row summary
plot(result)       # forest plot (requires forrest package)

Learning more

causatr includes detailed vignettes for each estimation method: g-computation, IPW, matching, longitudinal ICE, and methodological triangulation. Use vignette(package = "causatr") to see all available vignettes.