causatr provides a unified interface for causal effect estimation via five complementary methods:
G-computation (parametric g-formula) — fits an outcome model \(E[Y | A, L]\) and standardises predictions over the target population.
Inverse probability weighting (IPW) — reweights the data using propensity scores so that confounders are balanced across treatment groups.
Augmented IPW (AIPW) — combines the outcome model and propensity score into a doubly-robust estimator: consistent if either model is correctly specified, and semiparametrically efficient when both are correct.
Structural nested mean models (SNM) — g-estimation of blip parameters using the treatment model as an instrument. The only method that correctly handles time-varying effect modification (modifiers affected by prior treatment). See vignette("snm").
Propensity score matching — pairs treated and control units with similar propensity scores.
When multiple methods agree, you can be more confident in your findings. This is called methodological triangulation.
The example below uses the NHEFS (National Health and Nutrition Examination Survey — Epidemiologic Follow-up Study) dataset from Hernán & Robins (2025). The assumed causal structure is:
Treatment (\(A\)):qsmk — whether the individual quit smoking between baseline (1971) and follow-up (1982). Binary (0 = continued, 1 = quit).
Outcome (\(Y\)):wt82_71 — change in body weight (kg) from 1971 to
Continuous.
Confounders (\(L\)): a set of baseline covariates — sex, age, race, smoking intensity (cigarettes/day), years of smoking, exercise, physical activity level, and baseline weight — that jointly affect both the decision to quit smoking and the change in weight.
Censoring (\(C\)):censored — whether the individual was lost to follow-up. Rows with \(C = 1\) are excluded from model fitting; predictions are standardised over the full target population.
Under the causal DAG, \(L\) is a common cause of both \(A\) and \(Y\). Adjusting for \(L\) blocks all backdoor paths from \(A\) to \(Y\), so the g-formula identifies the average causal effect \(\mathbb{E}[Y^{a=1}] - \mathbb{E}[Y^{a=0}]\).
Estimator: gcomp · Estimand: ATE · Contrast: difference · CI method: sandwich · N: 1746
Intervention means
intervention
estimate
se
ci_lower
ci_upper
quit
5.033
0.404
4.241
5.825
continue
1.877
0.209
1.467
2.287
Contrasts
comparison
estimate
se
ci_lower
ci_upper
quit vs continue
3.156
0.449
2.276
4.035
Intervention types
causatr supports several intervention types for g-computation:
Function
Description
Example
static(value)
Set treatment to a fixed value
static(1)
shift(delta)
Add a constant to observed treatment
shift(-10)
scale_by(factor)
Multiply observed treatment
scale_by(0.5)
threshold(lower, upper)
Clamp treatment within bounds
threshold(0, 20)
dynamic(rule)
Apply a user-defined rule
dynamic(\(d, a) ifelse(d$L > 0, 1, 0))
NULL
Natural course (observed values)
—
IPW supports static(), shift(), scale_by(), dynamic(), and ipsi(). AIPW supports the same set as IPW (all except threshold() and stochastic()). Matching supports only static() interventions.
Contrast types
Type
Meaning
"difference"
Risk/mean difference (default)
"ratio"
Risk ratio
"or"
Odds ratio (binary outcomes)
Inference methods
Method
Description
"sandwich"
Robust sandwich SE (default, fast)
"bootstrap"
Nonparametric bootstrap (resamples full pipeline)
Estimands
Estimand
Population
Applicability
"ATE"
All individuals
Always (default)
"ATT"
Observed treated (A = 1)
Binary point treatment only
"ATC"
Observed controls (A = 0)
Binary point treatment only
subset = quote(...)
Custom subgroup
Always
For g-computation, the estimand can be changed in contrast() without refitting. For IPW and matching, refit with causat(estimand = ...).
Treatment types
causatr supports binary, continuous, categorical, count, and multivariate treatments:
Treatment type
G-comp
IPW
AIPW
Matching
Binary (0/1)
Yes
Yes
Yes
Yes
Continuous
Yes
Yes (GPS)
Yes (shift, scale_by)
No (MatchIt binary-only)
Categorical (k > 2)
Yes
Yes
Yes
No (MatchIt binary-only)
Count (Poisson / NB)
Yes
Yes (propensity_family =)
Yes
No
Multivariate (joint)
Yes
Yes (sequential MTP)
Yes
No
For multivariate treatments (e.g. treatment = c("A1", "A2")), supply interventions as named lists with one element per treatment variable.
Effect modification via A:modifier interaction terms in confounders is supported across all estimation methods. G-comp and AIPW feed the interaction to the outcome model directly; IPW expands the per-intervention MSM to Y ~ 1 + modifier; matching expands to Y ~ A + modifier + A:modifier; and ICE auto-expands the interaction across treatment lags. Use by = "modifier" in contrast() to obtain stratum-specific estimates.
Effect modification
The by argument in contrast() stratifies estimates by levels of a variable, producing subgroup-specific effects:
causat() and contrast() support two inference-infrastructure conveniences for design-aware studies.
Sampling weights via survey::svydesign. Pass a design object directly to weights = and causatr unpacks both the sampling weights (via stats::weights()) and the first-stage cluster (PSU):
Code
library(survey)des <-svydesign(ids =~psu, weights =~pw, data = nhefs)fit <-causat(nhefs, outcome ="wt82_71", treatment ="qsmk",confounders =~ sex + age + wt71, weights = des,model_fn = stats::glm)
Cluster-robust sandwich. Use cluster = "col" on causat() (or contrast()) to aggregate per-individual influence functions within-cluster before squaring (Liang & Zeger 1986). Numerically equivalent to sandwich::vcovCL(type = "HC0") applied to the final predict-then-average step. Supported for gcomp / IPW / ICE; rejected for matching (subclass clustering is already structural).
future backend for bootstrap.contrast(parallel = "future") dispatches replicates through future.apply::future_lapply(), so any future::plan() (local multisession, remote cluster, future.batchtools for HPC) is honoured.
Diagnostics
Use diagnose() to check positivity and covariate balance after fitting:
For IPW and matching, balance is computed via cobalt and includes standardised mean differences before and after adjustment.
Extracting results
causatr result objects support standard R generics:
coef(result) # named vector of E[Y^a]confint(result) # CI matrix (respects custom `level`)vcov(result) # vcov of marginal meanstidy(result) # broom-style tidy data frameglance(result) # broom-style one-row summaryplot(result) # forest plot (requires forrest package)
Limitations and known rejections
causatr rejects unsupported combinations with informative errors rather than silently producing wrong answers. Key restrictions:
What
Why
Matching requires binary treatment
MatchIt does not support continuous/categorical treatments
ATT/ATC only for binary 0/1 treatment
The estimand is not defined for other treatment types
threshold() under IPW
The pushforward density has point masses at the bounds (mixed measure)
dynamic() on continuous treatment under IPW
A deterministic rule on a continuous treatment has no Lebesgue density
ipsi() only under IPW, binary only
The IPSI weight formula requires binary treatment and a propensity model
stochastic() only under g-comp
The density ratio of a user-defined stochastic policy is intractable
Missing treatment values
Cannot define a counterfactual without observed treatment
na.action = na.exclude rejected
Causes silent IF corruption via residual padding mismatch
Effect modifier must be baseline under IPW/matching
Time-varying modifiers in MSMs bias via collider paths (not enforced at runtime)
Multivariate longitudinal IPW
Not yet supported; use ICE g-computation
For the full matrix of supported estimator × treatment × intervention × outcome combinations, see vignette("interventions") and FEATURE_COVERAGE_MATRIX.md.
WarningConfounder selection: use a DAG, not data-driven methods
The confounders = formula in causat() should reflect a pre-specified causal DAG (directed acyclic graph) encoding your assumptions about the data-generating process. Do not select confounders via stepwise regression, LASSO, or other data-driven variable selection. These methods optimise predictive fit, not causal identification, and can: - Include colliders — conditioning on a common effect of treatment and outcome opens a non-causal path, biasing the estimate toward zero or even flipping its sign. - Include instruments — variables that affect treatment but not outcome inflate variance without reducing confounding bias (a pure efficiency loss under correct specification, but can amplify bias from unmeasured confounders). - Include mediators — variables on the causal path from treatment to outcome block part of the effect you are trying to estimate, attenuating it toward the null.
Use dagitty to encode your DAG and query it for the minimal sufficient adjustment set (dagitty::adjustmentSets()). See Hernán & Robins (2025), Ch. 18.
Learning more
causatr includes detailed vignettes for each estimation method: g-computation, IPW, matching, longitudinal ICE, transportability, and methodological triangulation. Use vignette(package = "causatr") to see all available vignettes.
Source Code
---title: "Introduction to causatr"code-fold: showcode-tools: truevignette: > %\VignetteIndexEntry{Introduction to causatr} %\VignetteEngine{quarto::html} %\VignetteEncoding{UTF-8}---```{r}#| include: falseknitr::opts_chunk$set(collapse =TRUE, comment ="#>")```causatr provides a unified interface for causal effect estimation via fivecomplementary methods:- **G-computation** (parametric g-formula) — fits an outcome model $E[Y | A, L]$ and standardises predictions over the target population.- **Inverse probability weighting (IPW)** — reweights the data using propensity scores so that confounders are balanced across treatment groups.- **Augmented IPW (AIPW)** — combines the outcome model and propensity score into a doubly-robust estimator: consistent if *either* model is correctly specified, and semiparametrically efficient when both are correct.- **Structural nested mean models (SNM)** — g-estimation of blip parameters using the treatment model as an instrument. The only method that correctly handles **time-varying effect modification** (modifiers affected by prior treatment). See `vignette("snm")`.- **Propensity score matching** — pairs treated and control units with similar propensity scores.When multiple methods agree, you can be more confident in your findings.This is called **methodological triangulation**.## Two-step APIAll analyses follow the same two-step pattern:```r# Step 1: Fitfit <-causat(data, outcome ="Y", treatment ="A", confounders =~ L1 + L2,estimator ="gcomp", model_fn = stats::glm)# Step 2: Contrastresult <-contrast(fit,interventions =list(treated =static(1), control =static(0)),reference ="control")```## Quick exampleThe example below uses the **NHEFS** (National Health and NutritionExamination Survey — Epidemiologic Follow-up Study) dataset from Hernán &Robins (2025). The assumed causal structure is:- **Treatment ($A$):** `qsmk` — whether the individual quit smoking between baseline (1971) and follow-up (1982). Binary (0 = continued, 1 = quit).- **Outcome ($Y$):** `wt82_71` — change in body weight (kg) from 1971 to 1982. Continuous.- **Confounders ($L$):** a set of baseline covariates — sex, age, race, smoking intensity (cigarettes/day), years of smoking, exercise, physical activity level, and baseline weight — that jointly affect both the decision to quit smoking and the change in weight.- **Censoring ($C$):** `censored` — whether the individual was lost to follow-up. Rows with $C = 1$ are excluded from model fitting; predictions are standardised over the full target population.Under the causal DAG, $L$ is a common cause of both $A$ and $Y$. Adjustingfor $L$ blocks all backdoor paths from $A$ to $Y$, so the g-formulaidentifies the average causal effect $\mathbb{E}[Y^{a=1}] - \mathbb{E}[Y^{a=0}]$.```{r}#| message: falselibrary(causatr)data("nhefs")fit <-causat( nhefs,outcome ="wt82_71",treatment ="qsmk",confounders =~ sex + age + race + smokeintensity + smokeyrs +factor(exercise) +factor(active) + wt71,censoring ="censored",model_fn = stats::glm)result <-contrast( fit,interventions =list(quit =static(1), continue =static(0)),reference ="continue")result```## Intervention typescausatr supports several intervention types for g-computation:| Function | Description | Example ||---|---|---||`static(value)`| Set treatment to a fixed value |`static(1)`||`shift(delta)`| Add a constant to observed treatment |`shift(-10)`||`scale_by(factor)`| Multiply observed treatment |`scale_by(0.5)`||`threshold(lower, upper)`| Clamp treatment within bounds |`threshold(0, 20)`||`dynamic(rule)`| Apply a user-defined rule |`dynamic(\(d, a) ifelse(d$L > 0, 1, 0))`||`NULL`| Natural course (observed values) | — |IPW supports `static()`, `shift()`, `scale_by()`, `dynamic()`, and `ipsi()`.AIPW supports the same set as IPW (all except `threshold()` and `stochastic()`).Matching supports only `static()` interventions.## Contrast types| Type | Meaning ||---|---||`"difference"`| Risk/mean difference (default) ||`"ratio"`| Risk ratio ||`"or"`| Odds ratio (binary outcomes) |## Inference methods| Method | Description ||---|---||`"sandwich"`| Robust sandwich SE (default, fast) ||`"bootstrap"`| Nonparametric bootstrap (resamples full pipeline) |## Estimands| Estimand | Population | Applicability ||---|---|---||`"ATE"`| All individuals | Always (default) ||`"ATT"`| Observed treated (A = 1) | Binary point treatment only ||`"ATC"`| Observed controls (A = 0) | Binary point treatment only ||`subset = quote(...)`| Custom subgroup | Always |For g-computation, the estimand can be changed in `contrast()` withoutrefitting. For IPW and matching, refit with `causat(estimand = ...)`.## Treatment typescausatr supports binary, continuous, categorical, count, and multivariate treatments:| Treatment type | G-comp | IPW | AIPW | Matching ||---|---|---|---|---|| Binary (0/1) | Yes | Yes | Yes | Yes || Continuous | Yes | Yes (GPS) | Yes (shift, scale\_by) | No (MatchIt binary-only) || Categorical (k > 2) | Yes | Yes | Yes | No (MatchIt binary-only) || Count (Poisson / NB) | Yes | Yes (`propensity_family =`) | Yes | No || Multivariate (joint) | Yes | Yes (sequential MTP) | Yes | No |For multivariate treatments (e.g. `treatment = c("A1", "A2")`), supplyinterventions as named lists with one element per treatment variable.Effect modification via `A:modifier` interaction terms in `confounders` issupported across all estimation methods. G-comp and AIPW feed the interactionto the outcome model directly; IPW expands the per-intervention MSM to`Y ~ 1 + modifier`; matching expands to `Y ~ A + modifier + A:modifier`;and ICE auto-expands the interaction across treatment lags. Use`by = "modifier"` in `contrast()` to obtain stratum-specific estimates.## Effect modificationThe `by` argument in `contrast()` stratifies estimates by levels of avariable, producing subgroup-specific effects:```{r}#| eval: falsecontrast(fit,interventions =list(quit =static(1), continue =static(0)),by ="sex")```## Survey designs and cluster-robust inference`causat()` and `contrast()` support two inference-infrastructureconveniences for design-aware studies.**Sampling weights via `survey::svydesign`.** Pass a design objectdirectly to `weights =` and causatr unpacks both the sampling weights(via `stats::weights()`) and the first-stage cluster (PSU):```{r}#| eval: falselibrary(survey)des <-svydesign(ids =~psu, weights =~pw, data = nhefs)fit <-causat(nhefs, outcome ="wt82_71", treatment ="qsmk",confounders =~ sex + age + wt71, weights = des,model_fn = stats::glm)```**Cluster-robust sandwich.** Use `cluster = "col"` on `causat()` (or`contrast()`) to aggregate per-individual influence functionswithin-cluster before squaring (Liang & Zeger 1986). Numericallyequivalent to `sandwich::vcovCL(type = "HC0")` applied to the finalpredict-then-average step. Supported for gcomp / IPW / ICE; rejectedfor matching (subclass clustering is already structural).**`future` backend for bootstrap.** `contrast(parallel = "future")`dispatches replicates through `future.apply::future_lapply()`, so any`future::plan()` (local multisession, remote cluster,`future.batchtools` for HPC) is honoured.## DiagnosticsUse `diagnose()` to check positivity and covariate balance after fitting:```{r}#| eval: falsediag <-diagnose(fit)diag # print positivity + balance summaryplot(diag) # Love plot (requires cobalt)```For IPW and matching, balance is computed via[cobalt](https://ngreifer.github.io/cobalt/) and includes standardised meandifferences before and after adjustment.## Extracting resultscausatr result objects support standard R generics:```rcoef(result) # named vector of E[Y^a]confint(result) # CI matrix (respects custom `level`)vcov(result) # vcov of marginal meanstidy(result) # broom-style tidy data frameglance(result) # broom-style one-row summaryplot(result) # forest plot (requires forrest package)```## Limitations and known rejectionscausatr rejects unsupported combinations with informative errors rather thansilently producing wrong answers. Key restrictions:| What | Why ||---|---|| **Matching requires binary treatment** | MatchIt does not support continuous/categorical treatments || **ATT/ATC only for binary 0/1 treatment** | The estimand is not defined for other treatment types || **`threshold()` under IPW** | The pushforward density has point masses at the bounds (mixed measure) || **`dynamic()` on continuous treatment under IPW** | A deterministic rule on a continuous treatment has no Lebesgue density || **`ipsi()` only under IPW, binary only** | The IPSI weight formula requires binary treatment and a propensity model || **`stochastic()` only under g-comp** | The density ratio of a user-defined stochastic policy is intractable || **Missing treatment values** | Cannot define a counterfactual without observed treatment || **`na.action = na.exclude` rejected** | Causes silent IF corruption via residual padding mismatch || **Effect modifier must be baseline under IPW/matching** | Time-varying modifiers in MSMs bias via collider paths (not enforced at runtime) || **Multivariate longitudinal IPW** | Not yet supported; use ICE g-computation |For the full matrix of supported estimator × treatment × intervention × outcomecombinations, see `vignette("interventions")` and`FEATURE_COVERAGE_MATRIX.md`.::: {.callout-warning}## Confounder selection: use a DAG, not data-driven methodsThe `confounders =` formula in `causat()` should reflect a pre-specified**causal DAG** (directed acyclic graph) encoding your assumptions about thedata-generating process. Do **not** select confounders via stepwise regression,LASSO, or other data-driven variable selection. These methods optimisepredictive fit, not causal identification, and can:- **Include colliders** — conditioning on a common effect of treatment and outcome opens a non-causal path, biasing the estimate toward zero or even flipping its sign.- **Include instruments** — variables that affect treatment but not outcome inflate variance without reducing confounding bias (a pure efficiency loss under correct specification, but can amplify bias from unmeasured confounders).- **Include mediators** — variables on the causal path from treatment to outcome block part of the effect you are trying to estimate, attenuating it toward the null.Use [`dagitty`](https://cran.r-project.org/package=dagitty) to encode your DAGand query it for the minimal sufficient adjustment set(`dagitty::adjustmentSets()`). See Hernán & Robins (2025), Ch. 18.:::## Learning morecausatr includes detailed vignettes for each estimation method: g-computation,IPW, matching, longitudinal ICE, transportability, and methodologicaltriangulation. Use `vignette(package = "causatr")` to see all availablevignettes.