Introduction to causatr

causatr provides a unified interface for causal effect estimation via five complementary methods:

G-computation (parametric g-formula) — fits an outcome model $E[Y | A, L]$ and standardises predictions over the target population.
Inverse probability weighting (IPW) — reweights the data using propensity scores so that confounders are balanced across treatment groups.
Augmented IPW (AIPW) — combines the outcome model and propensity score into a doubly-robust estimator: consistent if either model is correctly specified, and semiparametrically efficient when both are correct.
Structural nested mean models (SNM) — g-estimation of blip parameters using the treatment model as an instrument. The only method that correctly handles time-varying effect modification (modifiers affected by prior treatment). See vignette("snm").
Propensity score matching — pairs treated and control units with similar propensity scores.

When multiple methods agree, you can be more confident in your findings. This is called methodological triangulation.

Two-step API

All analyses follow the same two-step pattern:

# Step 1: Fit
fit <- causat(data, outcome = "Y", treatment = "A", confounders = ~ L1 + L2,
              estimator = "gcomp", model_fn = stats::glm)

# Step 2: Contrast
result <- contrast(fit,
  interventions = list(treated = static(1), control = static(0)),
  reference = "control")

Quick example

The example below uses the NHEFS (National Health and Nutrition Examination Survey — Epidemiologic Follow-up Study) dataset from Hernán & Robins (2025). The assumed causal structure is:

Treatment ($A$): qsmk — whether the individual quit smoking between baseline (1971) and follow-up (1982). Binary (0 = continued, 1 = quit).
Outcome ($Y$): wt82_71 — change in body weight (kg) from 1971 to
1. Continuous.
Confounders ($L$): a set of baseline covariates — sex, age, race, smoking intensity (cigarettes/day), years of smoking, exercise, physical activity level, and baseline weight — that jointly affect both the decision to quit smoking and the change in weight.
Censoring ($C$): censored — whether the individual was lost to follow-up. Rows with $C = 1$ are excluded from model fitting; predictions are standardised over the full target population.

Under the causal DAG, $L$ is a common cause of both $A$ and $Y$. Adjusting for $L$ blocks all backdoor paths from $A$ to $Y$, so the g-formula identifies the average causal effect $\mathbb{E}[Y^{a=1}] - \mathbb{E}[Y^{a=0}]$.

Code

library(causatr)
data("nhefs")

fit <- causat(
  nhefs,
  outcome = "wt82_71",
  treatment = "qsmk",
  confounders = ~ sex + age + race + smokeintensity + smokeyrs +
    factor(exercise) + factor(active) + wt71,
  censoring = "censored",
  model_fn = stats::glm
)

result <- contrast(
  fit,
  interventions = list(quit = static(1), continue = static(0)),
  reference = "continue"
)
result

Estimator: gcomp · Estimand: ATE · Contrast: difference · CI method: sandwich · N: 1746

Intervention means
intervention	estimate	se	ci_lower	ci_upper
quit	5.033	0.404	4.241	5.825
continue	1.877	0.209	1.467	2.287

Contrasts
comparison	estimate	se	ci_lower	ci_upper
quit vs continue	3.156	0.449	2.276	4.035

Intervention types

causatr supports several intervention types for g-computation:

Function	Description	Example
`static(value)`	Set treatment to a fixed value	`static(1)`
`shift(delta)`	Add a constant to observed treatment	`shift(-10)`
`scale_by(factor)`	Multiply observed treatment	`scale_by(0.5)`
`threshold(lower, upper)`	Clamp treatment within bounds	`threshold(0, 20)`
`dynamic(rule)`	Apply a user-defined rule	`dynamic(\(d, a) ifelse(d$L > 0, 1, 0))`
`NULL`	Natural course (observed values)	—

IPW supports static(), shift(), scale_by(), dynamic(), and ipsi(). AIPW supports the same set as IPW (all except threshold() and stochastic()). Matching supports only static() interventions.

Contrast types

Type	Meaning
`"difference"`	Risk/mean difference (default)
`"ratio"`	Risk ratio
`"or"`	Odds ratio (binary outcomes)

Inference methods

Method	Description
`"sandwich"`	Robust sandwich SE (default, fast)
`"bootstrap"`	Nonparametric bootstrap (resamples full pipeline)

Estimands

Estimand	Population	Applicability
`"ATE"`	All individuals	Always (default)
`"ATT"`	Observed treated (A = 1)	Binary point treatment only
`"ATC"`	Observed controls (A = 0)	Binary point treatment only
`subset = quote(...)`	Custom subgroup	Always

For g-computation, the estimand can be changed in contrast() without refitting. For IPW and matching, refit with causat(estimand = ...).

Treatment types

causatr supports binary, continuous, categorical, count, and multivariate treatments:

Treatment type	G-comp	IPW	AIPW	Matching
Binary (0/1)	Yes	Yes	Yes	Yes
Continuous	Yes	Yes (GPS)	Yes (shift, scale_by)	No (MatchIt binary-only)
Categorical (k > 2)	Yes	Yes	Yes	No (MatchIt binary-only)
Count (Poisson / NB)	Yes	Yes (`propensity_family =`)	Yes	No
Multivariate (joint)	Yes	Yes (sequential MTP)	Yes	No

For multivariate treatments (e.g. treatment = c("A1", "A2")), supply interventions as named lists with one element per treatment variable.

Effect modification via A:modifier interaction terms in confounders is supported across all estimation methods. G-comp and AIPW feed the interaction to the outcome model directly; IPW expands the per-intervention MSM to Y ~ 1 + modifier; matching expands to Y ~ A + modifier + A:modifier; and ICE auto-expands the interaction across treatment lags. Use by = "modifier" in contrast() to obtain stratum-specific estimates.

Effect modification

The by argument in contrast() stratifies estimates by levels of a variable, producing subgroup-specific effects:

Code

contrast(fit,
  interventions = list(quit = static(1), continue = static(0)),
  by = "sex"
)

Survey designs and cluster-robust inference

causat() and contrast() support two inference-infrastructure conveniences for design-aware studies.

Sampling weights via survey::svydesign. Pass a design object directly to weights = and causatr unpacks both the sampling weights (via stats::weights()) and the first-stage cluster (PSU):

Code

library(survey)
des <- svydesign(ids = ~psu, weights = ~pw, data = nhefs)
fit <- causat(nhefs, outcome = "wt82_71", treatment = "qsmk",
              confounders = ~ sex + age + wt71, weights = des,
              model_fn = stats::glm)

Cluster-robust sandwich. Use cluster = "col" on causat() (or contrast()) to aggregate per-individual influence functions within-cluster before squaring (Liang & Zeger 1986). Numerically equivalent to sandwich::vcovCL(type = "HC0") applied to the final predict-then-average step. Supported for gcomp / IPW / ICE; rejected for matching (subclass clustering is already structural).

future backend for bootstrap. contrast(parallel = "future") dispatches replicates through future.apply::future_lapply(), so any future::plan() (local multisession, remote cluster, future.batchtools for HPC) is honoured.

Diagnostics

Use diagnose() to check positivity and covariate balance after fitting:

Code

diag <- diagnose(fit)
diag          # print positivity + balance summary
plot(diag)    # Love plot (requires cobalt)

For IPW and matching, balance is computed via cobalt and includes standardised mean differences before and after adjustment.

Extracting results

causatr result objects support standard R generics:

coef(result)       # named vector of E[Y^a]
confint(result)    # CI matrix (respects custom `level`)
vcov(result)       # vcov of marginal means
tidy(result)       # broom-style tidy data frame
glance(result)     # broom-style one-row summary
plot(result)       # forest plot (requires forrest package)

Limitations and known rejections

causatr rejects unsupported combinations with informative errors rather than silently producing wrong answers. Key restrictions:

What	Why
Matching requires binary treatment	MatchIt does not support continuous/categorical treatments
ATT/ATC only for binary 0/1 treatment	The estimand is not defined for other treatment types
`threshold()` under IPW	The pushforward density has point masses at the bounds (mixed measure)
`dynamic()` on continuous treatment under IPW	A deterministic rule on a continuous treatment has no Lebesgue density
`ipsi()` only under IPW, binary only	The IPSI weight formula requires binary treatment and a propensity model
`stochastic()` only under g-comp	The density ratio of a user-defined stochastic policy is intractable
Missing treatment values	Cannot define a counterfactual without observed treatment
`na.action = na.exclude` rejected	Causes silent IF corruption via residual padding mismatch
Effect modifier must be baseline under IPW/matching	Time-varying modifiers in MSMs bias via collider paths (not enforced at runtime)
Multivariate longitudinal IPW	Not yet supported; use ICE g-computation

For the full matrix of supported estimator × treatment × intervention × outcome combinations, see vignette("interventions") and FEATURE_COVERAGE_MATRIX.md.

Confounder selection: use a DAG, not data-driven methods

The confounders = formula in causat() should reflect a pre-specified causal DAG (directed acyclic graph) encoding your assumptions about the data-generating process. Do not select confounders via stepwise regression, LASSO, or other data-driven variable selection. These methods optimise predictive fit, not causal identification, and can: - Include colliders — conditioning on a common effect of treatment and outcome opens a non-causal path, biasing the estimate toward zero or even flipping its sign. - Include instruments — variables that affect treatment but not outcome inflate variance without reducing confounding bias (a pure efficiency loss under correct specification, but can amplify bias from unmeasured confounders). - Include mediators — variables on the causal path from treatment to outcome block part of the effect you are trying to estimate, attenuating it toward the null.

Use dagitty to encode your DAG and query it for the minimal sufficient adjustment set (dagitty::adjustmentSets()). See Hernán & Robins (2025), Ch. 18.

Learning more

causatr includes detailed vignettes for each estimation method: g-computation, IPW, matching, longitudinal ICE, transportability, and methodological triangulation. Use vignette(package = "causatr") to see all available vignettes.

--- title: "Introduction to causatr" code-fold: show code-tools: true vignette: > %\VignetteIndexEntry{Introduction to causatr} %\VignetteEngine{quarto::html} %\VignetteEncoding{UTF-8} --- ```{r} #| include: false knitr::opts_chunk$set(collapse = TRUE, comment = "#>") ``` causatr provides a unified interface for causal effect estimation via five complementary methods: - **G-computation** (parametric g-formula) — fits an outcome model $E[Y | A, L]$ and standardises predictions over the target population. - **Inverse probability weighting (IPW)** — reweights the data using propensity scores so that confounders are balanced across treatment groups. - **Augmented IPW (AIPW)** — combines the outcome model and propensity score into a doubly-robust estimator: consistent if *either* model is correctly specified, and semiparametrically efficient when both are correct. - **Structural nested mean models (SNM)** — g-estimation of blip parameters using the treatment model as an instrument. The only method that correctly handles **time-varying effect modification** (modifiers affected by prior treatment). See `vignette("snm")`. - **Propensity score matching** — pairs treated and control units with similar propensity scores. When multiple methods agree, you can be more confident in your findings. This is called **methodological triangulation**. ## Two-step API All analyses follow the same two-step pattern: ```r # Step 1: Fit fit <- causat(data, outcome = "Y", treatment = "A", confounders = ~ L1 + L2, estimator = "gcomp", model_fn = stats::glm) # Step 2: Contrast result <- contrast(fit, interventions = list(treated = static(1), control = static(0)), reference = "control") ``` ## Quick example The example below uses the **NHEFS** (National Health and Nutrition Examination Survey — Epidemiologic Follow-up Study) dataset from Hernán & Robins (2025). The assumed causal structure is: - **Treatment ($A$):** `qsmk` — whether the individual quit smoking between baseline (1971) and follow-up (1982). Binary (0 = continued, 1 = quit). - **Outcome ($Y$):** `wt82_71` — change in body weight (kg) from 1971 to 1982. Continuous. - **Confounders ($L$):** a set of baseline covariates — sex, age, race, smoking intensity (cigarettes/day), years of smoking, exercise, physical activity level, and baseline weight — that jointly affect both the decision to quit smoking and the change in weight. - **Censoring ($C$):** `censored` — whether the individual was lost to follow-up. Rows with $C = 1$ are excluded from model fitting; predictions are standardised over the full target population. Under the causal DAG, $L$ is a common cause of both $A$ and $Y$. Adjusting for $L$ blocks all backdoor paths from $A$ to $Y$, so the g-formula identifies the average causal effect $\mathbb{E}[Y^{a=1}] - \mathbb{E}[Y^{a=0}]$. ```{r} #| message: false library(causatr) data("nhefs") fit <- causat( nhefs, outcome = "wt82_71", treatment = "qsmk", confounders = ~ sex + age + race + smokeintensity + smokeyrs + factor(exercise) + factor(active) + wt71, censoring = "censored", model_fn = stats::glm ) result <- contrast( fit, interventions = list(quit = static(1), continue = static(0)), reference = "continue" ) result ``` ## Intervention types causatr supports several intervention types for g-computation: | Function | Description | Example | |---|---|---| | `static(value)` | Set treatment to a fixed value | `static(1)` | | `shift(delta)` | Add a constant to observed treatment | `shift(-10)` | | `scale_by(factor)` | Multiply observed treatment | `scale_by(0.5)` | | `threshold(lower, upper)` | Clamp treatment within bounds | `threshold(0, 20)` | | `dynamic(rule)` | Apply a user-defined rule | `dynamic(\(d, a) ifelse(d$L > 0, 1, 0))` | | `NULL` | Natural course (observed values) | — | IPW supports `static()`, `shift()`, `scale_by()`, `dynamic()`, and `ipsi()`. AIPW supports the same set as IPW (all except `threshold()` and `stochastic()`). Matching supports only `static()` interventions. ## Contrast types | Type | Meaning | |---|---| | `"difference"` | Risk/mean difference (default) | | `"ratio"` | Risk ratio | | `"or"` | Odds ratio (binary outcomes) | ## Inference methods | Method | Description | |---|---| | `"sandwich"` | Robust sandwich SE (default, fast) | | `"bootstrap"` | Nonparametric bootstrap (resamples full pipeline) | ## Estimands | Estimand | Population | Applicability | |---|---|---| | `"ATE"` | All individuals | Always (default) | | `"ATT"` | Observed treated (A = 1) | Binary point treatment only | | `"ATC"` | Observed controls (A = 0) | Binary point treatment only | | `subset = quote(...)` | Custom subgroup | Always | For g-computation, the estimand can be changed in `contrast()` without refitting. For IPW and matching, refit with `causat(estimand = ...)`. ## Treatment types causatr supports binary, continuous, categorical, count, and multivariate treatments: | Treatment type | G-comp | IPW | AIPW | Matching | |---|---|---|---|---| | Binary (0/1) | Yes | Yes | Yes | Yes | | Continuous | Yes | Yes (GPS) | Yes (shift, scale\_by) | No (MatchIt binary-only) | | Categorical (k > 2) | Yes | Yes | Yes | No (MatchIt binary-only) | | Count (Poisson / NB) | Yes | Yes (`propensity_family =`) | Yes | No | | Multivariate (joint) | Yes | Yes (sequential MTP) | Yes | No | For multivariate treatments (e.g. `treatment = c("A1", "A2")`), supply interventions as named lists with one element per treatment variable. Effect modification via `A:modifier` interaction terms in `confounders` is supported across all estimation methods. G-comp and AIPW feed the interaction to the outcome model directly; IPW expands the per-intervention MSM to `Y ~ 1 + modifier`; matching expands to `Y ~ A + modifier + A:modifier`; and ICE auto-expands the interaction across treatment lags. Use `by = "modifier"` in `contrast()` to obtain stratum-specific estimates. ## Effect modification The `by` argument in `contrast()` stratifies estimates by levels of a variable, producing subgroup-specific effects: ```{r} #| eval: false contrast(fit, interventions = list(quit = static(1), continue = static(0)), by = "sex" ) ``` ## Survey designs and cluster-robust inference `causat()` and `contrast()` support two inference-infrastructure conveniences for design-aware studies. **Sampling weights via `survey::svydesign`.** Pass a design object directly to `weights =` and causatr unpacks both the sampling weights (via `stats::weights()`) and the first-stage cluster (PSU): ```{r} #| eval: false library(survey) des <- svydesign(ids = ~psu, weights = ~pw, data = nhefs) fit <- causat(nhefs, outcome = "wt82_71", treatment = "qsmk", confounders = ~ sex + age + wt71, weights = des, model_fn = stats::glm) ``` **Cluster-robust sandwich.** Use `cluster = "col"` on `causat()` (or `contrast()`) to aggregate per-individual influence functions within-cluster before squaring (Liang & Zeger 1986). Numerically equivalent to `sandwich::vcovCL(type = "HC0")` applied to the final predict-then-average step. Supported for gcomp / IPW / ICE; rejected for matching (subclass clustering is already structural). **`future` backend for bootstrap.** `contrast(parallel = "future")` dispatches replicates through `future.apply::future_lapply()`, so any `future::plan()` (local multisession, remote cluster, `future.batchtools` for HPC) is honoured. ## Diagnostics Use `diagnose()` to check positivity and covariate balance after fitting: ```{r} #| eval: false diag <- diagnose(fit) diag # print positivity + balance summary plot(diag) # Love plot (requires cobalt) ``` For IPW and matching, balance is computed via [cobalt](https://ngreifer.github.io/cobalt/) and includes standardised mean differences before and after adjustment. ## Extracting results causatr result objects support standard R generics: ```r coef(result) # named vector of E[Y^a] confint(result) # CI matrix (respects custom `level`) vcov(result) # vcov of marginal means tidy(result) # broom-style tidy data frame glance(result) # broom-style one-row summary plot(result) # forest plot (requires forrest package) ``` ## Limitations and known rejections causatr rejects unsupported combinations with informative errors rather than silently producing wrong answers. Key restrictions: | What | Why | |---|---| | **Matching requires binary treatment** | MatchIt does not support continuous/categorical treatments | | **ATT/ATC only for binary 0/1 treatment** | The estimand is not defined for other treatment types | | **`threshold()` under IPW** | The pushforward density has point masses at the bounds (mixed measure) | | **`dynamic()` on continuous treatment under IPW** | A deterministic rule on a continuous treatment has no Lebesgue density | | **`ipsi()` only under IPW, binary only** | The IPSI weight formula requires binary treatment and a propensity model | | **`stochastic()` only under g-comp** | The density ratio of a user-defined stochastic policy is intractable | | **Missing treatment values** | Cannot define a counterfactual without observed treatment | | **`na.action = na.exclude` rejected** | Causes silent IF corruption via residual padding mismatch | | **Effect modifier must be baseline under IPW/matching** | Time-varying modifiers in MSMs bias via collider paths (not enforced at runtime) | | **Multivariate longitudinal IPW** | Not yet supported; use ICE g-computation | For the full matrix of supported estimator × treatment × intervention × outcome combinations, see `vignette("interventions")` and `FEATURE_COVERAGE_MATRIX.md`. ::: {.callout-warning} ## Confounder selection: use a DAG, not data-driven methods The `confounders =` formula in `causat()` should reflect a pre-specified **causal DAG** (directed acyclic graph) encoding your assumptions about the data-generating process. Do **not** select confounders via stepwise regression, LASSO, or other data-driven variable selection. These methods optimise predictive fit, not causal identification, and can: - **Include colliders** — conditioning on a common effect of treatment and outcome opens a non-causal path, biasing the estimate toward zero or even flipping its sign. - **Include instruments** — variables that affect treatment but not outcome inflate variance without reducing confounding bias (a pure efficiency loss under correct specification, but can amplify bias from unmeasured confounders). - **Include mediators** — variables on the causal path from treatment to outcome block part of the effect you are trying to estimate, attenuating it toward the null. Use [`dagitty`](https://cran.r-project.org/package=dagitty) to encode your DAG and query it for the minimal sufficient adjustment set (`dagitty::adjustmentSets()`). See Hernán & Robins (2025), Ch. 18. ::: ## Learning more causatr includes detailed vignettes for each estimation method: g-computation, IPW, matching, longitudinal ICE, transportability, and methodological triangulation. Use `vignette(package = "causatr")` to see all available vignettes.