Longitudinal treatments: ICE g-computation

When treatment and confounders vary over time, standard regression methods give biased estimates even when adjusted for all confounders. G-methods solve this. causatr implements ICE (iterated conditional expectation) g-computation (Zivich et al., 2024), which requires models for the outcome only — no models for time-varying confounders are needed.

This vignette demonstrates ICE g-computation in causatr, covering:

Setup

Code
library(causatr)
library(tinytable)
library(tinyplot)

The problem: treatment-confounder feedback

When \(A_0\) affects a time-varying confounder \(L_1\), and \(L_1\) in turn affects both \(A_1\) and \(Y\) (\(A_0 \to L_1 \to A_1\) and \(L_1 \to Y\)), ordinary regression cannot recover the marginal counterfactual mean \(\mathbb{E}[Y^{a_0, a_1}]\):

  • Adjusting for \(L_1\) blocks part of the causal path from \(A_0\) to \(Y\) (and, depending on the DAG, can open a collider path through unmeasured common causes of \(L_1\) and \(Y\)).
  • Not adjusting leaves \(L_1\)-confounding of the \(A_1 \to Y\) arrow.
  • ICE g-computation handles both correctly by iterating conditional expectations and marginalising over the \(L_1\) distribution induced by each intervention.

Table 20.1: the null paradox

We reconstruct the dataset from Table 20.1 of Hernan & Robins (2025), scaled to 3,200 individuals observed at 2 time points. (The original table has 32,000; we use 1/10th for faster vignette rendering.) The DGP has a very specific property: every joint regime \((a_0, a_1)\) has the same marginal counterfactual mean \(\mathbb{E}[Y^{a_0, a_1}] = 60\), so the true causal contrast between any two regimes is exactly zero — the “null is true”. Crucially, the null holds only marginally: within strata of \(L_1\), the conditional effect of \(A_0\) is \(-8\). Treatment-confounder feedback is exactly the mechanism that lets a non-zero conditional effect average out to a zero marginal effect.

DGM. Two periods (\(k = 0, 1\)) with binary treatment \(A_k \in \{0, 1\}\), binary time-varying confounder \(L_1 \in \{0, 1\}\) (affected by \(A_0\)), and continuous outcome \(Y\) measured at \(k = 1\). The causal structure is \(A_0 \to L_1 \to A_1\), \(A_0 \to L_1 \to Y\), and \(A_0 \to Y\) (direct). The table is constructed so that the conditional effect of \(A_0\) given \(L_1\) is \(-8\), but this conditional effect averages to exactly zero marginally because \(A_0\) shifts the \(L_1\) distribution in a perfectly offsetting way. The structural equations implied by the table are:

\[ \begin{aligned} A_0 &\sim \text{Bernoulli}(p_{A_0}), \\ L_1 \mid A_0 &\sim \text{Bernoulli}\bigl(P(L_1 = 1 \mid A_0)\bigr), \\ A_1 \mid L_1 &\sim \text{Bernoulli}\bigl(P(A_1 = 1 \mid L_1)\bigr), \\ Y &= 84 - 8 \cdot A_0 - 32 \cdot L_1. \end{aligned} \]

The outcome is deterministic given \((A_0, L_1)\) and does not depend on \(A_1\) directly. The key property: \(P(L_1 = 1 \mid A_0 = 1) > P(L_1 = 1 \mid A_0 = 0)\), so \(A_0\) increases \(L_1\), and since \(L_1\) has a strong negative effect on \(Y\), the indirect harm through \(L_1\) exactly cancels the direct benefit. The known true value is \(\mathbb{E}[Y^{a_0, a_1}] = 60\) for all four joint regimes, so the ATE between any two regimes is exactly 0.

Code
groups <- data.frame(
  A0 = c(0, 0, 0, 0, 1, 1, 1, 1),
  L1 = c(0, 0, 1, 1, 0, 0, 1, 1),
  A1 = c(0, 1, 0, 1, 0, 1, 0, 1),
  Y  = c(84, 84, 52, 52, 76, 76, 44, 44),
  N  = c(240, 160, 240, 960, 480, 320, 160, 640)
)
rows <- lapply(seq_len(nrow(groups)), function(i) {
  g <- groups[i, ]
  off <- sum(groups$N[seq_len(i - 1)])
  data.frame(id = seq_len(g$N) + off, A0 = g$A0, L1 = g$L1, A1 = g$A1, Y = g$Y)
})
wide <- do.call(rbind, rows)

t0 <- data.frame(id = wide$id, time = 0L, A = wide$A0, L = NA_real_, Y = NA_real_)
t1 <- data.frame(id = wide$id, time = 1L, A = wide$A1, L = wide$L1, Y = wide$Y)
long <- rbind(t0, t1)
head(long, 4)
#>   id time A  L  Y
#> 1  1    0 0 NA NA
#> 2  2    0 0 NA NA
#> 3  3    0 0 NA NA
#> 4  4    0 0 NA NA

The data is in person-period (long) format: one row per individual per time point. L is a time-varying confounder (only measured at time 1).

Demonstrating bias from naive methods

Hernan & Robins (2025, Ch. 20) uses this table to make a pointed argument: neither of the two conventional regression strategies recovers the null. Truth is \(\mathbb{E}[Y^{1,1}] - \mathbb{E}[Y^{0,0}] = 0\), and in fact all four joint regimes have the same marginal counterfactual mean of \(60\).

Strategy 1 — unadjusted. Drop \(L_1\) and hope randomisation at each time point takes care of confounding:

Code
wide_sub <- wide[!is.na(wide$Y), ]
naive_unadj <- lm(Y ~ A0 + A1, data = wide_sub)
round(coef(naive_unadj), 3)
#> (Intercept)          A0          A1 
#>      68.711      -1.244     -12.444

Both coefficients move away from zero (\(A_0 \approx -1.24\), \(A_1 \approx -12.44\)). The \(A_1\) estimate is especially biased: \(L_1\) confounds \(A_1 \to Y\), and without adjustment that confounding leaks directly into the coefficient.

Strategy 2 — adjust for \(L_1\). The standard epidemiological reflex:

Code
naive_fit   <- lm(Y ~ A0 + A1 + L1, data = wide_sub)
naive_coefs <- coef(naive_fit)
round(naive_coefs, 3)
#> (Intercept)          A0          A1          L1 
#>          84          -8           0         -32

Now \(A_1 = 0\) (correct, because in this DGP \(Y\) is a deterministic function of \(A_0\) and \(L_1\)), but \(A_0 = -8\) — the conditional effect of \(A_0\) within strata of \(L_1\). That is not the causal contrast we want. Adjusting for \(L_1\) blocks the \(A_0 \to L_1 \to Y\) path and leaves only the direct-effect residue, so \(A_0\) looks like a protective effect when in fact the marginal causal effect is zero.

The takeaway. One naive strategy biases \(A_1\); the other biases \(A_0\). There is no regression specification on this table that jointly returns \(A_0 = 0\) and \(A_1 = 0\). This is the null paradox that motivates g-methods: the only way to recover the null is to marginalise over the intervention-specific distribution of \(L_1\), which is exactly what ICE g-computation does below.

ICE g-computation: fitting

With causat(), specify id and time to trigger longitudinal mode. Baseline confounders go in confounders, time-varying confounders in confounders_tv.

Code
fit <- causat(
  long,
  outcome = "Y",
  treatment = "A",
  confounders = ~ 1,
  confounders_tv = ~ L,
  id = "id",
  time = "time"
)
fit
#> <causatr_fit>
#>  Estimator:  G-computation
#>  Type:       longitudinal
#>  Outcome:    Y (gaussian)
#>  Treatment:  A
#>  Estimand:   ATE
#>  Confounders: ~1 
#>  TV conf.:   ~L 
#>  ID:         id
#>  Time:       time
#>  N:          6400

No models are fitted at this stage. ICE defers model fitting to contrast() because the sequential outcome models depend on the intervention being evaluated.

ICE g-computation: contrasts with sandwich SE

Compare “always treat” (A = 1 at every time) vs “never treat” (A = 0 at every time):

Code
result <- contrast(
  fit,
  interventions = list(always = static(1), never = static(0)),
  reference = "never",
  type = "difference",
  ci_method = "sandwich"
)
#> ICE step (time_idx = 0): dropping all-NA column(s) from the outcome model formula: 'L'. Check your `confounders_tv` spec if this is unexpected.
#> This message is displayed once every 8 hours.
result

Estimator: gcomp  ·  Estimand: ATE  ·  Contrast: difference  ·  CI method: sandwich  ·  N: 3200

Intervention means
intervention estimate se ci_lower ci_upper
always 60 0.4 59.216 60.784
never 60 0.346 59.321 60.679
Contrasts
comparison estimate se ci_lower ci_upper
always vs never -0.00000000000009947598 0.529 -1.037 1.037

ICE correctly recovers E[Y^{always}] = E[Y^{never}] = 60 and ATE = 0, as expected from the data generating process.

Visualising the result

Code
naive_ci <- confint(naive_fit)
est_df <- data.frame(
  method = c("Naive (A0)", "Naive (A1)", "ICE"),
  estimate = c(naive_coefs["A0"], naive_coefs["A1"], result$contrasts$estimate[1]),
  ci_lower = c(naive_ci["A0", 1], naive_ci["A1", 1], result$contrasts$ci_lower[1]),
  ci_upper = c(naive_ci["A0", 2], naive_ci["A1", 2], result$contrasts$ci_upper[1])
)
tinyplot(
  estimate ~ method,
  data = est_df,
  type = "pointrange",
  ymin = est_df$ci_lower,
  ymax = est_df$ci_upper,
  ylab = "Estimated treatment effect",
  xlab = "",
  main = "ICE recovers the null effect"
)
abline(h = 0, lty = 2, col = "grey40")

Point estimates and confidence intervals comparing naive regression to ICE g-computation. Naive estimates are biased away from zero while ICE correctly recovers ATE = 0.

Sandwich SE: stacked estimating equations

The default ci_method = "sandwich" uses the empirical sandwich estimator for stacked estimating equations (Zivich et al., 2024). This accounts for uncertainty from all K sequential outcome models, not just the final model.

Code
result$estimates[, c("intervention", "estimate", "se")]
#>    intervention estimate        se
#>          <char>    <num>     <num>
#> 1:       always       60 0.4000000
#> 2:        never       60 0.3464102

The standard errors are finite and positive, confirming that the sandwich variance estimator is working correctly even with a large sample.

Bootstrap SE

For robustness or when using flexible models (e.g., GAMs), bootstrap resamples individuals (all their time points together) and re-runs the full ICE procedure.

Code
result_boot <- contrast(
  fit,
  interventions = list(always = static(1), never = static(0)),
  reference = "never",
  type = "difference",
  ci_method = "bootstrap",
  n_boot = 20L
)
result_boot

Estimator: gcomp  ·  Estimand: ATE  ·  Contrast: difference  ·  CI method: bootstrap  ·  N: 3200

Intervention means
intervention estimate se ci_lower ci_upper
always 60 0.372 59.27 60.73
never 60 0.333 59.346 60.654
Contrasts
comparison estimate se ci_lower ci_upper
always vs never -0.00000000000009947598 0.551 -1.08 1.08

The parallel and ncpus arguments enable parallel bootstrap replicates for larger datasets:

Code
result_par <- contrast(
  fit,
  interventions = list(always = static(1), never = static(0)),
  reference = "never",
  ci_method = "bootstrap",
  n_boot = 20L,
  parallel = "multicore",
  ncpus = 4L
)

Comparing sandwich and bootstrap SEs

Code
se_df <- data.frame(
  method = c("Sandwich", "Bootstrap"),
  estimate = c(result$contrasts$estimate[1], result_boot$contrasts$estimate[1]),
  ci_lower = c(result$contrasts$ci_lower[1], result_boot$contrasts$ci_lower[1]),
  ci_upper = c(result$contrasts$ci_upper[1], result_boot$contrasts$ci_upper[1])
)
tinyplot(
  estimate ~ method,
  data = se_df,
  type = "pointrange",
  ymin = se_df$ci_lower,
  ymax = se_df$ci_upper,
  ylab = "ATE (always vs never)",
  xlab = "Inference method",
  main = "Sandwich vs bootstrap inference"
)
abline(h = 0, lty = 2, col = "grey40")

Point estimates and confidence intervals from sandwich and bootstrap inference for the ICE ATE estimate, showing close agreement.

Dynamic interventions

ICE supports dynamic treatment regimes where treatment depends on time-varying covariates. Here, individuals receive treatment only when L = 1 (the confounder is present):

Code
result_dyn <- contrast(
  fit,
  interventions = list(
    adaptive = dynamic(\(data, trt) ifelse(!is.na(data$L) & data$L > 0, 1L, 0L)),
    always   = static(1),
    never    = static(0)
  ),
  reference = "never",
  ci_method = "sandwich"
)
result_dyn

Estimator: gcomp  ·  Estimand: ATE  ·  Contrast: difference  ·  CI method: sandwich  ·  N: 3200

Intervention means
intervention estimate se ci_lower ci_upper
adaptive 60 0.346 59.321 60.679
always 60 0.4 59.216 60.784
never 60 0.346 59.321 60.679
Contrasts
comparison estimate se ci_lower ci_upper
adaptive vs never -0.00000000000006394885 0 -0 0
always vs never -0.00000000000009947598 0.529 -1.037 1.037

Comparing multiple interventions

Code
est <- result_dyn$estimates
tinyplot(
  estimate ~ intervention,
  data = est,
  type = "pointrange",
  ymin = est$ci_lower,
  ymax = est$ci_upper,
  ylab = "E[Y^a]",
  xlab = "Intervention",
  main = "Marginal means under different interventions"
)
abline(h = 60, lty = 2, col = "grey40")

Forest plot of marginal means under three interventions: always treat, never treat, and adaptive (treat if L=1). All are close to 60 in this null-effect example.

All three interventions should return a marginal mean close to 60 (the structural truth for every joint regime in this null-effect DGP).

The history parameter

The history parameter controls the Markov order — how many lags of treatment and time-varying confounders enter each outcome model.

  • history = 1 (default): Include one lag of A and L (first-order Markov).
  • history = 2: Include two lags (A_{k-1}, A_{k-2}, L_{k-1}, L_{k-2}).
  • history = Inf: Include full treatment and confounder history up to each time point.

For this 2-time-point example, history = 1 and history = Inf are equivalent. With longer follow-up, higher-order histories capture longer-range dependencies but require more data per model.

Code
fit_inf <- causat(
  long,
  outcome = "Y",
  treatment = "A",
  confounders = ~ 1,
  confounders_tv = ~ L,
  id = "id",
  time = "time",
  history = Inf
)

result_inf <- contrast(
  fit_inf,
  interventions = list(always = static(1), never = static(0)),
  reference = "never",
  ci_method = "sandwich"
)
result_inf

Estimator: gcomp  ·  Estimand: ATE  ·  Contrast: difference  ·  CI method: sandwich  ·  N: 3200

Intervention means
intervention estimate se ci_lower ci_upper
always 60 0.4 59.216 60.784
never 60 0.346 59.321 60.679
Contrasts
comparison estimate se ci_lower ci_upper
always vs never -0.00000000000009947598 0.529 -1.037 1.037

Simulation: ICE with a non-null effect

To show ICE working with a known non-null effect, we simulate data with treatment-confounder feedback.

DGM. Two periods (\(k = 0, 1\)) with binary treatment \(A_k \in \{0, 1\}\), binary time-varying confounder \(L_k \in \{0, 1\}\), and continuous outcome \(Y\). Treatment assignment at each period depends on the current confounder via a logistic model; the confounder at \(k = 1\) is affected by prior treatment \(A_0\) (treatment-confounder feedback). The structural equations are:

\[ \begin{aligned} L_0 &\sim \text{Bernoulli}(0.5), \\ A_0 \mid L_0 &\sim \text{Bernoulli}\bigl(\text{logit}^{-1}(-0.5 + 0.5 \, L_0)\bigr), \\ L_1 \mid A_0, L_0 &\sim \text{Bernoulli}\bigl(\text{logit}^{-1}(-0.5 + 0.8 \, A_0 + 0.3 \, L_0)\bigr), \\ A_1 \mid L_1 &\sim \text{Bernoulli}\bigl(\text{logit}^{-1}(-0.5 + 0.5 \, L_1)\bigr), \\ Y \mid A_0, A_1, L_1 &= 50 + 2.5 \, A_0 + 2.5 \, A_1 - 3 \, L_1 + \varepsilon, \quad \varepsilon \sim N(0, 25). \end{aligned} \]

The structural ATE of “always treat” vs “never treat” is not simply \(2.5 + 2.5 = 5\), because \(A_0\) affects \(L_1\) (coefficient \(0.8\)), and \(L_1\) enters the outcome with coefficient \(-3\). Under “always treat”, \(A_0 = 1\) increases \(\mathbb{E}[L_1]\), and since \(L_1\) has a negative effect on \(Y\), this indirect path partially offsets the direct benefit:

\[ \text{ATE} = \underbrace{(2.5 + 2.5)}_{\text{direct}} - 3 \bigl(\mathbb{E}[L_1 \mid A_0\!=\!1] - \mathbb{E}[L_1 \mid A_0\!=\!0]\bigr) \approx 5 - 3 \times 0.196 \approx 4.41. \]

This is the same treatment-confounder feedback that makes naive regression biased.

Code
set.seed(42)
n <- 2000
sim_id <- rep(seq_len(n), each = 2)
sim_time <- rep(0:1, n)

L0 <- rbinom(n, 1, 0.5)
A0 <- rbinom(n, 1, plogis(-0.5 + 0.5 * L0))
L1 <- rbinom(n, 1, plogis(-0.5 + 0.8 * A0 + 0.3 * L0))
A1 <- rbinom(n, 1, plogis(-0.5 + 0.5 * L1))
Y <- 50 + 2.5 * A0 + 2.5 * A1 - 3 * L1 + rnorm(n, 0, 5)

sim_long <- data.frame(
  id = sim_id,
  time = sim_time,
  A = c(rbind(A0, A1)),
  L = c(rbind(L0, L1)),
  Y = c(rbind(NA_real_, Y))
)

fit_sim <- causat(
  sim_long,
  outcome = "Y",
  treatment = "A",
  confounders = ~ 1,
  confounders_tv = ~ L,
  id = "id",
  time = "time"
)

res_sim <- contrast(
  fit_sim,
  interventions = list(always = static(1), never = static(0)),
  reference = "never",
  ci_method = "sandwich"
)
res_sim

Estimator: gcomp  ·  Estimand: ATE  ·  Contrast: difference  ·  CI method: sandwich  ·  N: 2000

Intervention means
intervention estimate se ci_lower ci_upper
always 53.134 0.227 52.69 53.579
never 48.832 0.187 48.465 49.199
Contrasts
comparison estimate se ci_lower ci_upper
always vs never 4.303 0.336 3.644 4.961

The estimated ATE should be close to the true value of \(\approx 4.41\).

Binary outcome

ICE also handles binary outcomes. The first outcome model uses binomial (actual 0/1 response), while subsequent pseudo-outcome models use quasibinomial (fractional logistic for predicted probabilities in [0, 1]).

DGM. This reuses the non-null simulation above (same SEM) with the continuous outcome binarised at \(Y > 50\):

\[ \begin{aligned} Y^* &= 50 + 2.5 \, A_0 + 2.5 \, A_1 - 3 \, L_1 + \varepsilon, \quad \varepsilon \sim N(0, 25), \\ Y &= \mathbf{1}(Y^* > 50). \end{aligned} \]

The resulting binary outcome does not have a clean closed-form ATE on the risk-difference scale, but the direction is positive (“always treat” increases \(P(Y = 1)\) relative to “never treat”) because both treatment coefficients are positive in the latent linear predictor.

Code
sim_long_bin <- sim_long
sim_long_bin$Y[!is.na(sim_long_bin$Y)] <-
  as.integer(sim_long_bin$Y[!is.na(sim_long_bin$Y)] > 50)

fit_bin <- causat(
  sim_long_bin,
  outcome = "Y",
  treatment = "A",
  confounders = ~ 1,
  confounders_tv = ~ L,
  family = "binomial",
  id = "id",
  time = "time"
)

res_bin <- contrast(
  fit_bin,
  interventions = list(always = static(1), never = static(0)),
  reference = "never",
  type = "difference",
  ci_method = "sandwich"
)
res_bin

Estimator: gcomp  ·  Estimand: ATE  ·  Contrast: difference  ·  CI method: sandwich  ·  N: 2000

Intervention means
intervention estimate se ci_lower ci_upper
always 0.728 0.018 0.693 0.764
never 0.415 0.018 0.381 0.449
Contrasts
comparison estimate se ci_lower ci_upper
always vs never 0.313 0.029 0.256 0.371

Risk ratio for binary longitudinal outcomes

Code
res_rr <- contrast(
  fit_bin,
  interventions = list(always = static(1), never = static(0)),
  reference = "never",
  type = "ratio",
  ci_method = "sandwich"
)
res_rr

Estimator: gcomp  ·  Estimand: ATE  ·  Contrast: ratio  ·  CI method: sandwich  ·  N: 2000

Intervention means
intervention estimate se ci_lower ci_upper
always 0.728 0.018 0.693 0.764
never 0.415 0.018 0.381 0.449
Contrasts
comparison estimate se ci_lower ci_upper
always vs never 1.754 0.098 1.572 1.958

Three or more time periods

The examples above use 2 time points for simplicity, but ICE scales to any number of periods. Here we simulate 4-period data with a binary treatment and binary outcome. The backward recursion fits 4 sequential models.

DGM. Four periods (\(k = 1, \dots, 4\)) with binary treatment \(A_k\), continuous time-varying confounder \(L_k\), continuous baseline confounder \(L_0\), and binary outcome \(Y\) measured at \(k = 4\). The confounder evolves with treatment feedback: at each step, \(L_k\) depends on \(L_{k-1}\) plus noise, and after observing \(A_k\) the next confounder shifts by \(0.3 \, A_k\). The structural equations are:

\[ \begin{aligned} L_0 &\sim N(0, 1), \\ L_k \mid L_{k-1}, A_{k-1} &= L_{k-1} + 0.3 \, A_{k-1} + \eta_k, \quad \eta_k \sim N(0, 0.25), \\ A_k \mid L_k &\sim \text{Bernoulli}\bigl(\text{logit}^{-1}(-0.3 + 0.4 \, L_k)\bigr), \\ Y \mid A_1, \dots, A_4, L_0 &\sim \text{Bernoulli}\bigl(\text{logit}^{-1}(-1.5 + 0.4 \, \textstyle\sum_{k=1}^{4} A_k + 0.3 \, L_0)\bigr). \end{aligned} \]

The outcome depends on cumulative treatment exposure \(\sum_k A_k\) and the baseline confounder \(L_0\). Under “always treat” (\(\sum A_k = 4\)) the log-odds shift by \(0.4 \times 4 = 1.6\) relative to “never treat” (\(\sum A_k = 0\)), so the risk difference on the probability scale is positive but does not have a simple closed form (it depends on the marginal distribution of \(L_0\)).

Code
set.seed(99)
n_id <- 600
K <- 4

ids <- rep(seq_len(n_id), each = K)
times <- rep(seq_len(K), n_id)

L0 <- rep(rnorm(n_id), each = K)
sim_4p <- data.frame(id = ids, time = times, L0 = L0)

sim_4p$L <- NA_real_
sim_4p$A <- NA_integer_
for (i in seq_len(n_id)) {
  rows <- which(sim_4p$id == i)
  l_prev <- sim_4p$L0[rows[1]]
  for (k in seq_along(rows)) {
    sim_4p$L[rows[k]] <- l_prev + rnorm(1, 0, 0.5)
    sim_4p$A[rows[k]] <- rbinom(1, 1, plogis(-0.3 + 0.4 * sim_4p$L[rows[k]]))
    l_prev <- sim_4p$L[rows[k]] + 0.3 * sim_4p$A[rows[k]]
  }
}

sim_4p$Y <- NA_real_
for (i in seq_len(n_id)) {
  rows <- which(sim_4p$id == i)
  cum_a <- sum(sim_4p$A[rows])
  sim_4p$Y[rows[K]] <- rbinom(
    1, 1,
    plogis(-1.5 + 0.4 * cum_a + 0.3 * sim_4p$L0[rows[1]])
  )
}

fit_4p <- causat(
  sim_4p,
  outcome = "Y",
  treatment = "A",
  confounders = ~ L0,
  confounders_tv = ~ L,
  family = "binomial",
  id = "id",
  time = "time"
)

res_4p <- contrast(
  fit_4p,
  interventions = list(always = static(1), never = static(0)),
  reference = "never",
  type = "difference",
  ci_method = "sandwich"
)
res_4p

Estimator: gcomp  ·  Estimand: ATE  ·  Contrast: difference  ·  CI method: sandwich  ·  N: 600

Intervention means
intervention estimate se ci_lower ci_upper
always 0.414 0.037 0.343 0.486
never 0.228 0.028 0.173 0.283
Contrasts
comparison estimate se ci_lower ci_upper
always vs never 0.186 0.053 0.083 0.29
Code
plot(res_4p)

Forest plot showing the risk difference between always-treat and never-treat over 4 periods.

Code
tt(tidy(res_4p), digits = 3)
term estimate std.error type conf.low conf.high
always vs never 0.186 0.0528 contrast 0.0828 0.29

The ICE sandwich variance correctly propagates uncertainty through all 4 sequential models via the stacked estimating equations cascade.

Continuous treatment with modified treatment policies

ICE also supports continuous time-varying treatments combined with the modified-treatment-policy (MTP) interventions: shift(), scale_by(), and threshold(). The intervention is applied to the current-time treatment at each backward step; ice_apply_intervention_long() explicitly does not recompute the lag columns from the intervened values (this would double-count a non-static intervention — see CLAUDE.md for the structural reason).

DGM. Two periods (\(k = 0, 1\)) with continuous treatment \(A_k \in \mathbb{R}\), continuous confounder \(L_k\), and continuous outcome \(Y\). Treatment-confounder feedback is present: \(A_0\) affects \(L_1\) (coefficient \(0.5\)), and \(L_1\) affects \(A_1\) (coefficient \(0.5\)). The structural equations are:

\[ \begin{aligned} L_0 &\sim N(0, 1), \\ A_0 &= 1 + 0.4 \, L_0 + \eta_0, \quad \eta_0 \sim N(0, 1), \\ L_1 &= 0.3 \, L_0 + 0.5 \, A_0 + \eta_1, \quad \eta_1 \sim N(0, 1), \\ A_1 &= 1 + 0.5 \, L_1 + \eta_2, \quad \eta_2 \sim N(0, 1), \\ Y &= 2 + 1.5 \, A_0 + 2 \, A_1 + 0.5 \, L_0 - 0.3 \, L_1 + \varepsilon, \quad \varepsilon \sim N(0, 1). \end{aligned} \]

For a shift(-1) MTP (reduce each \(A_k\) by 1 at each period), the structural causal effect on \(Y\) relative to the natural course accounts for both direct and indirect paths. Shifting \(A_0\) by \(-1\) also shifts \(L_1\) by \(-0.5\) (coefficient \(0.5\) in the \(L_1\) equation), which in turn shifts the natural value of \(A_1\) by \(-0.25\) (coefficient \(0.5\) in the \(A_1\) equation). The shifted \(A_1\) is then reduced by a further \(-1\) by the MTP. The total effect on \(Y\) is:

\[ 1.5 \times (-1) + 2 \times (-1.25) + (-0.3) \times (-0.5) = -1.5 - 2.5 + 0.15 = -3.85. \]

Code
set.seed(101)
n <- 800
sim_id <- rep(seq_len(n), each = 2)
sim_time <- rep(0:1, n)

L0 <- rnorm(n)
A0 <- 1 + 0.4 * L0 + rnorm(n)
L1 <- 0.3 * L0 + 0.5 * A0 + rnorm(n)
A1 <- 1 + 0.5 * L1 + rnorm(n)
Y <- 2 + 1.5 * A0 + 2 * A1 + 0.5 * L0 - 0.3 * L1 + rnorm(n)

cont_long <- data.frame(
  id = sim_id,
  time = sim_time,
  A = c(rbind(A0, A1)),
  L = c(rbind(L0, L1)),
  Y = c(rbind(NA_real_, Y))
)

fit_cont <- causat(
  cont_long,
  outcome = "Y",
  treatment = "A",
  confounders = ~ 1,
  confounders_tv = ~ L,
  id = "id",
  time = "time"
)

res_mtp <- contrast(
  fit_cont,
  interventions = list(
    observed = NULL,
    shift_m1 = shift(-1),
    halved   = scale_by(0.5),
    capped   = threshold(0, 1)
  ),
  reference = "observed",
  type = "difference",
  ci_method = "sandwich"
)
res_mtp

Estimator: gcomp  ·  Estimand: ATE  ·  Contrast: difference  ·  CI method: sandwich  ·  N: 800

Intervention means
intervention estimate se ci_lower ci_upper
observed 5.904 0.117 5.675 6.134
shift_m1 2.061 0.147 1.773 2.349
halved 3.799 0.075 3.652 3.946
capped 4.326 0.059 4.21 4.443
Contrasts
comparison estimate se ci_lower ci_upper
shift_m1 vs observed -3.844 0.086 -4.013 -3.674
halved vs observed -2.105 0.066 -2.235 -1.976
capped vs observed -1.578 0.084 -1.743 -1.414

Structural truth for shift(-1) is \(-3.85\) (see derivation above). The shift_m1 estimate should be close to this value.

Multivariate time-varying treatment

When multiple treatments vary over time (e.g. two drugs dosed jointly), pass a character vector to treatment = and supply each intervention as a named sub-list. Lag columns are built per-component by prepare_data().

DGM. Two periods (\(k = 0, 1\)) with two binary treatments \(A_{1,k}, A_{2,k} \in \{0, 1\}\), a continuous confounder \(L_k\), and continuous outcome \(Y\). Both treatments at \(k = 0\) feed into the confounder \(L_1\), creating treatment-confounder feedback. The structural equations are:

\[ \begin{aligned} L_0 &\sim N(0, 1), \\ A_{1,0} \mid L_0 &\sim \text{Bernoulli}\bigl(\text{logit}^{-1}(0.3 \, L_0)\bigr), \\ A_{2,0} \mid L_0 &\sim \text{Bernoulli}\bigl(\text{logit}^{-1}(0.1 \, L_0)\bigr), \\ L_1 &= 0.3 \, L_0 + 0.4 \, A_{1,0} + 0.2 \, A_{2,0} + \eta, \quad \eta \sim N(0, 1), \\ A_{1,1} \mid L_1 &\sim \text{Bernoulli}\bigl(\text{logit}^{-1}(0.3 \, L_1)\bigr), \\ A_{2,1} \mid L_1 &\sim \text{Bernoulli}\bigl(\text{logit}^{-1}(0.1 \, L_1)\bigr), \\ Y &= 2 + 1.5 \, A_{1,0} + 1.5 \, A_{1,1} + 0.8 \, A_{2,0} + 0.8 \, A_{2,1} + 0.5 \, L_1 + \varepsilon, \quad \varepsilon \sim N(0, 1). \end{aligned} \]

The outcome is linear in both treatments at both periods. The structural ATE of “both on” vs “both off” includes the indirect path through \(L_1\): setting \(A_{1,0} = A_{2,0} = 1\) shifts \(\mathbb{E}[L_1]\) by \(0.4 + 0.2 = 0.6\) relative to “both off”, and \(L_1\) enters \(Y\) with coefficient \(0.5\), adding \(0.3\) to the direct sum \(2 \times (1.5 + 0.8) = 4.6\):

\[ \text{ATE} = \underbrace{4.6}_{\text{direct}} + \underbrace{0.5 \times 0.6}_{\text{indirect via } L_1} = 4.9. \]

Code
set.seed(202)
n <- 600
sim_id <- rep(seq_len(n), each = 2)
sim_time <- rep(0:1, n)
L0 <- rnorm(n)
A10 <- rbinom(n, 1, plogis(0.3 * L0))
A20 <- rbinom(n, 1, plogis(0.1 * L0))
L1 <- 0.3 * L0 + 0.4 * A10 + 0.2 * A20 + rnorm(n)
A11 <- rbinom(n, 1, plogis(0.3 * L1))
A21 <- rbinom(n, 1, plogis(0.1 * L1))
Y <- 2 + 1.5 * A10 + 1.5 * A11 + 0.8 * A20 + 0.8 * A21 + 0.5 * L1 + rnorm(n)

mv_long <- data.frame(
  id = sim_id,
  time = sim_time,
  A1 = c(rbind(A10, A11)),
  A2 = c(rbind(A20, A21)),
  L = c(rbind(L0, L1)),
  Y = c(rbind(NA_real_, Y))
)

fit_mv_long <- causat(
  mv_long,
  outcome = "Y",
  treatment = c("A1", "A2"),
  confounders = ~ 1,
  confounders_tv = ~ L,
  id = "id",
  time = "time"
)

res_mv_long <- contrast(
  fit_mv_long,
  interventions = list(
    both_on  = list(A1 = static(1), A2 = static(1)),
    both_off = list(A1 = static(0), A2 = static(0))
  ),
  reference = "both_off",
  type = "difference"
)
res_mv_long

Estimator: gcomp  ·  Estimand: ATE  ·  Contrast: difference  ·  CI method: sandwich  ·  N: 600

Intervention means
intervention estimate se ci_lower ci_upper
both_on 6.914 0.095 6.728 7.1
both_off 1.967 0.098 1.774 2.16
Contrasts
comparison estimate se ci_lower ci_upper
both_on vs both_off 4.947 0.17 4.613 5.281

Structural truth: both_on − both_off = 4.9 (direct effects \(4.6\) plus indirect \(0.3\) via \(L_1\)).

External weights (IPCW / survey)

External weights — survey weights or pre-computed IPCW — pass through every backward-step model via model_fn’s weights = argument. The ICE sandwich variance engine accounts for heterogeneous weights via the A_{k-1,k} cascade gradient (see variance-theory.qmd §5.4 and the inline comment in R/variance_if.R:variance_if_ice_one).

DGM. Two periods (\(k = 0, 1\)) with binary treatment \(A_k\), binary confounder \(L_k\), continuous outcome \(Y\), and uninformative survey weights drawn independently from \(\text{Uniform}(0.5, 2.0)\). The structural equations are:

\[ \begin{aligned} L_0 &\sim \text{Bernoulli}(0.5), \\ A_0 \mid L_0 &\sim \text{Bernoulli}\bigl(\text{logit}^{-1}(-0.2 + 0.6 \, L_0)\bigr), \\ L_1 \mid A_0, L_0 &\sim \text{Bernoulli}\bigl(\text{logit}^{-1}(-0.2 + 0.8 \, A_0 + 0.3 \, L_0)\bigr), \\ A_1 \mid L_1 &\sim \text{Bernoulli}\bigl(\text{logit}^{-1}(-0.2 + 0.6 \, L_1)\bigr), \\ Y &= 2 + 2.5 \, A_0 + 2.5 \, A_1 - 0.5 \, L_1 + \varepsilon, \quad \varepsilon \sim N(0, 1), \\ w_i &\sim \text{Uniform}(0.5, 2.0). \end{aligned} \]

Since the weights are non-informative (independent of everything), the structural ATE of “always treat” vs “never treat” is the same as the unweighted case. The direct effects sum to \(2.5 + 2.5 = 5\), but \(A_0\) shifts \(\mathbb{E}[L_1]\) by \(\approx 0.19\), and \(L_1\) enters \(Y\) with coefficient \(-0.5\), giving a true ATE \(\approx 4.90\). The purpose of this example is to demonstrate that the ICE sandwich variance engine correctly propagates heterogeneous weights through the backward recursion.

Code
set.seed(303)
n <- 500
sim_id <- rep(seq_len(n), each = 2)
sim_time <- rep(0:1, n)
L0 <- rbinom(n, 1, 0.5)
A0 <- rbinom(n, 1, plogis(-0.2 + 0.6 * L0))
L1 <- rbinom(n, 1, plogis(-0.2 + 0.8 * A0 + 0.3 * L0))
A1 <- rbinom(n, 1, plogis(-0.2 + 0.6 * L1))
Y <- 2 + 2.5 * A0 + 2.5 * A1 - 0.5 * L1 + rnorm(n)
weights_long <- runif(2 * n, 0.5, 2.0)

w_long_df <- data.frame(
  id = sim_id,
  time = sim_time,
  A = c(rbind(A0, A1)),
  L = c(rbind(L0, L1)),
  Y = c(rbind(NA_real_, Y))
)

fit_w <- causat(
  w_long_df,
  outcome = "Y",
  treatment = "A",
  confounders = ~ 1,
  confounders_tv = ~ L,
  id = "id",
  time = "time",
  weights = weights_long
)

res_w <- contrast(
  fit_w,
  interventions = list(always = static(1), never = static(0)),
  reference = "never",
  type = "difference",
  ci_method = "sandwich"
)
res_w

Estimator: gcomp  ·  Estimand: ATE  ·  Contrast: difference  ·  CI method: sandwich  ·  N: 500

Intervention means
intervention estimate se ci_lower ci_upper
always 6.756 0.075 6.61 6.902
never 1.629 0.099 1.434 1.823
Contrasts
comparison estimate se ci_lower ci_upper
always vs never 5.128 0.139 4.854 5.401

Censoring (IPCW-style row filter)

Pass a time-varying censoring indicator via censoring =. At each backward step, ICE restricts the fit to uncensored rows; predictions are made for the target population regardless of censoring status, so the estimator is valid under the usual sequential-randomization assumption on the censoring mechanism.

DGM. Two periods (\(k = 0, 1\)) with binary treatment \(A_k\), binary confounder \(L_k\), continuous outcome \(Y\), and a censoring indicator \(C_1\) at \(k = 1\) that depends on treatment and baseline confounder. About 20% of individuals are censored at the second period. The structural equations are:

\[ \begin{aligned} L_0 &\sim \text{Bernoulli}(0.5), \\ A_0 \mid L_0 &\sim \text{Bernoulli}\bigl(\text{logit}^{-1}(-0.2 + 0.6 \, L_0)\bigr), \\ L_1 \mid A_0, L_0 &\sim \text{Bernoulli}\bigl(\text{logit}^{-1}(-0.2 + 0.8 \, A_0 + 0.3 \, L_0)\bigr), \\ A_1 \mid L_1 &\sim \text{Bernoulli}\bigl(\text{logit}^{-1}(-0.2 + 0.6 \, L_1)\bigr), \\ Y^* &= 2 + 2 \, A_0 + 2 \, A_1 - 0.5 \, L_1 + \varepsilon, \quad \varepsilon \sim N(0, 1), \\ C_1 \mid A_0, L_0 &\sim \text{Bernoulli}\bigl(\text{logit}^{-1}(-2 + 0.5 \, A_0 + 0.3 \, L_0)\bigr), \\ Y &= \begin{cases} Y^* & \text{if } C_1 = 0, \\ \text{NA} & \text{if } C_1 = 1. \end{cases} \end{aligned} \]

The structural ATE of “always treat” vs “never treat” on the full (uncensored) outcome is \(\approx 3.90\) (direct effects \(2 + 2 = 4\) minus \(0.5 \times 0.19 \approx 0.10\) for the indirect path \(A_0 \to L_1 \to Y\)). Under the censoring-at-random assumption (\(Y^{a_0, a_1} \perp C_1 \mid A_0, L_0\)), ICE with row filtering recovers this effect by fitting the outcome model on uncensored rows and standardising over the full population.

Code
set.seed(404)
n <- 500
sim_id <- rep(seq_len(n), each = 2)
sim_time <- rep(0:1, n)
L0 <- rbinom(n, 1, 0.5)
A0 <- rbinom(n, 1, plogis(-0.2 + 0.6 * L0))
L1 <- rbinom(n, 1, plogis(-0.2 + 0.8 * A0 + 0.3 * L0))
A1 <- rbinom(n, 1, plogis(-0.2 + 0.6 * L1))
Y_full <- 2 + 2 * A0 + 2 * A1 - 0.5 * L1 + rnorm(n)
# 20% censoring at t = 1 depending on A0 + L0.
C1 <- rbinom(n, 1, plogis(-2 + 0.5 * A0 + 0.3 * L0))
Y <- ifelse(C1 == 1, NA_real_, Y_full)

cens_long <- data.frame(
  id = sim_id,
  time = sim_time,
  A = c(rbind(A0, A1)),
  L = c(rbind(L0, L1)),
  C = c(rbind(0L, C1)),
  Y = c(rbind(NA_real_, Y))
)

fit_c <- causat(
  cens_long,
  outcome = "Y",
  treatment = "A",
  confounders = ~ 1,
  confounders_tv = ~ L,
  id = "id",
  time = "time",
  censoring = "C"
)

res_c <- contrast(
  fit_c,
  interventions = list(always = static(1), never = static(0)),
  reference = "never",
  type = "difference",
  ci_method = "sandwich"
)
res_c

Estimator: gcomp  ·  Estimand: ATE  ·  Contrast: difference  ·  CI method: sandwich  ·  N: 500

Intervention means
intervention estimate se ci_lower ci_upper
always 5.721 0.081 5.562 5.879
never 1.683 0.09 1.506 1.86
Contrasts
comparison estimate se ci_lower ci_upper
always vs never 4.038 0.138 3.767 4.309

Structural truth \(\approx 3.90\) (see derivation above). The estimate should be close, and the sandwich SE reflects the censoring-induced reduction in effective sample size at each step.

How ICE works

ICE g-computation (Zivich et al., 2024) works backward in time:

  1. Final time K: Fit E[Y | A_K, L_K, baseline] among uncensored individuals. Predict under the intervention to get pseudo-outcomes.
  2. Each earlier time k: Use pseudo-outcomes as the response. Fit E[pseudo | A_k, L_k, baseline]. Predict under intervention.
  3. First time: The mean of the pseudo-outcomes is the marginal mean E[Y^a].

This requires only outcome models — unlike forward simulation (“big g-formula”), no models for time-varying confounders are needed.

Why ICE, not forward simulation?

ICE g-comp Forward simulation
Models needed Outcome only (at each time) Outcome + every time-varying confounder
Inference Sandwich (stacked EE) Bootstrap only
Speed Fast Slow (many models + simulation)
Model misspecification K models K + p*K models
Implementation causatr gfoRmula

Key parameters

  • confounders: Baseline (time-invariant) confounders. Enter every outcome model at every time step.
  • confounders_tv: Time-varying confounders. Included with their lags (controlled by history).
  • history: Markov order. history = 1 (default) includes one lag of treatment and TV confounders. history = Inf includes full history.
  • censoring: Time-varying censoring indicator (1 = censored). ICE restricts to uncensored at each backward step; the resulting estimator is IPCW-valid when the censoring mechanism is conditionally random given the covariate history.
  • weights: external survey weights or pre-computed IPCW. Validated upfront by check_weights() (rejects NA, Inf, negative, mis-sized) and propagated through every step of the backward recursion. The ICE sandwich variance engine accounts for the weights via the A_{k-1,k} cascade gradient — dropping them would underestimate the SE under heterogeneous weights by roughly a factor of two.
  • parallel / ncpus: passed straight through to boot::boot() on the bootstrap path. Defaults to getOption("boot.parallel", "no") / getOption("boot.ncpus", 1L) so a session-wide options(boot.parallel = "multicore") flips parallelism on for every contrast() call.

For a full enumeration of the sandwich-vs-bootstrap, continuous/binary, external-weights, censoring, multivariate-treatment, and ipsi-rejection combinations, see FEATURE_COVERAGE_MATRIX.md §“Method 4 — ICE longitudinal g-computation” (20+ truth-pinned rows against lmtp::lmtp_tmle / analytic targets / Hernán & Robins Table 20.1).

Summary of covered combinations

Legend. ✅ covered and truth-pinned in tests (either against Table 20.1 or against lmtp::lmtp_tmle()) · 🟡 smoke test only · ⛔ rejected with an informative error.

Treatment Outcome Intervention Estimand Contrast Inference Weights Status
Binary Continuous Static (always vs never) ATE Difference Sandwich none ✅ Table 20.1
Binary Continuous Static ATE Difference Bootstrap none ✅ Table 20.1
Binary Continuous Dynamic (rule on L) ATE Difference Sandwich none
Binary Continuous Static ATE Difference Sandwich + bootstrap survey
Continuous Continuous Shift / Scale / Threshold ATE Difference Sandwich none ✅ (vs lmtp)
Continuous Continuous Shift ATE Difference Bootstrap none ✅ (vs lmtp)
Multivariate Continuous Static / Shift ATE Difference Sandwich none
Binary Binary Static / Dynamic ATE Difference Sandwich / bootstrap none ✅ (2 periods)
Binary Binary Static ATE Difference Sandwich none ✅ (4 periods)
Binary Binary Static ATE Difference Bootstrap (parallel) none
Binary Continuous Static ATE Difference Sandwich censoring
Binary Any IPSI(\(\delta\)) ⛔ Phase 4
Binary Any Static / ATT ATT ⛔ ICE is ATE-only

See FEATURE_COVERAGE_MATRIX.md for the authoritative coverage status of every method × treatment × outcome × intervention × variance combination.

References

Hernan MA, Robins JM (2025). Causal Inference: What If. Chapman & Hall/CRC. Chapters 19–21: Time-varying treatments.

Zivich PN, Ross RK, Shook-Sa BE, Cole SR, Edwards JK (2024). Empirical sandwich variance estimator for iterated conditional expectation g-computation. Statistics in Medicine 43:5562–5572.