causatr (development version)
2026-06-03 — Phase 22b-5: Flexible-treatment ICE term (treatment_form)
causat() gains a treatment_form argument for longitudinal g-computation (ICE), letting the treatment enter the per-step outcome models via a transformed term — treatment_form = ~ factor(A) (categorical / ordinal dose) or ~ splines::ns(A, df = 3) (smooth dose-response) — instead of the default bare numeric main effect. The intervention (or natural-history policy) still sets the numeric treatment column; only the model’s design term changes, so a nonlinear or kinked counterfactual response is no longer forced through a single linear slope. Lag terms are expanded automatically (factor(A) → factor(lag1_A)), reusing the same parse-tree substitution as transformed time-varying confounders.
This is the enabler in the Phase 22b roadmap: it removes the bare-numeric misspecification of a kinked capped-dose response under the natural-history MTP engine — the cap_escalation() gap to the forward-MC truth closes from ≈0.034 to ≈0.0015 under ~ factor(A). It benefits all of longitudinal ICE, not just G-LMTPs. treatment_form = NULL (the default) is byte-identical to the previous behaviour. The flexible term flows through both the standard ICE recursion and the augmented G-LMTP recursion, and through the ID-cluster bootstrap refit; the analytic ICE sandwich handles the wider design matrix transparently. Validated against exact analytic-contrast truths (categorical and continuous) and the G-LMTP forward-MC truth. Rejected for point treatments and non-gcomp estimators (causatr_treatment_form_not_ice) and for non-treatment terms / non-formula input (causatr_treatment_form_bad).
2026-06-03 — Phase 22b: Natural-history modified treatment policies (G-LMTPs)
Two new intervention constructors, grace_period() and carry_forward(), add support for modified treatment policies that depend on the natural-value history of treatment (delays, grace periods, last-observation-carried-forward) under longitudinal g-computation. These regimes are not expressible with dynamic(): the standard iterated-conditional-expectation recursion conditions on the observed lagged treatment, whereas the policy needs the counterfactual natural value, which under treatment-state feedback differs from the observed value — dynamic(\(d, trt) d$lag1_A) runs but silently targets the wrong estimand.
The estimator is the augmented-data sequential regression of Díaz, Williams, Morzywołek & Rudolph (2026, Modified treatment policies that depend on the natural history of treatment, arXiv:2605.24167): each observation is augmented with every possible natural-treatment-history sequence, carried as a label through the backward recursion so the conditioning treatment value is decoupled from the policy-input value. causatr’s parametric-GLM plug-in is √n-consistent under correct specification. grace_period(window) delays treatment initiation by window periods (the paper’s Section-6 policy); carry_forward() is the degenerate LOCF policy.
Supported for estimator = "gcomp" longitudinal fits with a discrete treatment; ID-cluster bootstrap variance. Validated against the forward Monte-Carlo truth (the paper’s Proposition 1), replicating the Díaz et al. (2026) Section-6 delay result, with exact limiting-case agreement against standard ICE (carry_forward() ≡ a baseline regime; grace_period(0) ≡ the natural course). Rejections are classed (causatr_glmtp_not_ice, _mixed, _continuous_trt, _too_many). The augmented-data sandwich is deferred; ci_method = "sandwich" aborts with causatr_glmtp_sandwich and points to the bootstrap.
The augmented engine handles any single discrete treatment, not just binary: the ordinal ({0,1,2}) path is validated to numerical precision — carry_forward() matches the equivalent baseline regime to 1e-14, and the engine matches an independent hand-coded recursion to 1e-9 for a dose-escalation cap. The public ordered-policy constructor (cap_escalation()) and multivariate G-LMTP are deferred (designed in PHASE_22_ICE_ENHANCEMENTS.md §3.7): the parametric plug-in is consistent for these only under a flexible-dose model that causatr’s ICE does not yet expose (the treatment enters as a bare numeric term, which misspecifies the kinked pseudo-response and biases the point estimate ~3% when a dose cap binds — confirmed specification error, not a bug, since a factor(dose) model recovers the truth).
2026-06-02 — Phase 22a: Stratified ICE (stratified = "G")
causat(..., stratified = "G") adds stratified iterated-conditional-expectation g-computation for longitudinal data. When the outcome–treatment relationship differs structurally across a baseline subgroup G, a single pooled per-step model is misspecified; stratified ICE fits one outcome / pseudo-outcome model per stratum of G at every backward step and merges per-stratum predictions back into the recursion.
- Estimand-preserving. Because
Gis baseline and individuals never cross strata in the recursion, stratified ICE on the full panel is exactly equal to running pooled ICE separately on each stratum subset — the target marginal \(E[Y^{\bar d}]\) (and anyby/subset) is unchanged. The within-stratum formula drops the constant stratum term automatically. - Both variance paths.
ci_method = "bootstrap"(ID-cluster, refits all per-stratum per-period models per replicate) andci_method = "sandwich"(the analytic per-stratum × per-time stacked-EE sandwich). BecauseGis baseline and the stacked estimating equations are block-diagonal across strata, the sandwich reuses the pooled IF engine: a global Channel-1 sampling term plus the per-step nuisance-correction cascade run once per stratum on disjoint rows. - Composes with binary/continuous treatment, gaussian/binomial outcome,
static/shift/dynamic/stochasticinterventions, lags / TV confounders, censoring row-filter, and external/IPCW weights. - Validated by exact equivalence to per-stratum pooled ICE; an exact reduction to the pooled sandwich at one stratum; agreement of the per-stratum sandwich with
delicatessen’s M-estimationMEstimatorto ~1e-7 on every per-arm mean / SE for gaussian and binomial DGPs (the same plug-in sandwich); bootstrap parity under shift and weights; a truth-based binary DGP; and anlmtpper-stratum point cross-check (test-ice-stratified.R). Misuse is rejected with classed errors (causatr_stratified_not_ice,_not_baseline,_too_many,_not_found,_na).
This is Phase 22a. Phase 22b (natural-history modified treatment policies — grace periods, carry-forward) is designed in PHASE_22_ICE_ENHANCEMENTS.md (an augmented-data sequential-regression G-LMTP engine, per arXiv:2605.24167) but not yet implemented; the earlier “thin dynamic() wrapper” idea was found to be theoretically unsound and is not shipped.
2026-06-01 — Classed confounder-placement warnings
The two prepare_data() warnings that flag a likely confounders / confounders_tv mix-up are now classed so downstream code (and tests) can match on class rather than message text: causatr_tv_confounder_constant (a time-varying confounder that never varies within an individual) and causatr_baseline_confounder_varying (a baseline confounder that does vary within individuals). No behaviour change beyond the added condition class.
2026-06-01 — Phase 21: Multiple imputation via causat_mice()
causat_mice() is now implemented and exported. It is the analysis step of a multiple-imputation workflow: the user imputes missing covariates and/or treatment upstream with mice::mice(), and causat_mice() fits causat() + contrast() on every completed dataset and pools the per-imputation estimates into a single causatr_result. This is the supported way to handle missing-at-random covariates (L) or treatment (A) — orthogonal to IPCW, which handles missing outcomes (Y).
Two pooling engines.
pool_method = "rubin"(default) applies Rubin’s- rules with Barnard-Rubin (1999) degrees of freedom, cross-checked to machine precision against
mice::pool.scalar().pool_method = "boot_mi"implements von Hippel’s (2020) two-stage bootstrap-then-impute variance, which attains nominal coverage without relying on Rubin’s congeniality assumption — provided the point estimator stays consistent (a bootstrap corrects the variance, not bias; Bartlett & Hughes 2020). Under uncongeniality Rubin’s variance can be biased in either direction (Meng 1994), so it is not uniformly conservative. Means and difference contrasts pool on the natural scale; ratio / odds-ratio contrasts pool on the log scale.
- rules with Barnard-Rubin (1999) degrees of freedom, cross-checked to machine precision against
Estimator-agnostic. Every
causat()configuration flows through unchanged: all five estimators (gcomp, IPW, AIPW, matching, SNM), every treatment type and outcome family, all interventions and contrast scales,by-stratification and ATT, stabilized weights, longitudinal (ICE + longitudinal IPW), and IPCW (missing L imputed while censored Y is reweighted) all pool. SNM pools per blip parameter.S3 integration.
confint()andtidy()reconstructt-based intervals from the per-row pooling degrees of freedom (stored in the"mi_details"attribute) so the user’s confidence level is honored.parallel = "future"forwards the Boot MI bootstrap tofuture.apply.Guards.
causat_mice()warns when an analysis variable is absent from or unused by the imputation model (a uncongeniality risk), drops individual imputations that fail to fit (aborting only if none survive), and degrades gracefully to a single complete-data analysis whenm = 1. Boot MI warns (causatr_mi_boot_floor) when a variance component goes non-positive and is floored — a signal thatB/Mare too small for the random-effects decomposition rather than a genuinely tiny standard error.Coverage study + theory correction (21h). A Monte-Carlo study (
test-mi-coverage.R) confirms Rubin and Boot MI both attain ~nominal coverage when the imputation model keeps the causal estimator consistent (DGP-MI1), with Rubin’s intervals marginally wider. It also corrects an earlier claim in the design doc: excluding the outcome from the predictor matrix (DGP-MI5) is a misspecification that biases the estimate, so neither pooling rule recovers coverage — a bootstrap fixes variance, not bias (Bartlett & Hughes 2020). The MI workflow guide is themissing-data.qmdvignette (21i).
New internal engines live in R/pool_rubin.R and R/pool_boot_mi.R. The check_treatment_nas() hint now advertises MI as a third way to handle missing treatment alongside IPCW and complete-case analysis.
2026-05-31 — Phase 20: Per-period IPSI under longitudinal IPW
Univariate ipsi(delta) interventions now work under longitudinal IPW (estimator = "ipw" with id = and time =). The previously emitted causatr_longitudinal_ipsi_pending rejection in compute_ipw_contrast_longitudinal() has been lifted for the univariate case.
- No new estimation math: Kennedy’s (2019) closed-form incremental propensity-score weight extends to a per-period product
W_i = prod_t (delta * A_t + (1 - A_t)) / (delta * p_t + (1 - p_t)). Each period already routes throughcompute_density_ratio_weights()’s IPSI branch on the period-tsubset, andcompute_longitudinal_weights()multiplies the per-period factors within an id exactly as forstatic/shift/scale_by. - Stacked sandwich is automatic:
make_weight_fn_longitudinal()builds per-period sub-closures viamake_weight_fn(), whose IPSI branch is smooth in the propensity coefficients through thedelta * p_t + (1 - p_t)denominator, sonumDeriv::jacobian()on the stacked closure produces the block-diagonal cross-derivative without modification. Sandwich and bootstrap SEs agree within Monte-Carlo error. - Two combinations stay rejected with dedicated classes: multivariate IPSI (
causatr_longitudinal_ipsi_multivariate— Kennedy’s closed form is binary-univariate, the per-component density chain has no IPSI shortcut) and IPSI +stabilize = "marginal"(causatr_longitudinal_ipsi_stabilize— the IPSI weight is already a bounded propensity-odds ratio with no separate marginal numerator). Longitudinal AIPW IPSI remains unsupported (the ICE recursion needs a counterfactual treatment stream IPSI does not provide).
2026-05-30 — Phase 25: Multivariate longitudinal AIPW
Doubly-robust estimation of joint time-varying interventions now works: estimator = "aipw" with treatment = c("A1", "A2"), id =, and time =. The previously emitted causatr_longitudinal_multivariate_pending rejection in fit_aipw_longitudinal() has been removed.
- A low-risk composition, no new estimation math: the outcome (ICE) side of the Bang & Robins (2005) backward recursion already loops over vector
treatment, and Phase 19 already built the multivariate per-period density chain (fit_treatment_models(),compute_density_ratio_weights_mv(),make_weight_fn_mv()).fit_aipw_longitudinal()now dispatches onlength(treatment)to fit a per-period K-component propensity chain, andice_aipw_iterate()uses the multivariate per-period density-ratio weight. - Stacked sandwich: Channel 2b of
variance_if_aipw_long_one()stacksT × Kblock-diagonal propensity corrections (block-diagonal across time via disjoint period subsets, block-diagonal across components via the chain rule) instead ofT. Bootstrap is unchanged — id-clustered resampling preserves both axes. - Stabilization rejected:
stabilize = "marginal"is now an explicit classed error (causatr_stabilize_longitudinal_aipw) for longitudinal AIPW (univariate and multivariate).fit_aipw()never threadedstabilizeinto the longitudinal path, so a stabilized request was previously honoured nowhere; users wanting stabilized longitudinal weights should useestimator = "ipw". - Truth-based + cross-method tests: on the static binary
make_em_mv_long_scm()DGP (analytical truths 9 / 15 by sex), MV AIPW recovers the stratified truths, agrees with MV ICE g-computation and MV longitudinal IPW cross-method, and satisfies one-sided double-robustness (wrong outcome + correct propensity, and vice versa). A continuous-shift DGP checks finiteness and sandwich-vs-bootstrap SE parity. No external single-call oracle exists for longitudinal AIPW (lmtp’s SDR uses a different nuisance-fixing convention), so triangulation + DR + truth recovery stand in.
Unbalanced-panel sandwich caveat
The longitudinal AIPW sandwich SE now emits a classed warning (causatr_longitudinal_aipw_unbalanced_sandwich) when the panel is unbalanced (some individuals are not observed at every period, e.g. monotone dropout or a censoring row-filter). A Monte-Carlo study (300 reps) showed the per-period rescaled sandwich underestimates the true SE by ~15% under informative dropout, because dropped ids contribute exactly zero to later-period channels rather than their unobserved counterfactual contribution — a limitation of the row-filtering influence function that a constant rescale cannot repair. The bootstrap reproduces the truth, so the warning steers users to ci_method = "bootstrap". Affects both univariate and multivariate longitudinal AIPW.
2026-05-29 — Phase 19c: Effect modification for multivariate longitudinal IPW
Baseline effect modification (A:modifier terms in confounders) now works for joint time-varying treatments under estimator = "ipw", type = "longitudinal". The previously emitted causatr_longitudinal_mv_em_pending rejection has been removed.
- No multivariate-specific EM code needed: lifting the rejection lets the existing wirings compose. Each per-period, per-component propensity in
fit_treatment_models()strips its own treatment-touching interaction terms (Phase 8b), the per-period formula strips them viaem_info$confounder_terms(Phase 10c), and the single final-period MSM expands fromY ~ 1toY ~ 1 + modifierviabuild_ipw_msm_formula(). The modifier stays as a confounder main effect in every propensity; only theA:modifierinteractions are dropped. - Baseline-only constraint carries over (Robins 2000): the modifier must be a pre-treatment covariate. A time-varying modifier in the MSM conditions on a post-treatment variable. Not enforced at runtime; use
estimator = "snm"for time-varying effect modification. - Truth-based test:
make_em_mv_long_scm()is a 2-period × 2-component binary DGP with sex effect modification whose static contrast has analytical truths 9 (sex = 0) and 15 (sex = 1). Tests confirm IPW recovers these, agrees with ICE g-computation cross-method, strips the interactions per component, and is invariant to stabilization under a static intervention.
2026-05-29 — Phase 19b: Stabilized multivariate longitudinal IPW
Stabilized weights (stabilize = "marginal") now compose with joint time-varying treatments under estimator = "ipw", type = "longitudinal". The marginal numerator factorises over both axes, mirroring the T × K denominator:
g(Ā1, Ā2 | V) = ∏_t ∏_k g_{t,k}(A_{t,k} | A_{t,1..k-1}, Ā_{t-1}, V)
- Per-period × per-component numerator sweep: when
stabilize = "marginal"andlength(treatment) > 1L, each period fits a K-component numerator via the denominatorfit_treatment_models()path, but with confounders set toremove_response(build_longitudinal_numerator_ps_formula(...))— this drops the time-varying L while retaining the treatment lags and baseline numerator covariatesV. The per-component chain rule then layers the prior componentsA_{1..k-1}on automatically. Numerator models are stashed as thenumerator_models/stabilizeattributes on each period’s denominatorcausatr_treatment_modelslist. - Weight + variance composition: the MV weight and IF closures (
compute_density_ratio_weights_mv(),make_weight_fn_mv()) read the numerator attributes and apply the stabilized per-component ratio. Numerator parameters (gamma) are held fixed under the sandwich (nuisance-fixed convention, as for sigma/theta); bootstrap refits both numerator and denominator. - Invariants: under static interventions on discrete treatments the stabilized estimate and sandwich SE are identical to unstabilized (the Hájek-mean invariant carries over from Phase 8e / 10b). The
causatr_longitudinal_mv_stabilize_pendingrejection is removed.
2026-05-24 — Phase 19a: Multivariate longitudinal IPW
New support for joint time-varying treatments (treatment = c("A1", "A2")) under estimator = "ipw" with type = "longitudinal". Composes Phase 8 (multivariate point IPW, K components) with Phase 10 (longitudinal IPW, T periods) to produce the T × K propensity factorisation:
W_i = ∏_t ∏_k w_{t,k,i}
- Per-period multivariate fitting: each time period fits K per-component propensity models via
fit_treatment_models()with sequential conditioning (A_k ~ A_{1..k-1} + confounders + lags). The resulting per-period models arecausatr_treatment_modelsobjects. - Weight computation:
compute_longitudinal_weights()detects multivariate models and dispatches tocompute_density_ratio_weights_mv()per period, then takes the row-product across periods. - Sandwich variance:
make_weight_fn_longitudinal()delegates tomake_weight_fn_mv()per period, producing a T×K block-diagonal stacked alpha. The variance engine (compute_ipw_if_self_contained_long_one()) iterates over periods, then over components within each period, for the per-model corrections. - Bootstrap: works automatically via
refit_ipw()→fit_longitudinal_ipw(). - Deferred: stabilized weights (
stabilize = "marginal") and effect modification (A:modifier) with multivariate longitudinal IPW are rejected with classed errors (causatr_longitudinal_mv_stabilize_pending,causatr_longitudinal_mv_em_pending).
2026-05-23 — Phase 19-trim: Cross-cutting weight truncation
New trim argument on contrast() for density-ratio weight truncation (winsorization). Clips per-component density ratios at the trim-th quantile before they enter any product (multivariate cross-component or longitudinal cross-period), following the approach of Cole & Hernan (2008) and the lmtp default of 0.999. Recommended values: 0.999 (mild) or 0.99 (moderate, per Spreafico et al. 2025).
trim = 1(default): no truncation, all existing behavior unchanged.- Applied to all weight-computing paths: point IPW, multivariate IPW, longitudinal IPW, point AIPW, longitudinal AIPW.
- Sandwich variance: truncation threshold is fixed at the original data’s quantile and held constant under numDeriv perturbation, so the SE reflects the truncated weight surface at a fixed cutoff.
- Bootstrap: each replicate recomputes its own quantile threshold on the resampled data, capturing the full variance including truncation.
- NOT applied to sampling weights (transport) or IPCW weights (censoring), which define the target population rather than causal reweighting.
- Cross-validated against
lmtp::lmtp_sdr()with matching.trimvalues for both multivariate point IPW and longitudinal IPW.
2026-05-24 — Fix: MV IPW sandwich crash with natural-course intervention
- numDeriv::jacobian crash: multivariate continuous IPW sandwich variance crashed with “subscript out of bounds” when any intervention was
NULL(natural course). Root cause:make_weight_fn_mv()returnsalpha_hat = numeric(0)for natural course (no propensity parameters to optimize), andnumDeriv::jacobian(fn, x = numeric(0))cannot handle zero-length input. Fixed by adding an early return incompute_ipw_if_self_contained_mv_one()that skips the cross-derivative computation when no propensity parameters exist — the propensity correction is zero in this case since the weight function is constant (all ones).
2026-05-24 — Phase 19-trim critical review fix
- Sandwich/bootstrap truncation semantics mismatch: the variance-engine closures (
make_weight_fn_longitudinal,make_weight_fn_mv) applied truncation to the post-product cumulative weight, while the weight-computation path (compute_longitudinal_weights,compute_density_ratio_weights_mv) applied per-component/per-period truncation before the product. This caused sandwich and bootstrap SEs to disagree by ~10% under aggressive trim values. Fixed by distributing per-component/per-period truncation thresholds to each sub-closure, matching the per-component semantics throughout.
2026-05-22 — Phase 18 critical review fixes
3rd-round critical review of Phase 18 (SNM implementation):
- Bootstrap metadata fix: longitudinal SNM bootstrap now correctly forwards
confounders_outcome_rawto replicated fits, ensuring metadata consistency on the bootstrapcausatr_fitobjects. - Predict consistency:
snm_treatment_design()now usesnewdata = datainpredict()for scalar treatments, matching the categorical branch and guarding against future data-vs-model mismatch. - File split: extracted
compute_snm_contrast()fromsnm_point.R(488 lines) intosnm_contrast.R(191 lines), bringing both files under the 300-line guideline. - Roxygen: added missing
@returntags tovariance_if_snm_longitudinal_notf()andvariance_if_snm_longitudinal_tf(). - delicatessen cross-checks: added two new cross-language oracle tests for longitudinal SNM — treatment-free model (TF, no EM) and time-varying effect modification with TF (TV-EM + TF). Both use Python delicatessen stacked M-estimation as the reference, with tight point-estimate tolerances (1e-4) and SE tolerances (0.3–1%).
2026-05-22 — Phase 18 complete: Structural Nested Mean Models
Phase 18 is complete. The snm.qmd vignette now covers all shipped features: categorical and count treatment extensions (18h), bootstrap confidence intervals (18i), and S3 method output (18j), in addition to the existing coverage of point and longitudinal SNMs with time-varying effect modification, treatment-free efficiency augmentation, and sandwich inference.
2026-05-22 — SNM-aware S3 dispatch (Phase 18j)
All user-facing S3 methods now recognize SNM results and display blip-parameter-specific output instead of the intervention-mean layout used by gcomp/IPW/AIPW/matching.
print.causatr_result() shows “Blip parameters:” (point), “Per-stage blip parameters:” (longitudinal), or “Averaged blip effect:” (Path B with treatment_values), and suppresses the empty contrasts table for Path A. summary.causatr_result() skips the “Intervention details” section for SNM and labels the vcov as “blip parameters”. plot.causatr_result() produces a forest plot with parameter names as row labels, a zero reference line, and an SNM-specific title. coef() and tidy() correctly use the parameter column (instead of intervention) present in SNM result objects, and tidy() tags rows with type = "parameter".
2026-05-22 — Categorical and count treatment extensions for SNM (Phase 18h)
Extends SNM g-estimation to categorical (k>2) and count (Poisson/NB) treatment types. The new snm_treatment_design() helper abstracts the construction of treatment-action (AM) and treatment-residual (RM) matrices so the g-estimating equation solve, variance engine, and bootstrap are uniform across all treatment types.
Categorical treatments use multinomial residualisation via nnet::multinom. The per-level indicators D_j = 1{A=j} and residuals R_j = D_j - P(A=j|L) produce (K−1) × p_mod blip parameters, one set per non-reference treatment level. The sandwich variance uses prepare_model_if_multinom() for the multinomial treatment model IF and a softmax-based cross-derivative closure. treatment_values is rejected for categorical SNM — the per-level blip parameters are the estimand directly.
Count treatments work with zero code changes: the existing scalar residual path R = A − E[A|L] handles Poisson and negative binomial treatment models via propensity_family = "poisson" or "negbin".
13 new truth-based tests covering point estimation, sandwich variance, bootstrap consistency, treatment-free model, and longitudinal backward sequential estimation for both categorical and count treatments.
2026-05-22 — Bootstrap variance for SNM (Phase 18i)
Adds bootstrap variance estimation for both point and longitudinal SNM estimators, completing the dual inference path (sandwich + bootstrap) that all other estimators already support.
Point bootstrap resamples rows and re-solves the g-estimating equation on each replicate, handling all three paths: standard (no effect modification), with A:modifier interactions, and with treatment_free joint (β, ψ) estimation. When treatment_values is supplied, the bootstrap statistic is the scalar averaged blip effect directly — no delta-method transformation needed on the bootstrap vcov.
Longitudinal bootstrap uses ID-clustered resampling (resample entire individual trajectories with clone-and-reassign for duplicates) and re-runs the full backward sequential g-estimation on each replicate. Forwards per-component confounders (confounders_outcome, confounders_tv_outcome, etc.) to ensure the blip specification is correctly reconstructed on bootstrap replicates even when outcome and treatment confounders are specified separately.
All bootstrap SE estimates agree with sandwich SEs within a factor of 2 across 9 test configurations (point ± EM ± treatment-free ± treatment_values ± binary/continuous treatment; longitudinal ± TF ± TV-EM).
2026-05-21 — Triangulation tests, delicatessen cross-check, history=0 support (Phase 18f)
Adds the Robins triangle invariant tests: under correct specification and no time-varying effect modification, the SNM blip sum, longitudinal IPW-MSM ATE, and ICE g-comp ATE must agree on binary static interventions. Three pairwise comparisons (SNM vs IPW, SNM vs ICE, 3-way) plus a negative control where IPW is biased by collider conditioning on a post-treatment modifier but SNM recovers the correct blip.
Cross-language validation against Python’s delicatessen package (Zivich 2022) using stacked M-estimation on shared fixture data (data-raw/snm_longitudinal_fixture.csv). Both R and Python operate on the exact same dataset, enabling tight tolerances: 1e-4 for point estimates, 0.3% for sandwich SEs.
The history parameter now accepts 0 (no-lag Markov model) for all longitudinal estimators (SNM, IPW, ICE). Previously history < 1 was rejected. The validator error message updated from “positive integer” to “non-negative integer.” Tests added for all three estimators with history=0.
Removed 18 suppressMessages/suppressWarnings wrappers around causatr calls in the test suite (test-snm.R, test-missing-data.R, test-longitudinal-ipw.R). All underlying messages were either eliminated at the source or are expected informational output.
2026-05-21 — Longitudinal TV-EM truth test + formula dedup fix (Phase 18e)
Adds the headline truth-based test for longitudinal SNMs with time-varying effect modification. The 2-period DGP has a post-treatment modifier M_1 = 1{L_1 > 0} (where L_1 depends on A_0) at both stages, with known blip parameters (psi_00 = 1.15, psi_0M = 2, psi_10 = 2, psi_1M = 2). Validates that the SNM correctly recovers all four parameters, while point-treatment IPW conditioning on M_1 is detectably biased due to collider bias (A_0 -> L_1 -> M_1).
DTRreg cross-check uses history = 0 for exact treatment-model match. Cluster bootstrap consistency check and TF-model efficiency test included.
Also fixes build_longitudinal_ps_formula() to deduplicate baseline terms that overlap with time-varying terms. R formulas silently dropped the duplicate, so the fitted model was unaffected, but the formula object was misleading to anyone who inspected it.
2026-05-21 — Fix longitudinal SNM variance with treatment-free model
The sandwich variance for longitudinal estimator = "snm" with treatment_free now correctly accounts for the joint (beta, psi) system at each stage. Previously, the variance engine used the psi-only estimating equation scores and bread, ignoring the treatment-free nuisance parameters. When E[R * Z] ≈ 0 (correctly specified treatment model, TF covariates subset of PS covariates), the error cancelled exactly. When E[R * Z] ≠ 0 (TF formula includes covariates absent from the treatment model), the sandwich SEs were inflated by the unabsorbed TF prediction. The fix implements the full joint system with per-stage (beta_k, psi_k) blocks, cross-stage derivatives, and psi-marginal extraction — mirroring the point-treatment TF variance that was already correct.
2026-05-21 — Longitudinal SNMM with per-stage blip estimation (Phase 18d)
estimator = "snm" now supports type = "longitudinal" for multi-period panel data. The implementation uses backward sequential g-estimation (Robins 1994): starting at the final stage K, each stage’s blip parameters are estimated using the backward-transformed outcome H_k, yielding per-stage parameters (e.g. stage0_psi_intercept, stage1_psi_intercept). Each stage gets its own treatment model fit via the existing longitudinal propensity infrastructure.
The cluster-robust sandwich variance accounts for cross-stage dependencies: earlier-stage blip parameters depend on later-stage estimates through the backward-transformed outcome, creating off-diagonal blocks in the bread matrix. Treatment-model corrections are applied per-stage via the standard numDeriv::jacobian approach.
Testing: truth-based tests on binary and continuous treatment DGPs, DTRreg cross-checks (binary treatment), vcov PSD checks, rejection tests for treatment_values and bootstrap. DGP truth values account for the mediated A_0 -> L_1 -> Y path correctly (validated against DTRreg at n = 500k).
2026-05-21 — Time-varying effect modification for SNMs (Phase 18c)
SNMs now explicitly support time-varying (post-treatment) modifiers — the headline feature motivating the addition of SNMMs to causatr. The modifier M can depend on treatment A, unlike IPW-MSM where this opens collider paths (Robins 2000). With treatment_free = ~ L, the blip parameters recover the structural truth; without it, the moment-condition estimates capture a different (still valid) quantity that absorbs the A → M → Y indirect path.
Documentation: updated parse_effect_mod() to clarify the SNM exception to the baseline-only modifier constraint. New vignette snm.qmd with a worked example demonstrating IPW bias with post-treatment modifiers and correct SNM identification.
Testing: truth-based tests on DGPs where M = f(A, L), delicatessen cross-checks (4 scenarios: continuous/binary × no-TF/TF), DTRreg cross-check, IPW bias demonstration, vcov PSD checks.
2026-05-20 — Treatment-free outcome model for SNM efficiency (Phase 18b½)
causat(..., treatment_free = ~ L) enables a treatment-free outcome model that reduces blip parameter standard errors by 30–45% without changing point estimates. The approach follows Vansteelandt & Joffe (2014) and matches DTRreg’s tf.mod argument: the treatment-free model parameters β and blip parameters ψ are solved jointly from a linear system, rather than fitting E[Y | L] separately.
The sandwich variance engine extends to a stacked system (α_treatment, θ_joint = (β, ψ)) with the joint bread computed analytically and the cross-derivative A_{θ,α} via numDeriv::jacobian(). Validated against delicatessen on three DGPs (continuous + EM, binary + EM, two modifiers) and against DTRreg on the EM case — point estimates and SEs match to 0.01 tolerance.
Breaking change: none. The treatment_free parameter defaults to NULL, preserving the existing behavior. Non-SNM estimators reject treatment_free with a causatr_treatment_free_not_snm classed error.
2026-05-20 — SNM point estimation and sandwich variance (Phase 18b)
Point-treatment SNM g-estimation is now fully operational. contrast(fit) returns the blip parameter table (ψ₀, ψ_M, …) with stacked-EE sandwich standard errors and confidence intervals. contrast(fit, treatment_values = c(0, 1)) computes the population-averaged blip effect with delta-method variance.
The sandwich variance engine (variance_if_snm()) implements the full stacked M-estimation system: treatment model score block + blip estimating equation block + cross-derivative A_{ψ,α} via numDeriv::jacobian(). Validated against delicatessen (Zivich et al. 2024) on three DGPs (continuous treatment + single modifier, binary treatment + single modifier, continuous treatment + two modifiers) with point estimate and SE agreement to 0.01 tolerance.
New files: R/variance_if_snm.R (split from snm.R following the existing variance file pattern), data-raw/snm_reference.py (delicatessen reference script), fixture CSVs.
2026-05-20 — SNM routing and validation (Phase 18a)
Added estimator = "snm" to causat() for structural nested mean model (SNMM) g-estimation — the third leg of the Robins triangle alongside g-computation and IPW-MSM. Chunk 18a ships the routing infrastructure: fit_snm() fits the treatment model via the existing fit_treatment_model() machinery, parses effect-modification terms (A:modifier in confounders) into a linear blip specification, and stores the result for downstream g-estimation. Supports binary and continuous point treatments.
Rejection paths: multivariate treatment (causatr_snm_multivariate), longitudinal data (causatr_snm_longitudinal_pending), and contrast(interventions = ...) (causatr_snm_no_interventions) — SNM estimates blip parameters directly, not counterfactual means.
Point estimation, sandwich variance, and the contrast pathway are deferred to chunk 18b. DTRreg added to Suggests as the primary R oracle for cross-checks.
2026-05-20 — Phase 17 complete: transportability documentation (Phase 17m)
The transportability vignette (transportability.qmd) is expanded with sections for longitudinal transport (IPW and ICE with target = and id = / time =), multivariate treatment transport (joint-treatment IPW with sampling weights), separate confounder specifications for the sampling model (confounders_sampling =), and a summary-of-combinations table. All worked examples include full causat() + contrast() code and rendered output.
This completes Phase 17: transportability and generalizability. All chunks (17a–17m) are shipped. The CLAUDE.md guide files section and PHASE_17_TRANSPORTABILITY.md status are updated accordingly.
2026-05-18 — MTP + transportability via MC marginalization (Phase 17l)
Modified treatment policy (MTP) interventions (shift(), scale_by(), threshold(), dynamic(), and natural-course NULL) can now be combined with target = "S" under g-computation and AIPW. Previously, target-population rows (S=0) had no observed treatment, so MTP interventions that depend on observed A produced NA predictions and silently dropped all target rows.
The fix uses Monte Carlo marginalization: for each target row with covariates L_i, causatr draws treatment values from P(A|L_i, S=1) — the study treatment distribution — applies the intervention d(A,L), predicts outcomes from the fitted model, and averages. For binary treatments, exact enumeration replaces MC sampling. A simple GLM treatment model is fitted internally at contrast time for g-computation; AIPW reuses its existing propensity model.
IPW + MTP + transport requires no MC marginalization (density-ratio weights operate on study rows only) and continues to support sandwich variance. Sandwich variance for gcomp and AIPW + MTP + transport is not yet supported (the MC marginalization introduces treatment-model dependence not captured by the current IF chain); ci_method = "sandwich" aborts with class causatr_mtp_transport_sandwich, pointing users to bootstrap.
2026-05-18 — IPCW × transportability (Phase 17k)
causat() now supports target = "S" together with ipcw = TRUE for all three estimators (gcomp, IPW, AIPW). The composition produces a triply/quadruply-weighted estimator: treatment density-ratio weights, stabilized IPCW weights, and sampling-odds weights all multiply pointwise into the MSM prior weights (IPW) or the outcome-model fitting weights (gcomp/AIPW). The sandwich variance engine already handled the IPCW and sampling-model score blocks independently; this chunk verifies the full four-block composition (propensity + censoring + sampling + plug-in) works end-to-end.
Truth-based simulation tests validate transportability and generalizability ATE recovery for all three estimators, sandwich-vs-bootstrap SE agreement, AIPW double-robustness under outcome misspecification, and cross-estimator agreement (gcomp ≈ IPW ≈ AIPW). External cross-checks against delicatessen (stacked M-estimation sandwich) and TransportHealth (gcomp transport with manual IPCW weights) confirm the composition. A longitudinal IPW + IPCW + transport smoke test verifies the person-period pathway produces finite estimates.
2026-05-18 — Multivariate IPW × transportability (Phase 17j)
causat() now supports target = "S" together with multivariate treatment (treatment = c("A1", "A2", ...), IPW estimator). Previously, the weight computation (joint density-ratio × sampling-odds) was already correct, and the bootstrap refitted the sampling model per replicate. The only missing piece was the sandwich variance: the multivariate branch of variance_if_ipw() exited early via return() before the sampling-model score block could be appended, so the gamma uncertainty was absent from the SE. The fix is a single structural change: store the multivariate IF, conditionally subtract compute_ipw_sampling_correction(), and return the corrected value. The correction is independent of the number of propensity components.
Truth-based simulation tests verify transportability (target ATE ≈ 3.5 + E[L|S=0]) and generalizability (ATE = 3.5) recovery under binary × binary static interventions, with SE plausibility against bootstrap. Stabilized weights (stabilize = "marginal") also produce a finite, positive sandwich SE.
2026-05-18 — Longitudinal IPW × transportability (Phase 17i)
causat() now supports target = "S" together with id = / time = (longitudinal IPW). Passing a sampling_model_fn alongside the longitudinal arguments fits P(S=1 | L) on the first period of each individual, broadcasts the resulting sampling-odds weight (1 − p̂) / p̂ to all person-period rows for that individual, and multiplies it into the per-period DR treatment weight inside the longitudinal Hájek MSM. The result is a transportable (or generalizable, via target_subset = "all") longitudinal causal estimate.
Sandwich variance extends the Phase 10 stacked EE to include a sampling-model score block; bootstrap refits the sampling model per replicate. ICE g-computation with longitudinal transport is also supported: NA lag treatment columns for target individuals (who have no observed treatment history) are filled with the static intervention value before the backward ICE iteration, so counterfactuals can be predicted over the target population. Stabilized weights (stabilize = "marginal") compose without modification.
Truth-based simulation tests verify transportability (ATE ≈ 4 + E[L|S=0] ≈ 3.67) and generalizability (ATE = 4) recovery, cross-estimator agreement between longitudinal IPW and ICE gcomp transport, SE plausibility against bootstrap, and correct bootstrap refitting of the sampling model.
2026-05-18 — External cross-check against TransportHealth (Phase 17h)
Two cross-check tests confirm that causatr’s transportability estimates agree to numerical precision with the independent TransportHealth package (CoreClinicalSciences/TransportHealth, GitHub). TransportHealth::transportGC() implements the same Dahabreh 2020 standardization — fit outcome model on study rows, average counterfactual predictions over target rows — and agrees with causatr gcomp to within 1e-4. TransportHealth::transportIP() fits the same P(S=1|L) sampling model and P(A=1|L) propensity model, combines them into a weighted MSM, and agrees with causatr IPW to within 1e-4. Both tests skip on CRAN; TransportHealth is added to Suggests. The DGP used is the standard observational transport DGP with confounded treatment and treatment-effect heterogeneity (A:L interaction).
2026-05-18 — Sampling model predictor-set validation (Phase 17g)
fit_sampling_model() now emits a causatr_transport_predictor_subset warning when a baseline covariate appears in the confounders formula only via an interaction with treatment (e.g. ~ L1 + L2:A). In this situation the stripping step that removes treatment terms from the sampling formula also removes the covariate, so the sampling model silently omits a variable that the outcome and treatment models condition on. The transportability identification assumption requires the sampling model to control for all baseline confounders; a misspecified sampling model breaks the exchangeability-over-populations condition (Dahabreh et al. 2020, assumption 3). The fix is to add the missing variable as a main effect in confounders. The warning fires through causat() as well as when calling fit_sampling_model() directly.
2026-05-17 — Sampling-model diagnostics for transportability (Phase 17f)
diagnose() now includes a sampling-model panel when the fit uses transportability or generalizability (target =). The panel reports: the sampling-score distribution (quantiles of P(S=1|L) overall and per population), the sampling-weight distribution on study rows (mean, SD, min, max, ESS), and a count of extreme sampling weights that may signal positivity violations. The panel appears in every per-intervention diagnostic and is shared across interventions (it depends only on the sampling model, not the treatment intervention). print.causatr_diag() renders it automatically.
2026-05-17 — AIPW test coverage parity + model_fn default warnings (Phase 16p, 16q)
AIPW now has full test coverage parity with gcomp and IPW for non-standard outcome families: Gamma, quasibinomial, negative binomial, and beta regression (betareg). Also covers external weights, survey design auto-extract, cluster-robust variance, MCAR outcome NAs, subset, GAM outcome and propensity models, and all compositions thereof.
causat() now emits a one-per-session warning when model_fn or propensity_model_fn is left at the implicit stats::glm default. The warning encourages deliberate model specification and is silenced once the user supplies an explicit function. Matching is excluded (it delegates model choice to MatchIt).
Fixed a cluster-robust variance bug where the cluster vector was not subset to fit_rows before being passed to the AIPW IF engine, producing misaligned cluster assignments when outcome NAs caused row-dropping.
2026-05-17 — AIPW + IPCW + transport sandwich fix
Fixed an error in variance_if_aipw() where the IPCW censoring correction (Channel 2d) referenced the non-transport variable target_fit unconditionally. When ipcw = TRUE and target = were both supplied, contrast(..., ci_method = "sandwich") failed with “object ‘target_fit’ not found”. The censoring correction closure now branches on transport mode: the transport path varies IPCW weights in the study-row augmentation term with a fixed target-population denominator, while the non-transport path retains the original Hájek ratio.
2026-05-07 — Multivariate point AIPW + AIPW-IPCW composition (Phase 16m, 16o)
Multivariate AIPW (chunk 16m). estimator = "aipw" now supports multivariate (joint) treatments treatment = c("A1", "A2", ...). The doubly-robust augmentation term uses per-component density-ratio weights from the sequential-MTP factorisation. Binary × binary, continuous × continuous, and mixed-type combinations are supported. stabilize = "marginal" composes correctly (stabilized weights in the augmentation term match unstabilized point estimates). Rejected for univariate AIPW (univariate already benefits from Hájek normalization).
AIPW + IPCW (chunk 16o). AIPW now composes with built-in IPCW (ipcw = TRUE) for triply-weighted doubly-robust estimation. The censoring weights multiply into the augmentation term alongside the treatment and (optionally) sampling weights. The stacked estimating equation adds the censoring-model block. Consistent if any two of the three models (outcome, treatment, censoring) are correctly specified.
2026-05-06 — Transportability and generalizability core (Phase 17a–17e)
causat(target = "S") enables transportability and generalizability estimation. The sampling indicator S (1 = study, 0 = target) tags rows in a mixed-data input; the sampling model P(S = 1 | L) is fit via sampling_model_fn (default logistic GLM) on all rows.
Three estimators transport the estimand to the target population:
- Gcomp transport: outcome model fit on S = 1 rows, standardized over target-population covariates.
- IPW transport: sampling × treatment density-ratio weights multiply pointwise in the Hájek MSM on study rows.
- AIPW transport: 2-out-of-3 doubly-robust composition (Dahabreh et al. 2020) — consistent if any two of outcome, treatment, and sampling models are correctly specified.
target_subset = "target" (default) targets S = 0 only (transportability); target_subset = "all" targets S = 0 ∪ S = 1 (generalizability). Sandwich variance extends the stacked estimating equation with the sampling-model block. Bootstrap refits the sampling model per replicate.
Truth-based simulation tests verify that the transport estimators recover the target-population ATE on a DGP where study and target ATEs diverge via an A × L interaction. Cross-estimator agreement (gcomp, IPW, AIPW) and AIPW efficiency (SE ≤ gcomp ∧ IPW) are pinned.
2026-05-05 — Point AIPW estimator (Phase 16a–16j)
New estimator = "aipw" implements the classical doubly-robust estimator: the augmented IPW functional
ψ(g) = E[m(g(A,L), L)] + E[W(g) · (Y − m(A, L))]
where m is the outcome model and W is the density-ratio weight from the self-contained IPW engine. Consistent if either the outcome model or the treatment model is correctly specified.
Supported: binary / continuous / categorical / count (Poisson) treatments, all contrast types, static / dynamic / shift / scale_by / ipsi interventions, ATT / ATC / ATE estimands, effect modification (by =), sandwich and bootstrap variance. Stacked estimating equation adds the outcome-model and propensity-model score blocks plus the plug-in row, computed via the existing prepare_model_if() and apply_model_correction() primitives.
Cross-validated against delicatessen (Python) for both binary static and continuous shift DGPs — point estimates and SEs match to 3 significant figures.
2026-05-05 — Longitudinal AIPW (Phase 16i)
estimator = "aipw" with id = and time = implements the ICE-AIPW sequential doubly-robust estimator (Bang & Robins 2005). At each backward step k, the augmentation term corrects the ICE pseudo- outcome using the period-k density-ratio weight and the period-k outcome model residual. The cumulative product of per-period augmentation terms gives the longitudinal DR correction.
Supported: binary / continuous treatments, static / shift / scale_by / dynamic interventions, gaussian / binomial outcomes, effect modification (by =), sandwich and bootstrap variance. Double robustness verified: deliberately misspecifying either the outcome model or the propensity model still recovers the correct ATE.
Cross-validated against lmtp::lmtp_sdr() for binary static and continuous shift settings.
2026-05-05 — Polish: target_trial(), ICE formula builder, feasibility diagnostic (Phase 15)
target_trial()constructor + S3 print method for documenting target trial emulation parameters (eligibility, treatment strategy, follow-up, outcome, causal contrast, analysis plan).ice_build_formula()now handles transformed time-varying confounders (poly(),ns(),log(),I()) via the newsubstitute_vars_in_term()helper. Previously, ICE silently dropped any term that wasn’t a bare column name when building per-step formulas.compute_pct_intervened()reports what fraction of observations are affected by each intervention (feasibility diagnostic). Appears in thepct_intervenedslot of per-intervention diagnostic panels. Returns NULL for natural-course and ipsi interventions. Longitudinal variant reports per-period and overall feasibility.
2026-05-04 — Built-in IPCW for MAR outcome censoring (Phase 14)
causat(censoring = "C", ipcw = TRUE) now fits an internal censoring model P(C = 0 | A, L) and applies inverse-probability-of-censoring weights to the estimator, replacing the previous row-filter-only censoring = behavior when ipcw = TRUE is specified.
Point estimators (14b). Gcomp, IPW, and matching all receive IPCW weights. For gcomp and matching, the IPCW weights enter the outcome model via the weights argument. For IPW, the IPCW weight multiplies into the treatment density-ratio weight. The sandwich variance engine extends with the censoring-model score block.
Longitudinal IPCW (14c). ICE and longitudinal IPW fit per-period censoring models and apply cumulative IPCW weights. The per-period censoring probability enters the per-step pseudo-outcome weighting.
Variance regression (14e). Sandwich SEs for IPCW fits agree with bootstrap SEs within the standard MC tolerance (ratio in 0.5–2.0).
Diagnostics (14f). diagnose() includes a censoring panel with IPCW weight distribution, censoring prevalence, and censoring-model positivity quantiles. Longitudinal IPCW shows per-period and cumulative censoring diagnostics.
Cross-validated against lmtp::lmtp_sdr() for both point and longitudinal IPCW settings.
2026-05-04 — Extended outcome type coverage (Phase 13)
causat(model_fn = MASS::glm.nb) and causat(model_fn = betareg::betareg, family = "beta") now work end-to-end with sandwich and bootstrap variance. Previously both errored because fit_gcomp_point() always passed a family = argument that neither function accepts.
Infrastructure changes:
- New
fn_accepts_family()helper conditionally stripsfamilyfrom model args when the fitting function doesn’t accept it. resolve_family("beta")returns a sentinel family object after checking thatbetaregis installed.variance_if_numeric()Jacobian now handles betareg’s list-structured coefficients ($mean+$precision).betaregadded to Suggests.
Expanded test coverage for all non-standard outcome families (Poisson, Gamma, quasibinomial, negative binomial, beta regression): binary and continuous treatments, multiple contrast types (difference/ratio/OR with rejection tests where invalid), sandwich and bootstrap SE agreement, IPW cross-checks against g-computation, and matching smoke tests. ~50 new truth-based simulation tests.
2026-05-03 — Stochastic interventions under g-computation (Phase 12)
New stochastic() intervention constructor accepts a user-supplied sampler function and implements Monte Carlo g-formula integration for both point and longitudinal (ICE) estimators.
sampler <- function(a, data) rnorm(length(a), mean = a - 1, sd = 0.5)
contrast(fit,
interventions = list(stoch = stochastic(sampler, n_mc = 100)),
reference = "observed")- Sandwich variance uses MC-averaged influence functions; bootstrap draws fresh MC samples per replicate.
- Rejection paths for IPW and matching with informative errors (
causatr_stochastic_ipw,causatr_stochastic_matching). - Cross-validated against
lmtp::lmtp_sdr()for point and longitudinal settings. - 57 new tests covering binary, continuous, categorical, and multivariate treatments across gaussian, binomial, Poisson, and Gamma outcome families.
2026-05-03 — Suppress non-integer binomial and NB iteration warnings
- IPW outcome MSMs fit with non-integer density-ratio weights now use
quasibinomial()instead ofbinomial()via the newmsm_family()helper (identical coefficients and SEs, no “non-integer #successes” warning). Applied in both point IPW and longitudinal IPW. - Propensity models under external survey weights use
quasibinomial()for the same reason.
2026-05-03 — Plot overhaul and diagnostics vignette (Phase 11 chunk 11e)
plot.causatr_diag() now supports three plot types via which = c("balance", "weights", "positivity"). The balance plot (default) remains the cobalt Love plot. Weight-distribution histograms (which = "weights") use tinyplot, facet by arm for binary treatments, and accept a log_scale argument. Positivity plots (which = "positivity") show the propensity-score density (binary), conditional density histogram (continuous/count), or per-level probability histogram (categorical).
New vignettes/diagnostics.qmd provides a user-facing tour of every supported diagnostic scenario: binary/continuous/categorical IPW, matching, gcomp, ATT/ATC, by = stratification, longitudinal, and all three plot types.
This completes Phase 11. All causatr_diag_*_pending gates from chunk 11a are lifted, and diagnose() now handles every supported combination of estimator, treatment type, estimand, and timing.
2026-05-02 — Treatment-type and longitudinal dispatch for diagnose() (Phase 11 chunks 11b, 11c)
Chunk 11b — treatment-type dispatch. diagnose() now handles continuous, categorical, count (Poisson / NB), and multivariate IPW fits. Continuous IPW reports density-range positivity (flagging observations in the low-density tails), categorical IPW reports per-level probability positivity, count IPW reports density-range positivity, and multivariate IPW reports per-component positivity plus the combined product-weight distribution. Previously all non-binary IPW fits hit a _pending classed error.
Chunk 11c — longitudinal dispatch. Longitudinal IPW and ICE fits now produce per-period diagnostics: per-period positivity and weight distribution for longitudinal IPW, per-period covariate balance for both. The cumulative (product) weight summary flags compounding instability across periods. Previously longitudinal fits hit a _pending classed error.
2026-04-26 — diagnose() rewrite foundation (Phase 11 chunk 11a)
diagnose() now accepts an interventions = argument that mirrors contrast()’s signature, and the returned causatr_diag object exposes a nested per_intervention named list (one panel per intervention key with positivity, balance, and weights slots). The default call diagnose(fit) is unchanged: a single panel keyed obs is auto- injected, populated with the observed-treatment Horvitz-Thompson view (1/p on treated, 1/(1-p) on controls). Top-level positivity / balance / weights slots continue to point at the first panel so every existing flat-shape consumer keeps working without modification.
diag <- diagnose(fit,
interventions = list(a1 = static(1), a0 = static(0)))
diag$per_intervention$a1$weights # weight summary under static(1)
diag$per_intervention$a0$weights # weight summary under static(0)User-named interventions route through compute_density_ratio_weights() to build the per-intervention HT-style weight; the resulting treated / control / overall summary uses the same Kish effective sample size formula as the legacy view, with a divide-by-zero guard so the zero-weight off-arm under a static-arm intervention reports ess = 0 cleanly.
Seven causatr_diag_*_pending classed errors gate every sub-feature that the rewrite has not yet absorbed: causatr_diag_continuous_pending (continuous IPW), causatr_diag_categorical_pending (categorical IPW), causatr_diag_count_pending (Poisson / negbin IPW), causatr_diag_multivariate_pending (multivariate IPW), causatr_diag_longitudinal_pending (replaces the pre-chunk-11a flat abort on ICE fits), causatr_diag_estimand_pending (IPW under ATT / ATC — gcomp and matching are unaffected because their diagnostic paths are estimand-agnostic), and causatr_diag_em_pending (the by = argument is reserved at the signature level so future chunks can lift the rejection without a signature break). The chunk 11a balance view remains the unadjusted SMD across treatment groups; per-intervention post-weighting balance lands in chunk 11d alongside the ATT / ATC rewrite.
The 12 pre-existing test blocks in test-diagnose.R continue to pass unchanged as regression anchors; 12 new test blocks cover the per- intervention dispatch path (with manual 1/p ESS reconstruction sanity), the multi-panel print() output, and one classed-error check per _pending gate.
2026-04-26 — Effect modification under longitudinal IPW (Phase 10 chunk 10c)
A:modifier interactions in confounders = are now supported under longitudinal IPW. The per-period propensity formulas strip every treatment-touching term via parse_effect_mod() (modifier main effects are kept; only the A:sex interaction is removed because A is the response of the propensity model), and the final-period MSM expands from Y ~ 1 to Y ~ 1 + modifier via the existing build_ipw_msm_formula() so predict() returns stratum-specific counterfactual means.
fit <- causat(d, outcome = "Y", treatment = "A",
confounders = ~ L0 + sex + A:sex,
confounders_tv = ~ L,
id = "id", time = "time", estimator = "ipw")
contrast(fit, list(a = static(1), z = static(0)),
reference = "z", by = "sex")Known limitation. The modifier MUST be a baseline covariate (Robins 2000). A time-varying modifier in an MSM conditions on a post-treatment variable, biasing the estimand. Not enforced at runtime (time-varying status isn’t inferable from data); doc-level constraint only. Use a structural nested model (Phase 18) for time-varying effect modification.
The chunk 10a causatr_longitudinal_em_pending rejection is removed. Bare treatment in confounders (~ L + A) continues to be rejected upstream by check_em_compat() with causatr_bare_treatment_in_confounders.
2026-04-26 — Stabilized longitudinal IPW (Phase 10 chunk 10b)
causat(estimator = "ipw", id, time, stabilize = "marginal") now fits per-period numerator models \(g_k(A_k \mid \bar A_{k-1}, V)\) that drop time-varying confounders from the conditioning set. The per-period weight becomes \(g_k(d(A_k) \mid \bar A_{k-1}, V) \cdot
\lvert \mathrm{Jac} \rvert / f_k(A_k \mid \bar A_{k-1}, \bar L_k)\), dampening the multiplicative L-dependence across periods that inflates the cumulative product.
fit <- causat(d, outcome = "Y", treatment = "A",
confounders = ~ L0, confounders_tv = ~ L,
id = "id", time = "time", estimator = "ipw",
stabilize = "marginal", numerator = ~ L0)- Default
numerator = NULLkeeps treatment lags only (drops L from every period). Customnumerator = ~ Vadds the user-supplied baseline conditioning set on top of the treatment lags so the chain-rule factorisation of \(g_k\) stays valid. - Sandwich variance treats numerator parameters \(\gamma\) as fixed (same nuisance-fixed convention as Phase 8e for multivariate IPW and \(\sigma\) / \(\theta\) for Gaussian / negbin densities). Bootstrap refits both numerator and denominator per replicate and captures the full uncertainty.
- Stabilized + static binary recovers identical point estimate and SE as unstabilized (HT indicator collapses surviving rows so numerator and denominator densities coincide).
- Stabilized +
numerator =withoutstabilize = "marginal"is rejected with acausatr_longitudinal_numerator_without_stabilizeclassed error (no semantically valid use case for a custom numerator without stabilization).
2026-04-26 — Longitudinal IPW core (Phase 10 chunk 10a)
causat(estimator = "ipw", id = ..., time = ...) now ships: fits a per-period treatment density chain \(f(A_k \mid \bar A_{k-1},
\bar L_k)\) across time_points, builds the cumulative density-ratio weight per individual under sequential-MTP semantics (Robins, Hernán, Brumback 2000; Díaz et al. 2023), refits an intercept-only Hájek MSM on final-period outcomes, and reads \(\hat\mu_a\) off the intercept.
fit <- causat(d, outcome = "Y", treatment = "A",
confounders = ~ L0, confounders_tv = ~ L,
id = "id", time = "time", estimator = "ipw")
contrast(fit, list(s = shift(0.5), n = NULL),
reference = "n", ci_method = "sandwich")- Supported interventions:
static()/shift()/scale_by()/dynamic()(per-period dispatch through the existingcheck_intervention_family_compat()gates). - Variance: stacked sandwich (
variance_if_ipw_longitudinal()) with block-diagonal propensity bread across periods, plus id-clustered nonparametric bootstrap that resamples entire individual trajectories. - Sequential positivity: per-period weight tails are scanned and a
causatr_longitudinal_seq_positivitywarning surfaces the offending period(s) (default threshold 100). - Rejected (deferred to follow-up chunks): effect-modification (
A:modifier),stabilize = "marginal"/numerator =, multivariate longitudinal treatment,ipsi()per period. Each rejection ships with a_pendingclassed error pointing at the alternative. - The longitudinal-IPW rejection in
check_causat_inputs()is removed; matching is the only estimator still gated for longitudinal data.
2026-04-22 — future backend for bootstrap (Phase 9c)
contrast(parallel = "future") now dispatches bootstrap replicates through future.apply::future_lapply(), so any active future::plan() is honoured. Users with remote / cluster / future.batchtools workflows no longer have to set up socket clusters by hand via parallel = "snow".
library(future)
plan(multisession, workers = 4)
contrast(fit, list(a1 = static(1), a0 = static(0)),
ci_method = "bootstrap", n_boot = 500,
parallel = "future")- The
"no" / "multicore" / "snow"paths remain unchanged and still go throughboot::boot()."future"bypassesboot::boot()to let thefutureplan own the scheduling. ncpusis ignored under"future"— the plan’s worker count is the single source of truth.- Reproducibility:
future.seed = TRUEgives each worker a deterministic L’Ecuyer stream, soset.seed()at the dispatch site governs every replicate. futureandfuture.applyare Suggests-only;check_pkg()gates the dispatch.
2026-04-22 — survey::svydesign integration (Phase 9b)
causat(weights = ) now accepts a survey::svydesign object directly. The sampling weights are extracted via stats::weights(design) and the design’s first-stage cluster (PSU) is auto-propagated into the fit’s cluster slot so the contrast-time sandwich is cluster-robust by default.
library(survey)
des <- svydesign(ids = ~psu, weights = ~pw, data = d)
fit <- causat(d, outcome = "Y", treatment = "A",
confounders = ~ L, weights = des)
# weights and cluster both applied automatically
contrast(fit, list(a1 = static(1), a0 = static(0)))Behaviour:
- Equivalent to
causat(weights = weights(des), cluster = names(des$cluster)[1])— identical point estimate and SE (pinned intest-survey-design.R). - Users can override the auto-adopted PSU by passing
cluster =explicitly; the override wins. - Single-PSU designs (
svydesign(ids = ~1, ...)) do not auto-adopt a cluster — only the weights flow through. - Row counts must match: a design built on a larger / different data frame aborts with
causatr_bad_svydesign. surveyis Suggests-only;check_pkg("survey")gates the unpack path.
2026-04-22 — General cluster-robust sandwich (Phase 9a)
causat() and contrast() accept a new cluster argument naming a column for cluster-robust sandwich variance aggregation. Before this change the only cluster-robust path was matching’s internal subclass aggregation; users with design clusters (site, household, PSU, repeated measures) had to hand-roll their own SE.
The implementation threads through the existing influence-function engine: variance_if() now accepts a cluster_vec argument that arrives at vcov_from_if(cluster = ...), which has supported the sum-within-cluster-then-square Liang & Zeger (1986) aggregation since the matching branch was wired. Numerically equivalent to sandwich::vcovCL(type = "HC0") applied to the final predict-then-average step (agreement to within a G / (G - 1) small-cluster correction, pinned in test-cluster-sandwich.R).
Scope
- Supported:
estimator = "gcomp","ipw","ice"(includingbystratification and longitudinal ICE; for ICE the cluster vector is read from the first-time-point rows). - Rejected at fit + contrast:
estimator = "matching"already clusters on its ownsubclass; layering a user cluster on top would double-count within-subclass rows. Classed ascausatr_bad_cluster. - Tier 2 numerical fallback (no
sandwich::estfunmethod and no analytic bread): only the Channel-1 block is cluster-adjusted; theJ V_beta J^Tblock uses the model’s standard vcov. Emits a classed warningcausatr_tier2_cluster_partial.
API
fit <- causat(data, outcome = "Y", treatment = "A",
confounders = ~ L, cluster = "site")
# cluster auto-propagates from fit to contrast
contrast(fit, list(a1 = static(1), a0 = static(0)))
# or specify at contrast time only
fit <- causat(data, outcome = "Y", treatment = "A", confounders = ~ L)
contrast(fit, list(a1 = static(1), a0 = static(0)),
cluster = "site") # column name
contrast(fit, list(a1 = static(1), a0 = static(0)),
cluster = data$site) # direct vectorTruth testing
- gcomp oracle: direct match against
sandwich::vcovCL(fit$model, cluster = d$cl, type = "HC0")on a saturatedY ~ A + Lmodel. Agreement to machine precision after theG / (G - 1)correction. - IPW + ICE: cluster-randomised design (cluster-level treatment assignment + cluster-level outcome shock) produces SE inflation vs independent aggregation; regression-tested at 1.05x-1.1x lower bounds.
- Singleton clusters: one cluster per row reduces to the standard independent aggregation (tested by equality to the
cluster = NULLpath).
2026-04-22 — Multivariate treatment IPW (Phase 8)
estimator = "ipw" now accepts multivariate (joint) treatments treatment = c("A1", "A2", ...). Previously the path hard-aborted; users needed estimator = "gcomp" for joint interventions. The new implementation factorises the joint conditional density via the chain rule
\[ f(A_1, \ldots, A_K \mid L) = \prod_{k=1}^{K} f_k(A_k \mid A_{1..k-1}, L), \]
fits one univariate density model per component (sequential factorisation), and forms the product density-ratio weight per intervention.
Estimand semantics: sequential MTP (Díaz et al. 2023)
Multivariate IPW implements the sequential modified treatment policy estimand (Díaz, Williams, Hoffman, Schenck 2023, JASA), the standard in the causal-inference MTP literature and what lmtp computes. Under this interpretation, the counterfactual at stage \(k\) is \(A_k^d =
d_k(A_k^n)\) where \(A_k^n\) is drawn from the conditional density of \(A_k\) given the COUNTERFACTUAL history up to stage \(k\). At identification time this plugs observed upstream treatments into the \(k\)-th factor’s conditioning:
\[ w_k(\alpha_k) = \frac{f_k(d_k^{-1}(A_k) \mid A_{1..k-1}^{\mathrm{obs}}, L; \alpha_k) \cdot |\mathrm{Jac}\,d_k^{-1}|}{f_k(A_k \mid A_{1..k-1}^{\mathrm{obs}}, L; \alpha_k)}. \]
Under natural course on component \(k\) (no intervention), the ratio is identically \(1\) — downstream components do not accumulate an “upstream conditioning shift” from natural-course components.
Multivariate gcomp implements a different estimand — deterministic joint transformation, \(E[Y^d] = E\{E[Y \mid A_1 = d_1(A_1^{\mathrm{obs}}),\, A_2 = d_2(A_2^{\mathrm{obs}}),\, L]\}\) via simultaneous per-individual substitution of each counterfactual column. For static-only interventions the two estimators coincide (the indicator on the HT branch collapses the conditioning); for non-static interventions on non-final components they can differ by the upstream → downstream cross-dependence (e.g. shift + shift on a continuous × continuous DGP with \(A_1 \to A_2\) coefficient \(0.3\): gcomp gives \(-0.9\), IPW gives \(-1.02\)). This difference is intentional and pinned by a regression test.
Supported combinations
- Binary × binary (and K = 2, 3 binary), all-static, gauss / binom outcome, RD / RR / OR contrasts.
- Binary × continuous mixed type, static + shift / scale_by / dynamic.
- Continuous × continuous, shift + shift, scale + scale, dynamic components.
- Effect modification via
A_k:modifier(per-component propensity formulas strip treatment-touching terms; MSM expands toY ~ 1 + modifier_main_effects). Baseline-modifier constraint from Phase 6 carries over. - Categorical components (via
nnet::multinomper component; multinomial softmax closure inmv_ht_closure). Cat × bin, bin × cat, and cat × cat all supported. - Count components (Poisson / negative binomial) via per-component
propensity_familyopt-in (NULL/ length-1 / length-K). Truth-based tests cover bin \(\times\) Poisson, bin \(\times\) negbin, and Poisson \(\times\) continuous (the last exercises the upstream \(\to\) downstream chain-rule structure under sequential MTP: shift(\(+1\)) on \(A_1\) propagates to \(E[A_2 \mid A_1^d, L]\) through the fitted \(A_1 \to A_2\) coefficient, giving truth \(0.3 + 0.4 \cdot 0.2 = 0.38\) on the canonical DGP). - ATE only (matches the existing single-treatment ATT/ATC gate that rejects multivariate at fit time).
- Sandwich + bootstrap variance,
by =,subset =, survey weights.
Internal bits
fit$details$treatment_models is the list of per-component causatr_treatment_model objects (length K, named by treatment column). The legacy treatment_model slot stays populated for univariate fits. fit_treatment_models() in R/treatment_model.R does per-component fitter auto-selection. The mv sandwich (compute_ipw_if_self_contained_mv_one()) exploits that the K propensity models are fit independently, so the stacked propensity bread is block-diagonal — the propensity correction sums K single-model apply_model_correction() results.
Rejected with classed errors
ipsi()in any component (causatr_multivariate_ipsi) — Kennedy’s closed form has no joint analogue.- Multivariate matching still rejected (MatchIt is binary-only).
Stabilized weights (Phase 8e)
Multivariate IPW now accepts a stabilize = c("none", "marginal") argument. Under "marginal", the per-component numerator factor is swapped for a separately-fit density \(g_k(A_k \mid A_{1..k-1})\) that drops the full covariate vector \(L\) from the conditioning (Robins, Hernán, Brumback 2000, extended to multivariate). The per-component weight becomes
\[ w_k^{s}(\alpha_k) = \frac{g_k\bigl(d_k^{-1}(A_k) \mid A_{1..k-1}\bigr) \cdot |\mathrm{Jac}|}{f_k(A_k \mid A_{1..k-1}, L; \alpha_k)}. \]
The numerator model’s parameters are held fixed in the sandwich variance (nuisance-fixed, same convention as \(\sigma\) for Gaussian and \(\theta\) for negative binomial). Bootstrap refits both numerator and denominator models on every replicate and captures the full variance correctly. Implemented for all treatment families the mv engine already supports (binary, continuous, categorical, Poisson, negative binomial) and every intervention shape (static, shift, scale, dynamic, natural course). Rejected with causatr_stabilize_univariate under univariate IPW (univariate already benefits from the Hájek normalization on the MSM).
Whether stabilization reduces or inflates the sandwich SE depends on the DGP: when the marginal density of \(A_k\) has smaller variance than the full-\(L\) conditional, weights are dampened and the SE shrinks; when the full-\(L\) conditional is tighter than the marginal, stabilization can inflate SE. This is a known property of MV MTP weights (Díaz et al. 2023) and a DGP-dependent tradeoff, not a bug.
2026-04-20 — Breaking change: survival analysis removed
Causal survival analysis has been extracted from causatr into its own etverse package. This removes the scaffolded Phase 7 surface — the causat_survival() export, type = "survival" branch in contrast(), PHASE_7_SURVIVAL.md, the survival.qmd vignette, and all survival tests and snapshots. to_person_period() stays in causatr because it also serves general longitudinal data.
If you were using causat_survival(), switch to the new survival package (etverse/survatr). Until then, manual pooled-logistic regression on person-period data (via causat(..., estimator = "gcomp") plus custom intervention + cumulative-product code) remains possible.
2026-04-20 — Bug fixes
bread_inv()now aborts withcausatr_gam_missing_vpwhen a GAM-classed fit object lacks$Vp. Previously it fell through to the GLM-styleX'WXbread onmodel.matrix(model), silently miscomputing the sandwich variance for any caller that built a GAM-subclassed object without the Bayesian posterior covariance. Properly fittedmgcv::gam()objects always carry$Vp, so this is a defensive guard rather than a regression for standard users. Sixth-round critical review Issue L2.ICE g-computation now emits a rate-limited, classed
causatr_ice_boundary_saturationwarning when any binary-outcome prediction lands exactly at 0 or 1. Such rows enter the next backward pseudo-outcome model’s quasibinomial fit with zero IWLS working weightmu * (1 - mu)and are silently excluded from that fit, biasing the counterfactual chain. The warning is emitted on both the final-time model’s predictions and each subsequent backward step, and is rate-limited viarlang::warn(.frequency = "once")so long chains and bootstrap loops don’t flood stderr. Sixth-round critical review Issue L3.Tier-2 sandwich variance fallback in
variance_if_numeric()now tags the returned vcov withattr(., "tier2_approximate") = TRUE, andprint.causatr_result()surfaces a one-line note when the flag is present. Previously the approximate-variance path was only announced via a classedrlang::warn()that scrolled off quickly and was not detectable post-hoc without re-running the variance calculation. Users and batch pipelines can now queryattr(result$vcov, "tier2_approximate")to decide whether to preferci_method = "bootstrap". Sixth-round critical review Issue L1.expand_em_lag_terms()now correctly finds all treatment components in multivariate treatment EM terms (e.g.A1:A2:sexwithtreatment = c("A1", "A2")). Previously,which(components == trt_var)relied on R’s==vector recycling, which silently returned zero matches whentreatment_varwas a vector of length > 1 whose length did not evenly divide the component count. The function returnedcharacter(0), dropping all lag EM terms from ICE formulas. Fixed by switching to%in%. Fifth-round critical review Issue #B1.IPW
compute_ipw_contrast_point()now predicts and averages onfit_data(outcome-complete rows) instead of the full data. When the MSM includes modifier terms (e.g.Y ~ 1 + sexfrom effect modification), predictions vary across rows. Previously, the marginal mean included NA-outcome rows that the variance engine excluded, creating a centering mismatch in Channel 1 of the sandwich IF. Under modifier-dependent outcome missingness the sandwich SE was biased (9% underestimate at 70% missingness in a repro). ForY ~ 1(no EM) the bug was invisible because all predictions were constant. Fifth-round critical review Issue #B2.Removed unused
em_confounder_terms()helper fromR/effect_modification.R. Added a diagnosticrlang::warn()inexpand_em_lag_terms()when no treatment components are found in an EM term (previously silent). Fifth-round critical review Issues #S1, #S2.
2026-04-17 — Phase 6 complete: unified effect-modification API
Phase 6 (chunks 6a–6e) delivers a unified effect-modification API across all four estimation methods. Users write confounders = ~ L + sex + A:sex and every method does the right thing:
- G-comp feeds the interaction to the outcome model directly.
- IPW expands the per-intervention MSM from
Y ~ 1toY ~ 1 + modifierviabuild_ipw_msm_formula(). - Matching expands the outcome MSM from
Y ~ AtoY ~ A + modifier + A:modifierviabuild_matching_msm_formula(). - ICE auto-expands the interaction across treatment lags (
lag1_A:modifier,lag2_A:modifier, …) viaexpand_em_lag_terms().
Cross-method triangulation test confirms that gcomp, IPW, and matching agree on stratum-specific ATEs within cross-method tolerance on the same DGP. Vignettes updated with worked examples for all four methods.
2026-04-17 — Phase 6 chunk 6d: ICE lag auto-expansion for effect modification
ICE g-computation now auto-expands A:modifier interaction terms across treatment lags. When confounders = ~ L0 + sex + A:sex, the ICE outcome model at each backward step includes not only A:sex (current period) but also lag1_A:sex, lag2_A:sex, etc. for all available lags. Without this expansion, ICE captured effect modification only at the current treatment period, collapsing roughly half the intended heterogeneity.
- New
expand_em_lag_terms()helper inR/effect_modification.Rsplits the interaction term on:, substitutes the treatment component withlag{k}_{treatment}, and returns the expanded terms. fit_ice()now parses and storesem_infofromparse_effect_mod().ice_build_formula()acceptsem_infoand appends lag-expanded EM terms to the formula RHS, subject to the same all-NA column validity check as other dynamic terms.- The sandwich variance engine and bootstrap path require no changes: both operate generically on the model matrices, which simply grow to accommodate the additional interaction columns.
- Tests: 2-period truth (ATE|sex=0 = 5, ATE|sex=1 = 8), 3-period truth (ATE|sex=0 = 8, ATE|sex=1 = 12.5), multiple EM terms smoke test, bootstrap coverage, regression guard.
2026-04-17 — Phase 6 chunks 6a–6c: effect modification infrastructure + IPW/matching MSM expansion
Phase 6 chunk 6a — parse_effect_mod() infrastructure + gate refactoring.
- New
R/effect_modification.Rwith the shared effect-modification parser that all four estimation methods consult.parse_effect_mod()classifies each term inconfoundersas either a pure confounder or an effect-modification interaction (A:modifier), returning a structuredcausatr_em_infoobject. - Companion helpers:
build_ipw_msm_formula(),build_matching_msm_formula(), andcheck_em_compat()provide method-specific MSM construction and validation logic. - The old blanket-reject
check_confounders_no_treatment()is refactored intocheck_confounders_treatment(), which delegates to the new parser. Bare treatment terms (~ L + A) are still rejected; true EM terms (A:sex) now pass through the parser for method-specific dispatch.
Phase 6 chunk 6b — IPW MSM expansion for effect modification.
- The IPW MSM expands from intercept-only (
Y ~ 1) toY ~ A + modifier + A:modifierwhenparse_effect_mod()detects an interaction term.build_ipw_msm_formula()constructs the expanded formula;compute_ipw_contrast_point()passes it to the per-intervention weighted GLM refit. - The variance engine (
variance_if_ipw) generalises unchanged:J,X_star, andphi_barall extend fromp_beta = 1top_beta > 1. - Tests: DGP-4 truth-based simulation (binary treatment x binary modifier, sandwich SE), bootstrap parity, gcomp cross-check.
Phase 6 chunk 6c — Matching MSM expansion for effect modification.
- The matching MSM expands from
Y ~ AtoY ~ A + modifier + A:modifierwhenparse_effect_mod()detects an interaction term inconfounders.build_matching_msm_formula()constructs the saturated MSM; the expanded model recovers stratum-specific treatment effects via the matched-sample outcome regression. refit_matching()(bootstrap path) replays the EM-expanded MSM formula so bootstrap SEs correctly track the effect-modification structure.- Formula environment fix:
stats::glm()evaluatesweightsin the formula’s environment, but formulas returned from helper functions carry the helper’s frame. Bothfit_matching()andrefit_matching()now reset the formula environment to the local frame. - The variance engine (
variance_if_matching) generalises unchanged:prepare_model_if()andapply_model_correction()work on the expanded GLM’s coefficients, score, and bread without modification. - Tests: DGP-4 truth-based simulation (binary treatment x binary modifier, sandwich SE), bootstrap parity, gcomp cross-check, regression guard.
2026-04-17 — CRAN compliance: non-ASCII character cleanup
R CMD check --as-cran flags non-ASCII characters in .R files. Replaced all occurrences across 22 R/ files: em/en dashes, arrows, multiplication signs to ASCII equivalents; accented names to unaccented; Unicode math symbols to \eqn{} LaTeX in roxygen and plain ASCII in comments.
2026-04-17 — Documentation overhaul
Data-generating mechanism descriptions in vignettes. All vignettes that use simulated data now include explicit DGM descriptions before each simulation code block, providing:
- A plain-language description of the causal structure.
- Structural equations (SEM) in display math.
- The known true value of the estimand when analytically available.
Vignettes using the NHEFS dataset now include a description of the assumed causal structure (treatment, outcome, confounders, DAG) at the start. Affected vignettes: introduction, gcomp, ipw, matching, interventions, longitudinal, triangulation, missing-data, validation.
Vignette coverage expansion. New sections in existing vignettes to cover feature combinations that had test coverage but no user-facing documentation: Gamma outcome and quasibinomial outcome (gcomp, matching), GLM+splines model specification, IPSI dose-response curve (ipw), 4-period ICE example (longitudinal), missing-data vignette (new).
Unicode-to-LaTeX conversion in docs. Converted Unicode math symbols in vignettes and CLAUDE.md to proper LaTeX for correct MathJax rendering.
Other documentation changes:
gcomp.qmdgains a callout note explaining that GAM sandwich SEs use the penalised Bayesian covariance (model$Vp) as the bread inverse.longitudinal.qmdgains a truth reference line atE[Y] = 60on the marginal-means comparison plot for the Table 20.1 null-effect example.- Variance-theory vignettes enable
tinytable_html_mathjaxfor correct LaTeX rendering in table captions. - Numerical tables in
variance-theory.qmdreduced from 6 to 4 significant digits for readability.
2026-04-17 — print.causatr_result() uses digits = 3
Console output for causatr_result objects now prints with digits = 3 (down from 4), matching the knit_print method for more compact display.
2026-04-16 — Phase 4 chunk 3j: count treatment IPW (Poisson / negative binomial)
causat() gains a propensity_family parameter ("poisson" or "negbin") that enables IPW estimation on integer-valued count treatments. Auto-detection never infers count (non-negative integers like age are legitimately Gaussian), so this is an explicit opt-in.
Runtime: - fit_count_density() fits either stats::glm(family = poisson()) or MASS::glm.nb() (auto-selected for "negbin" when propensity_model_fn = NULL). - evaluate_density() gains Poisson (dpois) and NB (dnbinom) branches. - make_weight_fn() gains a count closure using the log link lambda = exp(X %*% alpha). - check_intervention_family_compat() allows shift() (integer delta only) and scale_by() (inverse must preserve integer support) on count treatments; rejects static, dynamic, threshold, ipsi.
Variance: no changes needed — Poisson and NB GLMs inherit from glm, so bread_inv() and prepare_model_if() work out of the box. NB theta is treated as fixed (same convention as Gaussian sigma).
Tests: truth-based Poisson DGP (shift(1) recovers ATE = 1.5), NB parity (NB nests Poisson on the same DGP), 6 rejection snapshot tests for disallowed intervention types.
2026-04-16 — Fifth-round critical review S2: near-zero intervened density warning
compute_density_ratio_weights() now warns (class causatr_near_zero_intervened_density) when > 80% of observations have near-zero density at the intervened treatment value under shift() or scale_by(). This catches the case where the intervention pushes treatment far outside the fitted distribution’s support, producing density-ratio weights that are effectively zero for the entire sample. The Hájek estimator still runs (the math is valid — zero weights just mean no contribution), but the resulting estimate is degenerate and should not be interpreted. Previously this situation was silent.
2026-04-16 — Fifth-round critical review R1: diagnose() censoring alignment
diagnose() now correctly excludes censored rows when computing positivity, balance, and simple-balance diagnostics for gcomp fits that use a censoring column. Previously, compute_positivity(), compute_balance(), and compute_balance_simple() called get_fit_rows(data, outcome) without passing fit$censoring, so diagnostics were computed on the full non-NA-outcome sample (including censored individuals) while the main gcomp pipeline excluded them. The diagnostics thus described a different sample than the one used for estimation. All three sites now call get_fit_rows(data, outcome, fit$censoring). Regression test in tests/testthat/test-diagnose.R.
2026-04-15 — Fourth-round critical review: reject na.action = na.exclude
causat() and causat_survival() now abort via check_dots_na_action() when na.action = na.exclude is forwarded through .... Under na.exclude, stats::residuals(model, "working") is padded with NAs to the original data length while model.matrix(model) drops NA rows — the length mismatch propagated into prepare_model_if()’s r_score and then silently corrupted the Channel-2 correction via R’s recycling (only a “longer object length” warning, no abort). Sandwich SEs were mathematically wrong. Only na.omit (default) and na.fail are accepted now; users with NAs should drop them explicitly before calling causat(). New error class causatr_bad_na_action. Regression test in tests/testthat/test-causat.R.
2026-04-15 — Third-round critical review: dots audit + replay_fit() helper + T6/T7/T9
A full audit of the fit$details$dots plumbing (capture sites in fit_gcomp_point() / fit_ipw() / fit_matching() / fit_ice(), replay sites in refit_gcomp() / refit_ipw() / refit_matching() / ice_iterate()) flagged three systemic risks:
- R1 (T2): positional dots with empty names slipped through the hand-written
setdiff(names(dots), ...)strips. - R2 (C5): four duplicate-strip blocks had to stay in sync by hand.
- R4: the strip lists were drifting — adding a new reserved key to one site but forgetting the others was a silent regression.
Fix — Alternative E in the audit: one central helper. New replay_fit(fn, base_args, dots, reserved) in R/utils.R owns the c(base_args, dots) + do.call(fn, ...) composition. All four refit sites (plus fit_ipw() itself) now call it. Behavior:
- Unnamed (positional) entries in
dotsare dropped. Fixes T2. - Keys already present in
base_argsare stripped fromdotsso the caller’s explicit value always wins. Covers C5 centrally. - A
reservedvector lets callers block additional keys the target function uses implicitly (e.g.WeightIt::weightit’ss.weightswhen external weights are not supplied).
Other non-dots fixes in the same pass:
- T6 —
causat_survival()now usesstats::quasibinomial()when external weights are supplied, suppressing the spurious “non-integer #successes in a binomial glm” warning without changing coefficients or SEs (same score equations, free dispersion). - T7 —
bread_inv()’s singular-bread warning now carries acausatr_singular_breadclass. The bootstrapwithCallingHandlersinvariance_bootstrap()matches on class, not on the English substring, so a future wording change cannot silently break the demotion. - T9 —
causat_survival()’s internal-column strip now sources the list from a newCAUSATR_SURVIVAL_INTERNAL_COLSconstant inR/utils.R(derived assetdiff(CAUSATR_RESERVED_COLS, ".pseudo_y")), so adding a new reserved column propagates automatically.
Audit items R3 (unknown dots silently ignored by target function), R5 (backward-compat with old fit objects missing $dots), R6 (replay opacity), and R7 (arg aliasing across target functions) were verified as non-bugs via direct R scripts — base R catches unknown args at fit time, %||% list() handles missing $dots, fit$details$dots is inspectable, and explicit base_args locks override dots.
Tests. New tests/testthat/test-replay-fit.R adds 14 unit tests pinning replay_fit() edge cases (base-wins precedence, positional dots dropping, reserved-key blocking, NULL dots, glm.control catching unknown args, backward-compat with missing $dots, end-to-end gcomp and IPW bootstrap replay). All 44 earlier critical-review + weight-edge-case tests still pass unchanged.
2026-04-15 — Second-round critical-review follow-ups (C3, C5, C12) + B7 gradient fix + weight edge-case tests
A second-round self-review of the 2026-04-15 critical-review sweep flagged twelve concerns (C1–C12). C1, C2, C4, C6, C7, C8, C9, C10, C11 were either false alarms (misread R semantics) or acceptable as-is. C3, C5, and C12 landed as real fixes; running the regression tests also surfaced an incomplete B7 fix. A new tests/testthat/test-weights-edge-cases.R adds 24 weight-specific tests covering every branch the critical review flagged as weight-sensitive.
C3 — Classed abort for empty target population.
build_point_channel_pieces()now emitsrlang::abort(class = "causatr_empty_target", ...), and thebyskip path incompute_contrast()matches on class, not on the English message text. Any future wording drift can no longer silently turn skipped strata into real aborts.C5 — Duplicate key stripping in
c(args, dots)compositions.fit_ipw(),refit_ipw(),refit_gcomp(), andice_iterate()now strip keys already present inargsfrom the stasheddotsbefore thec().do.call()always takes the first named duplicate, so the previous behavior was correct but fragile — explicit stripping makes the precedence unambiguous.C12 —
.causatr_prev_event/.causatr_prev_censstripped fromfit$data. The within-id risk-set bookkeeping columns incausat_survival()are now dropped before thecausatr_fitis returned, so they no longer leak into user-visible state.B7 gradient follow-up — unify ICE step_i == 1 gradient on the weighted form. The B7 fix in the first sweep only touched the Channel-1 IF vector, not the corresponding sensitivity gradient computed at step 1 of the backward iteration, which still used
n_target(now removed from the enclosing scope). Running the new regression test surfaced an_target not foundabort. The gradient is now computed via the samew_t * mu_eta / sum_w_targetform and reduces to the original unweighted expression whenw_t = 1.Weight edge-case coverage. New test file exercises: NA / Inf / NaN / negative / non-numeric / wrong-length weights rejected by
check_weights(); zero weights as row exclusion; uniform weights producing identical SEs to the unweighted path (gcomp + ICE); heterogeneous weights changing SEs non-trivially; IPW survey weights attaching ass.weights; matched outcome model carryingmatch_weight * external_weight;causat_survival()weight plumbing to the hazard fit.
2026-04-15 — Critical-review sweep (B1-B8, R3-R12)
A second full-codebase critical review flagged ~twenty correctness and robustness issues across the variance engine, bootstrap refit path, contrast() dispatch, and causat_survival(). Every blocking fix ships with a regression test in tests/testthat/test-critical-review-2026-04.R.
Blocking correctness fixes
B1 —
subsetexpressions now resolve caller-scoped variables.contrast(fit, subset = quote(age > cutoff))previously evaluatedsubsetwithparent.frame()deep inside internal dispatch frames, socutoff(defined in the user’s session) failed to resolve. The caller’s frame is now captured once atcontrast()’s top level and threaded throughcompute_contrast(),get_target_idx(),variance_bootstrap(), andice_variance_bootstrap()as an explicitsubset_envargument.get_target_idx()also validates that the evaluated expression has lengthnrow(data)so a scalarTRUEcannot recycle silently.B2 — Bootstrap replicates replay the user’s
...verbatim.refit_gcomp(),refit_ipw(), andrefit_matching()previously dropped the user’s extras (method = "cbps",stabilize = TRUE,caliper,ratio,gamma, GAM smoothing, etc.), so bootstrap SEs for any non-default estimator silently corresponded to a different estimator than the point estimate.fit_ipw()/fit_matching()/fit_gcomp_point()now stashlist(...)ontofit$details$dotsand each refit function threads those back throughdo.call().B5 —
causat_survival()implements the documented censoring rule. The roxygen promised “subsequent rows for that individual are also dropped”, but the implementation only excluded the current row. A.causatr_prev_censwithin-id cumsum + lag column now drops all rows at and after the first censor per id.B6 — External weights enter WeightIt as
s.weights.fit_ipw()post-multiplied external weights ontow$weightsafterWeightIt::weightit()had finished, so the Mparts sandwich correction silently under-corrected survey-weighted IPW. External weights now flow in ass.weights, andrefit_ipw()mirrors the change at bootstrap time.B7 — ICE Channel-1 IF unified on a single formula.
variance_if_ice_one()previously used(n / n_target) * (Y - mu)unweighted andn * (w / sum_w) * (Y - mu)weighted. These agree only whensum(w) == n_target, so heterogeneous weights mis-scaled the Channel-1/Channel-2 cross-term. The unweighted branch now setsw = 1and shares the weighted formula. A regression test pins uniform-weighted and unweighted SEs to machine precision.B8 —
byskips empty strata instead of aborting the whole call.by_valsis now enumerated fromfit$details$fit_rows(not the full data), and any per-level call that still ends up with an empty target is caught by atryCatch, collected into a singlerlang::warn()at the end, and dropped from the combined result.
Required robustness fixes
R3 — Branch-A propensity IF reconciles via
na.action. When the MSM and propensity fits drop different rows,prepare_propensity_if_weightit()now intersectsfit_idxwith the MSM’sna.actioninstead of aborting.R6 — OR validation no longer aborts on NA
mu_hat. Themu_hat >= 1check now pre-filters NAs consistently with the<= 0check.R7 —
bread_inv()prefersweights(model, type = "working"). Keeps GLM subclasses (glm_weightit, glmnet’s glm wrapper) correctly aligned with the sandwich convention.R8 — Matched cluster aggregation uses a first-seen factor.
vcov_from_if(..., cluster = subclass)no longer sorts the cluster levels alphabetically / numerically, which had desynchronisedrowsum(..., reorder = FALSE)fromIF_mat’s row order.R9 —
causat_survival()sorts once up front viasetkeyv(). Previously twoby = idaggregations relied on data.table’s group-order staying consistent between calls.R10 — Robust Mparts detection. Handles both list-shaped and flag-shaped
attr(w, "Mparts")across WeightIt versions.R11 — Dead
is.integerbranch removed ininterventions.R.is.integer(x)already impliesis.numeric(x).R12 — Reserved column names centralised. New constant
CAUSATR_RESERVED_COLSand helpercheck_reserved_cols()inR/utils.R, called fromprepare_data(),fit_ice(), andcausat_survival(). The previously-unprefixedprev_eventinternal column was renamed to.causatr_prev_event.
Suggestions addressed
- S1 —
subsetlength validated inget_target_idx(). - S5 —
Reduce(&, ...)incompute_contrast()gets an explicitinit = rep(TRUE, nrow(data))so an emptypreds_listfalls through cleanly.
2026-04-15 — Documentation overhaul: vignettes, math rendering, and knit_print
A pass over every package vignette plus a small package-level addition so that causatr_result objects render as readable HTML tables inside Quarto / knitr documents.
New package feature: knitr-aware causatr_result rendering
R/knit_print.Rdefines aknit_print.causatr_result()S3 method that renders acausat() |> contrast()result as a metadata header plus two tinytable HTML tables (intervention means + contrasts) when the result object is evaluated inside a knitr chunk. The method degrades gracefully to the existingprint.causatr_result()if tinytable or knitr is unavailable.R/zzz.Rconditionally registers the method via.onLoad()(requireNamespace("knitr", quietly = TRUE)), so the dependency on knitr stays Suggests-only.DESCRIPTIONaddstinytableandlitedowntoSuggests(the latter is needed bytinytable::format_tt(markdown = TRUE), which the variance-theory vignette uses for math-in-cells rendering).
Vignette fixes (no behavioural changes to the engine)
longitudinal.qmd— Table 20.1 null-effect section rewritten. The previous explanation claimed “individual effects of A_0 and A_1 are both zero” and that naive regression gives non-zero estimates for both. In fact (i) the true marginal counterfactual means are all equal to 60, not the conditional effects, (ii)lm(Y ~ A0 + A1 + L1)givesA1 = 0exactly (in the DGP Y does not depend on A1), and- the bias shows up in
A0only, as the conditional-within-L1 effect of \(-8\). The section now shows both naive strategies (unadjusted and L1-adjusted) failing, demonstrating the null paradox as stated in Hernán & Robins Chapter 20.
- the bias shows up in
by = "sex"examples fixed across three vignettes.gcomp.qmdnow refits with an explicitqsmk:sexterm inconfounders, so the stratified estimates actually move (Male \(\approx\) 3.61, Female \(\approx\) 3.44).ipw.qmdandmatching.qmdreplace theby = "sex"code chunks with a Phase 8 callout explaining that IPW and matching wrap a saturatedY ~ AMSM and soby = "sex"is a no-op until the unified effect-modification API lands (PHASE_6_INTERACTIONS.md).
Unified “Summary of covered combinations” tables. gcomp, ipw, matching, and longitudinal now share one 8-column schema (Treatment | Outcome | Intervention | Estimand | Contrast | Inference | Weights | Status) with a common ✅ / 🟡 / ⛔ legend and a pointer to
FEATURE_COVERAGE_MATRIX.md.code-fold: show+code-tools: truein every vignette YAML. Combined with the newknit_printhook, readers can fold the R code of any example chunk and still see a nicely formatted result table.
variance-theory.qmd — roadmap + verified numerics
- New Section 0 “Roadmap” walks through Sections 1 \(\to\) 6 as a single narrative and introduces the running example (a small logistic GLM) used by every verification block below.
- Eight new Math / Code / Result tabset panels, each of which instantiates a key equation on the running example and checks it against either
sandwich::sandwich(),sandwich::vcovCL(), orcausatr::variance_if():- Sec 1.4 — Sandwich variance of \(\hat{\beta}\) vs
sandwich::sandwich(fit)(agreement ≲ 1e-8). - Sec 2.3 — \((1/n^2) \sum IF_i IF_i^\top\) equals the sandwich to machine precision.
- Sec 2.4 — IF for a sample mean reproduces the textbook \(s/\sqrt n\) up to the \((n-1)/n\) d.f. correction.
- Sec 3.2 — Two-channel IF for g-comp \(\hat\mu_1\) matches causatr’s sandwich SE to $$1e-9.
- Sec 3.3 — Delta-method decomposition: shows numerically that \(J V_\beta J^\top\) equals \(\text{Ch.2}^2\), and that dropping \(\text{Ch.1}^2\) + cross term underestimates the SE by about 3% on this example.
- Sec 4.2 — Horvitz-Thompson IPW \(\hat\mu_1\) equals causatr’s IPW point estimate, with SE from the propensity-corrected adjusted score.
- Sec 4.3 — Cluster-robust sandwich: per-cluster IFs summed before squaring reproduce
sandwich::vcovCL(..., cadjust = FALSE)on a toy subclass assignment. - Sec 5.5 — ICE chained IF on a 2-period linear-Gaussian DGP recovers the structural truth \(0.8 + 0.6 = 1.4\) within the sandwich CI.
- Sec 1.4 — Sandwich variance of \(\hat{\beta}\) vs
validation.qmd — tinytable comparisons
compare_row()now returns a tinytable with a third row of absolute differences between causatr and each reference implementation (stdReg2, WeightIt, marginaleffects, lmtp). Agreement — or the lack of it — is readable without having to compare 6 decimal places by eye.
2026-04-17 — Breaking: causat() / causat_survival() argument method renamed to estimator
causat(method = ...) shadowed WeightIt::weightit(method = ...) and MatchIt::matchit(method = ...), making it impossible to forward those packages’ own method selector through .... The causatr argument has been renamed to estimator; extra ... arguments now flow cleanly into WeightIt / MatchIt. This is a hard rename with no deprecation shim — update call sites from method = "gcomp" | "ipw" | "matching" to estimator = "...", and from fit$method / result$method / diag$method to fit$estimator / result$estimator / diag$estimator. The glance.causatr_result() output column is renamed in the same way.
After the rename, forwarding e.g. a CBPS propensity model is a one-liner:
fit <- causat(data, outcome = "Y", treatment = "A", confounders = ~ L1 + L2,
estimator = "ipw", method = "cbps")See vignette("ipw") for a worked example. ci_method is a separate concept and is unchanged.
2026-04-14 — Critical review sweep (R1–R9 + S1–S9)
Systematic audit and hardening of the whole package in a single pass. All fixes below are covered by existing or newly-added tests; the FEATURE_COVERAGE_MATRIX.md now carries a CI-enforced round-trip check so future drift between the matrix and the test files fails loudly.
Required changes (correctness / silent-failure fixes)
contrast()ratio / odds-ratio guard (R2). The log-scale delta method was silently returningNaNCIs whenmu_hathad mixed signs (legal for Gaussian outcomes, fatal forlog(R)).contrast(type = "ratio")now aborts upfront if any intervention mean is non-positive, andtype = "or"additionally aborts if any mean is \(\geq 1\). Error messages point totype = "difference"or a binomial / poisson / gamma refit.variance_bootstrap()warning suppression narrowed (R3). The refit closure previously wrapped the whole pipeline insuppressWarnings(), hiding GLM non-convergence, bread-singular warnings, and everything else. Replaced with awithCallingHandlers()that demotes only the three expected noisy warnings (fitted probabilities numerically 0 or 1,X'WX is singular,Fewer (control|treated) units). Genuine failures now surface throughtryCatch’s error path and show up as failed replicates inboot_info.- Shared match-weight alignment helper (R4). Extracted
combine_match_and_external_weights()(inR/matching.R) so bothfit_matching()andrefit_matching()share the rowname-invariant check. A future change toas.data.frame.data.table’s rowname behavior now fails loudly in exactly one place instead of silently producing misaligned weights in the bootstrap. dynamic()docstring corrected (R5). The previous wording claimed the rule received data “subset to the current time point”; in reality it is called once per intervention with the full counterfactual panel. Updated the roxygen prose and the example to show how to branch on the time column or a lag column.- ICE all-NA column drop is no longer silent (R6).
ice_build_formula()still dropslag*_columns at step 0 (a structural requirement), but now emits arlang::inform()when a user-supplied column is dropped at a step because it happened to be all NA (e.g. a time-varying confounder first measured at time > 0). Gated via.frequency = "regularly"so the message does not spam long pipelines. boot_info$n_fail_by_int(R7).process_boot_results()now records per-intervention NA counts in addition to the whole-row totals. This lets users see whether bootstrap failures cluster on a single intervention (e.g. a rarestatic()level triggering separation) versus being spread across the whole replicate. The field is surfaced inboot_infoand tested intest-s3-methods.R.causat_survival()wide-format check (R8). The previousmax(rows_per_id$N) == 1Ltest only caught uniformly wide data; a mixed frame with some single-row ids and some multi-row ids would slip through. Now usesany(rows_per_id$N == 1L)and additionally asserts thattimevaries within each id (no duplicated rows at the same time value).FEATURE_COVERAGE_MATRIX.md↔︎tests/round-trip check (R9). Newtests/testthat/test-coverage-matrix.Rparses the matrix, extracts everytest-*.Rreference, and asserts both that the referenced file exists and that every test file on disk is referenced somewhere in the matrix. Guards against the exact kind of drift the rest of this sweep was cleaning up.
Suggestions (hardening / clarity)
fit_gcomp_point()resolvesfamilyviaresolve_family()(S1) so custommodel_fnimplementations that don’t auto-coerce a bare string see a fully-evaluated family object. Matches the behavior already infit_ipw()andfit_matching().- Multivariate sub-intervention name validation (S2).
apply_intervention()now assertssetequal(names(iv), treatment)before iterating the sub-interventions. A typo likelist(X = static(1), A2 = static(0))fortreatment = c("A1", "A2")now aborts with a clear message instead of silently creating a spuriousXcolumn via data.table. ipsi()constructor informs about its Phase-4 status (S3). The constructor succeeds (so users can build their interventions list) but surfaces a one-per-sessionrlang::inform()pointing to the knowncontrast()abort, so users hit the wall at construction rather than deep in a pipeline. Use.frequency_id = "causatr_ipsi_dead_end"to silence or grep for it.check_estimand_trt_compat()error message (S4) now explicitly says “binary point treatments coded as 0/1” and suggests recoding. Factor- or1/2-coded treatments previously got the same message without any hint at the coding issue.- Tier 2 numeric variance warning is now classed (S5).
variance_if_numeric()’s delta-shortcut warning carriesclass = "causatr_tier2_fallback"so batch pipelines and CI can grep for it viawithCallingHandlers(). compute_weight_summary()binary split pulls levels from WeightIt (S6) instead of hard-codingtrt == 0/trt == 1. Factor-coded binary treatments and integer codings likec(1, 2)now get correctly-populated “treated” / “control” rows.contrast()subset evaluation preserves lexical scope (S7). The threeeval(subset, envir = as.list(data))call sites were copying every column ofdatainto a fresh list and losing the caller’s enclosing environment. Switched toeval(subset, envir = data, enclos = parent.frame()), which skips the copy and letssubsetexpressions reference session variables likequote(age > cutoff).- FEATURE_COVERAGE_MATRIX survival section split (S8). Replaced the mixed “fit-only smoke (Phase 7 contrast pending)” rows with an explicit three-table layout (fit path / contrast path / rejection path) so the Phase 7 gap is visible without status annotations that mean different things on different rows.
.pseudo_ycolumn collision guard in ICE (S9).fit_ice()now aborts upfront ifdataalready has a column named.pseudo_y, which is the internal bookkeeping column ICE writes predicted pseudo-outcomes into during the backward iteration.
Doc drift — A:modifier behaviour (R1)
CLAUDE.md, NEWS.md (this file’s older entries), PHASE_6_INTERACTIONS.md, and CAUSATR_SCAFFOLD.md all still described IPW and matching as “silently ignoring” A:modifier interaction terms in confounders. That code path was hardened months ago: fit_ipw() and fit_matching() now call check_confounders_no_treatment() and abort upfront with a Phase-8 pointer. Every doc that mis-described the old behavior has been updated to match the current behavior.
Vignette expansion
Every method vignette (excluding variance-theory.qmd) now covers every combination supported by FEATURE_COVERAGE_MATRIX.md:
gcomp.qmd— added worked examples for categorical (k > 2) treatment, multivariate (joint) treatment, external survey weights, and a Poisson count outcome with a log-link ratio contrast. Summary table now records the family / weights columns.ipw.qmd— added categorical and continuous (GPS) treatment sections, an external-survey-weights section, a parallel-bootstrap section, and a non-Mparts WeightIt fallback example. Updated the Channel-2 technical note to reference the currentprepare_propensity_if()/apply_propensity_correction()architecture (was still referring to the renamedcorrect_propensity()).matching.qmd— added an ATT + external survey weights section demonstratingcombine_match_and_external_weights(), and extended the summary table with a categorical / continuous rejection row.longitudinal.qmd— added continuous-treatment + MTP section (shift / scale_by / threshold), multivariate time-varying treatment, external weights (with a note pointing to the ICE cascade-gradient correction in §5.4 of the theory vignette), and a censoring / IPCW-style example.survival.qmd— replaced the 3-line TODO stub with a full narrative of the current fit-only Phase 7 status, the intended end-to-end contrast algorithm, and the tracking pointers.triangulation.qmd— introduction.qmd and the comparison table both corrected for the drift above (matching is binary-only; continuous / categorical are not “Limited”).
2026-04-14 — Sweep and audit
- ICE sandwich fix under non-uniform external weights. The cascade gradient at step k > 0 was dropping the
w_{k-1}factor from \(A_{k-1,k} = E[\partial s_{k-1} / \partial \beta_k]\), silently underestimating the SE by ~2x on multi-period DGPs. Verified againstlmtp::lmtp_tmleand a 300-replicate Monte-Carlo. (Followed by a separate fix for the earlier regression where the ICE backward loop dropped external weights entirely.) - ICE double-counting fix for
shift/scale_by/threshold.ice_apply_intervention_long()was recomputing lag columns from the intervened treatment, double-counting the shift through both the lag path and the current-A prediction path. Now matcheslmtpon the canonical linear-Gaussian DGP. - New
FEATURE_COVERAGE_MATRIX.md. Single source of truth for what’s tested and at what fidelity (truth / smoke / none / rejected). New test-writing rule inCLAUDE.md: every feature change MUST update the matrix in the same PR. - Coverage sweep closing most residual gaps:
- gcomp × quasibinomial, Gamma(log), probit, cloglog, Poisson × bootstrap, categorical (3-level) treatment
- IPW × categorical (3-level), continuous, survey weights, ATT bootstrap, non-Mparts WeightIt method at contrast time
- matching × Poisson, Gamma(log), ATC bootstrap, categorical rejection (MatchIt is binary-only; verified against upstream docs)
- ICE ×
scale_by,threshold,shiftbootstrap, bootstrap × survey weights, binomial × ratio/OR,by(sex)stratification, external-weights truth test - Survival × censored fit (smoke; contrasts still Phase 7)
- End-to-end Tier 2 numeric variance via a custom
model_fn vcov_from_if(cluster = ...)unit test vs hand-computed referenceconfint()level monotonicity on both sandwich and bootstrap
- Defensive guards across
causat(),causat_survival(), IPW, matching, ICE, diagnose. Front-door validation for NA / Inf / negative / mis-sized / non-numericweights; rejection of duplicated or empty intervention names; explicit longitudinal rejection indiagnose()(previously crashed inside cobalt); categorical treatment rejection inmatchingwith a clearer causatr-side message. ~ sex + A:sexno longer triggers a false-positive “Baseline confounder ‘A’ varies within some individuals” warning. Treatment columns are now excluded fromwarn_confounder_variation().- altdoc sidebar fix. Single-item sections (“Getting started”, “Theory”) were silently dropped from the deployed site because
yaml::yaml.load()collapses a one-element scalar sequence into a scalar that Quarto’s sidebar can’t parse. Both sections now use the explicittext:/file:form, making thevariance-theoryvignette visible on the website. vignettes/variance-theory.qmdSection 5.4 now derives the ICE cascade gradient with the explicitw_{k-1}factor and the pseudo-code shows theprior.weightslookup.PHASE_6_INTERACTIONS.mddesign doc. Plans a unified effect-modification API across gcomp / IPW / matching / ICE. IPW and matching currently hardcode a saturatedY ~ AMSM and hard-abort on anyA:modifierterm inconfoundersviacheck_confounders_no_treatment()(rather than silently dropping it and returning a pooled ATE); ICE handles current-period A × modifier but not lagged A × modifier. Phase 8 will replace both the abort and the lag-gap with a proper MSM builder.
2026-04-13 — Variance engine unification
- Single influence-function engine. Replaced four method-specific variance paths with one
variance_if()dispatcher that routes tovariance_if_gcomp()/variance_if_ipw()/variance_if_matching()/variance_if_ice(). All four shareprepare_model_if()/apply_model_correction()primitives andvcov_from_if()aggregator. - Prep / apply split.
prepare_model_if()computes the gradient-independent half of Channel 2 once per model;apply_model_correction()finishes it per intervention. Saves an O(k) bread solve whencontrast()is called with many interventions. - Matching uses
cluster = subclassin the aggregator (cluster-robust IF-level sum-then-square).
2026-04-12 — Large refactor + feature pass
- Point-treatment sandwich now uses the full influence function (previous version was effectively a delta-method shortcut and could underestimate the SE on non-saturated MSMs).
variance_if()as single entry point. Introducescorrect_model()andcorrect_propensity()as the two per-model correction kernels. Newvignettes/variance-theory.qmdderives the two-channel IF from first principles.- Log-scale CIs for
ratioandorcontrasts. Respects the (0, ∞) support and gives better coverage than Wald bounds. static()accepts character values for categorical treatments (e.g.static("never")).- Analytical Jacobian for GLMs in sandwich variance (previously used numerical differencing).
- External weights moved to
fit$details$weights(never a column indata) so every method consumes them uniformly. - New
test-complex-dgp.Rcovering nonlinear confounding, GAMs, splines, and multi-confounder designs. - ICE IF scaling fix for subset / by-stratified targets, plus a
confint()ordering fix and a vectorized IF loop for speed.
2026-04-11 — Bootstrap percentile CIs
- Bootstrap path stores raw replicates in
object$boot_tandconfint()recomputes percentile CIs from them perlevel. The sandwich path continues to reconstruct Wald CIs from the stored SE. - External weights now propagate through every method (gcomp, IPW, matching, ICE); previously ICE dropped them in the backward loop.
2026-04-10 — Polish and hardening
by-stratificationvcovfix — stratum-specific vcov matrices now land inres$vcov[[level]]as a list instead of being collapsed to the pooled matrix.familynow threads through IPW and matching (previously silently forced Gaussian on the MSM regardless of the user’sfamilyargument).scale()\(\to\)scale_by()rename to avoid colliding with the base Rscale().by + estimandbug fix — previously the ATT/ATC restriction and the by-level restriction fought over the target population and returned the wrong per-stratum estimate.typeparameter oncausat()for explicit point / longitudinal selection; autodetects fromid+timeotherwise.
2026-04-09 — Effect modification, S3 polish, broom integration
- New
byargument oncontrast()for stratified (by-level) effect estimates. Computes separate target means and vcov per stratum;confint()andtidy()respect the stratification. - broom integration via
generics—tidy.causatr_result()andglance.causatr_result()return the expected columns. - Forest plots via the
forrestpackage forcausatr_resultobjects. - S3 method polish across
print,summary,plot,coef,vcov,confint,tidy,glance.
2026-04-02 / 2026-04-03 — Longitudinal scaffolding + R CMD check
- ICE sandwich variance with scaling fix; first batch of simulation-based ICE tests.
- Clean R CMD check: man pages regenerated, MASS / knitr declared, Windows vignette skip to work around a toolchain issue.
2026-04-01 — Phase 2–3 implementation
causat(estimator = "gcomp" / "ipw" / "matching")fully working, with:- parametric g-formula via a user-supplied
model_fn - IPW via
WeightIt::weightit()andWeightIt::glm_weightit() - matching via
MatchIt::matchit()+ weighted MSM
- parametric g-formula via a user-supplied
diagnose()for all three methods: positivity, balance (viacobalt), weight summaries, match quality, Love plots.- First vignettes:
gcomp.qmd,ipw.qmd,matching.qmd,triangulation.qmd.
2026-03-30 / 2026-03-31 — Phase 1 scaffolding
- Initial package structure:
DESCRIPTION,NAMESPACE,R/layout, testthat setup, dev tooling, GitHub Actions CI. - Core API stubs:
causat(),contrast(),diagnose(), intervention constructors (static,shift,scale_by,threshold,dynamic,ipsi), S3 classes. - NHEFS dataset bundled; input-validation helpers in
checks.R.