IPW density-ratio weights and variance theory

This vignette develops the density-ratio weight formulas behind causatr’s self-contained IPW engine and shows how they feed into the sandwich variance estimator. It is a companion to vignette("variance-theory"), which derives the general stacked M-estimation framework for IPW in Section 4.2. Here we zoom into the pieces that Section 4.2 treats as black boxes: the weight formulas themselves, the make_weight_fn() closure that the variance engine differentiates through, and the numerical cross-derivative $A_{\beta\alpha}$ that propagates propensity-score uncertainty into the final standard error.

For practical usage examples see vignette("ipw") and vignette("interventions").

1. Density-ratio weights: the general idea

IPW reweights the observed sample so that, under the reweighted distribution, confounders are balanced and the weighted mean of $Y$ estimates the counterfactual mean $E[Y^d]$ under intervention $d$. The key object is the density-ratio weight:

\[ w_i = \frac{f_d(A_i^{\text{obs}} \mid L_i)}{f(A_i^{\text{obs}} \mid L_i)}, \]

where $f(\cdot \mid L)$ is the conditional treatment density under the observed data and $f_d(\cdot \mid L)$ is the density of the treatment under intervention $d$. The nature of $f_d$ depends on the intervention type:

For point-mass interventions (static, dynamic) the intervened distribution is a Dirac, and the weight collapses to a Horvitz–Thompson indicator.
For smooth MTPs (shift, scale) the intervened distribution is a pushforward of $f$ under a diffeomorphism, and the weight is a smooth density ratio.
For IPSI (incremental propensity-score interventions) the weight has a closed-form expression that bypasses density evaluation entirely.

The rest of this section works through each case.

1.1 Horvitz–Thompson indicator form (static, dynamic)

When the intervention sets each individual’s treatment to a deterministic value $a^* = d(A_i, L_i)$, the intervened “density” is a point mass $\delta_{a^*}$. The Radon–Nikodym derivative of a point mass w.r.t. a density $f$ is:

\[ \frac{d\delta_{a^*}}{df}(A_i^{\text{obs}}) = \frac{\mathbb{1}\{A_i = a^*\}}{f(a^* \mid L_i)}. \]

For binary treatment with static(1), this becomes the familiar $w_i = A_i / e(L_i)$ where $e(L_i) = P(A_i=1 \mid L_i)$ is the propensity score.

Estimand-specific numerator (ATE / ATT / ATC)

The ATE targets the whole population; ATT and ATC target subpopulations. To recover $E[Y^a \mid A = A^*]$ from observed data, the standard Bayes-rule rewrite (Imbens 2004; Hernán & Robins Ch. 12) gives a per-individual numerator $f^*_i$:

\[ w_i = \frac{\mathbb{1}\{A_i = a\} \cdot f^*_i}{f(A_i \mid L_i)}, \]

where:

Estimand	Target subpopulation	$f^*_i$
ATE	Whole population	$1$
ATT	The treated ($A^* = 1$)	$p(L_i)$
ATC	The controls ($A^* = 0$)	$1 - p(L_i)$

This single formula reproduces all three per-arm weight schemes in one branch. For the degenerate arm (e.g. ATT with static(1) on the $A=1$ rows), $f^* = p$ and $f(1 \mid L) = p$ cancel, giving $w_i = A_i$ — the direct sample functional $E[Y \mid A=1]$ with no propensity uncertainty, as expected.

causatr enforces ATT/ATC only for static() on binary 0/1 treatment. All other interventions are ATE-only; non-ATE requests are rejected by check_estimand_intervention_compat().

Categorical treatment

For a categorical treatment with $K$ levels, the same HT form applies: $w_i = \mathbb{1}\{A_i = a^*\} / P(A_i = a^* \mid L_i)$, where the multinomial probabilities come from a nnet::multinom model. Only ATE is supported (ATT/ATC are ambiguous when there are more than two treatment levels).

1.2 Smooth pushforward (shift, scale_by)

When the intervention is a smooth, invertible map $d: a \mapsto d(a, l)$ on a continuous treatment, the intervened density is the pushforward of $f$ under $d$. By the change-of-variables formula:

\[ f_d(y \mid l) = f\bigl(d^{-1}(y, l) \mid l\bigr) \cdot \lvert\det Dd^{-1}(y, l)\rvert. \]

Evaluating at $y = A_i^{\text{obs}}$ gives the weight:

\[ w_i = \frac{f\bigl(d^{-1}(A_i, L_i) \mid L_i\bigr) \cdot |\text{Jac}|}{f(A_i \mid L_i)}. \tag{1}\]

For the two MTPs causatr supports:

Intervention	Map $d(a)$	Inverse $d^{-1}(y)$	Jacobian $\lvert\det Dd^{-1}\rvert$
`shift(delta)`	$a + \delta$	$y - \delta$	$1$
`scale_by(c)`	$c \cdot a$	$y / c$	$1 / \lvert c \rvert$

The pushforward sign trap

A common mistake is to compute $f(d(A_i) \mid L_i) / f(A_i \mid L_i)$ — i.e. evaluating the fitted density at the intervened treatment value. This is wrong: the correct formula evaluates at $d^{-1}(A_i^{\text{obs}})$, not $d(A_i^{\text{obs}})$.

For shift(delta), the difference is a sign flip: evaluating at $A_i + \delta$ gives $E[Y^{\text{shift}(-\delta)}]$ instead of $E[Y^{\text{shift}(\delta)}]$. The mistake is easy to write and hard to catch without a truth-based simulation test.

In compute_density_ratio_weights():

# shift: d(a) = a + delta, d^{-1}(y) = y - delta, |Jac| = 1.
# f_d(A_obs | l) = f(A_obs - delta | l).
a_eval <- a_obs - delta                       # <-- d^{-1}, not d
f_int  <- evaluate_density(tm, a_eval, data)
w      <- f_int / f_obs

For scale_by(c):

# d(a) = c*a, d^{-1}(y) = y/c, |Jac| = 1/|c|.
a_eval <- a_obs / fct                         # <-- d^{-1}, not d
f_int  <- evaluate_density(tm, a_eval, data)
w      <- (f_int / f_obs) / abs(fct)

Why `threshold()` and `dynamic()` are rejected on continuous treatment

threshold(lo, hi) clamps $A$ to $[\text{lo}, \text{hi}]$. The pushforward of a continuous density under a boundary clamp is a mixed measure: continuous on $(\text{lo}, \text{hi})$ plus point masses at the boundaries (from all the probability mass outside the interval that gets piled up at the clamp points). A mixed measure has no Lebesgue density, so the density ratio in Equation 1 is not defined. Users should switch to estimator = "gcomp", where the clamped-treatment counterfactual is handled cleanly via predict-then-average on the outcome model.

Similarly, dynamic(rule) on a continuous treatment defines a Dirac $\delta_{d(L_i)}$ per individual — a degenerate “density” with zero Lebesgue density. The weight is zero almost surely. Users should rewrite the rule as a smooth shift() or scale_by(), or switch to gcomp.

1.3 IPSI closed form (Kennedy 2019)

Incremental propensity-score interventions (Kennedy 2019) act on the propensity score rather than on the treatment value. The intervention multiplies the odds of treatment by a factor $\delta > 0$:

\[ \text{odds}_d(L_i) = \delta \cdot \text{odds}(L_i) \quad\Rightarrow\quad p_d(L_i) = \frac{\delta \, p(L_i)}{1 - p(L_i) + \delta \, p(L_i)}. \]

Because the intervention is defined in terms of the propensity score rather than a counterfactual treatment value, there is no pushforward density to evaluate. The density-ratio weight simplifies to a closed form:

\[ w_i = \frac{\delta \, A_i + (1 - A_i)}{\delta \, p_i + (1 - p_i)}, \tag{2}\]

where $p_i = P(A_i = 1 \mid L_i)$ is the fitted propensity score.

This formula has several appealing properties:

No density evaluation. Unlike shift() / scale_by(), IPSI never evaluates $f(a \mid L)$ at a counterfactual treatment value. The weight is a direct function of the propensity.
Finite weights. For finite $\delta > 0$, the denominator $\delta \, p + (1-p)$ is bounded between $\min(1, \delta)$ and $\max(1, \delta)$, so the weight is always finite. No positivity guard needed.
Smooth in $\delta$. The dose–response curve $\delta \mapsto E[Y^{d_\delta}]$ is smooth, making it a natural candidate for a marginal structural model across a grid of $\delta$ values.

In causatr:

ipsi_weight_formula <- function(a_obs, p, delta) {
  (delta * a_obs + (1 - a_obs)) / (delta * p + (1 - p))
}

2. The `make_weight_fn()` closure

The sandwich variance estimator needs the cross-derivative $A_{\beta\alpha} = -\partial\bar\psi_\beta/\partial\alpha$, which captures how the MSM score changes when the propensity parameters shift. causatr computes this numerically via numDeriv::jacobian(). The numerical Jacobian needs a function $\alpha \mapsto w_i(\alpha)$ that recomputes the weight vector under a candidate propensity parameter.

make_weight_fn() builds this closure. It captures everything it needs by value at creation time:

Captured quantity	Description	Varies under $\alpha$ perturbation?
`X_prop`	Propensity design matrix ($n \times q$)	No — matrix is fixed
`a_obs`	Observed treatment values	No
`a_int` / `a_eval`	Intervened or inverse-intervened values	No — intervention is fixed
`sigma`	Residual SD (continuous treatments)	No — fixed at fit time
`delta`	Odds multiplier (IPSI only)	No
`family` tag	Dispatches to the right formula	No

The only quantity that varies is $\alpha$ itself. Inside the closure, predictions are recomputed from $\alpha$ via a single matrix multiply X_prop %*% alpha, avoiding formula parsing on every numDeriv step.

Closure branches

IPSI:

function(alpha) {
  eta <- as.numeric(X_prop %*% alpha)
  p   <- plogis(eta)
  ipsi_weight_formula(a_obs, p, delta)
}

HT indicator (Bernoulli):

function(alpha) {
  eta   <- as.numeric(X_prop %*% alpha)
  p     <- plogis(eta)
  f_obs <- ifelse(a_obs == 1, p, 1 - p)
  ind * f_star_fn(p) / f_obs
}

Here ind (the HT indicator $\mathbb{1}\{A_i = a^*\}$) and f_star_fn (the estimand-specific Bayes numerator) are captured at creation time. For ATE, f_star_fn(p) = 1; for ATT, f_star_fn(p) = p; for ATC, f_star_fn(p) = 1 - p. Baking the ATT/ATC numerator into the closure ensures that numDeriv::jacobian() picks up the propensity-dependence of the numerator correctly.

HT indicator (categorical/multinomial):

function(alpha) {
  alpha_mat <- matrix(alpha, nrow = K-1, ncol = p, byrow = TRUE)
  eta       <- X_prop %*% t(alpha_mat)
  exp_eta   <- exp(eta)
  denom     <- 1 + rowSums(exp_eta)
  prob_mat  <- cbind(1/denom, exp_eta/denom)     # softmax
  f_obs     <- prob_mat[cbind(1:n, col_idx)]
  ind / f_obs                                    # ATE only
}

The flattened $\alpha$ vector encodes $(K-1) \times p$ coefficients (row-major). Each numDeriv step perturbs one entry; the closure reconstructs the coefficient matrix, applies the softmax, and extracts the per-individual probability at the observed treatment level.

Smooth pushforward (Gaussian):

function(alpha) {
  mu    <- as.numeric(X_prop %*% alpha)
  f_obs <- dnorm(a_obs, mean = mu, sd = sigma)
  f_eval <- dnorm(a_eval, mean = mu, sd = sigma)
  (f_eval / f_obs) * jac_abs
}

Here a_eval ($= d^{-1}(A_i^{\text{obs}})$) and jac_abs are precomputed once. The residual SD sigma is held fixed under $\alpha$ perturbation. Treating $\sigma$ as an additional nuisance parameter would require a joint $(\mu, \sigma)$ M-estimation setup — deferred.

Why sigma is fixed

The conditional treatment density is $A_i \mid L_i \sim N(\mu_i, \sigma^2)$ where $\mu_i = L_i^T\alpha$. Under maximum likelihood the score for $\alpha$ and the score for $\sigma$ are orthogonal (information matrix is block-diagonal in $(\alpha, \sigma)$). Fixing $\sigma$ at its MLE and treating only $\alpha$ as the propensity nuisance is equivalent to profiling out $\sigma$ — the resulting sandwich is asymptotically correct for $\hat\mu_a$ up to an $O(n^{-1})$ term from the profiled $\sigma$ uncertainty (Liang & Zeger 1986). In practice the effect on the SE is negligible because $\sigma$ enters the weight ratio only through the Normal pdf, where it cancels to leading order.

3. The cross-derivative $A_{\beta\alpha}$

vignette("variance-theory") Section 4.2 derives the full IF for $\hat\mu_a$ under IPW as a three-term sum (Channel 1 + MSM correction + propensity correction). The propensity correction chain is:

\[ \text{propensity correction}_i = J \, A_{\beta\beta}^{-1} \, A_{\beta\alpha} \, A_{\alpha\alpha}^{-1} \, \psi_{\alpha,i}, \]

where $A_{\beta\alpha}$ is the cross-derivative of the weighted MSM score w.r.t. the propensity parameters:

\[ A_{\beta\alpha} = -\frac{1}{n} \sum_{i=1}^{n} \frac{\partial \psi_{\beta,i}}{\partial \alpha^T} = -\frac{1}{n} \sum_{i=1}^{n} X_i \, \frac{\partial w_i}{\partial \alpha^T} \, \frac{(Y_i - \mu_i)\,\mu'(\eta_i)}{V(\mu_i)}. \]

The weight derivative $\partial w_i / \partial\alpha^T$ depends on the intervention type (HT indicator, smooth pushforward, IPSI — each is a different function of $\alpha$). Rather than deriving a closed-form formula for each case, causatr evaluates $A_{\beta\alpha}$ numerically via numDeriv::jacobian() on a function that computes the average weighted MSM score as a function of $\alpha$:

\[ \bar\psi_\beta(\alpha) = \frac{1}{n} \sum_{i=1}^{n} X_i \, w_i(\alpha) \, \frac{(Y_i - \hat\mu_i)\,\mu'(\hat\eta_i)}{V(\hat\mu_i)}. \]

Everything except $w_i(\alpha)$ is held fixed at $\hat\beta$. The Jacobian of $\bar\psi_\beta$ w.r.t. $\alpha$ is $+\partial\bar\psi_\beta/\partial\alpha$, so the negative-Hessian convention gives:

\[ A_{\beta\alpha} = -\texttt{numDeriv::jacobian}(\bar\psi_\beta, \hat\alpha). \]

phi_bar <- function(alpha) {
  w_alpha <- weight_fn(alpha)
  s_per_i <- w_alpha * mu_eta * r_fit / var_mu
  as.numeric(crossprod(X_msm, s_per_i)) / n_fit
}
A_beta_alpha <- -numDeriv::jacobian(phi_bar, x = alpha_hat)

The weight_fn here is the closure from Section 2, which encodes the intervention-specific weight formula. By building the closure once and handing it to numDeriv::jacobian(), we get a single code path for the cross-derivative across all intervention types.

Numerical verification

causatr’s test suite (test-ipw-cross-derivative.R) verifies the numerical $A_{\beta\alpha}$ against hand-derived analytic formulas for the static binary case, where the closed form is available:

\[ (A_{\beta\alpha})_{j,k} = -\frac{1}{n}\sum_{i=1}^{n} X_{i,j}^{\text{msm}} \cdot \frac{(Y_i - \hat\mu_i)\,\mu'(\hat\eta_i)}{V(\hat\mu_i)} \cdot \frac{\partial w_i}{\partial \alpha_k}. \]

For static(1) with Bernoulli treatment ($w_i = A_i / p_i$):

\[ \frac{\partial w_i}{\partial \alpha_k} = -\frac{A_i}{p_i^2} \cdot p_i(1-p_i) \cdot X_{i,k}^{\text{prop}} = -\frac{A_i(1-p_i)}{p_i} \cdot X_{i,k}^{\text{prop}}. \]

For shift(delta) with Gaussian treatment ($w_i = \phi(a_i - \delta; \mu_i, \sigma) / \phi(a_i; \mu_i, \sigma)$):

\[ \frac{\partial w_i}{\partial \alpha_k} = w_i \cdot \frac{\delta}{\sigma^2} \cdot X_{i,k}^{\text{prop}}. \]

For IPSI with Bernoulli treatment:

\[ \frac{\partial w_i}{\partial \alpha_k} = \frac{(\delta A_i + 1 - A_i) \cdot (-1)(\delta - 1) \cdot p_i(1-p_i)} {(\delta p_i + 1 - p_i)^2} \cdot X_{i,k}^{\text{prop}}. \]

The test asserts agreement at $\sim 10^{-6}$ tolerance across all five weight branches (static(1), static(0), shift, IPSI, natural course).

Code

library(causatr)

set.seed(42)
n <- 200
L <- rnorm(n)
A <- rbinom(n, 1, plogis(0.5 * L))
Y <- 2 + 3 * A + L + rnorm(n)
dat <- data.table::data.table(Y = Y, A = A, L = L)

fit <- causat(dat, outcome = "Y", treatment = "A", confounders = ~ L,
              estimator = "ipw",
              model_fn = stats::glm, propensity_model_fn = stats::glm)
res <- contrast(fit, interventions = list(a1 = static(1), a0 = static(0)),
                reference = "a0", ci_method = "sandwich")

tm <- fit$details$treatment_model
p <- as.numeric(stats::predict(tm$model, type = "response"))

# Hand-derived dw/dalpha for static(1): -A*(1-p)/p * X_prop
X_prop <- tm$X_prop
dw_dalpha_hand <- sweep(X_prop, 1, -A * (1 - p) / p, "*")

# Numerical dw/dalpha via the make_weight_fn closure
wfn <- causatr:::make_weight_fn(tm, dat, static(1))
dw_dalpha_num <- numDeriv::jacobian(wfn, x = tm$alpha_hat)

max_diff <- max(abs(dw_dalpha_hand - dw_dalpha_num))

#> Max |hand - numerical| for dw/dalpha under static(1): 8.41e-10

4. Assembly: the full IF for $\hat\mu_a$ under IPW

compute_ipw_if_self_contained_one() assembles the per-individual influence function for one intervention. The three channels, following the derivation in vignette("variance-theory"), are:

\[ \text{IF}_i(\hat\mu_a) = \underbrace{g^{-1}(X_i^{*T}\hat\beta) - \hat\mu_a}_{\text{Channel 1}} + \underbrace{J \, A_{\beta\beta}^{-1} \, \psi_{\beta,i}}_{\text{MSM correction}} - \underbrace{J \, A_{\beta\beta}^{-1} \, A_{\beta\alpha} \, A_{\alpha\alpha}^{-1} \, \psi_{\alpha,i}}_{\text{propensity correction}}. \]

Note the minus sign on the propensity correction. This comes from block inversion of the lower-triangular bread:

\[ A^{-1} = \begin{pmatrix} A_{\alpha\alpha}^{-1} & 0 \\ -A_{\beta\beta}^{-1} A_{\beta\alpha} A_{\alpha\alpha}^{-1} & A_{\beta\beta}^{-1} \end{pmatrix}. \]

Sign convention

vignette("variance-theory") writes the IF with a plus sign on the propensity correction because it uses Wooldridge’s convention $A_{\beta\alpha}^W = +(1/n)\sum \partial\psi_\beta/\partial\alpha$, whereas the code uses the negative-Hessian convention $A_{\beta\alpha} = -(1/n)\sum \partial\psi_\beta/\partial\alpha$. The two differ by a sign on $A_{\beta\alpha}$, which flips the composition sign. The numerical result is the same.

Implementation walkthrough

MSM correction. prepare_model_if(msm_model) + apply_model_correction(msm_prep, J) gives $A_{\beta\beta}^{-1}\psi_{\beta,i}$ projected onto the gradient $J$. This is the same primitive used by g-comp and matching — one code path for all estimators.
Cross-derivative. numDeriv::jacobian(phi_bar, alpha_hat) gives $A_{\beta\alpha}$ numerically, using the weight_fn closure from Section 2. The per-step cost is $O(n \cdot q)$ for a $q$-parameter propensity model — one matrix multiply per perturbation.
Propensity correction. Form the gradient $g_{\text{prop}} = A_{\beta\alpha}^T \cdot h_{\text{msm}}$ where $h_{\text{msm}} = n \cdot A_{\beta\beta}^{-1} J$ (the $n$-scaling accounts for bread_inv() returning the raw $(X^TWX)^{-1}$). Then apply_model_correction(prop_prep, g_prop) gives $g_{\text{prop}}^T A_{\alpha\alpha}^{-1} \psi_{\alpha,i}$ per individual — the propensity channel.
Assembly. Ch1_i + msm_res$correction - prop_res$correction.

When does the propensity correction matter?

For static binary ATE, the propensity correction reduces the SE (Lunceford & Davidian 2004; Hirano et al. 2003). Using estimated rather than known propensity scores is more efficient, so the naive sandwich (which ignores propensity uncertainty) is conservative.

For non-static interventions the picture is richer:

Shift: the per-intervention SE typically drops 5–8% from the propensity correction (the negative cross-term from the stacked M-estimation outweighs the added propensity-uncertainty term).
IPSI: the per-intervention effect on the SE is small, but the contrast SE can drop dramatically ($$90%) because the shared propensity model induces large positive covariance between the IPSI and natural-course marginal means. When you subtract two marginal means that share the same propensity noise, the noise largely cancels.

Both effects are verified in test-ipw-variance-regression.R, which asserts that the full IF-based SE materially differs from the $J V_\beta J^T$ shortcut (which omits the propensity correction).

5. Intervention × family compatibility matrix

Not every combination of intervention type and treatment family is valid under density-ratio IPW. The compatibility matrix, enforced by check_intervention_family_compat(), is:

Intervention	Bernoulli	Gaussian	Categorical
`static()`	$\checkmark$ HT	$\times$	$\checkmark$ HT
`shift()`	$\times$	$\checkmark$ pushforward	$\times$
`scale_by()`	$\times$	$\checkmark$ pushforward	$\times$
`threshold()`	$\times$	$\times$	$\times$
`dynamic()`	$\checkmark$ HT	$\times$	$\checkmark$ HT
`ipsi()`	$\checkmark$ Kennedy	$\times$	$\times$

Each $\times$ has a specific reason rooted in the density-ratio framework:

static(v) on continuous: the HT indicator $\mathbb{1}\{A_i = v\}$ is zero almost surely for a continuous treatment.
shift() / scale_by() on discrete: factors have no additive or multiplicative structure.
threshold() on continuous: pushforward is a mixed measure (point masses at the clamp boundaries), so there is no Lebesgue density for the ratio.
dynamic() on continuous: a deterministic per-individual rule is a Dirac per individual. Same pushforward degeneracy as threshold().
ipsi() on non-binary: Kennedy’s (2019) closed-form formula
1. is Bernoulli-specific.

All rejections point the user to the right alternative (typically estimator = "gcomp" or a smooth MTP constructor).

References

Hernán MA, Robins JM (2025). Causal Inference: What If. Chapman & Hall/CRC. Ch. 12 (IP weighting).
Hirano K, Imbens GW, Ridder G (2003). Efficient estimation of average treatment effects using the estimated propensity score. Econometrica 71:1161–1189.
Imbens GW (2004). Nonparametric estimation of average treatment effects under exogeneity: a review. Review of Economics and Statistics 86:4–29.
Kennedy EH (2019). Nonparametric causal effects based on incremental propensity score interventions. JASA 114:645–656.
Lunceford JK, Davidian M (2004). Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Statistics in Medicine 23:2937–2960.
Díaz I, Williams N, Hoffman KL, Schenck EJ (2023). Non-parametric causal effects based on longitudinal modified treatment policies. JASA 118:846–857.
Haneuse S, Rotnitzky A (2013). Estimation of the effect of interventions that modify the received treatment. Statistics in Medicine 32:5260–5277.

--- title: "IPW density-ratio weights and variance theory" toc-depth: 4 code-fold: show code-tools: true vignette: > %\VignetteIndexEntry{IPW density-ratio weights and variance theory} %\VignetteEngine{quarto::html} %\VignetteEncoding{UTF-8} --- ```{=html} <style> mjx-container[jax="CHTML"][display="true"] { overflow-x: auto; overflow-y: visible; padding: 0.5em 0; } </style> ``` ```{r} #| include: false knitr::opts_chunk$set(collapse = TRUE, comment = "#>") options(tinytable_html_mathjax = TRUE) ``` This vignette develops the density-ratio weight formulas behind causatr's self-contained IPW engine and shows how they feed into the sandwich variance estimator. It is a companion to `vignette("variance-theory")`, which derives the general stacked M-estimation framework for IPW in Section 4.2. Here we zoom into the pieces that Section 4.2 treats as black boxes: the weight formulas themselves, the `make_weight_fn()` closure that the variance engine differentiates through, and the numerical cross-derivative $A_{\beta\alpha}$ that propagates propensity-score uncertainty into the final standard error. For practical usage examples see `vignette("ipw")` and `vignette("interventions")`. ## 1. Density-ratio weights: the general idea IPW reweights the observed sample so that, under the reweighted distribution, confounders are balanced and the weighted mean of $Y$ estimates the counterfactual mean $E[Y^d]$ under intervention $d$. The key object is the **density-ratio weight**: $$ w_i = \frac{f_d(A_i^{\text{obs}} \mid L_i)}{f(A_i^{\text{obs}} \mid L_i)}, $$ where $f(\cdot \mid L)$ is the conditional treatment density under the observed data and $f_d(\cdot \mid L)$ is the density of the treatment **under intervention $d$**. The nature of $f_d$ depends on the intervention type: - For **point-mass interventions** (static, dynamic) the intervened distribution is a Dirac, and the weight collapses to a Horvitz--Thompson indicator. - For **smooth MTPs** (shift, scale) the intervened distribution is a pushforward of $f$ under a diffeomorphism, and the weight is a smooth density ratio. - For **IPSI** (incremental propensity-score interventions) the weight has a closed-form expression that bypasses density evaluation entirely. The rest of this section works through each case. ### 1.1 Horvitz--Thompson indicator form (static, dynamic) When the intervention sets each individual's treatment to a deterministic value $a^* = d(A_i, L_i)$, the intervened "density" is a point mass $\delta_{a^*}$. The Radon--Nikodym derivative of a point mass w.r.t. a density $f$ is: $$ \frac{d\delta_{a^*}}{df}(A_i^{\text{obs}}) = \frac{\mathbb{1}\{A_i = a^*\}}{f(a^* \mid L_i)}. $$ For binary treatment with `static(1)`, this becomes the familiar $w_i = A_i / e(L_i)$ where $e(L_i) = P(A_i=1 \mid L_i)$ is the propensity score. #### Estimand-specific numerator (ATE / ATT / ATC) The ATE targets the whole population; ATT and ATC target subpopulations. To recover $E[Y^a \mid A = A^*]$ from observed data, the standard Bayes-rule rewrite (Imbens 2004; Hernán & Robins Ch. 12) gives a per-individual numerator $f^*_i$: $$ w_i = \frac{\mathbb{1}\{A_i = a\} \cdot f^*_i}{f(A_i \mid L_i)}, $$ where: | Estimand | Target subpopulation | $f^*_i$ | |---|---|---| | ATE | Whole population | $1$ | | ATT | The treated ($A^* = 1$) | $p(L_i)$ | | ATC | The controls ($A^* = 0$) | $1 - p(L_i)$ | This single formula reproduces all three per-arm weight schemes in one branch. For the degenerate arm (e.g. ATT with `static(1)` on the $A=1$ rows), $f^* = p$ and $f(1 \mid L) = p$ cancel, giving $w_i = A_i$ --- the direct sample functional $E[Y \mid A=1]$ with no propensity uncertainty, as expected. causatr enforces ATT/ATC only for `static()` on binary 0/1 treatment. All other interventions are ATE-only; non-ATE requests are rejected by `check_estimand_intervention_compat()`. ::: {.callout-note} ### Categorical treatment For a categorical treatment with $K$ levels, the same HT form applies: $w_i = \mathbb{1}\{A_i = a^*\} / P(A_i = a^* \mid L_i)$, where the multinomial probabilities come from a `nnet::multinom` model. Only ATE is supported (ATT/ATC are ambiguous when there are more than two treatment levels). ::: ### 1.2 Smooth pushforward (shift, scale_by) {#sec-pushforward} When the intervention is a smooth, invertible map $d: a \mapsto d(a, l)$ on a continuous treatment, the intervened density is the **pushforward** of $f$ under $d$. By the change-of-variables formula: $$ f_d(y \mid l) = f\bigl(d^{-1}(y, l) \mid l\bigr) \cdot \lvert\det Dd^{-1}(y, l)\rvert. $$ Evaluating at $y = A_i^{\text{obs}}$ gives the weight: $$ w_i = \frac{f\bigl(d^{-1}(A_i, L_i) \mid L_i\bigr) \cdot |\text{Jac}|}{f(A_i \mid L_i)}. $$ {#eq-pushforward} For the two MTPs causatr supports: | Intervention | Map $d(a)$ | Inverse $d^{-1}(y)$ | Jacobian $\lvert\det Dd^{-1}\rvert$ | |---|---|---|---| | `shift(delta)` | $a + \delta$ | $y - \delta$ | $1$ | | `scale_by(c)` | $c \cdot a$ | $y / c$ | $1 / \lvert c \rvert$ | ::: {.callout-warning} ### The pushforward sign trap A common mistake is to compute $f(d(A_i) \mid L_i) / f(A_i \mid L_i)$ --- i.e. evaluating the fitted density at the **intervened** treatment value. This is wrong: the correct formula evaluates at $d^{-1}(A_i^{\text{obs}})$, not $d(A_i^{\text{obs}})$. For `shift(delta)`, the difference is a sign flip: evaluating at $A_i + \delta$ gives $E[Y^{\text{shift}(-\delta)}]$ instead of $E[Y^{\text{shift}(\delta)}]$. The mistake is easy to write and hard to catch without a truth-based simulation test. In `compute_density_ratio_weights()`: ```r # shift: d(a) = a + delta, d^{-1}(y) = y - delta, |Jac| = 1. # f_d(A_obs | l) = f(A_obs - delta | l). a_eval <- a_obs - delta # <-- d^{-1}, not d f_int <- evaluate_density(tm, a_eval, data) w <- f_int / f_obs ``` For `scale_by(c)`: ```r # d(a) = c*a, d^{-1}(y) = y/c, |Jac| = 1/|c|. a_eval <- a_obs / fct # <-- d^{-1}, not d f_int <- evaluate_density(tm, a_eval, data) w <- (f_int / f_obs) / abs(fct) ``` ::: #### Why `threshold()` and `dynamic()` are rejected on continuous treatment `threshold(lo, hi)` clamps $A$ to $[\text{lo}, \text{hi}]$. The pushforward of a continuous density under a boundary clamp is a **mixed measure**: continuous on $(\text{lo}, \text{hi})$ plus point masses at the boundaries (from all the probability mass outside the interval that gets piled up at the clamp points). A mixed measure has no Lebesgue density, so the density ratio in @eq-pushforward is not defined. Users should switch to `estimator = "gcomp"`, where the clamped-treatment counterfactual is handled cleanly via predict-then-average on the outcome model. Similarly, `dynamic(rule)` on a continuous treatment defines a Dirac $\delta_{d(L_i)}$ per individual --- a degenerate "density" with zero Lebesgue density. The weight is zero almost surely. Users should rewrite the rule as a smooth `shift()` or `scale_by()`, or switch to `gcomp`. ### 1.3 IPSI closed form (Kennedy 2019) {#sec-ipsi} Incremental propensity-score interventions (Kennedy 2019) act on the **propensity score** rather than on the treatment value. The intervention multiplies the odds of treatment by a factor $\delta > 0$: $$ \text{odds}_d(L_i) = \delta \cdot \text{odds}(L_i) \quad\Rightarrow\quad p_d(L_i) = \frac{\delta \, p(L_i)}{1 - p(L_i) + \delta \, p(L_i)}. $$ Because the intervention is defined in terms of the propensity score rather than a counterfactual treatment value, there is no pushforward density to evaluate. The density-ratio weight simplifies to a closed form: $$ w_i = \frac{\delta \, A_i + (1 - A_i)}{\delta \, p_i + (1 - p_i)}, $$ {#eq-ipsi-weight} where $p_i = P(A_i = 1 \mid L_i)$ is the fitted propensity score. This formula has several appealing properties: - **No density evaluation.** Unlike `shift()` / `scale_by()`, IPSI never evaluates $f(a \mid L)$ at a counterfactual treatment value. The weight is a direct function of the propensity. - **Finite weights.** For finite $\delta > 0$, the denominator $\delta \, p + (1-p)$ is bounded between $\min(1, \delta)$ and $\max(1, \delta)$, so the weight is always finite. No positivity guard needed. - **Smooth in $\delta$.** The dose--response curve $\delta \mapsto E[Y^{d_\delta}]$ is smooth, making it a natural candidate for a marginal structural model across a grid of $\delta$ values. In causatr: ```r ipsi_weight_formula <- function(a_obs, p, delta) { (delta * a_obs + (1 - a_obs)) / (delta * p + (1 - p)) } ``` ## 2. The `make_weight_fn()` closure {#sec-closure} The sandwich variance estimator needs the cross-derivative $A_{\beta\alpha} = -\partial\bar\psi_\beta/\partial\alpha$, which captures how the MSM score changes when the propensity parameters shift. causatr computes this numerically via `numDeriv::jacobian()`. The numerical Jacobian needs a function $\alpha \mapsto w_i(\alpha)$ that recomputes the weight vector under a candidate propensity parameter. `make_weight_fn()` builds this closure. It captures everything it needs **by value** at creation time: | Captured quantity | Description | Varies under $\alpha$ perturbation? | |---|---|---| | `X_prop` | Propensity design matrix ($n \times q$) | No --- matrix is fixed | | `a_obs` | Observed treatment values | No | | `a_int` / `a_eval` | Intervened or inverse-intervened values | No --- intervention is fixed | | `sigma` | Residual SD (continuous treatments) | No --- fixed at fit time | | `delta` | Odds multiplier (IPSI only) | No | | `family` tag | Dispatches to the right formula | No | The only quantity that varies is $\alpha$ itself. Inside the closure, predictions are recomputed from $\alpha$ via a single matrix multiply `X_prop %*% alpha`, avoiding formula parsing on every `numDeriv` step. ### Closure branches **IPSI:** ```r function(alpha) { eta <- as.numeric(X_prop %*% alpha) p <- plogis(eta) ipsi_weight_formula(a_obs, p, delta) } ``` **HT indicator (Bernoulli):** ```r function(alpha) { eta <- as.numeric(X_prop %*% alpha) p <- plogis(eta) f_obs <- ifelse(a_obs == 1, p, 1 - p) ind * f_star_fn(p) / f_obs } ``` Here `ind` (the HT indicator $\mathbb{1}\{A_i = a^*\}$) and `f_star_fn` (the estimand-specific Bayes numerator) are captured at creation time. For ATE, `f_star_fn(p) = 1`; for ATT, `f_star_fn(p) = p`; for ATC, `f_star_fn(p) = 1 - p`. Baking the ATT/ATC numerator into the closure ensures that `numDeriv::jacobian()` picks up the propensity-dependence of the numerator correctly. **HT indicator (categorical/multinomial):** ```r function(alpha) { alpha_mat <- matrix(alpha, nrow = K-1, ncol = p, byrow = TRUE) eta <- X_prop %*% t(alpha_mat) exp_eta <- exp(eta) denom <- 1 + rowSums(exp_eta) prob_mat <- cbind(1/denom, exp_eta/denom) # softmax f_obs <- prob_mat[cbind(1:n, col_idx)] ind / f_obs # ATE only } ``` The flattened $\alpha$ vector encodes $(K-1) \times p$ coefficients (row-major). Each `numDeriv` step perturbs one entry; the closure reconstructs the coefficient matrix, applies the softmax, and extracts the per-individual probability at the observed treatment level. **Smooth pushforward (Gaussian):** ```r function(alpha) { mu <- as.numeric(X_prop %*% alpha) f_obs <- dnorm(a_obs, mean = mu, sd = sigma) f_eval <- dnorm(a_eval, mean = mu, sd = sigma) (f_eval / f_obs) * jac_abs } ``` Here `a_eval` ($= d^{-1}(A_i^{\text{obs}})$) and `jac_abs` are precomputed once. The residual SD `sigma` is held fixed under $\alpha$ perturbation. Treating $\sigma$ as an additional nuisance parameter would require a joint $(\mu, \sigma)$ M-estimation setup --- deferred. ::: {.callout-note} ### Why `sigma` is fixed The conditional treatment density is $A_i \mid L_i \sim N(\mu_i, \sigma^2)$ where $\mu_i = L_i^T\alpha$. Under maximum likelihood the score for $\alpha$ and the score for $\sigma$ are orthogonal (information matrix is block-diagonal in $(\alpha, \sigma)$). Fixing $\sigma$ at its MLE and treating only $\alpha$ as the propensity nuisance is equivalent to profiling out $\sigma$ --- the resulting sandwich is asymptotically correct for $\hat\mu_a$ up to an $O(n^{-1})$ term from the profiled $\sigma$ uncertainty (Liang & Zeger 1986). In practice the effect on the SE is negligible because $\sigma$ enters the weight ratio only through the Normal pdf, where it cancels to leading order. ::: ## 3. The cross-derivative $A_{\beta\alpha}$ {#sec-cross} `vignette("variance-theory")` Section 4.2 derives the full IF for $\hat\mu_a$ under IPW as a three-term sum (Channel 1 + MSM correction + propensity correction). The propensity correction chain is: $$ \text{propensity correction}_i = J \, A_{\beta\beta}^{-1} \, A_{\beta\alpha} \, A_{\alpha\alpha}^{-1} \, \psi_{\alpha,i}, $$ where $A_{\beta\alpha}$ is the cross-derivative of the weighted MSM score w.r.t. the propensity parameters: $$ A_{\beta\alpha} = -\frac{1}{n} \sum_{i=1}^{n} \frac{\partial \psi_{\beta,i}}{\partial \alpha^T} = -\frac{1}{n} \sum_{i=1}^{n} X_i \, \frac{\partial w_i}{\partial \alpha^T} \, \frac{(Y_i - \mu_i)\,\mu'(\eta_i)}{V(\mu_i)}. $$ The weight derivative $\partial w_i / \partial\alpha^T$ depends on the intervention type (HT indicator, smooth pushforward, IPSI --- each is a different function of $\alpha$). Rather than deriving a closed-form formula for each case, causatr evaluates $A_{\beta\alpha}$ numerically via `numDeriv::jacobian()` on a function that computes the average weighted MSM score as a function of $\alpha$: $$ \bar\psi_\beta(\alpha) = \frac{1}{n} \sum_{i=1}^{n} X_i \, w_i(\alpha) \, \frac{(Y_i - \hat\mu_i)\,\mu'(\hat\eta_i)}{V(\hat\mu_i)}. $$ Everything except $w_i(\alpha)$ is held fixed at $\hat\beta$. The Jacobian of $\bar\psi_\beta$ w.r.t. $\alpha$ is $+\partial\bar\psi_\beta/\partial\alpha$, so the negative-Hessian convention gives: $$ A_{\beta\alpha} = -\texttt{numDeriv::jacobian}(\bar\psi_\beta, \hat\alpha). $$ ```r phi_bar <- function(alpha) { w_alpha <- weight_fn(alpha) s_per_i <- w_alpha * mu_eta * r_fit / var_mu as.numeric(crossprod(X_msm, s_per_i)) / n_fit } A_beta_alpha <- -numDeriv::jacobian(phi_bar, x = alpha_hat) ``` The `weight_fn` here is the closure from @sec-closure, which encodes the intervention-specific weight formula. By building the closure once and handing it to `numDeriv::jacobian()`, we get a single code path for the cross-derivative across all intervention types. ### Numerical verification {#sec-verify-cross} causatr's test suite (`test-ipw-cross-derivative.R`) verifies the numerical $A_{\beta\alpha}$ against hand-derived analytic formulas for the static binary case, where the closed form is available: $$ (A_{\beta\alpha})_{j,k} = -\frac{1}{n}\sum_{i=1}^{n} X_{i,j}^{\text{msm}} \cdot \frac{(Y_i - \hat\mu_i)\,\mu'(\hat\eta_i)}{V(\hat\mu_i)} \cdot \frac{\partial w_i}{\partial \alpha_k}. $$ For `static(1)` with Bernoulli treatment ($w_i = A_i / p_i$): $$ \frac{\partial w_i}{\partial \alpha_k} = -\frac{A_i}{p_i^2} \cdot p_i(1-p_i) \cdot X_{i,k}^{\text{prop}} = -\frac{A_i(1-p_i)}{p_i} \cdot X_{i,k}^{\text{prop}}. $$ For `shift(delta)` with Gaussian treatment ($w_i = \phi(a_i - \delta; \mu_i, \sigma) / \phi(a_i; \mu_i, \sigma)$): $$ \frac{\partial w_i}{\partial \alpha_k} = w_i \cdot \frac{\delta}{\sigma^2} \cdot X_{i,k}^{\text{prop}}. $$ For IPSI with Bernoulli treatment: $$ \frac{\partial w_i}{\partial \alpha_k} = \frac{(\delta A_i + 1 - A_i) \cdot (-1)(\delta - 1) \cdot p_i(1-p_i)} {(\delta p_i + 1 - p_i)^2} \cdot X_{i,k}^{\text{prop}}. $$ The test asserts agreement at $\sim 10^{-6}$ tolerance across all five weight branches (static(1), static(0), shift, IPSI, natural course). ::: {.panel-tabset} ## Code ```{r} #| label: verify-cross-deriv library(causatr) set.seed(42) n <- 200 L <- rnorm(n) A <- rbinom(n, 1, plogis(0.5 * L)) Y <- 2 + 3 * A + L + rnorm(n) dat <- data.table::data.table(Y = Y, A = A, L = L) fit <- causat(dat, outcome = "Y", treatment = "A", confounders = ~ L, estimator = "ipw", model_fn = stats::glm, propensity_model_fn = stats::glm) res <- contrast(fit, interventions = list(a1 = static(1), a0 = static(0)), reference = "a0", ci_method = "sandwich") tm <- fit$details$treatment_model p <- as.numeric(stats::predict(tm$model, type = "response")) # Hand-derived dw/dalpha for static(1): -A*(1-p)/p * X_prop X_prop <- tm$X_prop dw_dalpha_hand <- sweep(X_prop, 1, -A * (1 - p) / p, "*") # Numerical dw/dalpha via the make_weight_fn closure wfn <- causatr:::make_weight_fn(tm, dat, static(1)) dw_dalpha_num <- numDeriv::jacobian(wfn, x = tm$alpha_hat) max_diff <- max(abs(dw_dalpha_hand - dw_dalpha_num)) ``` ## Result ```{r} #| label: verify-cross-deriv-result #| echo: false cat(sprintf( "Max |hand - numerical| for dw/dalpha under static(1): %.2e\n", max_diff )) ``` ::: ## 4. Assembly: the full IF for $\hat\mu_a$ under IPW {#sec-assembly} `compute_ipw_if_self_contained_one()` assembles the per-individual influence function for one intervention. The three channels, following the derivation in `vignette("variance-theory")`, are: $$ \text{IF}_i(\hat\mu_a) = \underbrace{g^{-1}(X_i^{*T}\hat\beta) - \hat\mu_a}_{\text{Channel 1}} + \underbrace{J \, A_{\beta\beta}^{-1} \, \psi_{\beta,i}}_{\text{MSM correction}} - \underbrace{J \, A_{\beta\beta}^{-1} \, A_{\beta\alpha} \, A_{\alpha\alpha}^{-1} \, \psi_{\alpha,i}}_{\text{propensity correction}}. $$ Note the **minus** sign on the propensity correction. This comes from block inversion of the lower-triangular bread: $$ A^{-1} = \begin{pmatrix} A_{\alpha\alpha}^{-1} & 0 \\ -A_{\beta\beta}^{-1} A_{\beta\alpha} A_{\alpha\alpha}^{-1} & A_{\beta\beta}^{-1} \end{pmatrix}. $$ ::: {.callout-note} ### Sign convention `vignette("variance-theory")` writes the IF with a **plus** sign on the propensity correction because it uses Wooldridge's convention $A_{\beta\alpha}^W = +(1/n)\sum \partial\psi_\beta/\partial\alpha$, whereas the code uses the negative-Hessian convention $A_{\beta\alpha} = -(1/n)\sum \partial\psi_\beta/\partial\alpha$. The two differ by a sign on $A_{\beta\alpha}$, which flips the composition sign. The numerical result is the same. ::: ### Implementation walkthrough 1. **MSM correction.** `prepare_model_if(msm_model)` + `apply_model_correction(msm_prep, J)` gives $A_{\beta\beta}^{-1}\psi_{\beta,i}$ projected onto the gradient $J$. This is the same primitive used by g-comp and matching --- one code path for all estimators. 2. **Cross-derivative.** `numDeriv::jacobian(phi_bar, alpha_hat)` gives $A_{\beta\alpha}$ numerically, using the `weight_fn` closure from @sec-closure. The per-step cost is $O(n \cdot q)$ for a $q$-parameter propensity model --- one matrix multiply per perturbation. 3. **Propensity correction.** Form the gradient $g_{\text{prop}} = A_{\beta\alpha}^T \cdot h_{\text{msm}}$ where $h_{\text{msm}} = n \cdot A_{\beta\beta}^{-1} J$ (the $n$-scaling accounts for `bread_inv()` returning the raw $(X^TWX)^{-1}$). Then `apply_model_correction(prop_prep, g_prop)` gives $g_{\text{prop}}^T A_{\alpha\alpha}^{-1} \psi_{\alpha,i}$ per individual --- the propensity channel. 4. **Assembly.** `Ch1_i + msm_res$correction - prop_res$correction`. ### When does the propensity correction matter? For static binary ATE, the propensity correction **reduces** the SE (Lunceford & Davidian 2004; Hirano et al. 2003). Using estimated rather than known propensity scores is more efficient, so the naive sandwich (which ignores propensity uncertainty) is conservative. For non-static interventions the picture is richer: - **Shift:** the per-intervention SE typically drops 5--8% from the propensity correction (the negative cross-term from the stacked M-estimation outweighs the added propensity-uncertainty term). - **IPSI:** the per-intervention effect on the SE is small, but the **contrast** SE can drop dramatically ($\sim$90%) because the shared propensity model induces large positive covariance between the IPSI and natural-course marginal means. When you subtract two marginal means that share the same propensity noise, the noise largely cancels. Both effects are verified in `test-ipw-variance-regression.R`, which asserts that the full IF-based SE materially differs from the $J V_\beta J^T$ shortcut (which omits the propensity correction). ## 5. Intervention × family compatibility matrix Not every combination of intervention type and treatment family is valid under density-ratio IPW. The compatibility matrix, enforced by `check_intervention_family_compat()`, is: | Intervention | Bernoulli | Gaussian | Categorical | |---|---|---|---| | `static()` | $\checkmark$ HT | $\times$ | $\checkmark$ HT | | `shift()` | $\times$ | $\checkmark$ pushforward | $\times$ | | `scale_by()` | $\times$ | $\checkmark$ pushforward | $\times$ | | `threshold()` | $\times$ | $\times$ | $\times$ | | `dynamic()` | $\checkmark$ HT | $\times$ | $\checkmark$ HT | | `ipsi()` | $\checkmark$ Kennedy | $\times$ | $\times$ | Each $\times$ has a specific reason rooted in the density-ratio framework: - **`static(v)` on continuous**: the HT indicator $\mathbb{1}\{A_i = v\}$ is zero almost surely for a continuous treatment. - **`shift()` / `scale_by()` on discrete**: factors have no additive or multiplicative structure. - **`threshold()` on continuous**: pushforward is a mixed measure (point masses at the clamp boundaries), so there is no Lebesgue density for the ratio. - **`dynamic()` on continuous**: a deterministic per-individual rule is a Dirac per individual. Same pushforward degeneracy as `threshold()`. - **`ipsi()` on non-binary**: Kennedy's (2019) closed-form formula (@eq-ipsi-weight) is Bernoulli-specific. All rejections point the user to the right alternative (typically `estimator = "gcomp"` or a smooth MTP constructor). ## References - Hernán MA, Robins JM (2025). *Causal Inference: What If.* Chapman & Hall/CRC. Ch. 12 (IP weighting). - Hirano K, Imbens GW, Ridder G (2003). Efficient estimation of average treatment effects using the estimated propensity score. *Econometrica* 71:1161--1189. - Imbens GW (2004). Nonparametric estimation of average treatment effects under exogeneity: a review. *Review of Economics and Statistics* 86:4--29. - Kennedy EH (2019). Nonparametric causal effects based on incremental propensity score interventions. *JASA* 114:645--656. - Lunceford JK, Davidian M (2004). Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. *Statistics in Medicine* 23:2937--2960. - Díaz I, Williams N, Hoffman KL, Schenck EJ (2023). Non-parametric causal effects based on longitudinal modified treatment policies. *JASA* 118:846--857. - Haneuse S, Rotnitzky A (2013). Estimation of the effect of interventions that modify the received treatment. *Statistics in Medicine* 32:5260--5277.

Intervention	Map \(d(a)\)	Inverse \(d^{-1}(y)\)	Jacobian \(\lvert\det Dd^{-1}\rvert\)
`shift(delta)`	\(a + \delta\)	\(y - \delta\)	\(1\)
`scale_by(c)`	\(c \cdot a\)	\(y / c\)	\(1 / \lvert c \rvert\)

Intervention	Bernoulli	Gaussian	Categorical
`static()`	\(\checkmark\) HT	\(\times\)	\(\checkmark\) HT
`shift()`	\(\times\)	\(\checkmark\) pushforward	\(\times\)
`scale_by()`	\(\times\)	\(\checkmark\) pushforward	\(\times\)
`threshold()`	\(\times\)	\(\times\)	\(\times\)
`dynamic()`	\(\checkmark\) HT	\(\times\)	\(\checkmark\) HT
`ipsi()`	\(\checkmark\) Kennedy	\(\times\)	\(\times\)

IPW density-ratio weights and variance theory

1. Density-ratio weights: the general idea

1.1 Horvitz–Thompson indicator form (static, dynamic)

Estimand-specific numerator (ATE / ATT / ATC)

1.2 Smooth pushforward (shift, scale_by)

Why `threshold()` and `dynamic()` are rejected on continuous treatment

1.3 IPSI closed form (Kennedy 2019)

2. The `make_weight_fn()` closure

Closure branches

3. The cross-derivative \(A_{\beta\alpha}\)

Numerical verification

4. Assembly: the full IF for \(\hat\mu_a\) under IPW

Implementation walkthrough

When does the propensity correction matter?

5. Intervention × family compatibility matrix

References

1. Density-ratio weights: the general idea

1.1 Horvitz–Thompson indicator form (static, dynamic)

Estimand-specific numerator (ATE / ATT / ATC)

1.2 Smooth pushforward (shift, scale_by)

Why threshold() and dynamic() are rejected on continuous treatment

1.3 IPSI closed form (Kennedy 2019)

2. The make_weight_fn() closure

Closure branches

3. The cross-derivative \(A_{\beta\alpha}\)

Numerical verification

4. Assembly: the full IF for \(\hat\mu_a\) under IPW

Implementation walkthrough

When does the propensity correction matter?

5. Intervention × family compatibility matrix

References

Why `threshold()` and `dynamic()` are rejected on continuous treatment

2. The `make_weight_fn()` closure