surv_fit – survatr

Fit a causal survival hazard model on person-period data

Description

Fit-only entry point for survatr. Builds the risk set and fits the pooled-logistic discrete-time hazard model logit h(t | A, L) = alpha(t) + beta_A A + beta_L L on the at-risk person-period rows. Survival curves, risk / RMST contrasts, and variance live in contrast() (time-indexed curve-shaped result). This two-step split lets the user fit the hazard model once and cheaply contrast many interventions on top.

Usage

surv_fit(
  data,
  outcome,
  treatment,
  confounders,
  id,
  time,
  censoring = NULL,
  competing = NULL,
  time_formula = ~splines::ns(time, 4),
  weights = NULL,
  estimator = "gcomp",
  model_fn = stats::glm,
  propensity_model_fn = stats::glm,
  trim = NULL,
  confounders_tv = NULL,
  history = Inf,
  ipcw = NULL,
  censoring_model_fn = stats::glm,
  ...
)

Arguments

data A person-period (long) data.frame or data.table, rectangular across ids: every unique id must have one row at every unique time value. Ragged PP (ids dropped post-event / post-censor) is rejected with class survatr_ragged_pp. Pad ragged data before calling surv_fit() by appending rows with outcome = 0 and (if used) censoring = 1 so the risk-set builder drops them from the fit. Wide data (one row per id across a multi-period study) must be reshaped with causatr::to_person_period().

outcome Character scalar. Column name of the event indicator (1 = event at this period, 0 = no event). Must be in data.

treatment Character scalar. Column name of the treatment. For point-treatment g-computation the treatment is constant within id. Under the longitudinal ICE-hazard estimator (estimator = “ice”) the treatment may vary within id and must be numeric (binary, or a numeric dose entered linearly); factor / categorical (k > 2) treatments are rejected with class survatr_ice_treatment_unsupported (a treatment-design formula path ships in a later chunk).

confounders A one-sided formula (e.g. ~ L1 + L2) describing the baseline (time-invariant) covariate adjustment set. Under the longitudinal ICE-hazard estimator (estimator = “ice”), time-varying covariates go in confounders_tv instead; baseline terms here are never lagged.

id Character scalar. Column name of the individual identifier.

time Character scalar. Column name of the discrete period index (integer-valued; sorted within id).

censoring Character scalar or NULL. Column name of the censoring indicator (1 = censored at this period). NA or 0 means uncensored. When NULL, every uncensored period is treated as at-risk until the first event.

competing Character scalar or NULL (the default). When non-NULL, activates the competing-risks path: competing names a multi-valued event-type column (0 = no event this period, 1..J = the cause of the event this period), and surv_fit() fits J parallel cause-specific pooled-logistic hazard models on a shared all-cause risk set. It must name the same column as outcome; competing risks are gcomp / point-treatment only in this release (a non-“gcomp” estimator, fewer than two distinct causes, or outcome != competing aborts with survatr_competing_estimator / survatr_competing_misuse). Cumulative-incidence functions (CIF), CIF contrasts, and all-cause survival come from contrast(). Fine–Gray / subdistribution hazards are out of scope (cause-specific only).

time_formula One-sided formula for the baseline hazard alpha(t). Defaults to ~ splines::ns(time, 4) (4 df natural spline on the time variable). Pass ~ factor(time) for period dummies or ~ 1 for a time-constant hazard.

weights Optional numeric vector of external weights, length nrow(data). When supplied, the hazard model is fit with stats::quasibinomial() rather than stats::binomial() (same score equations, free dispersion – drops the "non-integer #successes" warning). The variance engine in later chunks reads the family from fit$family to pick the right dispersion.

estimator Character scalar. “gcomp” (pooled-logistic standardization), “ipw” (weighted marginal hazard MSM with stabilized density-ratio weights), or “ice” (longitudinal ICE-hazard: iterated-conditional-expectation hazards for a time-varying treatment). Matching is a hard reject with class survatr_matching_rejected pointing to survival::coxph(…, weights = , cluster = ).

model_fn Fitting function for the hazard model. Defaults to stats::glm. Accepts any function matching the stats::glm interface (formula, data, family, weights, …), e.g. mgcv::gam with an s(time) term in time_formula.

propensity_model_fn Fitting function for the treatment model under estimator = “ipw” (ignored otherwise). Same stats::glm-style interface as model_fn. Defaults to stats::glm. The treatment model is fit on the baseline rows (one per id) with confounders as predictors; the hazard MSM then uses A only.

trim Numeric scalar in (0, 1] or NULL (the default). Under estimator = “ipw”, the per-id stabilized treatment weights are winsorized at the trim-th quantile (Cole & Hernán 2008) before broadcast. Under IPCW (ipcw non-NULL), the same quantile is used for the per-period censoring weights (applied separately at each time period, targeting the heaviest late-time tails). NULL / 1 means no truncation. All resolved fixed cutoffs are reused by the sandwich.

confounders_tv A one-sided formula of time-varying confounders for the longitudinal ICE-hazard estimator (estimator = “ice”), lag-expanded at each backward step (e.g. ~ L builds L + lag1_L + …). NULL (the default) means no time-varying confounders. Ignored by point-treatment g-computation (gcomp / ipw).

history Markov lag order for the longitudinal ICE-hazard estimator. Inf (the default) uses the full available history (capped at n_times - 1); an integer restricts the lag structure (e.g. history = 1 for first-order Markov). Ignored by point-treatment g-computation.

ipcw One-sided formula for the censoring-model covariates (e.g. ~ L1 + L2) or NULL (the default, no built-in IPCW). When non-NULL, survatr fits a per-period censoring hazard on the person-period grid and forms stabilized per-period cumulative inverse-probability-of- censoring weights W^C_{i,k}. These are multiplied into the IPW weighted hazard MSM row weight, so the combined row weight is w_i * W^C_{i,k} (treatment weight × censoring weight). Requires estimator = “ipw” and a non-NULL censoring column; activating IPCW with any other estimator or without a censoring column aborts with a classed error. The censoring column (censoring =) switches from a pure row filter to the modelled path: the at-risk row set is unchanged (hazard MSM still fit on uncensored rows), but the IPCW weights reweight survivors to account for informative censoring.

censoring_model_fn Fitting function for the censoring hazard model under ipcw (default stats::glm, same stats::glm-style interface as model_fn). Ignored when ipcw = NULL. Stored in the fit so the bootstrap can refit the censoring model per replicate.

… Forwarded to model_fn. na.action = na.exclude is rejected with class survatr_bad_na_action – na.exclude pads working residuals with NAs while model.matrix() drops them, which silently corrupts the sandwich variance downstream.

Value

An object of class survatr_fit holding the fitted hazard model, the person-period data (internal .survatr_* columns stripped), the time grid, and metadata needed by contrast() and diagnose().

Examples

library("survatr")

# Small rectangular person-period dataset: 30 ids over 4 periods.
set.seed(1)
n_id <- 30L
K <- 4L
pp <- data.frame(
  id = rep(seq_len(n_id), each = K),
  t = rep(seq_len(K), times = n_id),
  A = rep(rbinom(n_id, 1L, 0.5), each = K),
  Y = rbinom(n_id * K, 1L, 0.1)
)

# Pooled-logistic hazard with period dummies for the baseline hazard.
fit <- surv_fit(pp, "Y", "A", ~1, "id", "t", time_formula = ~ factor(t))
fit

<survatr_fit>
  Track:       A
  Estimator:   gcomp
  Family:      binomial
  Outcome:     Y
  Treatment:   A
  ID:          id
  Time:        t
  Censoring:   none
  N:           30 individuals, 120 PP rows (104 at risk)
  Time grid:   [1, 4] (4 unique times)