| Type: | Package | 
| Title: | Cross-Fitting for Doubly Robust Evaluation of High-Dimensional Surrogate Markers | 
| Version: | 1.1.2 | 
| Description: | Doubly robust methods for evaluating surrogate markers as outlined in: Agniel D, Hejblum BP, Thiebaut R & Parast L (2022). "Doubly robust evaluation of high-dimensional surrogate markers", Biostatistics <doi:10.1093/biostatistics/kxac020>. You can use these methods to determine how much of the overall treatment effect is explained by a (possibly high-dimensional) set of surrogate markers. | 
| License: | MIT + file LICENSE | 
| Depends: | R (≥ 3.6.0) | 
| Imports: | dplyr, gbm, glmnet, glue, parallel, pbapply, purrr, ranger, RCAL, rlang, SIS, stats, SuperLearner, tibble, tidyr | 
| Encoding: | UTF-8 | 
| RoxygenNote: | 7.3.2 | 
| NeedsCompilation: | no | 
| Packaged: | 2025-04-04 12:54:18 UTC; boris | 
| Author: | Denis Agniel [aut, cre], Boris P. Hejblum [aut], Layla Parast [aut] | 
| Maintainer: | Denis Agniel <dagniel@rand.org> | 
| Repository: | CRAN | 
| Date/Publication: | 2025-04-08 13:50:02 UTC | 
crossurr
Description
The main functions of this package are xf_surrogate and xfr_surrogate
Author(s)
Maintainer: Denis Agniel dagniel@rand.org
Authors:
Boris P. Hejblum boris.hejblum@u-bordeaux.fr
Layla Parast parast@austin.utexas.edu
lasso
Description
lasso
Usage
lasso(
  x = NULL,
  y = NULL,
  data = NULL,
  newX = NULL,
  newX0 = NULL,
  newX1 = NULL,
  relax = TRUE,
  ps_fit = FALSE,
  ...
)
Ordinary Least Squares
Description
Ordinary Least Squares
Usage
ols(
  x = NULL,
  y = NULL,
  data = NULL,
  test_data = NULL,
  test_data0 = NULL,
  test_data1 = NULL,
  ...
)
A simple function to simulate example data.
Description
A simple function to simulate example data.
Usage
sim_data(n, p)
Arguments
n | 
 number of simulated observations  | 
p | 
 number of simulated variables  | 
Value
toy dataset used for demonstrating the methods with outcome y, treatment a, covariates x.1, x.2, and surrogates s.1, s.2, ...
A function for estimating the proportion of treatment effect explained using cross-fitting.
Description
A function for estimating the proportion of treatment effect explained using cross-fitting.
Usage
xf_surrogate(
  ds,
  x = NULL,
  s,
  y,
  a,
  K = 5,
  outcome_learners = NULL,
  ps_learners = outcome_learners,
  interaction_model = TRUE,
  trim_at = 0.05,
  outcome_family = gaussian(),
  mthd = "superlearner",
  n_ptb = 0,
  ncores = parallel::detectCores() - 1,
  ...
)
Arguments
ds | 
 a   | 
x | 
 names of all covariates in   | 
s | 
 names of surrogates in   | 
y | 
 name of the outcome in   | 
a | 
 treatment variable name (eg. groups). Expect a binary variable made of   | 
K | 
 number of folds for cross-fitting. Default is   | 
outcome_learners | 
 string vector indicating learners to be used for estimation of the outcome function (e.g.,   | 
ps_learners | 
 string vector indicating learners to be used for estimation of the propensity score function (e.g.,   | 
interaction_model | 
 logical indicating whether outcome functions for treated and control should be estimated separately. Default is   | 
trim_at | 
 threshold at which to trim propensity scores. Default is   | 
outcome_family | 
 default is   | 
mthd | 
 selected regression method. Default is   | 
n_ptb | 
 Number of perturbations. Default is   | 
ncores | 
 number of CPUs used for parallel computations. Default is   | 
... | 
 additional parameters (in particular for super_learner)  | 
Value
a tibble with columns: 
-  
R: estimate of the proportion of treatment effect explained, equal to 1 -deltahat_s/deltahat. -  
R_sestandard error for the PTE. -  
deltahat_s: residual treatment effect estimate. -  
deltahat_s_se: standard error for the residual treatment effect. -  
pi_o: estimate of the proportion of overlap. -  
R_o: PTE only in the overlap region. -  
R_o_se: the standard error forR_o. -  
deltahat_s_o: residual treatment effect in overlap region, -  
deltahat_s_se_o: standard error fordeltahat_s_o. -  
deltahat: overall treatment effect estimate. -  
deltahat_se: standard error for overall treatment effect estimate. -  
delta_diff: difference between the treatment effects, equal to the numerator of PTE. -  
dd_se: standard error fordelta_diff 
Examples
n <- 300
p <- 50
q <- 2
wds <- sim_data(n = n, p = p)
if(interactive()){
 sl_est <- xf_surrogate(ds = wds,
   x = paste('x.', 1:q, sep =''),
   s = paste('s.', 1:p, sep =''),
   a = 'a',
   y = 'y',
   K = 4,
   trim_at = 0.01,
   mthd = 'superlearner',
   outcome_learners = c("SL.mean","SL.lm", "SL.svm", "SL.ridge"),
   ps_learners = c("SL.mean", "SL.glm", "SL.svm", "SL.lda"),
   ncores = 1)
 lasso_est <- xf_surrogate(ds = wds,
   x = paste('x.', 1:q, sep =''),
   s = paste('s.', 1:p, sep =''),
   a = 'a',
   y = 'y',
   K = 4,
   trim_at = 0.01,
   mthd = 'lasso',
   ncores = 1)
}
Title
Description
Title
Usage
xfit_dr(
  ds,
  x,
  y,
  a,
  K = 5,
  outcome_learners = NULL,
  ps_learners = outcome_learners,
  interaction_model = TRUE,
  trim_at = 0.05,
  outcome_family = gaussian(),
  mthd = "superlearner",
  ncores = parallel::detectCores() - 1,
  ...
)
A function for estimating the proportion of treatment effect explained using repeated cross-fitting.
Description
A function for estimating the proportion of treatment effect explained using repeated cross-fitting.
Usage
xfr_surrogate(
  ds,
  x = NULL,
  s,
  y,
  a,
  splits = 50,
  K = 5,
  outcome_learners = NULL,
  ps_learners = NULL,
  interaction_model = TRUE,
  trim_at = 0.05,
  outcome_family = gaussian(),
  mthd = "superlearner",
  n_ptb = 0,
  ...
)
Arguments
ds | 
 a   | 
x | 
 names of all covariates in   | 
s | 
 names of surrogates in   | 
y | 
 name of the outcome in   | 
a | 
 treatment variable name (eg. groups). Expect a binary variable made of   | 
splits | 
 number of data splits to perform.  | 
K | 
 number of folds for cross-fitting. Default is   | 
outcome_learners | 
 string vector indicating learners to be used for estimation of the outcome function (e.g.,   | 
ps_learners | 
 string vector indicating learners to be used for estimation of the propensity score function (e.g.,   | 
interaction_model | 
 logical indicating whether outcome functions for treated and control should be estimated separately. Default is   | 
trim_at | 
 threshold at which to trim propensity scores. Default is   | 
outcome_family | 
 default is   | 
mthd | 
 selected regression method. Default is   | 
n_ptb | 
 Number of perturbations. Default is   | 
... | 
 additional parameters (in particular for super_learner)  | 
Value
a tibble with columns: 
-  
Rm: estimate of the proportion of treatment effect explained, computed as the median over the repeated splits. -  
R_se0standard error for the PTE, accounting for the variability due to splitting. -  
R_cil0lower confidence interval value for the PTE. -  
R_cih0upper confidence interval value for the PTE. -  
Dm: estimate of the overall treatment effect, computed as the median over the repeated splits. -  
D_se0standard error for the overall treatment effect, accounting for the variability due to splitting. -  
D_cil0lower confidence interval value for the overall treatment effect. -  
D_cih0upper confidence interval value for the overall treatment effect. -  
Dsm: estimate of the residual treatment effect, computed as the median over the repeated splits. -  
Ds_se0standard error for the residual treatment effect, accounting for the variability due to splitting. -  
Ds_cil0lower confidence interval value for the residual treatment effect. -  
Ds_cih0upper confidence interval value for the residual treatment effect. 
Examples
n <- 100
p <- 20
q <- 2
wds <- sim_data(n = n, p = p)
if(interactive()){
 lasso_est <- xfr_surrogate(ds = wds,
   x = paste('x.', 1:q, sep =''),
   s = paste('s.', 1:p, sep =''),
   a = 'a',
   y = 'y',
   splits = 2,
   K = 2,
   trim_at = 0.01,
   mthd = 'lasso',
   ncores = 1)
}