This function performs the cram method (simultaneous learning and evaluation) on simulation data, for which the data generation process (DGP) is known. The data generation process for X can be given directly as a function or induced by a provided dataset via row-wise bootstrapping. Results are averaged across Monte Carlo replicates for the given DGP.
Usage
cram_simulation(
X = NULL,
dgp_X = NULL,
dgp_D,
dgp_Y,
batch,
nb_simulations,
nb_simulations_truth = NULL,
sample_size,
model_type = "causal_forest",
learner_type = "ridge",
alpha = 0.05,
baseline_policy = NULL,
parallelize_batch = FALSE,
model_params = NULL,
custom_fit = NULL,
custom_predict = NULL,
propensity = NULL
)
Arguments
- X
Optional. A matrix or data frame of covariates for each sample inducing empirically the DGP for covariates.
- dgp_X
Optional. A function to generate covariate data for simulations.
- dgp_D
A vectorized function to generate binary treatment assignments for each sample.
- dgp_Y
A vectorized function to generate the outcome variable for each sample given the treatment and covariates.
- batch
Either an integer specifying the number of batches (which will be created by random sampling) or a vector of length equal to the sample size providing the batch assignment (index) for each individual in the sample.
- nb_simulations
The number of simulations (Monte Carlo replicates) to run.
- nb_simulations_truth
Optional. The number of additional simmulations (Monte Carlo replicates) beyond nb_simulations to use when calculating the true policy value difference (delta) and the true policy value (psi)
- sample_size
The number of samples in each simulation.
- model_type
The model type for policy learning. Options include
"causal_forest"
,"s_learner"
, and"m_learner"
. Default is"causal_forest"
. Note: you can also set model_type to NULL and specify custom_fit and custom_predict to use your custom model.- learner_type
The learner type for the chosen model. Options include
"ridge"
for Ridge Regression,"fnn"
for Feedforward Neural Network and"caret"
for Caret. Default is"ridge"
. if model_type is 'causal_forest', choose NULL, if model_type is 's_learner' or 'm_learner', choose between 'ridge', 'fnn' and 'caret'.- alpha
Significance level for confidence intervals. Default is 0.05 (95% confidence).
- baseline_policy
A list providing the baseline policy (binary 0 or 1) for each sample. If
NULL
, defaults to a list of zeros with the same length as the number of rows inX
.- parallelize_batch
Logical. Whether to parallelize batch processing (i.e. the cram method learns T policies, with T the number of batches. They are learned in parallel when parallelize_batch is TRUE vs. learned sequentially using the efficient data.table structure when parallelize_batch is FALSE, recommended for light weight training). Defaults to
FALSE
.- model_params
A list of additional parameters to pass to the model, which can be any parameter defined in the model reference package. Defaults to
NULL
.- custom_fit
A custom, user-defined, function that outputs a fitted model given training data (allows flexibility). Defaults to
NULL
.- custom_predict
A custom, user-defined, function for making predictions given a fitted model and test data (allow flexibility). Defaults to
NULL
.- propensity
The propensity score model
Value
A list containing:
avg_proportion_treated
The average proportion of treated individuals across simulations.
avg_delta_estimate
The average delta estimate across simulations.
avg_delta_standard_error
The average standard error of delta estimates.
delta_empirical_bias
The empirical bias of delta estimates.
delta_empirical_coverage
The empirical coverage of delta confidence intervals.
avg_policy_value_estimate
The average policy value estimate across simulations.
avg_policy_value_standard_error
The average standard error of policy value estimates.
policy_value_empirical_bias
The empirical bias of policy value estimates.
policy_value_empirical_coverage
The empirical coverage of policy value confidence intervals.
Examples
# \donttest{
set.seed(123)
# dgp_X <- function(n) {
# data.table::data.table(
# binary = rbinom(n, 1, 0.5),
# discrete = sample(1:5, n, replace = TRUE),
# continuous = rnorm(n)
# )
# }
n <- 100
X_data <- data.table::data.table(
binary = rbinom(n, 1, 0.5),
discrete = sample(1:5, n, replace = TRUE),
continuous = rnorm(n)
)
dgp_D <- function(X) rbinom(nrow(X), 1, 0.5)
dgp_Y <- function(D, X) {
theta <- ifelse(
X[, binary] == 1 & X[, discrete] <= 2, # Group 1: High benefit
1,
ifelse(X[, binary] == 0 & X[, discrete] >= 4, # Group 3: Negative benefit
-1,
0.1) # Group 2: Neutral effect
)
Y <- D * (theta + rnorm(length(D), mean = 0, sd = 1)) +
(1 - D) * rnorm(length(D)) # Outcome for untreated
return(Y)
}
# Parameters
nb_simulations <- 100
nb_simulations_truth <- 200
batch <- 5
# Perform CRAM simulation
result <- cram_simulation(
X = X_data,
dgp_D = dgp_D,
dgp_Y = dgp_Y,
batch = batch,
nb_simulations = nb_simulations,
nb_simulations_truth = nb_simulations_truth,
sample_size = 500
)
result$raw_results
#> Metric Value
#> 1 Average Proportion Treated 0.52724
#> 2 Average Delta Estimate 0.22597
#> 3 Average Delta Standard Error 0.10424
#> 4 Delta Empirical Bias 0.01224
#> 5 Delta Empirical Coverage 0.96000
#> 6 Variance Delta Empirical Bias 0.00220
#> 7 Average Policy Value Estimate 0.22580
#> 8 Average Policy Value Standard Error 0.10080
#> 9 Policy Value Empirical Bias 0.01210
#> 10 Policy Value Empirical Coverage 0.92000
#> 11 Variance Policy Value Empirical Bias 0.00071
result$interactive_table
# }