Cram Policy Simulation

This function performs the cram method (simultaneous learning and evaluation) on simulation data, for which the data generation process (DGP) is known. The data generation process for X can be given directly as a function or induced by a provided dataset via row-wise bootstrapping. Results are averaged across Monte Carlo replicates for the given DGP.

Usage

cram_simulation(
  X = NULL,
  dgp_X = NULL,
  dgp_D,
  dgp_Y,
  batch,
  nb_simulations,
  nb_simulations_truth = NULL,
  sample_size,
  model_type = "causal_forest",
  learner_type = "ridge",
  alpha = 0.05,
  baseline_policy = NULL,
  parallelize_batch = FALSE,
  model_params = NULL,
  custom_fit = NULL,
  custom_predict = NULL,
  propensity = NULL
)

Arguments

X: Optional. A matrix or data frame of covariates for each sample inducing empirically the DGP for covariates.
dgp_X: Optional. A function to generate covariate data for simulations.
dgp_D: A vectorized function to generate binary treatment assignments for each sample.
dgp_Y: A vectorized function to generate the outcome variable for each sample given the treatment and covariates.
batch: Either an integer specifying the number of batches (which will be created by random sampling) or a vector of length equal to the sample size providing the batch assignment (index) for each individual in the sample.
nb_simulations: The number of simulations (Monte Carlo replicates) to run.
nb_simulations_truth: Optional. The number of additional simmulations (Monte Carlo replicates) beyond nb_simulations to use when calculating the true policy value difference (delta) and the true policy value (psi)
sample_size: The number of samples in each simulation.
model_type: The model type for policy learning. Options include "causal_forest", "s_learner", and "m_learner". Default is "causal_forest". Note: you can also set model_type to NULL and specify custom_fit and custom_predict to use your custom model.
learner_type: The learner type for the chosen model. Options include "ridge" for Ridge Regression, "fnn" for Feedforward Neural Network and "caret" for Caret. Default is "ridge". if model_type is 'causal_forest', choose NULL, if model_type is 's_learner' or 'm_learner', choose between 'ridge', 'fnn' and 'caret'.
alpha: Significance level for confidence intervals. Default is 0.05 (95% confidence).
baseline_policy: A list providing the baseline policy (binary 0 or 1) for each sample. If NULL, defaults to a list of zeros with the same length as the number of rows in X.
parallelize_batch: Logical. Whether to parallelize batch processing (i.e. the cram method learns T policies, with T the number of batches. They are learned in parallel when parallelize_batch is TRUE vs. learned sequentially using the efficient data.table structure when parallelize_batch is FALSE, recommended for light weight training). Defaults to FALSE.
model_params: A list of additional parameters to pass to the model, which can be any parameter defined in the model reference package. Defaults to NULL.
custom_fit: A custom, user-defined, function that outputs a fitted model given training data (allows flexibility). Defaults to NULL.
custom_predict: A custom, user-defined, function for making predictions given a fitted model and test data (allow flexibility). Defaults to NULL.
propensity: The propensity score model

Value

A list containing:

avg_proportion_treated: The average proportion of treated individuals across simulations.
avg_delta_estimate: The average delta estimate across simulations.
avg_delta_standard_error: The average standard error of delta estimates.
delta_empirical_bias: The empirical bias of delta estimates.
delta_empirical_coverage: The empirical coverage of delta confidence intervals.
avg_policy_value_estimate: The average policy value estimate across simulations.
avg_policy_value_standard_error: The average standard error of policy value estimates.
policy_value_empirical_bias: The empirical bias of policy value estimates.
policy_value_empirical_coverage: The empirical coverage of policy value confidence intervals.

Examples

# \donttest{
set.seed(123)

# dgp_X <- function(n) {
#   data.table::data.table(
#     binary     = rbinom(n, 1, 0.5),
#     discrete   = sample(1:5, n, replace = TRUE),
#     continuous = rnorm(n)
#   )
# }

n <- 100

X_data <- data.table::data.table(
  binary     = rbinom(n, 1, 0.5),
  discrete   = sample(1:5, n, replace = TRUE),
  continuous = rnorm(n)
)


dgp_D <- function(X) rbinom(nrow(X), 1, 0.5)

dgp_Y <- function(D, X) {
  theta <- ifelse(
    X[, binary] == 1 & X[, discrete] <= 2,  # Group 1: High benefit
    1,
    ifelse(X[, binary] == 0 & X[, discrete] >= 4,  # Group 3: Negative benefit
           -1,
           0.1)  # Group 2: Neutral effect
  )
  Y <- D * (theta + rnorm(length(D), mean = 0, sd = 1)) +
    (1 - D) * rnorm(length(D))  # Outcome for untreated
  return(Y)
}

# Parameters
nb_simulations <- 100
nb_simulations_truth <- 200
batch <- 5

# Perform CRAM simulation
result <- cram_simulation(
  X = X_data,
  dgp_D = dgp_D,
  dgp_Y = dgp_Y,
  batch = batch,
  nb_simulations = nb_simulations,
  nb_simulations_truth = nb_simulations_truth,
  sample_size = 500
)

result$raw_results
#>                                  Metric   Value
#> 1            Average Proportion Treated 0.52724
#> 2                Average Delta Estimate 0.22597
#> 3          Average Delta Standard Error 0.10424
#> 4                  Delta Empirical Bias 0.01224
#> 5              Delta Empirical Coverage 0.96000
#> 6         Variance Delta Empirical Bias 0.00220
#> 7         Average Policy Value Estimate 0.22580
#> 8   Average Policy Value Standard Error 0.10080
#> 9           Policy Value Empirical Bias 0.01210
#> 10      Policy Value Empirical Coverage 0.92000
#> 11 Variance Policy Value Empirical Bias 0.00071
result$interactive_table

# }

Usage

Arguments

Value

See also

Examples