Skip to contents

This function performs the cram method (simultaneous learning and evaluation) on simulation data, for which the data generation process (DGP) is known. The data generation process for X can be given directly as a function or induced by a provided dataset via row-wise bootstrapping. Results are averaged across Monte Carlo replicates for the given DGP.

Usage

cram_simulation(
  X = NULL,
  dgp_X = NULL,
  dgp_D,
  dgp_Y,
  batch,
  nb_simulations,
  nb_simulations_truth = NULL,
  sample_size,
  model_type = "causal_forest",
  learner_type = "ridge",
  alpha = 0.05,
  baseline_policy = NULL,
  parallelize_batch = FALSE,
  model_params = NULL,
  custom_fit = NULL,
  custom_predict = NULL,
  propensity = NULL
)

Arguments

X

Optional. A matrix or data frame of covariates for each sample inducing empirically the DGP for covariates.

dgp_X

Optional. A function to generate covariate data for simulations.

dgp_D

A vectorized function to generate binary treatment assignments for each sample.

dgp_Y

A vectorized function to generate the outcome variable for each sample given the treatment and covariates.

batch

Either an integer specifying the number of batches (which will be created by random sampling) or a vector of length equal to the sample size providing the batch assignment (index) for each individual in the sample.

nb_simulations

The number of simulations (Monte Carlo replicates) to run.

nb_simulations_truth

Optional. The number of additional simmulations (Monte Carlo replicates) beyond nb_simulations to use when calculating the true policy value difference (delta) and the true policy value (psi)

sample_size

The number of samples in each simulation.

model_type

The model type for policy learning. Options include "causal_forest", "s_learner", and "m_learner". Default is "causal_forest". Note: you can also set model_type to NULL and specify custom_fit and custom_predict to use your custom model.

learner_type

The learner type for the chosen model. Options include "ridge" for Ridge Regression, "fnn" for Feedforward Neural Network and "caret" for Caret. Default is "ridge". if model_type is 'causal_forest', choose NULL, if model_type is 's_learner' or 'm_learner', choose between 'ridge', 'fnn' and 'caret'.

alpha

Significance level for confidence intervals. Default is 0.05 (95% confidence).

baseline_policy

A list providing the baseline policy (binary 0 or 1) for each sample. If NULL, defaults to a list of zeros with the same length as the number of rows in X.

parallelize_batch

Logical. Whether to parallelize batch processing (i.e. the cram method learns T policies, with T the number of batches. They are learned in parallel when parallelize_batch is TRUE vs. learned sequentially using the efficient data.table structure when parallelize_batch is FALSE, recommended for light weight training). Defaults to FALSE.

model_params

A list of additional parameters to pass to the model, which can be any parameter defined in the model reference package. Defaults to NULL.

custom_fit

A custom, user-defined, function that outputs a fitted model given training data (allows flexibility). Defaults to NULL.

custom_predict

A custom, user-defined, function for making predictions given a fitted model and test data (allow flexibility). Defaults to NULL.

propensity

The propensity score model

Value

A list containing:

avg_proportion_treated

The average proportion of treated individuals across simulations.

avg_delta_estimate

The average delta estimate across simulations.

avg_delta_standard_error

The average standard error of delta estimates.

delta_empirical_bias

The empirical bias of delta estimates.

delta_empirical_coverage

The empirical coverage of delta confidence intervals.

avg_policy_value_estimate

The average policy value estimate across simulations.

avg_policy_value_standard_error

The average standard error of policy value estimates.

policy_value_empirical_bias

The empirical bias of policy value estimates.

policy_value_empirical_coverage

The empirical coverage of policy value confidence intervals.

Examples

# \donttest{
set.seed(123)

# dgp_X <- function(n) {
#   data.table::data.table(
#     binary     = rbinom(n, 1, 0.5),
#     discrete   = sample(1:5, n, replace = TRUE),
#     continuous = rnorm(n)
#   )
# }

n <- 100

X_data <- data.table::data.table(
  binary     = rbinom(n, 1, 0.5),
  discrete   = sample(1:5, n, replace = TRUE),
  continuous = rnorm(n)
)


dgp_D <- function(X) rbinom(nrow(X), 1, 0.5)

dgp_Y <- function(D, X) {
  theta <- ifelse(
    X[, binary] == 1 & X[, discrete] <= 2,  # Group 1: High benefit
    1,
    ifelse(X[, binary] == 0 & X[, discrete] >= 4,  # Group 3: Negative benefit
           -1,
           0.1)  # Group 2: Neutral effect
  )
  Y <- D * (theta + rnorm(length(D), mean = 0, sd = 1)) +
    (1 - D) * rnorm(length(D))  # Outcome for untreated
  return(Y)
}

# Parameters
nb_simulations <- 100
nb_simulations_truth <- 200
batch <- 5

# Perform CRAM simulation
result <- cram_simulation(
  X = X_data,
  dgp_D = dgp_D,
  dgp_Y = dgp_Y,
  batch = batch,
  nb_simulations = nb_simulations,
  nb_simulations_truth = nb_simulations_truth,
  sample_size = 500
)

result$raw_results
#>                                  Metric   Value
#> 1            Average Proportion Treated 0.52724
#> 2                Average Delta Estimate 0.22597
#> 3          Average Delta Standard Error 0.10424
#> 4                  Delta Empirical Bias 0.01224
#> 5              Delta Empirical Coverage 0.96000
#> 6         Variance Delta Empirical Bias 0.00220
#> 7         Average Policy Value Estimate 0.22580
#> 8   Average Policy Value Standard Error 0.10080
#> 9           Policy Value Empirical Bias 0.01210
#> 10      Policy Value Empirical Coverage 0.92000
#> 11 Variance Policy Value Empirical Bias 0.00071
result$interactive_table
# }