Skip to contents

Performs the Cram method for simultaneous machine learning and evaluation.

Usage

cram_ml(
  data,
  batch,
  formula = NULL,
  caret_params = NULL,
  parallelize_batch = FALSE,
  loss_name = NULL,
  custom_fit = NULL,
  custom_predict = NULL,
  custom_loss = NULL,
  alpha = 0.05,
  classify = FALSE
)

Arguments

data

A matrix or data frame of covariates. For supervised learning, must include the target variable specified in formula.

batch

Integer specifying number of batches or vector of pre-defined batch assignments.

formula

Formula for supervised learning (e.g., y ~ .).

caret_params

List of parameters for caret::train() containing:

  • method: Model type (e.g., "rf", "glm", "xgbTree" for supervised learning)

  • Additional method-specific parameters

parallelize_batch

Logical indicating whether to parallelize batch processing (default = FALSE).

loss_name

Name of loss metric (supported: "se", "logloss", "accuracy").

custom_fit

Optional custom model training function.

custom_predict

Optional custom prediction function.

custom_loss

Optional custom loss function.

alpha

Confidence level for intervals (default = 0.05).

classify

Indicate if this is a classification problem. Defaults to FALSE.

Value

A list containing:

  • raw_results: Data frame with performance metrics

  • interactive_table: The same performance metrics in a user-friendly interface

  • final_ml_model: Trained model object

See also

train for model training parameters

Examples

# Load necessary libraries
library(caret)
#> Loading required package: ggplot2
#> Loading required package: lattice

# Set seed for reproducibility
set.seed(42)

# Generate example dataset
X_data <- data.frame(x1 = rnorm(100), x2 = rnorm(100), x3 = rnorm(100))
Y_data <- rnorm(100)  # Continuous target variable for regression
data_df <- data.frame(X_data, Y = Y_data)  # Ensure target variable is included

# Define caret parameters for simple linear regression (no cross-validation)
caret_params_lm <- list(
  method = "lm",
  trControl = trainControl(method = "none")
)

nb_batch <- 5

# Run ML learning function
result <- cram_ml(
  data = data_df,
  formula = Y ~ .,  # Linear regression model
  batch = nb_batch,
  loss_name = 'se',
  caret_params = caret_params_lm
)

result$raw_results
#>                         Metric    Value
#> 1       Expected Loss Estimate  0.86429
#> 2 Expected Loss Standard Error  0.73665
#> 3       Expected Loss CI Lower -0.57952
#> 4       Expected Loss CI Upper  2.30809
result$interactive_table
result$final_ml_model #> Linear Regression #> #> 100 samples #> 3 predictor #> #> No pre-processing #> Resampling: None