Cram ML: Simultaneous Machine Learning and Evaluation

Performs the Cram method for simultaneous machine learning and evaluation.

Usage

cram_ml(
  data,
  batch,
  formula = NULL,
  caret_params = NULL,
  parallelize_batch = FALSE,
  loss_name = NULL,
  custom_fit = NULL,
  custom_predict = NULL,
  custom_loss = NULL,
  alpha = 0.05,
  classify = FALSE
)

Arguments

data

A matrix or data frame of covariates. For supervised learning, must include the target variable specified in formula.

batch

Integer specifying number of batches or vector of pre-defined batch assignments.

formula

Formula for supervised learning (e.g., y ~ .).

caret_params

List of parameters for caret::train() containing:

method: Model type (e.g., "rf", "glm", "xgbTree" for supervised learning)
Additional method-specific parameters

parallelize_batch

Logical indicating whether to parallelize batch processing (default = FALSE).

loss_name

Name of loss metric (supported: "se", "logloss", "accuracy").

custom_fit

Optional custom model training function.

custom_predict

Optional custom prediction function.

custom_loss

Optional custom loss function.

alpha

Confidence level for intervals (default = 0.05).

classify

Indicate if this is a classification problem. Defaults to FALSE.

Value

A list containing:

raw_results: Data frame with performance metrics
interactive_table: The same performance metrics in a user-friendly interface
final_ml_model: Trained model object

Examples

# Load necessary libraries
library(caret)
#> Loading required package: ggplot2
#> Loading required package: lattice

# Set seed for reproducibility
set.seed(42)

# Generate example dataset
X_data <- data.frame(x1 = rnorm(100), x2 = rnorm(100), x3 = rnorm(100))
Y_data <- rnorm(100)  # Continuous target variable for regression
data_df <- data.frame(X_data, Y = Y_data)  # Ensure target variable is included

# Define caret parameters for simple linear regression (no cross-validation)
caret_params_lm <- list(
  method = "lm",
  trControl = trainControl(method = "none")
)

nb_batch <- 5

# Run ML learning function
result <- cram_ml(
  data = data_df,
  formula = Y ~ .,  # Linear regression model
  batch = nb_batch,
  loss_name = 'se',
  caret_params = caret_params_lm
)

result$raw_results
#>                         Metric    Value
#> 1       Expected Loss Estimate  0.86429
#> 2 Expected Loss Standard Error  0.73665
#> 3       Expected Loss CI Lower -0.57952
#> 4       Expected Loss CI Upper  2.30809
result$interactive_table

result$final_ml_model
#> Linear Regression 
#> 
#> 100 samples
#>   3 predictor
#> 
#> No pre-processing
#> Resampling: None

Cram ML: Simultaneous Machine Learning and Evaluation

Usage

Arguments

Value

See also

Examples