Performs the Cram method for simultaneous machine learning and evaluation.
Usage
cram_ml(
data,
batch,
formula = NULL,
caret_params = NULL,
parallelize_batch = FALSE,
loss_name = NULL,
custom_fit = NULL,
custom_predict = NULL,
custom_loss = NULL,
alpha = 0.05,
classify = FALSE
)
Arguments
- data
A matrix or data frame of covariates. For supervised learning, must include the target variable specified in formula.
- batch
Integer specifying number of batches or vector of pre-defined batch assignments.
- formula
Formula for supervised learning (e.g., y ~ .).
- caret_params
List of parameters for caret::train() containing:
method: Model type (e.g., "rf", "glm", "xgbTree" for supervised learning)
Additional method-specific parameters
- parallelize_batch
Logical indicating whether to parallelize batch processing (default = FALSE).
- loss_name
Name of loss metric (supported: "se", "logloss", "accuracy").
- custom_fit
Optional custom model training function.
- custom_predict
Optional custom prediction function.
- custom_loss
Optional custom loss function.
- alpha
Confidence level for intervals (default = 0.05).
- classify
Indicate if this is a classification problem. Defaults to FALSE.
Value
A list containing:
raw_results: Data frame with performance metrics
interactive_table: The same performance metrics in a user-friendly interface
final_ml_model: Trained model object
See also
train
for model training parameters
Examples
# Load necessary libraries
library(caret)
#> Loading required package: ggplot2
#> Loading required package: lattice
# Set seed for reproducibility
set.seed(42)
# Generate example dataset
X_data <- data.frame(x1 = rnorm(100), x2 = rnorm(100), x3 = rnorm(100))
Y_data <- rnorm(100) # Continuous target variable for regression
data_df <- data.frame(X_data, Y = Y_data) # Ensure target variable is included
# Define caret parameters for simple linear regression (no cross-validation)
caret_params_lm <- list(
method = "lm",
trControl = trainControl(method = "none")
)
nb_batch <- 5
# Run ML learning function
result <- cram_ml(
data = data_df,
formula = Y ~ ., # Linear regression model
batch = nb_batch,
loss_name = 'se',
caret_params = caret_params_lm
)
result$raw_results
#> Metric Value
#> 1 Expected Loss Estimate 0.86429
#> 2 Expected Loss Standard Error 0.73665
#> 3 Expected Loss CI Lower -0.57952
#> 4 Expected Loss CI Upper 2.30809
result$interactive_table
result$final_ml_model
#> Linear Regression
#>
#> 100 samples
#> 3 predictor
#>
#> No pre-processing
#> Resampling: None