Cram ML: Generalized ML Learning — ml

This function performs batch-wise learning for machine learning models.

Usage

ml_learning(
  data,
  formula = NULL,
  batch,
  parallelize_batch = FALSE,
  loss_name = NULL,
  caret_params = NULL,
  custom_fit = NULL,
  custom_predict = NULL,
  custom_loss = NULL,
  n_cores = detectCores() - 1,
  classify = FALSE
)

Arguments

data: A matrix or data frame of features. Must include the target variable.
formula: Formula specifying the relationship between the target and predictors for supervised learning.
batch: Either an integer specifying the number of batches (randomly sampled) or a vector of length equal to the sample size indicating batch assignment for each observation.
parallelize_batch: Logical. Whether to parallelize batch processing. Defaults to `FALSE`.
loss_name: The name of the loss function to be used (e.g., `"se"`, `"logloss"`).
caret_params: A list of parameters to pass to the `caret::train()` function. - Required: `method` (e.g., `"glm"`, `"rf"`).
custom_fit: A custom function for training user-defined models. Defaults to `NULL`.
custom_predict: A custom function for making predictions from user-defined models. Defaults to `NULL`.
custom_loss: Optional custom function for computing the loss of a trained model on the data. Should return a vector containing per-instance losses.
n_cores: Number of CPU cores to use for parallel processing (`parallelize_batch = TRUE`). Defaults to `detectCores() - 1`.
classify: Indicate if this is a classification problem. Defaults to FALSE

Value

A list containing:

final_ml_model: The final trained ML model.
losses: A matrix of losses where each column represents a batch's trained model. The first column contains zeros (baseline model).
batch_indices: The indices of observations in each batch.