Skip to contents

This function performs batch-wise learning for machine learning models.

Usage

ml_learning(
  data,
  formula = NULL,
  batch,
  parallelize_batch = FALSE,
  loss_name = NULL,
  caret_params = NULL,
  custom_fit = NULL,
  custom_predict = NULL,
  custom_loss = NULL,
  n_cores = detectCores() - 1,
  classify = FALSE
)

Arguments

data

A matrix or data frame of features. Must include the target variable.

formula

Formula specifying the relationship between the target and predictors for supervised learning.

batch

Either an integer specifying the number of batches (randomly sampled) or a vector of length equal to the sample size indicating batch assignment for each observation.

parallelize_batch

Logical. Whether to parallelize batch processing. Defaults to `FALSE`.

loss_name

The name of the loss function to be used (e.g., `"se"`, `"logloss"`).

caret_params

A list of parameters to pass to the `caret::train()` function. - Required: `method` (e.g., `"glm"`, `"rf"`).

custom_fit

A custom function for training user-defined models. Defaults to `NULL`.

custom_predict

A custom function for making predictions from user-defined models. Defaults to `NULL`.

custom_loss

Optional custom function for computing the loss of a trained model on the data. Should return a vector containing per-instance losses.

n_cores

Number of CPU cores to use for parallel processing (`parallelize_batch = TRUE`). Defaults to `detectCores() - 1`.

classify

Indicate if this is a classification problem. Defaults to FALSE

Value

A list containing:

final_ml_model

The final trained ML model.

losses

A matrix of losses where each column represents a batch's trained model. The first column contains zeros (baseline model).

batch_indices

The indices of observations in each batch.