regional_benchmark() is a wrapper function calling a number of functions. See details.

regional_benchmark(
  regions = c("ALLSAC", "SFE", "K", "NC", "NCC", "SCC", "SC", "SJT"),
  LRN_IDS,
  TUNELENGTH,
  INNER,
  ITERS,
  PROB,
  NU,
  REPS,
  PREPROC,
  FINAL,
  PATH,
  REDUCED,
  MES,
  INFO,
  FS,
  FS_NUM
)

Arguments

regions

character, vector of region identifiers (e.g. "SFE")

LRN_IDS

character, vector of learner identifiers (e.g. "classif.randomForest")

TUNELENGTH

numeric, controls the number of hyper-parameters for discrete tuning

INNER

ResampleDesc, the inner folds for the nested resampling

ITERS

numeric, controls the number of tries for the random tuning

PROB

logical, selects the type of output, if TRUE probabilities, if FALSE response

NU

numeric, number of folds for the \(\nu\)-fold cross-validation

REPS

numeric, number of repetitions for the repeated \(\nu\)-fold cross-validation

PREPROC

character, vector of preprocessing identifiers (e.g. "scale")

FINAL

logical, selects the type of runs, if FALSE trains, else predict

PATH

character or file.path, path to the output directory

REDUCED

logical, legacy option

MES

list, list of measures from the mlr package, the first one is optimized against

INFO

logical, controls the information printed by the training process

FS

logical, if TRUE activates the feature selection

FS_NUM

numeric, number of features to select if FS is TRUE

Value

a list of mlr benchmark results

Details

Here is a some pseudo-code that explains what is happening behind the scenes.

  1. Skip. Because regional_benchmark() is called inside the for-loop for (FS_NUM in FS_NUM_LIST) (see above), if FINAL is #' TRUE, regional_benchmark() skips the region it does not need to calculate the final models.

  2. Data loading. This handled by get_training_data()

  3. Data formatting. This is handled by fmt_labels(), sanitize_data() and get_coords().

  4. Feature selection. If FINAL is TRUE the selected features are retrieved from get_bestBMR_tuning_results(). If FINAL is FALSE the selected features are derived from transformed training data using get_ppc() and preproc_data(). The resulting transformed data are filtered for correlation higher than 0.95 with caret::findCorrelation(). Then, 500 subsampled mlr Tasks are created with mlr::makeResampleDesc(), mlr::makeClassifTask(), mlr::makeResampleInstance() and mlr::filterFeatures(). The #' FS_NUM most commonly select features across the 500 realizations are selected.

  5. Pre-processing. The target and training data are transformed using get_ppc() on the target data and preproc_data() on the #' training data. SMOTE is applied using get_smote_data() and get_smote_coords() which both call resolve_class_imbalance().

  6. Tasks. Tasks are obtained using mlr::makeClassifTask().

  7. Learners. Learners are constructed using get_learners() or get_final_learners().

  8. Compute benchmark. The benchmark is run with compute_final_model() or compute_benchmark() (which needs to retrieve the outer #' folds of the nested resampling with get_outers()).