Wrapper function to compute the benchmark — regional

regional_benchmark() is a wrapper function calling a number of functions. See details.

regional_benchmark(
  regions = c("ALLSAC", "SFE", "K", "NC", "NCC", "SCC", "SC", "SJT"),
  LRN_IDS,
  TUNELENGTH,
  INNER,
  ITERS,
  PROB,
  NU,
  REPS,
  PREPROC,
  FINAL,
  PATH,
  REDUCED,
  MES,
  INFO,
  FS,
  FS_NUM
)

Arguments

regions	`character`, vector of `region` identifiers (e.g. `"SFE"`)
LRN_IDS	`character`, vector of learner identifiers (e.g. `"classif.randomForest"`)
TUNELENGTH	`numeric`, controls the number of hyper-parameters for discrete tuning
INNER	`ResampleDesc`, the inner folds for the nested resampling
ITERS	`numeric`, controls the number of tries for the random tuning
PROB	`logical`, selects the type of output, if `TRUE` probabilities, if `FALSE` response
NU	`numeric`, number of folds for the \(\nu\)-fold cross-validation
REPS	`numeric`, number of repetitions for the repeated \(\nu\)-fold cross-validation
PREPROC	`character`, vector of preprocessing identifiers (e.g. `"scale"`)
FINAL	`logical`, selects the type of runs, if `FALSE` trains, else predict
PATH	`character` or `file.path`, path to the output directory
REDUCED	`logical`, legacy option
MES	`list`, list of measures from the `mlr` package, the first one is optimized against
INFO	`logical`, controls the information printed by the training process
FS	`logical`, if `TRUE` activates the feature selection
FS_NUM	`numeric`, number of features to select if `FS` is `TRUE`

Value

a list of mlr benchmark results

Details

Here is a some pseudo-code that explains what is happening behind the scenes.

Skip. Because regional_benchmark() is called inside the for-loop for (FS_NUM in FS_NUM_LIST) (see above), if FINAL is #' TRUE, regional_benchmark() skips the region it does not need to calculate the final models.
Data loading. This handled by get_training_data()
Data formatting. This is handled by fmt_labels(), sanitize_data() and get_coords().
Feature selection. If FINAL is TRUE the selected features are retrieved from get_bestBMR_tuning_results(). If FINAL is FALSE the selected features are derived from transformed training data using get_ppc() and preproc_data(). The resulting transformed data are filtered for correlation higher than 0.95 with caret::findCorrelation(). Then, 500 subsampled mlr Tasks are created with mlr::makeResampleDesc(), mlr::makeClassifTask(), mlr::makeResampleInstance() and mlr::filterFeatures(). The #' FS_NUM most commonly select features across the 500 realizations are selected.
Pre-processing. The target and training data are transformed using get_ppc() on the target data and preproc_data() on the #' training data. SMOTE is applied using get_smote_data() and get_smote_coords() which both call resolve_class_imbalance().
Tasks. Tasks are obtained using mlr::makeClassifTask().
Learners. Learners are constructed using get_learners() or get_final_learners().
Compute benchmark. The benchmark is run with compute_final_model() or compute_benchmark() (which needs to retrieve the outer #' folds of the nested resampling with get_outers()).