Function reference • mikropml

Main

The foundations for training machine learning models.

mikropml mikropml-package: mikropml: User-Friendly R Package for Robust Machine Learning Pipelines

preprocess_data(): Preprocess data prior to running machine learning

run_ml(): Run the machine learning pipeline

Model evaluation

Evaluate and interpret models.

get_feature_importance(): Get feature importance using the permutation method

get_performance_tbl(): Get model performance metrics as a one-row tibble

calc_model_sensspec() calc_mean_roc() calc_mean_prc(): Calculate and summarize performance for ROC and PRC plots

calc_mean_perf(): Generic function to calculate mean performance curves for multiple models

calc_baseline_precision(): Calculate the fraction of positives, i.e. baseline precision for a PRC curve

calc_balanced_precision(): Calculate balanced precision given actual and baseline precision

compare_models(): Perform permutation tests to compare the performance metric across all pairs of a group variable.

permute_p_value(): Calculated a permuted p-value comparing two models

bootstrap_performance(): Calculate a bootstrap confidence interval for the performance on a single train/test split

Plotting helpers

Visualize results to help you tune hyperparameters and choose model methods.

plot_mean_roc() plot_mean_prc(): Plot ROC and PRC curves

plot_hp_performance(): Plot hyperparameter performance metrics

plot_model_performance(): Plot performance metrics for multiple ML runs with different parameters

tidy_perf_data(): Tidy the performance dataframe

get_hp_performance(): Get hyperparameter performance metrics

combine_hp_performance(): Combine hyperparameter performance metrics for multiple train/test splits

Package Data

datasets

otu_small: Small OTU abundance dataset

otu_mini_bin: Mini OTU abundance dataset

otu_mini_multi: Mini OTU abundance dataset with 3 categorical variables

otu_mini_multi_group: Groups for otu_mini_multi

otu_data_preproc: Mini OTU abundance dataset - preprocessed

ML results

otu_mini_bin_results_glmnet: Results from running the pipeline with L2 logistic regression on otu_mini_bin with feature importance and grouping

otu_mini_bin_results_rf: Results from running the pipeline with random forest on otu_mini_bin

otu_mini_bin_results_rpart2: Results from running the pipeline with rpart2 on otu_mini_bin

otu_mini_bin_results_svmRadial: Results from running the pipeline with svmRadial on otu_mini_bin

otu_mini_bin_results_xgbTree: Results from running the pipeline with xbgTree on otu_mini_bin

otu_mini_cont_results_glmnet: Results from running the pipeline with glmnet on otu_mini_bin with Otu00001 as the outcome

otu_mini_cont_results_nocv: Results from running the pipeline with glmnet on otu_mini_bin with Otu00001 as the outcome column, using a custom train control scheme that does not perform cross-validation

otu_mini_multi_results_glmnet: Results from running the pipeline with glmnet on otu_mini_multi for multiclass outcomes

misc

otu_mini_cv: Cross validation on train_data_mini with grouped features.

replace_spaces(): Replace spaces in all elements of a character vector with underscores

Pipeline customization

Customize various steps of the pipeline beyond the arguments provided by run_ml() and preprocess_data().

remove_singleton_columns(): Remove columns appearing in only threshold row(s) or fewer.

get_caret_processed_df(): Get preprocessed dataframe for continuous variables

randomize_feature_order(): Randomize feature order to eliminate any position-dependent effects

get_partition_indices(): Select indices to partition the data into training & testing sets.

get_outcome_type(): Get outcome type.

get_hyperparams_list(): Set hyperparameters based on ML method and dataset characteristics

get_tuning_grid(): Generate the tuning grid for tuning hyperparameters

define_cv(): Define cross-validation scheme and training parameters

get_perf_metric_name(): Get default performance metric name

get_perf_metric_fn(): Get default performance metric function

train_model(): Train model using caret::train().

calc_perf_metrics(): Get performance metrics for test data

group_correlated_features(): Group correlated features