Main

The foundations for training machine learning models.

mikropml

mikropml: User-Friendly R Package for Robust Machine Learning Pipelines

preprocess_data()

Preprocess data prior to running machine learning

run_ml()

Run the machine learning pipeline

Plotting helpers

Visualize performance to help you tune hyperparameters and choose model methods.

plot_hp_performance()

Plot hyperparameter performance metrics

plot_model_performance()

Plot performance metrics for multiple ML runs with different parameters

tidy_perf_data()

Tidy the performance dataframe

get_hp_performance()

Get hyperparameter performance metrics

combine_hp_performance()

Combine hyperparameter performance metrics for multiple train/test splits

Package Data

datasets

otu_small

Small OTU abundance dataset

otu_mini_bin

Mini OTU abundance dataset

otu_mini_multi

Mini OTU abundance dataset with 3 categorical variables

ML results

otu_mini_bin_results_glmnet

Results from running the pipline with L2 logistic regression on otu_mini_bin with feature importance and grouping

otu_mini_bin_results_rf

Results from running the pipline with random forest on otu_mini_bin

otu_mini_bin_results_rpart2

Results from running the pipline with rpart2 on otu_mini_bin

otu_mini_bin_results_svmRadial

Results from running the pipline with svmRadial on otu_mini_bin

otu_mini_bin_results_xgbTree

Results from running the pipline with xbgTree on otu_mini_bin

otu_mini_cont_results_glmnet

Results from running the pipeline with glmnet on otu_mini_bin with Otu00001 as the outcome

otu_mini_multi_results_glmnet

Results from running the pipeline with glmnet on otu_mini_multi for multiclass outcomes

misc

otu_mini_cv

Cross validation on train_data_mini with grouped features.

Pipeline customization

These are functions called by preprocess_data() or run_ml(). We make them available in case you would like to customize various steps of the pipeline beyond the arguments provided by the main functions.

remove_singleton_columns()

Remove columns appearing in only threshold row(s) or fewer.

get_caret_processed_df()

Get preprocessed dataframe for continuous variables

randomize_feature_order()

Randomize feature order to eliminate any position-dependent effects

get_partition_indices()

Select indices to partition the data into training & testing sets.

get_outcome_type()

Get outcome type.

get_hyperparams_list()

Set hyperparameters based on ML method and dataset characteristics

get_tuning_grid()

Generate the tuning grid for tuning hyperparameters

define_cv()

Define cross-validation scheme and training parameters

get_perf_metric_name()

Get default performance metric name

get_perf_metric_fn()

Get default performance metric function

train_model()

Train model

calc_perf_metrics()

Get performance metrics for test data

get_performance_tbl()

Get model performance metrics as a one-row tibble

get_feature_importance()

Get feature importance using the permutation method

group_correlated_features()

Group correlated features