Main

The foundations for training machine learning models.

mikropml

mikropml: User-Friendly R Package for Robust Machine Learning Pipelines

preprocess_data()

Preprocess data prior to running machine learning

run_ml()

Run the machine learning pipeline

Plotting helpers

Visualize results to help you tune hyperparameters and choose model methods.

plot_hp_performance()

Plot hyperparameter performance metrics

plot_model_performance()

Plot performance metrics for multiple ML runs with different parameters

tidy_perf_data()

Tidy the performance dataframe

get_hp_performance()

Get hyperparameter performance metrics

combine_hp_performance()

Combine hyperparameter performance metrics for multiple train/test splits

Model evaluation

Evaluate and interpret models.

get_feature_importance()

Get feature importance using the permutation method

get_performance_tbl()

Get model performance metrics as a one-row tibble

compare_models()

Perform permutation tests to compare the performance metric across all pairs of a group variable.

permute_p_value()

Calculated a permuted p-value comparing two models

Package Data

datasets

otu_small

Small OTU abundance dataset

otu_mini_bin

Mini OTU abundance dataset

otu_mini_multi

Mini OTU abundance dataset with 3 categorical variables

otu_mini_multi_group

Groups for otu_mini_multi

ML results

otu_mini_bin_results_glmnet

Results from running the pipeline with L2 logistic regression on otu_mini_bin with feature importance and grouping

otu_mini_bin_results_rf

Results from running the pipeline with random forest on otu_mini_bin

otu_mini_bin_results_rpart2

Results from running the pipeline with rpart2 on otu_mini_bin

otu_mini_bin_results_svmRadial

Results from running the pipeline with svmRadial on otu_mini_bin

otu_mini_bin_results_xgbTree

Results from running the pipeline with xbgTree on otu_mini_bin

otu_mini_cont_results_glmnet

Results from running the pipeline with glmnet on otu_mini_bin with Otu00001 as the outcome

otu_mini_cont_results_nocv

Results from running the pipeline with glmnet on otu_mini_bin with Otu00001 as the outcome column, using a custom train control scheme that does not perform cross-validation

otu_mini_multi_results_glmnet

Results from running the pipeline with glmnet on otu_mini_multi for multiclass outcomes

misc

otu_mini_cv

Cross validation on train_data_mini with grouped features.

replace_spaces()

Replace spaces in all elements of a character vector with underscores

Pipeline customization

Customize various steps of the pipeline beyond the arguments provided by run_ml() and preprocess_data().

remove_singleton_columns()

Remove columns appearing in only threshold row(s) or fewer.

get_caret_processed_df()

Get preprocessed dataframe for continuous variables

randomize_feature_order()

Randomize feature order to eliminate any position-dependent effects

get_partition_indices()

Select indices to partition the data into training & testing sets.

get_outcome_type()

Get outcome type.

get_hyperparams_list()

Set hyperparameters based on ML method and dataset characteristics

get_tuning_grid()

Generate the tuning grid for tuning hyperparameters

define_cv()

Define cross-validation scheme and training parameters

get_perf_metric_name()

Get default performance metric name

get_perf_metric_fn()

Get default performance metric function

train_model()

Train model using caret::train().

calc_perf_metrics()

Get performance metrics for test data

group_correlated_features()

Group correlated features