Skip to contents

Main

The foundations for training machine learning models.

mikropml mikropml-package
mikropml: User-Friendly R Package for Robust Machine Learning Pipelines
preprocess_data()
Preprocess data prior to running machine learning
run_ml()
Run the machine learning pipeline

Model evaluation

Evaluate and interpret models.

get_feature_importance()
Get feature importance using the permutation method
get_performance_tbl()
Get model performance metrics as a one-row tibble
calc_model_sensspec() calc_mean_roc() calc_mean_prc()
Calculate and summarize performance for ROC and PRC plots
calc_mean_perf()
Generic function to calculate mean performance curves for multiple models
calc_baseline_precision()
Calculate the fraction of positives, i.e. baseline precision for a PRC curve
calc_balanced_precision()
Calculate balanced precision given actual and baseline precision
compare_models()
Perform permutation tests to compare the performance metric across all pairs of a group variable.
permute_p_value()
Calculated a permuted p-value comparing two models
bootstrap_performance()
Calculate a bootstrap confidence interval for the performance on a single train/test split

Plotting helpers

Visualize results to help you tune hyperparameters and choose model methods.

plot_mean_roc() plot_mean_prc()
Plot ROC and PRC curves
plot_hp_performance()
Plot hyperparameter performance metrics
plot_model_performance()
Plot performance metrics for multiple ML runs with different parameters
tidy_perf_data()
Tidy the performance dataframe
get_hp_performance()
Get hyperparameter performance metrics
combine_hp_performance()
Combine hyperparameter performance metrics for multiple train/test splits

Package Data

datasets

otu_small
Small OTU abundance dataset
otu_mini_bin
Mini OTU abundance dataset
otu_mini_multi
Mini OTU abundance dataset with 3 categorical variables
otu_mini_multi_group
Groups for otu_mini_multi
otu_data_preproc
Mini OTU abundance dataset - preprocessed

ML results

otu_mini_bin_results_glmnet
Results from running the pipeline with L2 logistic regression on otu_mini_bin with feature importance and grouping
otu_mini_bin_results_rf
Results from running the pipeline with random forest on otu_mini_bin
otu_mini_bin_results_rpart2
Results from running the pipeline with rpart2 on otu_mini_bin
otu_mini_bin_results_svmRadial
Results from running the pipeline with svmRadial on otu_mini_bin
otu_mini_bin_results_xgbTree
Results from running the pipeline with xbgTree on otu_mini_bin
otu_mini_cont_results_glmnet
Results from running the pipeline with glmnet on otu_mini_bin with Otu00001 as the outcome
otu_mini_cont_results_nocv
Results from running the pipeline with glmnet on otu_mini_bin with Otu00001 as the outcome column, using a custom train control scheme that does not perform cross-validation
otu_mini_multi_results_glmnet
Results from running the pipeline with glmnet on otu_mini_multi for multiclass outcomes

misc

otu_mini_cv
Cross validation on train_data_mini with grouped features.
replace_spaces()
Replace spaces in all elements of a character vector with underscores

Pipeline customization

Customize various steps of the pipeline beyond the arguments provided by run_ml() and preprocess_data().

remove_singleton_columns()
Remove columns appearing in only threshold row(s) or fewer.
get_caret_processed_df()
Get preprocessed dataframe for continuous variables
randomize_feature_order()
Randomize feature order to eliminate any position-dependent effects
get_partition_indices()
Select indices to partition the data into training & testing sets.
get_outcome_type()
Get outcome type.
get_hyperparams_list()
Set hyperparameters based on ML method and dataset characteristics
get_tuning_grid()
Generate the tuning grid for tuning hyperparameters
define_cv()
Define cross-validation scheme and training parameters
get_perf_metric_name()
Get default performance metric name
get_perf_metric_fn()
Get default performance metric function
train_model()
Train model using caret::train().
calc_perf_metrics()
Get performance metrics for test data
group_correlated_features()
Group correlated features