
Function reference
-
mikropml
mikropml-package
- mikropml: User-Friendly R Package for Robust Machine Learning Pipelines
-
preprocess_data()
- Preprocess data prior to running machine learning
-
run_ml()
- Run the machine learning pipeline
-
get_feature_importance()
- Get feature importance using the permutation method
-
get_performance_tbl()
- Get model performance metrics as a one-row tibble
-
calc_model_sensspec()
calc_mean_roc()
calc_mean_prc()
- Calculate and summarize performance for ROC and PRC plots
-
calc_mean_perf()
- Generic function to calculate mean performance curves for multiple models
-
calc_baseline_precision()
- Calculate the fraction of positives, i.e. baseline precision for a PRC curve
-
calc_balanced_precision()
- Calculate balanced precision given actual and baseline precision
-
compare_models()
- Perform permutation tests to compare the performance metric across all pairs of a group variable.
-
permute_p_value()
- Calculated a permuted p-value comparing two models
-
bootstrap_performance()
- Calculate a bootstrap confidence interval for the performance on a single train/test split
-
plot_mean_roc()
plot_mean_prc()
- Plot ROC and PRC curves
-
plot_hp_performance()
- Plot hyperparameter performance metrics
-
plot_model_performance()
- Plot performance metrics for multiple ML runs with different parameters
-
tidy_perf_data()
- Tidy the performance dataframe
-
get_hp_performance()
- Get hyperparameter performance metrics
-
combine_hp_performance()
- Combine hyperparameter performance metrics for multiple train/test splits
-
otu_small
- Small OTU abundance dataset
-
otu_mini_bin
- Mini OTU abundance dataset
-
otu_mini_multi
- Mini OTU abundance dataset with 3 categorical variables
-
otu_mini_multi_group
- Groups for otu_mini_multi
-
otu_data_preproc
- Mini OTU abundance dataset - preprocessed
-
otu_mini_bin_results_glmnet
- Results from running the pipeline with L2 logistic regression on
otu_mini_bin
with feature importance and grouping
-
otu_mini_bin_results_rf
- Results from running the pipeline with random forest on
otu_mini_bin
-
otu_mini_bin_results_rpart2
- Results from running the pipeline with rpart2 on
otu_mini_bin
-
otu_mini_bin_results_svmRadial
- Results from running the pipeline with svmRadial on
otu_mini_bin
-
otu_mini_bin_results_xgbTree
- Results from running the pipeline with xbgTree on
otu_mini_bin
-
otu_mini_cont_results_glmnet
- Results from running the pipeline with glmnet on
otu_mini_bin
withOtu00001
as the outcome
-
otu_mini_cont_results_nocv
- Results from running the pipeline with glmnet on
otu_mini_bin
withOtu00001
as the outcome column, using a custom train control scheme that does not perform cross-validation
-
otu_mini_multi_results_glmnet
- Results from running the pipeline with glmnet on
otu_mini_multi
for multiclass outcomes
-
otu_mini_cv
- Cross validation on
train_data_mini
with grouped features.
-
replace_spaces()
- Replace spaces in all elements of a character vector with underscores
Pipeline customization
Customize various steps of the pipeline beyond the arguments provided by run_ml() and preprocess_data().
-
remove_singleton_columns()
- Remove columns appearing in only
threshold
row(s) or fewer.
-
get_caret_processed_df()
- Get preprocessed dataframe for continuous variables
-
randomize_feature_order()
- Randomize feature order to eliminate any position-dependent effects
-
get_partition_indices()
- Select indices to partition the data into training & testing sets.
-
get_outcome_type()
- Get outcome type.
-
get_hyperparams_list()
- Set hyperparameters based on ML method and dataset characteristics
-
get_tuning_grid()
- Generate the tuning grid for tuning hyperparameters
-
define_cv()
- Define cross-validation scheme and training parameters
-
get_perf_metric_name()
- Get default performance metric name
-
get_perf_metric_fn()
- Get default performance metric function
-
train_model()
- Train model using
caret::train()
.
-
calc_perf_metrics()
- Get performance metrics for test data
-
group_correlated_features()
- Group correlated features