
Select indices to partition the data into training & testing sets.
Source:R/partition.R
get_partition_indices.Rd
Use this function to get the row indices for the training set.
Usage
get_partition_indices(
outcomes,
training_frac = 0.8,
groups = NULL,
group_partitions = NULL
)
Arguments
- outcomes
vector of outcomes
- training_frac
Fraction of data for training set (default:
0.8
). Rows from the dataset will be randomly selected for the training set, and all remaining rows will be used in the testing set. Alternatively, if you provide a vector of integers, these will be used as the row indices for the training set. All remaining rows will be used in the testing set.- groups
Vector of groups to keep together when splitting the data into train and test sets. If the number of groups in the training set is larger than
kfold
, the groups will also be kept together for cross-validation. Length matches the number of rows in the dataset (default:NULL
).- group_partitions
Specify how to assign
groups
to the training and testing partitions (default:NULL
). Ifgroups
specifies that some samples belong to group"A"
and some belong to group"B"
, then settinggroup_partitions = list(train = c("A", "B"), test = c("B"))
will result in all samples from group"A"
being placed in the training set, some samples from"B"
also in the training set, and the remaining samples from"B"
in the testing set. The partition sizes will be as close totraining_frac
as possible. If the number of groups in the training set is larger thankfold
, the groups will also be kept together for cross-validation.
Details
If groups
is NULL
, uses createDataPartition.
Otherwise, uses create_grouped_data_partition()
.
Set the seed prior to calling this function if you would like your data partitions to be reproducible (recommended).