Enables comparing analyses at different taxonomic resolutions, as seen in doi:10.1128/mbio.03161-21 . Implementation adapted from here.
Arguments
- otu_shared_dat
data frame created from a shared file at the OTU level.
- otu_tax_dat
data frame created from a taxonomy file at the OTU level. Must be from the same dataset as the shared file.
- taxon_level
taxonomic level to pool OTUs into. Options: "kingdom", "phylum", "class", "order", "family", "genus". This should be the name of a column in
otu_tax_dat
as a character string.
Value
a shared data frame with the OTUs at the specified taxon_level
and
a corresponding taxonomy dataframe with new OTU numbers.
Examples
tax_dat <- read_tax(system.file("extdata", "test.taxonomy",
package = "schtools"
))
shared_dat <- readr::read_tsv(system.file("extdata", "test.shared",
package = "schtools"
))
#> Rows: 10 Columns: 15
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "\t"
#> chr (1): Group
#> dbl (14): label, numOtus, Otu0001, Otu0003, Otu0004, Otu00008, Otu0044, Otu0...
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
pool_taxon_counts(shared_dat, tax_dat, "genus")
#> $shared
#> # A tibble: 10 × 13
#> label Group numOtus Otu01 Otu02 Otu03 Otu04 Otu05 Otu06 Otu07 Otu08 Otu09
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 genus p1 10 0 0 0 2 0 1 0 1 0
#> 2 genus p10 10 1 0 1 0 1 0 1 1 1
#> 3 genus p2 10 1 1 0 1 0 1 0 0 1
#> 4 genus p3 10 0 1 0 1 0 0 1 0 1
#> 5 genus p4 10 1 1 1 0 0 0 0 0 0
#> 6 genus p5 10 1 1 0 1 0 0 0 0 1
#> 7 genus p6 10 1 0 1 1 1 1 0 0 1
#> 8 genus p7 10 0 0 0 1 1 0 1 0 1
#> 9 genus p8 10 0 1 1 2 0 0 1 1 0
#> 10 genus p9 10 0 1 1 2 0 0 1 1 1
#> # ℹ 1 more variable: Otu10 <dbl>
#>
#> $tax
#> # A tibble: 10 × 3
#> otu size genus
#> <chr> <dbl> <chr>
#> 1 Otu01 5 Bacteroides
#> 2 Otu02 6 Porphyromonadaceae unclassified
#> 3 Otu03 5 Enterobacteriaceae unclassified
#> 4 Otu04 11 Bacteria unclassified
#> 5 Otu05 3 Acinetobacter
#> 6 Otu06 3 Clostridium XlVa
#> 7 Otu07 5 Betaproteobacteria unclassified
#> 8 Otu08 4 Clostridium XVIII
#> 9 Otu09 7 Candidatus Saccharibacteria unclassified
#> 10 Otu10 5 Clostridiales Incertae Sedis XIII unclassified
#>
pool_taxon_counts(shared_dat, tax_dat, "family")
#> $shared
#> # A tibble: 10 × 13
#> label Group numOtus Otu01 Otu02 Otu03 Otu04 Otu05 Otu06 Otu07 Otu08 Otu09
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 family p1 10 0 0 0 2 0 1 0 1 0
#> 2 family p10 10 1 0 1 0 1 0 1 1 1
#> 3 family p2 10 1 1 0 1 0 1 0 0 1
#> 4 family p3 10 0 1 0 1 0 0 1 0 1
#> 5 family p4 10 1 1 1 0 0 0 0 0 0
#> 6 family p5 10 1 1 0 1 0 0 0 0 1
#> 7 family p6 10 1 0 1 1 1 1 0 0 1
#> 8 family p7 10 0 0 0 1 1 0 1 0 1
#> 9 family p8 10 0 1 1 2 0 0 1 1 0
#> 10 family p9 10 0 1 1 2 0 0 1 1 1
#> # ℹ 1 more variable: Otu10 <dbl>
#>
#> $tax
#> # A tibble: 10 × 3
#> otu size family
#> <chr> <dbl> <chr>
#> 1 Otu01 5 Bacteroidaceae
#> 2 Otu02 6 Porphyromonadaceae
#> 3 Otu03 5 Enterobacteriaceae
#> 4 Otu04 11 Bacteria unclassified
#> 5 Otu05 3 Moraxellaceae
#> 6 Otu06 3 Lachnospiraceae
#> 7 Otu07 5 Betaproteobacteria unclassified
#> 8 Otu08 4 Erysipelotrichaceae
#> 9 Otu09 7 Candidatus Saccharibacteria unclassified
#> 10 Otu10 5 Clostridiales Incertae Sedis XIII
#>
pool_taxon_counts(shared_dat, tax_dat, "phylum")
#> $shared
#> # A tibble: 10 × 8
#> label Group numOtus Otu1 Otu2 Otu3 Otu4 Otu5
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 phylum p1 5 0 0 2 2 0
#> 2 phylum p10 5 1 3 0 1 1
#> 3 phylum p2 5 2 0 1 2 1
#> 4 phylum p3 5 1 1 1 1 1
#> 5 phylum p4 5 2 1 0 1 0
#> 6 phylum p5 5 2 0 1 0 1
#> 7 phylum p6 5 1 2 1 1 1
#> 8 phylum p7 5 0 2 1 1 1
#> 9 phylum p8 5 1 2 2 1 0
#> 10 phylum p9 5 1 2 2 2 1
#>
#> $tax
#> # A tibble: 5 × 3
#> otu size phylum
#> <chr> <dbl> <chr>
#> 1 Otu1 11 Bacteroidetes
#> 2 Otu2 13 Proteobacteria
#> 3 Otu3 11 Bacteria unclassified
#> 4 Otu4 12 Firmicutes
#> 5 Otu5 7 Candidatus Saccharibacteria
#>