| Title: | Fetch and Explore the Cornell Lab of Ornithology Open Tree of Life Avian Phylogeny |
|---|---|
| Description: | Fetches the Cornell Lab of Ornithology Open Tree of Life (clootl) tree in a specified taxonomy. Optionally prune it to a given set of study taxa. Provide a recommended citation list for the studies that informed the extracted tree. Tree generated as described in McTavish et al. (2024) <doi:10.1101/2024.05.20.595017>. |
| Authors: | Eliot Miller [aut, cre], Emily Jane McTavish [aut], Luna L. Sanchez Reyes [ctb, aut] |
| Maintainer: | Eliot Miller <[email protected]> |
| License: | GPL-3 |
| Version: | 0.1.4 |
| Built: | 2026-05-23 09:15:20 UTC |
| Source: | https://github.com/eliotmiller/clootl |
A dataset containing taxonomy files, summary phylogenies, constituent study information, and other data needed for the package to function properly.
clootl_dataclootl_data
List of csv files, phylogenies, and other data components.
The data object, clootl_data, stores the most up-to-date stable version of the
tree mapped to each of the different taxonomy years, the annotations of how each
study contributed to the tree, the citation information for each study that
contributed to the tree, the taxonomy crosswalks for different years, and
some other variables.
The structure of the data store (a list) is as follows:
clootl_data$taxonomiesA list of data frames. Each element corresponds to a taxonomy year:
year2025
year2024
year2023
year2022
year2021
These originate as CSV files linking the Clements taxonomy for each of these years to OTT ids, Avibase ids, and other bird taxonomies (see README of https://github.com/McTavishLab/AvesData).
clootl_data$treessummary.treesPhylo objects of complete dated trees mapped to the Clements taxonomy year:
year2025
These are generated from summary_dated_clements.nex
(see https://github.com/McTavishLab/AvesData README).
annotationsComplete annotations of the OpenTree synthetic tree for this version, used to determine appropriate subtree citations.
clootl_data$study_infoA mapping of OpenTree study ids to full citations. Used with annotations to generate appropriate citations for trees and subtrees.
clootl_data$versionsA character vector of all possible tree versions. To access older versions,
download the data repository using get_avesdata_repo().
clootl_data$tax_yearsA character vector of all available taxonomies. The current tree version is mapped to each of these taxonomies, along with crosswalks linking the Clements taxonomy for each year to other identifiers.
This data object is generated using the following code:
clootl_data = list()
clootl_data$versions <- c("1.2","1.3","1.4","1.5","1.6")
clootl_data$tax_years <- c("2021","2022","2023","2024", "2025")
clootl_data$combinations <- c(c(1.2, 2021),
c(1.2, 2022),
c(1.2, 2023),
c(1.3, 2021),
c(1.3, 2022),
c(1.3, 2023),
c(1.4, 2021),
c(1.4, 2022),
c(1.4, 2023),
c(1.5, 2021), <--- this will have every tip in AVONET
c(1.5, 2022),
c(1.5, 2023),
c(1.5, 2024),
c(1.6, 2025))
fullTree2025 <- treeGet("1.6","2025", data_path="~/projects/otapi/AvesData")
clootl_data$trees$`Aves_1.6`$summary.trees$year2025 <- fullTree2025
tax2021 <- taxonomyGet(2021, data_path="~/projects/otapi/AvesData")
tax2022 <- taxonomyGet(2022, data_path="~/projects/otapi/AvesData")
tax2023 <- taxonomyGet(2023, data_path="~/projects/otapi/AvesData")
tax2024 <- taxonomyGet(2024, data_path="~/projects/otapi/AvesData")
tax2025 <- taxonomyGet(2025, data_path="~/projects/otapi/AvesData")
clootl_data$taxonomies$year2021 <- tax2021
clootl_data$taxonomies$year2022 <- tax2022
clootl_data$taxonomies$year2023 <- tax2023
clootl_data$taxonomies$year2024 <- tax2024
clootl_data$taxonomies$year2025 <- tax2025
annot_filename <- "~/projects/otapi/AvesData/Tree_versions/Aves_1.6/OpenTreeSynth/annotated_supertree/annotations.json"
all_nodes <- jsonlite::fromJSON(txt=annot_filename)
clootl_data$trees$Aves_1.6$annotations <- all_nodes
studies <- c()
for (inputs in all_nodes$source_id_map) studies <- c(studies, inputs$study_id)
studies <- unique(studies)
study_info <- clootl:::api_studies_lookup(studies)
clootl_data$study_info <- study_info
save(clootl_data, file="~/projects/otapi/clootl/data/clootl_data.rda", compress="xz")
https://github.com/eliotmiller/clootl
This function extracts one or more phylogenies in the desired taxonomy and tree version. It defaults to the pre-packaged summary trees, but can also be used to extract sets of phylogenies expressing uncertainty, once they have been downloaded from the online repository.
extractTree( species = "all_species", label_type = "scientific", taxonomy_year = 2025, version = "1.6", data_path = FALSE, force = FALSE )extractTree( species = "all_species", label_type = "scientific", taxonomy_year = 2025, version = "1.6", data_path = FALSE, force = FALSE )
species |
A character vector either of scientific names (directly as they come out of the
eBird taxonomy, i.e. without underscores) or of six-letter eBird species codes. Any elements of
the species vector that do not match a species-level taxon in the specified eBird taxonomy
will result in an error. eBird taxonomy files can be accessed using |
label_type |
Either "scientific" or "code". Default is set to "scientific". |
taxonomy_year |
The eBird taxonomy year the tree should be output in. Current options are 2021-2024. Both numeric and character inputs are acceptable here. Any value aside from these years will result in an error. Default is most recent year. |
version |
The desired version of the tree. Default to the most recent version of the tree. Other versions available are listed in clootl_data$versions and can be passed as a character string or as numeric. |
data_path |
Default to FALSE. If a summary, dated tree is desired, this is sufficient
and does not need to be modified. However, if a user wishes to extract a set of complete
dated trees, for example to iterate an analysis across a cloud of trees, or to use an
older version of the tree than the current one packed in the data object, this function
can also accept a path to the downloaded set of trees. If you have already downloaded the AvesData repo
available at https://github.com/McTavishLab/AvesData use data_path= the path to the download location.
Alternately, you can download the full data repo using |
force |
Default to FALSE. If FALSE a tree will be returned only if there is an exact match in the tree to all species requested. This function first ensures that the requested output species overlap with species-level taxa in the requested eBird taxonomy. If they do not, the function will error out. The onus is on the user to ensure the requested taxa are valid. This is critical to ensure no unexpected analysis hiccups later–you don't want to find out many steps later that your dataset doesn't match your phylogeny. If force=TRUE even if there is is not a match to all taxa in the requested species list, a tree will be returned for the species that do match. |
One or more phylogenies of the specified taxa in the specified eBird taxonomy version and clootl tree version.
Eliot Miller, Luna Sanchez Reyes, Emily Jane McTavish
ex1 <- extractTree(species=c("amerob", "canwar", "reevir1", "yerwar", "gockin"), label_type="code") ex2 <- extractTree(species=c("Turdus migratorius", "Setophaga dominica", "Setophaga ruticilla", "Sitta canadensis"), label_type="scientific", taxonomy_year="2025", version="1.6")ex1 <- extractTree(species=c("amerob", "canwar", "reevir1", "yerwar", "gockin"), label_type="code") ex2 <- extractTree(species=c("Turdus migratorius", "Setophaga dominica", "Setophaga ruticilla", "Sitta canadensis"), label_type="scientific", taxonomy_year="2025", version="1.6")
Pull down full AvesData repository to a working directory
get_avesdata_repo(path, overwrite = FALSE)get_avesdata_repo(path, overwrite = FALSE)
path |
Path to download data zipfile to, and where it will be unpacked. To download into your working directory, use "." |
overwrite |
Default to |
Will download full data repo from https://github.com/McTavishLab/AvesData.
This data is required to use sampleTrees() to sample from the distribution of dated trees,
or to access earlier versions of the complete tree.
This function will download the data and set an environmental variable AVESDATA_PATH to the location of the data download.
When AVESDATA_PATH is set, the data_path in any clootl functions with a data_path argument will default to this value.
To manually set AVESDATA_PATH to the location of your downloaded AvesData repo use set_avesdata_repo_path()
No return value. This function is used to download the Aves Data repository.
Get path to Aves Data folder, if set.
get_avesdata_repo_path()get_avesdata_repo_path()
Based on https://github.com/CornellLabofOrnithology/auk/blob/main/R/auk-set-ebd-path.r
Use this function to check stored path to downloaded AvesData folder from https://github.com/McTavishLab/AvesData.
When AVESDATA_PATH is set, the data_path in any clootl functions with a data_path argument will default to this value.
String - path to Aves Data folder, if set. Returns "" if not set.
## Not run: get_avesdata_repo_path() ## End(Not run)## Not run: get_avesdata_repo_path() ## End(Not run)
Quantify the contribution of studies informing an extracted tree, and obtain DOI and citation information for those studies.
getCitations(tree, version = 1.6, data_path = FALSE)getCitations(tree, version = 1.6, data_path = FALSE)
tree |
A phylogeny obtained from extractTree (see details). |
version |
The version of the tree used in extract tree. Default to the most recent version of the tree. and can be passed as a character string or as numeric. If an alternate version was used to create the tree this function may fail or give incomplete or incorrect citation information. |
data_path |
Default to FALSE. If you are gathering citations for an
older version of the tree than the current one packed in the data object, you will have
already downloaded the data repo in order to generate that tree.
The data is available at https://github.com/McTavishLab/AvesData.
If you have manually downloaded the repo, use data_path= the path to the download location.
Alternately, you can download the full data repo using |
The function will determine what proportion of nodes in your phylogeny are supported by each study that goes into creating the final clootl tree. We use 'supported by' in the sense described in Redelings and Holder, PeerJ (2017) https://peerj.com/articles/3058/, and as shown in the tree.opentreeoflife.org tree viewer. We normalize these values to a percentage of internal nodes in the target tree supported by each study. In any resulting publication, please cite both the synthetic tree (McTavish et al. 2025), clootl (Miller et al. 2025) and "all" the trees/DOIs that contributed to your phylogeny. That said, we are well aware of citation and word count limits that plague modern publishing, and for this reason we quantify the contribution of each study; depending on your phylogeny, it is very possible that one or two studies contributed the majority of information. This function relies on the phylogenetic synthesis information directly, and is agnostic to taxonomy version.
A dataframe of the percent of internal nodes supported by a given study, as well as the DOI of that study. The proportion of taxa in the tree supported by taxonomic addition only is also included in the dataframe.
Eliot Miller, Emily Jane McTavish
#pull the taxonomy file out data(clootl_data) tax <- clootl_data$taxonomies$year2025 ls(tax) #simulate extracting a tree for a particular family temp <- tax[tax$FAMILY=="Rhinocryptidae (Tapaculos)",] spp <- temp$SCI_NAME #get your tree prunedTree <- extractTree(species=spp, label_type="scientific", taxonomy_year=2025, version="1.6") #get your citation DF yourCitations <- getCitations(tree=prunedTree)#pull the taxonomy file out data(clootl_data) tax <- clootl_data$taxonomies$year2025 ls(tax) #simulate extracting a tree for a particular family temp <- tax[tax$FAMILY=="Rhinocryptidae (Tapaculos)",] spp <- temp$SCI_NAME #get your tree prunedTree <- extractTree(species=spp, label_type="scientific", taxonomy_year=2025, version="1.6") #get your citation DF yourCitations <- getCitations(tree=prunedTree)
Extract a cloud of trees from the complete Avian Phylogeny for a set of species
sampleTrees( species = "all_species", label_type = "scientific", taxonomy_year = "2025", version = "1.6", count = 100, data_path = FALSE )sampleTrees( species = "all_species", label_type = "scientific", taxonomy_year = "2025", version = "1.6", count = 100, data_path = FALSE )
species |
A character vector either of scientific names (directly as they come out of the eBird taxonomy, i.e. without underscores) or of six-letter eBird species codes. Any elements of the species vector that do not match a species-level taxon in the specified eBird taxonomy will result in an error. Default is set to "all_species". |
label_type |
Either "scientific" or "code". Default is set to "scientific". |
taxonomy_year |
The eBird taxonomy year the tree should be output in. Current options include 2021, 2022, and 2023. Both numeric and character inputs are acceptable here. Any value aside from these years will result in an error. Default is set 2023. |
version |
The desired version of the tree. Default to the most recent version of the tree. Other versions available are listed in data object as clootl_data$versions Verson can be passed as a character string or as numeric. |
count |
Work in progress, can only sample 100 for now. Eventually: The desired number of sampled trees. |
data_path |
Default to FALSE. Data_path is not necessary if the path has been stored in the environment, as occurs when the data was downloaded using get_avesdata_repo. Otherwise, pass in the path to the directory where the AvesData repo was downloaded. |
This function first ensures that the requested output species overlap with species-level taxa in the requested eBird taxonomy. If they do not, the function will error out. The onus is on the user to ensure the requested taxa are valid. This is critical to ensure no unexpected analysis hiccups later–you don't want to find out many steps later that your dataset doesn't match your phylogeny. The eBird database is currently (as of Mar 2025) in 2024 taxonomy. Trees available in 2024 taxonomy will be available by June 2025. The 2025 taxonomy will be released to the public in October or November 2025. The intention is to release a tree in 2025 taxonomy concurrently with the publication of the taxonomy itself.
A set of phylogenies determined in count of the specified taxa in the specified eBird taxonomy version and clootl
tree version.
Eliot Miller, Luna Sanchez Reyes, Emily Jane McTavish
if (Sys.getenv("AVESDATA_PATH") != "") { ex2 <- sampleTrees(species=c("Turdus migratorius", "Setophaga dominica", "Setophaga ruticilla", "Sitta canadensis")) }if (Sys.getenv("AVESDATA_PATH") != "") { ex2 <- sampleTrees(species=c("Turdus migratorius", "Setophaga dominica", "Setophaga ruticilla", "Sitta canadensis")) }
Set path to Aves Data folder already somewhere on your computer
set_avesdata_repo_path(path, overwrite = FALSE, warn = TRUE)set_avesdata_repo_path(path, overwrite = FALSE, warn = TRUE)
path |
A character vector with the path to the Aves Data folder |
overwrite |
Boolean, default to |
warn |
Boolean, default to |
Based on https://github.com/CornellLabofOrnithology/auk/blob/main/R/auk-set-ebd-path.r
Use this function to manually set or update location of a downloaded AvesData folder from https://github.com/McTavishLab/AvesData.
When AVESDATA_PATH is set, the data_path in any clootl functions with a data_path argument will default to this value.
No return value, called to set the path to the Aves Data folder.
## Not run: set_avesdata_repo_path("/home/ejmctavish/AvesData") ## End(Not run)## Not run: set_avesdata_repo_path("/home/ejmctavish/AvesData") ## End(Not run)
taxonomyGet either reads a taxonomy file and loads it
as a data frame, or loads the default taxonomy data object.
taxonomyGet(taxonomy_year, data_path = FALSE, from_file = FALSE)taxonomyGet(taxonomy_year, data_path = FALSE, from_file = FALSE)
taxonomy_year |
The eBird taxonomy year the tree should be output in. Current options are 2021-2024. Both numeric and character inputs are acceptable here. Any value aside from these years will result in an error. Default is most recent year. |
data_path |
Default to FALSE. If a summary, dated tree is desired, this is sufficient
and does not need to be modified. However, if a user wishes to extract a set of complete
dated trees, for example to iterate an analysis across a cloud of trees, or to use an
older version of the tree than the current one packed in the data object, this function
can also accept a path to the downloaded set of trees. If you have already downloaded the AvesData repo
available at https://github.com/McTavishLab/AvesData use data_path= the path to the download location.
Alternately, you can download the full data repo using |
from_file |
Default to FALSE. If TRUE forces taxonomyGet to use a local copy of the the taxonomy. This is useful for testing changes and/or updating the clootl_data object. |
This will return a data object that has the taxonomy of the requested year.
A data.frame with 17 columns of taxonomic information: order, species code, taxon concept, common name, scientific name, family, OpenTree Taxonomy data, etc.