Get scientific, peer-reviewed information on time of lineage divergence openly available for a given set of taxon names
Source:R/datelife_search.R
datelife_search.Rd
datelife_search
is the core DateLife function to find and
get all openly available, peer-reviewed scientific information on time of
lineage divergence for a set of input
taxon names given as a character
vector, a newick character string, a phylo
or multiPhylo
object or as a
an already processed datelifeQuery
object obtained with make_datelife_query()
.
Usage
datelife_search(
input = c("Rhea americana", "Pterocnemia pennata", "Struthio camelus"),
use_tnrs = FALSE,
get_spp_from_taxon = FALSE,
partial = TRUE,
cache = "opentree_chronograms",
summary_format = "phylo_all",
na_rm = FALSE,
summary_print = c("citations", "taxa"),
taxon_summary = c("none", "summary", "matrix"),
criterion = "taxa"
)
Arguments
- input
One of the following:
- A character vector
With taxon names as a single comma separated starting or concatenated with
c()
.- A phylogenetic tree with taxon names as tip labels
As a
phylo
ormultiPhylo
object, OR as a newick character string.- A
datelifeQuery
object An output from
make_datelife_query()
.
- use_tnrs
Whether to use Open Tree of Life's Taxonomic Name Resolution Service (TNRS) to process input taxon names. Default to
TRUE
, it corrects misspellings and taxonomic name variations withtnrs_match()
, a wrapper ofrotl::tnrs_match_names()
.- get_spp_from_taxon
Whether to search ages for all species belonging to a given taxon or not. Default to
FALSE
. IfTRUE
, it must have same length as input. If input is a newick string with some clades it will be converted to aphylo
object, and the order ofget_spp_from_taxon
will matchphy$tip.label
.- partial
Whether to return or exclude partially matching source chronograms, i.e, those that match some and not all of taxa given in
datelife_query
. Options areTRUE
orFALSE
. Defaults toTRUE
: return all matching source chronograms.- cache
A character vector of length one, with the name of the data object to cache. Default to
"opentree_chronograms"
, a data object storing Open Tree of Life's database chronograms and other associated information.- summary_format
A character vector of length one, indicating the output format for results of the DateLife search. Available output formats are:
- "citations"
A character vector of references where chronograms with some or all of the target taxa are published (source chronograms).
- "mrca"
A named numeric vector of most recent common ancestor (mrca) ages of target taxa defined in input, obtained from the source chronograms. Names of mrca vector are equal to citations.
- "newick_all"
A named character vector of newick strings corresponding to target chronograms derived from source chronograms. Names of newick_all vector are equal to citations.
- "newick_sdm"
Only if multiple source chronograms are available. A character vector with a single newick string corresponding to a target chronogram obtained with SDM supertree method (Criscuolo et al. 2006).
- "newick_median"
Only if multiple source chronograms are available. A character vector with a single newick string corresponding to a target chronogram from the median of all source chronograms.
- "phylo_sdm"
Only if multiple source chronograms are available. A phylo object with a single target chronogram obtained with SDM supertree method (Criscuolo et al. 2006).
- "phylo_median"
Only if multiple source chronograms are available. A phylo object with a single target chronogram obtained from source chronograms with median method.
- "phylo_all"
A named list of phylo objects corresponding to each target chronogram obtained from available source chronograms. Names of phylo_all list correspond to citations.
- "phylo_biggest"
The chronogram with the most taxa. In the case of a tie, the chronogram with clade age closest to the median age of the equally large trees is returned.
- "html"
A character vector with an html string that can be saved and then opened in any web browser. It contains a 4 column table with data on target taxa: mrca, number of taxa, citations of source chronogram and newick target chronogram.
- "data_frame"
A 4 column
data.frame
with data on target taxa: mrca, number of taxa, citations of source chronograms and newick string.
- na_rm
If
TRUE
, it drops rows containingNA
s from thedatelifeResult
patristic matrix; ifFALSE
, it returnsNA
where there are missing entries.- summary_print
A character vector specifying the type of summary information to be printed to screen. Options are:
- "citations"
Prints references of chronograms where target taxa are found.
- "taxa"
Prints a summary of the number of chronograms where each target taxon is found.
- "none"
Nothing is printed to screen.
Defaults to
c("citations", "taxa")
, which displays both.- taxon_summary
A character vector specifying if data on target taxa missing in source chronograms should be added to the output as a
"summary"
or as a presence/absence"matrix"
. Default to"none"
, no information on taxon_summary added to the output.- criterion
Defaults to
criterion = "taxa"
. Used for chronogram summarizing, i.e., obtaining a single summary chronogram from a group of input chronograms. For summarizing approaches that return a single summary tree from a group of phylogenetic trees, it is necessary that the latter form a grove, roughly, a sufficiently overlapping set of taxa between trees, see Ané et al. (2009) doi:10.1007/s00026-009-0017-x . In rare cases, a group of trees can have multiple groves. This argument indicates whether to get the grove with the most trees (criterion = "trees"
) or the most taxa (criterion = "taxa"
).
Value
The output is determined by the argument summary_format
:
- If
summary_format = "citations"
The function returns a character vector of references.
- If
summary_format = "mrca"
The function returns a named numeric vector of most recent common ancestor (mrca) ages.
- If
summary_format = "newick_[all, sdm, or median]"
The function returns output chronograms as newick strings.
- If
summary_format = "phylo_[all, sdm, median, or biggest]"
The function returns output chronograms as
phylo
ormultiPhylo
objects.- If
summary_format = "html" or "data_frame"
The function returns a 4 column table with data on mrca ages, number of taxa, references, and output chronograms as newick strings.
Examples
if (FALSE) {
# For this example, we will set a temp working directory, but you can set
# your working directory as needed:
# we will use the tempdir() function to get a temporary directory:
tempwd <- tempdir()
# Obtain median ages from a set of source chronograms in newick format:
ages <- datelife_search(c(
"Rhea americana", "Pterocnemia pennata", "Struthio camelus",
"Mus musculus"
), summary_format = "newick_median")
# Save the tree in the temp working directory in newick format:
write(ages, file = file.path(tempwd, "some.bird.ages.txt"))
# Obtain median ages from a set of source chronograms in phylo format
# Will produce same tree as above but in "phylo" format:
ages.again <- datelife_search(c(
"Rhea americana", "Pterocnemia pennata", "Struthio camelus",
"Mus musculus"
), summary_format = "phylo_median")
plot(ages.again)
library(ape)
ape::axisPhylo()
mtext("Time (million years ago)", side = 1, line = 2, at = (max(get("last_plot.phylo",
envir = .PlotPhyloEnv
)$xx) * 0.5))
# Save "phylo" object in newick format
write.tree(ages.again, file = file.path(tempwd, "some.bird.tree.again.txt"))
# Obtain MRCA ages and target chronograms from all source chronograms
# Generate an htm"l output readable in any web browser:
ages.html <- datelife_search(c(
"Rhea americana", "Pterocnemia pennata", "Struthio camelus",
"Mus musculus"
), summary_format = "html")
write(ages.html, file = file.path(tempwd, "some.bird.trees.html"))
system(paste("open", file.path(tempwd, "some.bird.trees.html")))
} # end dontrun