Branch length modification of a phylogenetic tree in R - r

I have a phylo file in NEWICK format with a few distance of a very little length (aprox. of 1.042e-06) and I need to "eliminate" these little distances.
I have thought to multiply all distances by 10, because for what I further need the tree this multiplication do not produce any effect.
For doing that I have found the ape package in R and the function compute.brlen, as with this function you can change the length of the branches by a function.
Any idea on how to multiply the length of the branches by 10 with this function?
I have tried to do compute.brlen(tree, main=expression(rho==10)), but I think this is incorrect for what I want.

Try to do this:
require(ape) # get the ape package
mytree = rtree(10) # make a random tree, you should have instead read.tree(path_to_tree_file)
mytree$edge.length = mytree$edge.length * 10 #or any other scalar that you want
Keep in mind that this will scale all the branch lengths in the phylogeny.

Related

How do I implement a non-default dissimilarity metric with vegan function metaMDS()?

I have constructed a distance matrix from phylogenetic data using the Claddis function MorphDistMatrix() with the distance metric "MORD" (Maximum Observable Rescaled Distance). I now want to use this dissimilarity matrix to run an NMDS using the vegan function metaMDS(). However, although metaMDS has many distance metrics to choose from, "MORD" is not one of them. How do I enable metaMDS() to have this metric as an option?
Edit: here is some example code:
nexus.data<-ReadMorphNexus("example.nex")
Reading in Nexus file
dist<- MorphDistMatrix(nexus.data, distance = "MORD")
Claddis command for creating distance matrix. Instead of using the Gower dissimilarity (distance = "GC"), I would like to use Maximum Observable Rescaled Distance (distance = "MORD"), which is a modified form of Gower for use with ordered characters (Lloyd 2016). So far so good.
nmds<-metaMDS(dist$DistanceMatrix, k=2, trymax=1000, distance = "GC")
Here is where I run into trouble: as I understand it, the distance used for the metaMDS command should be the same as was used to construct the distance matrix, but MORD is not an option for "distance" in metaMDS. If I were to construct the distance matrix under Gower dissimilarity it wouldn't be a problem, as that is also available in metaMDS
Lloyd, G. T., 2016. Estimating morphological diversity and tempo with discrete character-taxon matrices: implementation, challenges, progress, and future directions. Biological Journal of the Linnean Society, 118, 131-151.
metaMDS has argument distfun to select other dissimilarity functions than vegdist. Such a function should accept argument method to select the dissimilarity measure used. Further, it should return a regular dissimilarity object that inherits from standard R dist function. I do not know about this Claddis package: does it return regular dissimilarities or something peculiar? Your example hints that it returns something that is not a regular R object, but something peculiar. Alternatively, you can use pre-calculated dissimilarities as input in metaMDS. Again these should be regular dissimilarities like in any decent R implementation. So you need to check the following with your dissimilarities:
inherits(dist, "dist") # your dist result: should be TRUE
inherits(dist$DistanceMatrix, "dist") # alternatively this should be TRUE
## if the latter was TRUE, you can extract that with
d <- dist$DistanceMatrix
## if d is not a "dist" object, you can see if it can be turned into one
d <- as.dist(dist$DistanceMatrix)
inherits(d, "dist") # TRUE: OK, FALSE: no hope
## if it was OK, you just do
metaMDS(d)

Different results between fpc::dbscan and dbscan::dbscan

I want to implement DBSCAN in R on some GPS coordinates. I have a distance matrix (dist_matrix) that I fed into the following functions:
dbscan::dbscan(dis_matrix, eps=50, minPts = 5,borderPoints=TRUE)
fpc::dbscan(dis_matrix,eps = 50,MinPts = 5,method = "dist")
and Im getting very different results from both functions in terms of number of clusters and if a point is a noise point or belongs to a cluster. Basically, the results are inconsistent between two algorithms. I have no clue why they generate these very different results although here
http://www.sthda.com/english/wiki/wiki.php?id_contents=7940
we see for iris data, both functions did the same.
my distance matrix [is coming from a function (geosphere::distm) which calculates the spatial distance between more than 2000 coordinates.
Furthermore, I coded dbscan according to this psuedo-code
source: https://cse.buffalo.edu/~jing/cse601/fa13/materials/clustering_density.pdf
My results are equal to what I obtained from fpc package.
Can anyone notice why they are different. I already looked into both functions and haven't found anything.
The documentation of geosphere::distm says that it does not return a dist object but a matrix. dbscan::dbscan assumes that you have a data matrix and not distances. Convert your matrix into a dist object with as.dist first. THis should resolve the problem.

issues using Spatial autocorrelation in R at specific lags (in m)

Since a few days I am struggling with a new challenging spatial analysis which include spatial autocorrelation in R: Specifically, I am interested in verifying the autocorrelation between points set in a grid of 50 m (more or less). My aim is to test the autocorrelation between these points (the locations where I collected the data) and to verify if the autocorrelation decreases increasing the distance among them (this is expected). My idea is to generate different radius of specific meters around each point (50 m, 100 m, 150 m and so on...) and to test the Moran's I Autocorrelation Index. Finally I would like to use ggplot to display the MI at each specific distance results (but this is easy to get once I have the MI outputs...).
My starting dataframe contains 4 coloumns: the ID of the point where data where collected, the values measured at that specific points (z) a coloumn with longitude (x) and a coloumn with latitude(y),data are displayed as follows:
#install libraries
library(sp)
library(spdep)
library(splm)
library(ape)
ID<- c(1,2,3,4,5,6)
x<-c(20.99984,20.99889, 20.99806,20.99800,20.99700,20.99732)
y<-c(52.21511,52.21489,52.21464,52.21410,52.21327,52.21278)
z<-c(1.16,0.54,0.89,0.60,1.27,1.45)
data <- data.frame(ID,x,y,z)
I read many things online and found this tutorial
https://mgimond.github.io/Spatial/spatial-autocorrelation-in-r.html#morans-i-as-a-function-of-a-distance-band
which actually shows what I'm interested in: however, it doesn't really work from the real beginning and, starting from my coordinates, I think there is a problem and I don't know how to tranform them in a proper format for R. this is the error message I get:
data <- data.frame(dataPOL$Long , dataPOL$Lat, dataPOL$Human_presence)
coordinates(data) <- c('x','y')`
proj4string(data) <- "+init=epsg:4326"
S.dist <- dnearneigh(coordinates, 0, 50) #radius of 50 meters
Error in dnearneigh(coordinates, 0, 50) : Data non-numeric
I did not receive any answer, but I ended up finding a solution:
I have found that the most used packages to work with spatial autocorrelation in R (in my case, Moran I) are spdep and ape.
I tried both: spdep didn't work yet but ape did. Here is the tutorial I followed for my specific case:
https://stats.idre.ucla.edu/r/faq/how-can-i-calculate-morans-i-in-r/
before calculate the Moran index, you should generate a distance matrix, I did it with the ‘rdist.earth’ from the package 'fields'.
This function measures the distance between each set of data points based on their coordinates. This function recognizes that the world is not flat, and as such calculates what are known as great-circle distances. I specified the distance in Km for my specific case.
to calculate Moran I, I ran this:
library(ape)
pop.dists.1 <- (popdists > 0 & popdists <= .06) # radius of 60m (remember
that field package works in km or miles)
Moran.I(mydataframe$myzvariable, pop.dists.1)
This is the output I got at this specific radius:
pop.dists.1 <- (popdists > 0 & popdists <= .06) #60m
Moran.I(dataPOL$Human_presence, pop.dists.1)
$observed
[1] 0.3841241 #Moran index: between -1 and 1, in here points within 60 m are
autocorrelated
$expected
[1] -0.009615385
$sd
[1] 0.08767598
$p.value
[1] 7.094019e-06
I repeated the formulas for the distances I am interested in: it works really well and increasing the distance, the Moran I index approximate 0 (which is what I expected).
I am going to plot the single outputs by using ggplot as always, in order to follow the trend of spatial autocorrelation for my z variable.
Hope this will help if needed!

convert a list -class numeric- into a distance structure in R

I have a list that looks like this, it is a measure of dispersion for each sample.
1 2 3 4 5
0.11829384 0.24987017 0.08082147 0.13355495 0.12933790
To further analyze this I need it to be a distance structure, the -vegan- package need it as a 'dist' object.
I found some solutions that applies to matrices > dist, but how could I change this current data into a dist object?
I am using the FD package, at the manual I found,
Still, one potential advantage of FDis over Rao’s Q is that in the unweighted case
(i.e. with presence-absence data), it opens possibilities for formal statistical tests for differences in
FD between two or more communities through a distance-based test for homogeneity of multivariate
dispersions (Anderson 2006); see betadisper for more details
I wanted to use vegan betadisper function to test if there are differences among different regions (I provided this using element "region" with column "region" too)
functional <- FD(trait, comun)
mod <- betadisper(functional$FDis, region$region)
using gowdis or fdisp from FD didn't work too.
distancias <- gowdis(rasgo)
mod <- betadisper(distancias, region$region)
dispersion <- fdisp(distancias, presence)
mod <- betadisper(dispersion, region$region)
I tried this but I need a list object. I thought I could pass those results to betadisper.
You cannot do this: FD::fdisp() does not return dissimilarities. It returns a list of three elements: the dispersions FDis for each sampling unit (SU), and the results of the eigen decomposition of input dissimilarities (eig for eigenvalues, vectors for orthonormal eigenvectors). The FDis values are summarized for each original SU, but there is no information on the differences among SUs. The eigen decomposition can be used to reconstruct the original input dissimilarities (your distancias from FD::gowdis()), but you can directly use the input dissimilarities. Function FD::gowdis() returns a regular "dist" structure that you can directly use in vegan::betadisper() if that gives you a meaningful analysis. For this, your grouping variable must be based on the same units as your distancias. In typical application of fdisp, the units are species (taxa), but it seems you want to get analysis for communities/sites/whatever. This will not be possible with these tools.

Usning cutree with phylo object (unrooted tree) in R

I would like to use the cutree() function to cluster a phylogenetic tree into a specified number of clades. However, the phylo object (an unrooted phylogenetic tree) is not unltrametric and thus returns an error when using as.hclust.phylo(). The goal is to sub-sample tips of a tree while retaining maximum diversity, hence the desire to cluster by a specified number of clades (and then randomly sample one from each clade). This will be done for a number of trees with varying numbers of desired samples. Any help in coercing the unrooted tree into an hclust object, or a suggestion as to a different method of systematically collapsing the trees (phylo objects) into a predefined number of clades would be greatly appreciated.
library("ape")
library("ade4")
tree <- rtree(n = 32)
tree.hclust <- as.hclust.phylo(tree)
Returns:
"Error in as.hclust.phylo(tree) : the tree is not ultrametric"
If I make a distance matrix of the brach lengths between all nodes, I am able to use hclust to generate clusters and subsequently cutree into the desired number of clusters:
dm <- cophenetic.phylo(tree)
single <- hclust(as.dist(dm), method="single")
cutSingle <- as.data.frame(cutree(single, k=10))
color <- cutSingle[match(tree$tip.label, rownames(cutSingle)), 'cutree(single, k = 10)']
plot.phylo(tree, tip.color=color)
However, the results are not desirable because very basal branches get clustered together. Basing the clustering on the tree structure, or the tip to root distance would be more desirable.
Any suggestions are appreciated!
I don't know if it's what you want, but firstly you have to use chronos(),
Here's an answer that could help you out:
How to convert a tree to a dendrogram in R?

Resources