Comparing dissimilarity measures using PCA in R - r

I would like to compare the behaviour of several dissimilarity measures (i.e. Bray-Curtis, Jaccard, Gower). I have seen this done using a principal component biplot (i.e. see Legendre and Caceres, 2013 below):
Any suggestions how one goes about this? Sample data provided below:
# Load the required packages
library(ade4)
library(vegan)
library(FD)
#Load data
data(dune)
# Calculate a series of dissimilarity measures for the data
dune.bc <- vegdist(dune, method="bray")
dune.mh <- vegdist(dune, method="manhattan")
dune.eu <- vegdist(dune, method="euclidean")
dune.cn <- vegdist(dune, method="canberra")
dune.k <- vegdist(dune, method="kulczynski")
dune.j <- vegdist(dune, method="jaccard")
dune.g <- vegdist(dune, method="gower")
dune.m <- vegdist(dune, method="morisita")
dune.h <- vegdist(dune, method="horn")
dune.mf <- vegdist(dune, method="mountford")
dune.r <- vegdist(dune, method="raup")
dune.bi <- vegdist(dune, method="binomial")
dune.c <- vegdist(dune, method="chao")
#Compare the behaviour of the dissimilarity measures using a PCA plot
# Suggestions on how proceed with this step would be greatly appreciated!

Hmm, that's not what the authors do. If you read that paper, the PCA biplot is one of the matrix of properties of each dissimilarity coefficient, not a PCA of on k dissimilarity matrices. Basically, they analysed Table 2 in the paper via PCA (minus the column at the far right, labelled *D*max).
I don't know a way to compare dissimilarity matrices, other than via a Procrustes rotation and associated PROTEST permutation test, or a Mantel test, perhaps: see procrustes(), protest() and mantel()
You can look at the rankindex() of the coefficients with the gradient values as another comparison.

It sounds like what you are trying to do is a Second Stage Analysis?
Take several dissimilarity matrices, generate pairwise rank correlations between all of them and this greats a dissimilarity matrices of your dissimilarity matrices. From there you can use NMDS to plot them all. In general you'll find that similar calculations (i.e. eucleadian family, bray-curtis family, ect.) group closely within.
Check out:
Exploring interactions by second-stage community analyses. (2006) clarke, somerfield, airoldi, and warwick
Here they do what you suggest, or want:
On resemblance measures for ecological studies, including taxonomic dissimilarities and a zer-adjusted Bray-Curtis coefficient for denuded assemblages. (2006) Clarke, Somerfield, and Chapman.

Related

R: Comparing dissimilarity between metabolic models with Discrete Wavelet Transformation

I’m working on comparing bacteria metabolic models. Each model has a set of metabolites and their concentration for 200 time points. I’m in the process of comparing the models to cluster them based on their similarity.
One method I followed is I did a pair wise comparison for each of the metabolite pairs in two models using Euclidean distance. Below is how my data look like. This is a sample data file.
I computed pair wise Euclidean distance for Met1 from Model A and Met1 from Model B. Likewise computed the distances for all the common metabolites between the 2 models (Met4 in Model A and Met4 in Model B) and summed up the distances to get a distance (dissimilarity) between the two models. Similarly I computed the dissimilarity matrix for all the models and I used hierarchical clustering to cluster them.
Now I want to compute the dissimilarity of the models using Discrete Wavelet Transformation as my distance measure. However I couldn't find a method in the package definition on how to compare two time series. I would like to know how to use Discrete Wavelet Transformation to compute a dissimilarity distance between 2 time series and hence for my models.
Take a look at the TSclust package. Here how you would apply it to your sample data.
require(TSclust)
#read in the data
model_a <- read.csv("~/Desktop/Model A.csv", header = TRUE, stringsAsFactors = FALSE)
model_b <- read.csv("~/Desktop/Model B.csv", header = TRUE, stringsAsFactors = FALSE)
#data must be in rows rather than columns
model_a <- as.data.frame(t(model_a))
model_b <- as.data.frame(t(model_b))
#calculate dissimlarities between metabolites in models 1 and 2
met1_DWT.diss <- as.numeric(diss.DWT(rbind(model_a['Met1', ], model_b['Met1', ])))
met1_DWT.diss
[1] 90.80332
met2_DWT.diss <- as.numeric(diss.DWT(rbind(model_a['Met2', ], model_b['Met2', ])))
met2_DWT.diss
[1] 1.499241

Correcting for spatial autocorrelation in community dissimilarity datasets

I have a community assembly dataset with 299 species at 15 sites. Im interested in correcting for the effect of spatial autocorrelation on beta-diversity (or species turnover). Dataset here.
There is an obvious effect of spatial distance between the sites and community dissimilarity. I have a completed a mantel correlogram to show this effect but now I’m a bit stuck on how to account for this effect or correct for beta diversity analyzes. And more importantly how does the effect of distance between sites compare to my environmental effect (precip)
#load libraries
library (vegan)
library (fossil)
#load datasets
spp.list<-read.csv('Spatial Autocorrelation.csv', header=T)
rownames(spp.list)=spp.list$Site
sites.xy= spp.list[,(2:3)]
precip= spp.list[,(4)]
spp.list<-spp.list[,(5:303)]
#check for spatial autocorrelation and environmental correlation using Mantel Test
#build a species dissimilarity matrix
dist.mat=vegdist (spp.list)
#build a geographic distance matrix for sites
geo.dist.mat=earth.dist(sites.xy)
#build a distance matrix for precip values
precip.dist <-dist (precip)
#identify the correlations between two matrices Matel Correlelogram
correlog<-mantel.correlog( dist.mat, geo.dist.mat, nperm=999 )
summary(correlog)
correlog
plot (correlog)
#partial Mantel Test including both distance and precipitation differences
library (ecodist)
natt.pmgram<-pmgram (dist.mat, geo.dist.mat, precip.dist, nperm=999)
plot (natt.pmgram)
Im not sure how to interpret these correlograms. Does anyone have any suggestions/explanations? and can these analyses be used to correct for the obvious spatial auto-correlation issues in my dataset?

Clustering Variables

What are some proven methods for finding groupings of highly correlated variables within a large, high-dimensional binary dataset (think 200,000+ rows and 150+ fields) that can be easily implemented in R? I want to find groupings of variables which lends itself to interpretation so I don't think PCA would be the best method.
library(Hmisc)
mtc <- mtcars[,2:8]
mtcn <- data.matrix(mtc)
clust <- varclus(mtcn)
clust
plot(clust)
?varclus :Does a hierarchical cluster analysis on variables, using the Hoeffding D statistic, squared Pearson or Spearman correlations, or proportion of observations for which two variables are both positive as similarity measures. Variable clustering is used for assessing collinearity, redundancy, and for separating variables into clusters that can be scored as a single variable, thus resulting in data reduction.
For Binary Vraibles:
library(cluster)
data(animals)
ma <- mona(animals)
ma
plot(ma)
?mona : Returns a list representing a divisive hierarchical clustering of a dataset with binary variables only.

Applying PCA on the user's defined covariance matrix

I am trying to apply principal component analysis on a covariance matrix estimated from the relationships among all individuals, see Mm in the following example.
I will appreciate it so much if anyone can show me how to do it
Example:
library(BLR)
library (rrBLUP)
data(wheat)
Mm is the covariance matrix that I want to compute PCA
Mm <- A.mat(X)
Try the following using your dataset. I'm not sure if this is what you are after
pca <- prcomp(Mm, scale=T)
# Check out whats in it
str(pca)
# Prints variance summary for all principal components.
summary(pca)
# Accesses subset of components.
summary(pca)$importance[, 1:20]
require(GGally)
ggpairs(pca$x[,1:4])

How to calculate the checkerboard score (C-score) under a null model by vegan package in R

I recently want to follow the checkerboard score under a null model method to calculate the pvalue for co-occurrence analysis as adopted by this paper "Using network analysis to explore co-occurrence patterns in soil microbial communities".
Unfortunately, the usage of commands and arguments of vegan package was not well described in the paper.
I believe there must be some expert of vegan package in R to do such co-occurrence analysis based on checkerboard score under a null model.
Could any one help with the scripts or the commands and arguments I should use to calculate the C-score under a null model in R?
Will this C-score thing return me a matrix of pvalue that I could use to indicate the co-occurrence?
I guess this may answer your question:
library(vegan)
library(bipartite)
null.model <- oecosimu(species, bipartite::C.score, "swap", burnin=100, thin=10, statistic="evals", nsimul=10000) #where species is you species by sites matrix
print(null.model)
this should give a quick estimate of C-score
library(vegan)
library(bipartite)
C.score(species, normalise=T, FUN=mean, na.rm=T)->cscore.speciesN
C.score(species, normalise=F, FUN=mean, na.rm=T)->cscore.speciesS
cscore.speciesN
cscore.speciesS

Resources