R - Vegan package. metaMDS error - r

I would like to know why im getting this error running metaMDS:
'comm' has negative data: 'autotransform', 'noshare' and 'wascores' set to FALSE
I would like to do NMDS and dendogram graphs but can do so with the error above.
My data set is available for download if anyone wants to check DATASET. After importing the data, I transposed the column and rows. Afterwhich, I replaced the NA values with O before trying to run metaMDS.
abundance <- read.table("1_abundance.txt", header = TRUE)
abundance[is.na(abundance)] <- 0
abundance_trans <- t(abundance)
metaMDS(abundance_trans, distance = "bray", k = 2, trymax = 50)

It is not an error message but information: metaMDS tells you that you have negative data entries, and it will not make some tricks that it defaults to do with non-negative data.
Second issue is that you ask for Bray-Curtis dissimilarities that are only applicable with non-negative data.
You have two alternatives: either take care of negative values, or use a dissimilarity measure than can handle them. If you think that you do not have negative data, you are wrong: computer knows. You may have an error when reading in your data, and you may have columns or rows that you should not have. Check your data.

Related

Species Scores not appearing from metaMDS

I'm not really sure what is happening considering it works on my computer at work, but doesn't transfer over when I'm on my laptop at home :/. For background, I am working with microbial community data using unifrac distances and trying to see how my different sample types cluster together.
Anywho, first I create a square distance matrix using my unifrac data file and my meta data.
generate.square.dist.matrix(unifrac,metadata,1)
unifrac.matrix.all = dist.matrix1
unifrac.dist.all = square.dist.matrix
Then run my NMDS
nmds.all = metaMDS(unifrac.dist.all, k = 4, trymax = 500, distance = "bray")
but when I attempt to extract my scores...
nmds.all.scores = as.data.frame(scores(nmds.all))
I receive the following error: "Error in x$species[, choices, drop = FALSE] :
incorrect number of dimensions".
There is no reproducible example, but I can still reproduce this. There is a bug in scores function, and it fails when the metaMDS result object has no species scores. It will not have species scores if you supply dissimilarity matrix as input, because dissimilarities have no information of species. I'll fix this in vegan. Meanwhile you can circumvent the problem by asking only site scores so that the function will not try to extract the non-existent species scores:
scores(nmds.all, display="site")

How to deal with zero in dataset when trying to create a Dissimilarity matrix?

I have a simple question, which took me about hundred hours of googling today, and it is still unresolved. I hope someone here can miraculously answer to this.
I am trying to make a bray curtis dissimilarity matrix, and nMDS and then run a permanova for my species community data. The problem is that when I assign the community to each plot, not all the species are presents, for obvious reasons right? Now, the function metaMDS from the vegan package does not let me create anything. How do I deal with the zero in a matrix? Anybody have any scripts or ideas or any magical things to fix my day??
this is my code so far:
Crabdata3<- read.csv("Crab_Edited_Plots_02_12.csv")
str(Crabdata3)
Crabbie1= Crabdata3[,-(1:2),drop=FALSE]
Crabbie2 <-as.matrix(Crabbie1)
is.matrix(Crabbie2)
Disscrabmatrix1 <-data.matrix(Crabbie2)
Disscrabmatrix2=matrix(Crabbie2,nrow=42,ncol=18, byrow =TRUE,
dimnames= list(paste("community", 1:42,sep="")
,paste(colnames(Crabbie2,1:18))))
example_NMDS=metaMDS(Disscrabmatrix2, distance ="bray", k=2)## number od reduced dimensions
This is the error I am getting
example_NMDS=metaMDS(Disscrabmatrix2, distance ="bray", k=2)## number od reduced dimensions
Square root transformation
Wisconsin double standardization
Error in cmdscale(dist, k = k) : NA values not allowed in 'd'
In addition: Warning messages:
1: In distfun(comm, method = distance, ...) :
you have empty rows: their dissimilarities may be meaningless in method “bray”
2: In distfun(comm, method = distance, ...) : missing values in results
I'd recommend looking at this link https://www.tutorialspoint.com/how-to-remove-rows-that-contains-all-zeros-in-an-r-data-frame
The author suggests using df1[rowSums(df1[])>0,] to remove all 0 rows. I'm trying to do a similar nMDS/PERMANOVA analysis and found this did the trick.
Cheers.

Why are polychoric correlation coefficients in matrices calculated by different R packages slightly different for the same data?

I calculated polychoric correlation matrices for the same data frame (20 ordinal variables, 190 missing values) in R, using three different packages and the coefficients for same variables are slightly different from each other.
I used the lavCor function from "lavaan" (I did list the ordinal variables when calling the function), polychoric function from "psych" (1.9.1) (took the rhos), and cor_auto function from "qgraph" (which is supposed to automatically calculate polychoric correlations for ordinal data). I am confused because I thought they were supposed to give exactly the same results. I read package documentations but could not find anything that helped me understand why. Could anyone let me know why this happens? I am sure I am missing some tiny difference between those, but I cannot figure it out.
PS: I guess this could have happened because psych package adjusts missing values (I have 190) using the correction for continuity, but I still do not understand why qgraph yields different results than lavaan as qgraph says it uses lavaan's lavCor function to calculate polychoric correlations.
Thanks!!
depanx<-data[1:20]
cor.depanx<-cor_auto(depanx)
polychor<-polychoric(depanx)
polymat<-polychor$rho
lav<-lavCor(depanx,ordered=c("unh","enj","trd","rst","noG","cry","cnc","htd","bdp","lnl","lov",
"cmp","wrg","pst","sch","dss","hlt","bad","ftr","oth"))
# as a result, matrices "cor.depanx", "polymat", and "lav" are different from each other.
Nice question! I do not know what the "data" dataset in you example is, but i recreate the two possible scenarios, which have most probably caused the discrepancy between cor_auto and lavCor results. In summary, first you must set the "ordinalLevelMax" argument in cor_auto based on your data and second you need to synchronize the "missing" argument in the two functions. Detailed explanation in the code snippet below:
depanx<-data.frame(lapply(1:5,function(x)sample(1:6,100,replace = T)),
stringsAsFactors = F)
colnames(depanx)=LETTERS[1:5]
lav<-lavaan::lavCor(depanx,ordered=colnames(depanx))
cor.depanx<-cor_auto(depanx)
all(lav==cor.depanx)#TRUE
#The first argument in cor auto, which you need to pay attention to is
#"ordinalLevelMax". #It is set to 7 by default in cor_auto,
#so any variable with levels more than 7 is sent to lavCor as plain numeric and not
#ordinal.
#Now we create the same dataset with 8 level variables. lavCor detects all as ordinal,
#since we have labeled them as so by "ordered" argument of lavCor, so it uses
#ploychorial
#correlations. Since "ordinalLevelMax" in cor_auto is 7 by default and you have not
#changed it,
#cor_auto detect none as ordinaland does not send them to lavCor as Ordinalvariables,
#so Lavcor computes pearson correlations between them,all.
depanx2<-data.frame(lapply(1:5,function(x)sample(1:8,100,replace =T)),
stringsAsFactors = F)
colnames(depanx2)=LETTERS[1:5]
lav2<-lavaan::lavCor(depanx2,ordered=colnames(depanx2))
cor.depanx2<-cor_auto(depanx2)
all(lav2==cor.depanx2)#FALSE
# the next argument you must synchronise in lavCor and cor_auto is the "missing",
#which is by default set to "pairwise" and "listwise" in cor_auto and lavCor,
#respectively.
#here we set row 10:20 value of the fifth variable to NA, without synchronizing the
#argument
depanx3<-data.frame(lapply(1:5,function(x)sample(1:6,100,replace =T)),
stringsAsFactors = F)
colnames(depanx3)=LETTERS[1:5]
depanx3[10:20,5]<-NA
lav3<-lavaan::lavCor(depanx3,ordered=colnames(depanx3))
cor.depanx3<-cor_auto(depanx3)
all(lav3==cor.depanx3)#FALSE

Removing nodes with non-finite edge weights in plots when using qgraph package in R

I want to see relationships in my data on a network and have used the qgraph package to do so, my data, combined.data, is used. The correlation of my data which I passed as an input has a lot of NA values. The command I used to get the network plot is
qgraph(cor(combined.data, method="spearman"),layout="spring", groups=gr, labels=nm,
label.scale=FALSE, label.cex=1)
# I chose spearman because the data variables are on ordinal scale
gr is list of the groups, nm is a vector containing the tags/labels of the nodes.
The command runs well but comes with a warning
Warning message:
In qgraph(cor(combined.data, method = "spearman"), layout = "spring", :
Non-finite weights are omitted
The network has a lot of empty edges (non-finite weights) and I want to remove the nodes with the non-finite weights. I have tried to set the minimum and maximum arguments but it still comes up with those redundant nodes. Any suggestion on how to achieve this will be appreciated.
Probably you have missing data leading to NA in the correlation matrix? I always use cor(combined.data, method="spearman", use = "pairwise.complete.obs") which gives no NA correlations.
Alternatively, easiest is to change the input:
foo <- cor(combined.data, method="spearman")
foo[!is.finite(foo)] <- 0
qgraph(foo)

modelTest and negative edges lengths with phangorn in R

I'm analysing protein data sets. I'm trying to build a tree with the package phangorn in R.
When I construct it, I get negative edge lengths that sometimes makes difficult to proceed with the analysis (modelTest).
Depending on the size of the dataset (more than 250 proteins), I can't perform a modelTest. Apparently there is a problem due to negative edge lengths. However, for shorter datasets I can perform a modelTest even though there are some negative edge lengths.
I am runing it directly from my terminal.
library(phangorn)
dat = read.phyDat(file, format="fasta", type="AA")
tax <- read.table("organism_names.txt", sep="\t", row.names=1)
names(dat) <- tax[,1]
distance <- dist.ml(dat, model="WAG")
tree <- bionj(distance)
mt <- modelTest(dat, tree, model=c("WAG", "LG", "cpREV", "mtArt", "MtZoa", "mtREV24"),multicore=TRUE)
Error: NA/NaN/Inf in foreign function call (arg 1)
In addition: Warning message:
In pml(tree, data) : negative edges length changed to 0!
Does somebody have any idea of what can I do?
cheers, Alba
As #Marc said, your example isn't really reproducible...
If the problem really is negative or zero branch lengths, you could try to make them a really small positive number, for instance:
tree$edge.length[which(tree$edge.length <= 0)] <- 0.0000001
Another tip is to subscribe to R-sig-phylo, a mail list about phylogenies in R. People there are really knowledgeable an usually respond pretty fast.

Resources