Bray-Curtis Pairwise Analysis in R - r

I am trying to calculate and visualize the Bray-Curtis dissimilarity between communities at paired/pooled sites using the Vegan package in R.
Below is a simplified example dataframe:
Site = c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J")
PoolNumber = c(1, 3, 4, 2, 4, 1, 2, 3, 4, 4)
Sp1 = c(3, 10, 7, 0, 12, 9, 4, 0, 4, 3)
Sp2 = c(2, 1, 17, 1, 2, 9, 3, 1, 6, 7)
Sp3 = c(5, 12, 6, 10, 2, 4, 0, 1, 3, 3)
Sp4 = c(9, 6, 4, 8, 13, 5, 2, 20, 13, 3)
df = data.frame(Site, PoolNumber, Sp1, Sp2, Sp3, Sp4)
"Site" is a variable indicating the location where each sample was taken
The "Sp" columns indicate abundance values of species at each site.
I want to compare pairs of sites that have the same "PoolNumber" and get a dissimilarity value for each comparison.
Most examples suggest I should create a matrix with only the "Sp" columns and use this code:
matrix <- df[,3:6]
braycurtis = vegdist(matrix, "bray")
hist(braycurtis)
However, I'm not sure how to tell R which rows to compare if I eliminate the columns with "PoolNumber" and "Site". Would this involve organizing by "PoolNumber", using this as a row name and then writing a loop to compare every 2 rows?
I am also finding the output difficult to interpret. Lower Bray-Curtis values indicate more similar communities (closer to a value of 0), while higher values (closer to 1) indicate more dissimilar communities, but is there a way to tell directionality, which one of the pair is more diverse?
I am a beginner R user, so I apologize for any misuse of terminology/formatting. All suggestions are appreciated.
Thank you

Do you mean that you want to get a subset of dissimilarities with equal PoolNumber? The vegdist function will get you all dissimilarities, and you can pick your pairs from those. This is easiest when you first transform dissimilarities into a symmetric matrix and then pick your subset from that symmetric matrix:
braycurtis <- vegdist(df[,3:6])
as.matrix(braycurtis)[df$PoolNumber==4,df$PoolNumber==4]
as.dist(as.matrix(braycurtis)[df$PoolNumber==4,df$PoolNumber==4])
If you only want to have averages, vegan::meandist function will give you those:
meandist(braycurtis, df$PoolNumber)
Here diagonal values will be mean dissimilarities within PoolNumber and off-diagonal mean dissimilarities between different PoolNumbers. Looking at the code of vegan::meandist you can see how this is done.
Bray-Curtis dissimilarities (like all normal dissimilarities) are a symmetric measure and it has no idea on the concept of being diverse. You can assess the degree of being diverse for each site, but then you need to first tell us what do you mean with "diverse" (diversity or something else?). Then you just need to use those values in your calculations.
If you just want to look at number of items (species), the following function will give you the differences in the lower triangle (and the upper triangle values will be the same with a switch of a sign):
designdist(df[,3:6], "A-B", "binary")
Alternatively you can work with row-wise statistics and see their differences. This is an example with Shannon-Weaver diversity index:
H <- diversity(df[,3:6])
outer(H, H, "-")
To get the subsets, work similarly as with the Bray-Curtis index.

Related

Adjusted Chi-squared test

I am analysing questionnaire data with R and testing whether different metadata explains differences in the answers. I use chi-squared test for that. I show here two examples, where the question is which pet person has and I am analysing whether people from different countries and different professions answer differently to the question:
tab <- matrix(c(7, 5, 14, 19, 3, 2, 17, 6, 12), ncol=3, byrow=TRUE)
colnames(tab) <- c('dog','cat','rabbit')
rownames(tab) <- c('Italy','Greece','Hungary')
tab <- as.table(tab)
tab
chisq.test(tab)
tab2 <- matrix(c(9, 8, 12, 18, 1, 5, 16, 5, 11), ncol=3, byrow=TRUE)
colnames(tab2) <- c('dog','cat','rabbit')
rownames(tab2) <- c('Nurse','Technician','Teacher')
tab2 <- as.table(tab2)
tab2
chisq.test(tab2)
However, I know that that the "country" and "profession" are not independent, and there is indeed a statistically significant correlation. My question is, how could I do some kind of adjusted Chi-squared test, to test correlation of country and profession with the answers independently of each other? Or how would you handle the data?

Calculating the value to know the trend in a set of numeric values

I have a requirement where I have set of numeric values for example: 2, 4, 2, 5, 0
As we can see in above set of numbers the trend is mixed but since the latest number is 0, I would consider the value is getting DOWN. Is there any way to measure the trend (either it is getting up or down).
Is there any R package available for that?
Thanks
Suppose your vector is c(2, 4, 2, 5, 0) and you want to know last value (increasing, constant or decreasing), then you could use diff function with a lag of 1. Below is an example.
MyVec <- c(2, 4, 2, 5, 0)
Lagged_vec <- diff(MyVec, lag=1)
if(MyVec[length(MyVec)]<0){
print("Decreasing")}
else if(MyVec[length(MyVec)]==0){
print("Constant")}
else {print("Increasing")}
Please let me know if this is what you wanted.

How to calculate the distance between an array and a matrix

Consider a matrix A and an array b. I would like to calculate the distance between b and each row of A. For instance consider the following data:
A <- matrix(c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15), 3, 5, byrow=TRUE)
b <- c(1, 2, 3, 4, 5)
I would expect as output some array of the form:
distance_array = c(0, 11.18, 22.36)
where the value 11.18 comes from the euclidean distance between a[2,] and b:
sqrt(sum((a[2,]-b)^2))
This seems pretty basic but so far all R functions I have found allow to compute distance matrices between all the pairs of rows of a matrix, but not this array-matrix calculation.
I would recommend putting the rows a A in list instead of a matrix as it might allow for faster processing time. But here's how I would do it with respect to your example
A <- matrix(c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15), 3, 5, byrow=TRUE)
b <- c(1, 2, 3, 4, 5)
apply(A,1,function(x)sqrt(sum((x-b)^2)))

Multivariate Granger's causality

I'm having issues doing a multivariate Granger's causal test. I'll like to check if conditioning a third variable affects the results of a causal test.
Here's one sample for a single dependent and independent variable based on an earlier question I asked and was answered by #Alex
Granger's causality test by column
library(lmtest)
M1<- matrix( c(2,3, 1, 4, 3, 3, 1,1, 5, 7), nrow=5, ncol=2)
M2<- matrix( c(7,3, 6, 9, 1, 2, 1,2, 8, 1), nrow=5, ncol=2)
M3<- matrix( c(1, 3, 1,5, 7,3, 1, 3, 3, 4), nrow=5, ncol=2)
For example, the equation for a conditioned linear regression will be
formula = y ~ w + x * z
How do I carry out this test as a function of a third or fourth variable please?
1. The solution for stationary variables are well-established: See FIAR (v 0.3) package.
This is the paper related with the package that includes concrete example of multivariate Granger causality (in the case of all of the variables are stationary).
Page 12: Theory, Page 15: Practice.
2. In case of mixed (stationary, nonstationary) variables, make all the variables stationary first (via differencing etc.). Do not handle stationary ones (they are already stationary). Now again, you finish by the above procedure (in case I).
3. In case of "non-cointegrated nonstationary" variables, then there is no need for VECM. Run VAR with the stationary variables (by making them stationary first, of course). Apply FIAR::condGranger etc.
4. In case of "cointegrated nonstationary" variables, the answer is really really very long:
Johansen Procedure (detect rank via urca::cajo)
Apply vec2var to convert VECM to VAR (since FIAR is based on VAR).
John Hunter's latest book nicely summarizes what can happen and what can be done in this last case.
You may wanna read this as well.
To my knowledge: Conditional/partial Granger causality supersides the GC via "Block exogeneity Wald test over VAR".

modeling a beta-binomial regression

Assume this easy example:
treatment <- factor(rep(c(1, 2), c(43, 41)), levels = c(1, 2),labels = c("placebo", "treated"))
improved <- factor(rep(c(1, 2, 3, 1, 2, 3), c(29, 7, 7, 13, 7, 21)),levels = c(1, 2, 3),labels = >c("none", "some", "marked"))
numberofdrugs<-rpois(84, 50)+1
healthvalue<-rpois(84,5)
y<-data.frame(healthvalue,numberofdrugs, treatment, improved)
test<-lm(healthvalue~numberofdrugs+treatment+improved, y)
What am I supossed to do when I'd like to estimate a beta-binomial regression with R? Is anybody familiar with it? Any thought is appreciated!
I don't see how this example relates to beta-binomial regression (i.e., you have generated count data, rather than (number out of total possible)). To simulate beta-binomial data, see rbetabinom in either the emdbook or the rmutil packages ...
library(sos); findFn("beta-binomial") finds a number of useful starting points, including
aod (analysis of overdispersed data), betabin function
betabinomial family in VGAM
hglm package
emdbook package (for dbetabinom) plus mle2 package

Resources