R vegan: adjusted p values for permanova (adonis2) - r

I am running an analysis of variances on a large distance matrix using adonis2 as described here: https://www.rdocumentation.org/packages/vegan/versions/2.4-2/topics/adonis
That method is frequently used in microbiome analysis to calculate beta diversity. That's also what I would like to do, i.e. to find out whether my community composition differs in response to an environmental variable (continuous)
Permanova returns one p value and there is no "official" post hoc test yet. That's where my question comes in:
I've come across publications saying they adjusted their permanova result using FDR/BH method. I cannot wrap my head around this. I'm confident I understand how FDR correction is calculated, I just don't see how that would be done for PERMANOVA, or, even more, how I would code it.
Can anyone help me out here?

Would be clearer if you provide an example of so-called publication. You are right that for each variable, permanova returns 1 p-value. However, if the model includes many variables, you would have 1 p-value for each variable and you need to correct for FDR.
For example in this publication looking at variation in gut microbiome, they wrote:
To calculate the variation explained by each of our collected host
factors, we performed an Adonis test implemented in QIIME. Each host
factor was calculated according to its explanation rate, and P values
were generated based on 1,000 permutations. All P values were then
adjusted using the Benjamini–Hochberg method.
You can also see an example of this in Table S2, I attached a screenshot here:

Related

Testing that an individual species in a matrix has a significant impact on the entire assemblage

I feel like this is kind of an elementary question but w/all my reading in and around the subject I have yet to arrive at a conclusive answer.
I'm testing an ecological (site*species) data set as part of an investigation into an invasive species (Rosa rugosa). I have grouped my data according to the quadrant they were part of, I don't how relevant this is, but the species is only seen at certain quadrants along the transect.
Is there a test that I can run to test the null hypothesis that this specific species has no impact on the entire assemblage. I believe that I am looking for some kind of multivariate T-test, for which I'd be looking for Hotelling's T^2 test, although many of the functions in the vegan package seem that they test along similar lines. Any of envfit, betadisper, adonis, or even anosim seem to allow me to group a sort of
>nta.bray ## bray-curtis matrix w/wisconsin transformation
>test(nta.bray, Rosa_rugosa ~ Quadrant, data=nta)
Does anyone know of a test that will allow me to test along these lines, or even a paper that performed a similar hypothesis test?

Using permanova in r to analyse the effect of 3 independent variables on reef systems

I am trying to understand how to run PERMANOVA using Adonis2 in R to analyse some data that I have collected. I have been looking online, but as it often happens, explanations are a bit convoluted, so I am asking for your help, if you can help me. I have got some fish and coral groups as columns, as well as 3 independent variables (reef age, depth, and material). Snapshot of my dataset structure I think I have understood that p-values are not the only important bit of the output, and that the R2 values indicate how much each variable contributes to the model. Is there something wrong or that I am missing here? Also, I think I understood that I should check for homogeneity of variance, but I have not understood, again, if I should check for it on each variable independently, or if I should include them all in the same bit of code (which does not seem to work). Here are the bit of code that I am using to run the PERMANOVA (1), and the one that I am trying to use to assess homogeneity of variance - which does not work (2).
(1) adonis2(species ~ Age+Material+Depth,data=data.var,by="margin)
'Species' is the subset of the dataset including all the species'count, while 'data.var'is the subset including the 3 independent variables. Also what is the difference in using '+' or '' in the code? When I use '' it gives me - 'Error in qr.X(object$CCA$QR) :
need larger value of 'ncol' as pivoting occurred'. What does this mean?
(2) variance.check<-betadisper(species.distance,data.var, type=c("centroid"), bias.adjust= FALSE)
'species.distance' is a matrix calculated through 'vegdist' using Bray-Curtis method. I used 'data.var'to check variance on all the 3 independent variables, but it does not work, while it works if I check them independently (3). Why is that?
(3) variance.check<-betadisper(species.distance, data$Depth, type=c("centroid"), bias.adjust= FALSE)
Thank you in advance for your responses, and for your help. It will really help me to get my head around it (and sorry for the many questions).

meaning of ICC in rergression

I'm stuck on this question and can not find a logical explanation.
I'm given the following regression output -
The question is this - a one-way analysis model of variance was mistakenly adapted to explain the variable “level of violence” using the random factor “grade” The grade factor in this study is a constant factor. The partial results in the output are based on a balanced experiment.
Does it make sense in this case to calculate the ICC? Is it at all possible to calculate it manually from this output data only?
I know that the ICC describes the relationship between the observations within the groups. So I thought maybe to describe the connection within the classes, and between the different classes. But how can the ICC be reached by manual calculation from the data in the output?

How to take a Probability Proportional to Size (PPS) Unequal Probability sample using R?

I have very little programming experience, but I'm working on a statistics project and would like to generate an unequal probability sample where the inclusion probability of a unit is based on its size (PPS).
Basically, I have two datasets:
ds1 lists US states and the parameter I'm trying to estimate
ds2 has the population size of each state.
My questions:
I want to use R to select a random sample from the first dataset using inclusion probabilities based on the population of each state (second dataset).
Also is there any way to use R to calculate these Generalized Unequal Probability Estimator formulas?
Also just a note on the formulas: pi_i is inclusion probability and pi_ij is joint inclusion probability.
There is a package for the same in R - pps and the documentation is here.
Also, there is another package called survey with a bit of documentation here.
I'm not sure of the difference between the two and haven't used them myself. Hope this is what you're looking for.
Yes, that's called weighted sampling. Simply set the weight to the size of the state, strictly you don't even need to normalize them by 1/sum(sizes) although it's always good practice to. There are tons of duplicate posts on SO showing how to do weighted sampling.
The only tiny complication is that you need to do a join() of the datasets ds1, ds2. Show us what code you've tried if it's causing problems. Recommend you use either dplyr or data.table.
Your second question should be asked as a separate question, and is offtopic on SO, or at least won't get a great response - best to ask statistical questions at sister site CrossValidated

LASSO coefficients equal to 0 using opt1D

I have a question about LASSO. I'm getting crazy because it is something that I can not solve only according to my background. I'm a biologist.
Briefly I run LASSO using the R library "penalized". In particular I used the opt1D function with around 500 simulations on a data.frame (numerical) of around 30 columns that are my biomarkers (gene expression). I want to test and 3000 rows that are people of which around 50 are tumours and all the others are normals.
Unfortunately by using L1 regularization, all and really all coefficients of 500 simulations are 0. If I check L2 matrix of coefficients they are close to 0. Now my point is that I cannot think that all my biomarkers are not able to distinguish between Normals and Tumors.
I don't know if what I have done is all I can to check for the discriminatory potential of my molecules. Is there something else I can do to understand why are they all 0 and also is there something else I can do to verify that really they are not able to stratify my cohort?
Did you consider fitting your data without penalization before using regularization? L1 regularization will naturally result in a significant number of zero coefficients.
As a side note I would first run PCA/PCoA and see whether or not your genes separate according to your class variable. This could save you some time and allow you to trim your data set to those genes that show the greatest differences across your class variable. Also if you have relatively little experience with R I would suggest using a linear modeling package such as Limma since it has excellent documentation and many examples that are easy to follow.

Resources