Different results using ANOSIM and SIMPER in R [closed] - r

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 5 years ago.
Improve this question
I have run ANOSIM and SIMPER to analyse community similarity at two treatments.
When I run ANOSIM the output is:
ANOSIM statistic R: -0.04465
Significance: 0.749
meaning they are similar to each other?
but the I run SIMPER it says the composition at two treatments are 62% different to each other?
not sure how to interpret the outputs of the two tests... why are they saying different things?

I looked up the documentation.
Are they similar to each other? Yes:
The divisor is chosen so that R will be in the interval -1 … +1, value
0 indicating completely random grouping.
Does 62% mean the groups are different from each other? No, greater than 70% is required.
The function displays most important species for each pair of groups. These species contribute at least to 70 % of the differences between groups.
There is also this note:
The results of simper can be very difficult to interpret. The method
very badly confounds the mean between group differences and within
group variation, and seems to single out variable species instead of
distinctive species (Warton et al. 2012). Even if you make groups that
are copies of each other, the method will single out species with high
contribution, but these are not contributions to non-existing
between-group differences but to within-group variation in species
abundance.

Related

Clustering of set of customers having heterogeneous variables [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have set of customers with different attributes continuous, categorical, binary and ordinal.
How can I cluster them knowing that we cannot apply the same distance metrics on the these different types of attributes?
Thank you in advance
As mentioned already daisy package is an option which does an automatic selection of best distance metric based on data type.But I would suggest the following approach and request expert to please chime in.
Rather than automatic selection identify and remove some correlated variables like(some examples)
Pearson Correlation: for continuous variable
Chi Square Test: for categorical variables
Categorical vs Numerical: One way Anova test etc.
Taking the subset of useful variables consider doing One-Hot Encoding of categorical variables and maybe convert ordinal to continuous (or categorical and one-hot encode). Test using different distance metric like Euclidean, Manhattan etc to evaluate the result. You will get a better clarity of the overall clustering process in this way.

How do I analyze movement between points in R? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
So I have a lot of points, kind of like this:
animalid1;A;time
animalid1;B;time
animalid1;C;time
animalid2;A;time
animalid2;B;time
animalid2;A;time
animalid2;B;time
animalid2;C;time
animalid3;A;time
animalid3;B;time
animalid3;C;time
animalid3;B;time
animalid3;A;time
What I want to do is to first of all make R understand that the points A,B,C are connected. Then I want to get comparisons of movement from A to C and how long time it takes, how many steps were used, etc. So maybe I have a movement sequence like ABC on 20 animals and then ABABC on 10 animals and then ABCBA on 5 animals. I want to get some sort of statistical test done to see if the total time is different between these groups, and so on.
I bet this has been done before. But my Google skills are not good enough to find it.
Look at the msm package (msm stands for Multi State Model). Given observations of states at different times it will estimate probabilities of transitions and average time in the different states.

Covariates in correlation? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
I'm not sure if this is a question for stackoverflow, or crossvalidated.
I'm looking for away to include covariate measures when calculating the correlation between two measures. For example, Lets say I have 100 samples, for which I have two measurements, x and y. Now lets say I also have a third measure, a covariate (lets say age). I want to measure the correlation between x and y, but I also want to ignore any of that correlation that comes from the covariate, age.
If I'm fitting a linear model, I could simply add the term to the model:
lm(y~x+age)
I know you can't calculate correlation with this kind of model in R (using ~).
So I want to know:
Does what I'm asking even make sense to do? I suspect it may not.
If it does, what R packages should I be using?
It sounds like you're asking for a semipartial correlation. You want the correlation between x and y partialling out the correlation between x and z. You need to read about partial and semipartial correlations.
The ppcor package in R will then help you with the calculations.

comparison of regression models built on two time points [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have two multiple linear regression models, built using the same groups of subjects, variables, the only difference is the time point: one is baseline data and the other is obtained some time after.
I want to compare if there is any statistical significance between the two models. I have seen articles saying that using AIC maybe a better option over p-value when comparing models.
My question is: does it make sense to just purely compare the AIC using extractAIC in R, or to obtain the anova(lm)?
It is not standard to test for statistical significance between observations recorded at two points in time by estimating two different models.
You may mean that you are testing to see whether the observations recorded at a second point in time are statistically different from the first, by including some dummy variables, and testing the coefficients on these. Still, this is only estimating one model.
In your model you will have dummy variables for your second point in time, either one intercept or an intercept and an interaction dummy like this.
Then you should do both - test the p-value significance for either or both gammas in the models described, and also look at the AIC. There is no definitive 'better', as the articles likely described.

Should Categorical predictors within a linear model be normally distributed? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I am running simple linear models(Y~X) in R where my predictor is a categorical variable (0-10). However, this variable is not normally distributed and none of the transformation techniques available are healpful (e.g. log, sq etc.) as the data is not negatively/positively skewed but rather all over the place. I am aware that for lm the outcome variable (Y) has to be normally distributed but is this also required for predictors? If yes, any suggestions of how to do this would be more than welcome.
Also, as the data I am looking at has two groups, patients vs controls (I am interested in group differences, as you can guess), do I have to look at whether the data is normally distributed within the two groups or overall across the two groups?
Thanks.
See #Roman Luštriks comment above: it does not matter how your predictors are distributed. (Except for problems with multicollinearity.) What is important is that the residuals be normal (and with homogeneous variances).

Resources