adonis function from vegan doesn't work - r

I've got a problem fighting one error. Here is the line I try to execute:
library(vegan)
adonis(data = dset, adiv ~ N+P+K)
It returns a failure message:
Error in rowSums(x, na.rm = TRUE) :
'x' must be an array of at least two dimensions
Everything seems to be alright with the dataset, because aov(data = dset, adiv ~ N+P+K) works just fine. I know that such errors appear when some functions drop data frame dimensions, but I don't know how to fix it in this case.
Edit. Adding a piece of my dataset.
treatment N P K M adiv
N 1 0 0 0 0.2059
P 0 1 0 0 0.20856
K 0 0 1 0 0.22935
O 0 0 0 0 0.10729
NP 1 1 0 0 0.30674
NK 1 0 1 0 0.30509
PK 0 1 1 0 0.30606
NPK+ 1 1 1 1 0.50389
NPK 1 1 1 0 0.40731
manure 0 0 0 1 0.2085
Before I try to execute adonis I convert treatment values into factors with:
dataset$N <- as.factor(dat$N)
dataset$P <- as.factor(dat$P)
dataset$K <- as.factor(dat$K)
dataset$M <- as.factor(dat$M)
Then I just try to execute the function and get the error.
As I've already mentioned, everything works just fine when I try aov() or lm().

This is guessing since there is nothing reproducible in your question. However, I can trigger similar error if I use univariate responses: adonis is intended for multivariate responses, and may not work with univariate responses. The adonis help page can be read with ?adonis, and it says that the left-hand-side of the formula should be "either a dissimilarity object (inheriting from class "dist") or data frame or a matrix." Following this helps when I try (but I really cannot reproduce your example): you could try with lhs of as.matrix(Nitrososphaearaceae) or dist(Nitrososphaeraceae).
The adonis function is really intended for multivariate responses and use univariate responses needs care. You should also carefully consider the type of dissimilarity (or distance) you use with such models. For instance, the two alternatives above will give different results because they use different dissimilarity measures. I am not at all sure that it makes much sense to use distance-based methods like adonis with univariate responses.

Related

Create a new Variable of values of another variable-multilevel regression

I am up to create a multilevel analysis (and I am a total newbie).
In this analysis I want to test if a high value of a predictor( here:senseofhumor) (numeric value - transfered into "high","low","medium") would predict the (numeric)outcome more than the other (numeric)predictors (senseofhomor-seriousness-friednlyness).
I have a dataset with many people and groups and want to compare the outcome between the groups regarding the influence of SenseofhumorHIGH
The code for that might look like this
RandomslopeEC <- lme(criteria(timepoint1) ~ senseofhumor + seriousness + friendlyness , data = DATA, random = ~ **SenseofhumorHIGH**|group)
For that reason I created values "high" "low" "medium" for my numeric predictor via
library(tidyverse)
DATA <- DATA %>%
mutate(predictorNew = case_when(senseofhumor< quantile(senseofhumor, 0.5) ~ 'low',
senseofhumor > quantile(senseofhumor, 0.75)~'high',
TRUE ~ 'med'))
Now they look like this:
Person
Group
senseofhumor
1
56
low
7
1
high
87
7
low
764
45
high
Now I realized i might need to cut this variable values in separate variables if I want to test my idea.
Do any of you know how to generate variables, that they may look like this?
Person
Group
senseofhumorHIGH
senseofhumorMED
senseofhumorLOW
1
56
0
0
1
7
1
1
0
0
87
7
0
0
1
764
45
1
0
0
51
3
1
0
0
362
9
1
0
0
87
27
0
0
1
Does this make any sense to you regarding my approach? Or do you have a better idea?
Thanks a lot in advance
Welcome to learning R. You will want to convert these types of variables to "factors," and R will be able to process them accordingly. To do that, use as.factor(variable) - so for you it may be DATA$senseofhumor <- as.factor(DATA$senseofhumor). If you need to convert multiple columns, you can use:
factor_cols <- c("Var1","Var2","Var3") # list columns you want as factors
DATA[factor_cols] <- lapply(DATA[factor_cols], as.factor)
Since you are new, note that this forum is typically for questions that cant be easily found online. This question is relatively routine and more details can be found with a quick google search. While SO is a great place to learn R, you may be penalized by the SO community in the future for routine questions like this. Just trying to help ensure you keep learning!

Regression with several dummy variables

I am running a logistic regression and I want to control for the country of the respondents. I have 12 countries. I used the "fastDummy" package to create dummies for each country
ALL<-dummy_cols(ALL, select_columns = "country")
I get something like this:
country_Japan 1 1 0 0 0 0
country_Taiwan 0 0 1 1 0 0
country_China 0 0 0 0 1 1
and so on...
As you can see, the sum of all variables makes a perfect collinearity. For this reason, I cannot estimate the model.
I read that I need to include a variable with 0s as the last country dummy to avoid this collinearity. Is this correct? I included the intercept (a column with 1s) , but it did not help.
I would appreciate your suggestions. Thanks
Check the remove_first_dummy parameter in the dummy_cols function, i.e. set it to TRUE. This should solve your problem of multicollinearity.

Matches in binary columns-R

I am performing some prediction models. I have 2 binary columns , one with predicted values and the other one with the actual values.
Since the columns have few ones because it counts the number of people with cancer, i want to observe how many cases the model detected(how many real ones it predicted) and the percentage of sick persons correctly predicted.
Brief description of the data: the first column shows the real values and the seconde one shows the predicted values:
> predictedvsreal
real prediction
39240 0 0
39241 0 0
39242 0 0
39243 1 0
39244 0 1
39245 0 0
39246 0 0
39247 0 0
39248 1 1
39249 0 0
39250 0 0
39251 0 0
39252 0 0
Thanks!
Next time please include a reproducible example as it makes the question much better - both for letting people who answer have a concrete example to work with and to catch edge-cases, and for future readers to see a real example.
There are lots of good recommendations for how to create nice, minimal, reproducible examples at this link.
From what you describe, you want the table function, probably like this:
with(your_data, table(your_first_column_name, your_second_column_name))

plotting variables of procrustes analysis in r?

I have performed non-metric multidimensional scaling (NMDS) on two data frames, each containing different variables but for the same sites. I am using the vegan package:
> head (ResponsesS3)
R1_S3 R10_S3 R11_S3 R12_S3 R2_S3 R3_S3 R4_S3 R6_S3 R7_S3 R8_S3 R9_S3
4 0 0 0 0 0 1 0 0 0 0 0
5 0 0 0 0 0 1 0 0 0 0 0
7 1 0 0 1 0 0 0 0 0 0 0
12 0 0 0 0 0 1 0 0 0 0 0
14 2 2 0 0 0 0 2 0 0 0 0
16 0 0 1 0 0 0 0 0 0 1 0
> head (EnvtS3)
Dep_Mark Dep_Work Dep_Ext Use_For Use_Fish Use_Ag Div_Prod
4 0.06222836 1.0852315 0.8367309 1.1415929 1.644670 0.1006964 0.566474
5 0.25946808 1.3342266 0.0000000 1.7123894 0.822335 0.0000000 0.283237
7 2.20668862 0.0000000 0.8769881 0.4280973 0.822335 0.5244603 0.849711
12 2.26323697 0.0000000 0.8090991 1.1415929 0.000000 1.4957609 1.416185
14 1.65107675 0.5195901 0.2921132 0.5707965 0.822335 1.7873609 0.849711
16 1.82230225 0.4760163 0.1915366 2.2831858 0.000000 1.6614904 0.849711
> ResponsesS3.mds = metaMDS (ResponsesS3, k =2, trymax = 100)
> EnvtS3.mds = metaMDS (EnvtS3, k =2, trymax = 100)
I fit the results using a procrustean superimposition
> pro.ResponsesS3.EnvtS3.mds <- procrustes(ResponsesS3.mds,EnvtS3.mds)
I am most interested in understanding how the variables from each dataset fit together. I would like to use the plot() function to return a graph of the variables from ResponsesS3 and from EnvtS3, rather than the sites (which is what the plot function returns by default).
Is this possible?
No, this is not possible. The problem you'll find you have is that there will be different numbers of variables in the two datasets which causes the procrustes() method to fail if you try procrustes(..., scores = "species").
Even if you fit with procrustes(..., score = "sites") (the default), who do you propose to draw the plot if we could extract the species information? The current plot joins rows from one matric with the rows of other; this works in the default setting because the datasets are assumed to be measurements on the same locations/sites. But this is not possible with species/variables. More fundamentally, how should we pair up species with environmental variables?
Finally, you are trying to look at how the variables compare yet have used a method that essentially throws this information away once dissimilarities are computed.
I would look at the method of coinertia analysis, of which there is a crude interface in my cocorresp package and a fuller one in the ade4 package. If you find yourself wanting to compare two sets of species data, try cocorrespondence analysis, which cocorresp fits.
Like Gav said, the points must match each other one to one for Procrustes rotation. However, once you have a Procrustes rotation, you can naturally apply it to other matrices with the same number of columns. The number of columns is crucial: If you have 2-dim NMDS, your variables also must be mapped into these 2 dim. Function metaMDS() will get you such column scores corresponding to your ordination of row scores, but I don't know how adequate these are in your case. The easiest way to rotate those scores in vegan is to use predict method with newdata. Continuing with your example:
predict(pro.ResponsesS3.EnvtS3.mds, newdata=scores(EnvtS3.mds, "species"))
This will only rotate your column scores ("species") similarly as is rotated your row scores.
We do not know what you try to achieve, and indeed there may be better ways to achieve your goal (check Gavin's answer for a starter). However, this will do the rotation.

ChoiceModelR - Hierarchical Bayes Multinomial Logit Model

I hope that some of you are a bit experienced with the R package ChoiceModelR by Sermas and Colias, to estimate a Hierarchical Bayes Multinomial Logit Model. Actually, I am quite a newbie on both R and Hierarchical Bayes. However, I tried to get some estimates by using the script provided by Sermas and Colias in the help file. I have a data set in the same structure as they use (ID, choice set, alternative, independent variables, and choice variable). I have four independent variables all of them binary coded as categorical variables, none of them restricted. I have eight choice sets with three alternatives within each set as well as one no-choice-option as fourth alternative. I tried the following script:
library (ChoiceModelR)
data <- read.delim("Z:/KLU/CSR/CBC/mp3_vio.txt")
xcoding=c(0,0,0,0)
mcmc = list(R = 10, use = 10)
options = list(none=FALSE, save=TRUE, keep=1)
attlevels=c(2,2,2,2)
c1=matrix(c(0,0,0,0),2,2)
c2=matrix(c(0,0,0,0),2,2)
c3=matrix(c(0,0,0,0),2,2)
c4=matrix(c(0,0,0,0),2,2)
constraints = list(c1, c2, c3, c4)
out = choicemodelr(data, xcoding, mcmc = mcmc, options = options, constraints = constraints)
and have got the following error message:
Error in 1:nalts[i] : result would be too long a vector
In addition: There were 50 or more warnings (use warnings() to see the first 50). The mentioned warnings are of the following:
In max(temp[temp[, 2] == j, 3]) : no non-missing arguments to max; returning -Inf
In max(temp[temp[, 2] == j, 3]) : no non-missing arguments to max; returning -Inf
Actually, I have no idea what went wrong so far as I used the same data structure even I have more independent variables, more choice sets, and more alternatives within a choice set. I would be fantastic if anybody can shed some light into the darkness
I know that this may not be helpful since you posted so long ago, but if it comes up again in the future, this could prove useful.
One of the most common reasons for this error (in my experience) has been that either the scenario variable or the alternative variable is not in ascending order within your data.
id scenario alt x1 ... y
1 1 1 4 1
1 1 2 1 0
1 3 1 4 2
1 3 2 5 0
2 1 4 3 1
2 1 5 1 0
2 2 1 4 2
2 2 2 3 0
This dataset will give you errors since the scenario and alternative variables must be ascending, and they must not skip any values. Just to fully reiterate what I mean, the scenario and alt variables must be reordered as follows in order to work:
id scenario alt x1 ... y
1 1 1 4 1
1 1 2 1 0
1 2 1 4 2
1 2 2 5 0
2 1 1 3 1
2 1 2 1 0
2 2 1 4 2
2 2 2 3 0
I work with ChoiceModelR quite frequently, and this is what has caused these errors for me in the past. If you have a github account, you can also post your data (or modified data) there if you end up wanting to have other users take a look.

Resources