I have difficulty interpreting my data in regards to dispersion and composition. I have 6 groups and used adonis2() to test the compositional difference between them. Futhermore, I used betadisper() to check dispersion per group and compared the groups with anova. Now I want to visualize this and an elegant way seemed to use ordihull() in my NMDS plot.
Now my question, can I use ordihull to visualize group dispersion in an NMDS ordination? It looks like this:
Could I interpret and say that groups with largest surface area in ordihull (indicated by the coloured outlining) have the highest dispersion?
Related
I am not able to understand what this plot depicts and what we mean by the relation between rankings.Let me add a few more things to make it a bit precise. This plot is generated by using plot_importance_rankings function available as part of randomForestExplainer package.
A sample code which generates this code is as
plot_importance_rankings(var_importance_frame)
where var_importance_frame contains important variables which we get as from
var_importance_frame <- measure_importance(rf_model)
Here rf_model is the trained random forest model. A sample example can be found at this link RandomForestExplainer - sample example
The randomForestExplainer package implements several measures to assess a given variable's importance in the random forests models. On this plot, you have mean_min_depth (average minimum depth of a variable across all trees), accuracy_decrease (accuracy loss by randomly permuting on a given variable), gini_decrease (average gain of purity by splitting on a given variable), no_of_nodes (number of nodes that split on a given variable across trees), times_a_root (number of times a given variable is used as the root of trees). Ideally you'd want these importance measures to be somewhat consistent, in that a variable measured as of high importance by one metric is also measured high by the others. What this plot is showing is this as a sanity check. In your case the variable importances are largely consistent and positively correlated. Each dot on the scatter plot represents a variable.
I have constructed a mixed effect model using lmer() with the aim of comparing the growth in reading scores for four different groups of children as they age.
I would like to plot a graph of the 4 different slopes with confidence intervals in R in order to visualize this relationship but I keep getting stuck.
I have tried to use the plot function and some versions of the ggplot as I have done for previous lm() models but it isn't working so far. Here is my attempted model which I hope looks at how the change in reading scores over time(age) interacts with a child's SESDLD grouping (this indicated whether a child has a language problem and whether or not they are high or low income).
AgeSES.model <- lmer(ReadingMeasure ~ Age.c*SESDLD1 + (1|childid), data = reshapedomit, REML = FALSE)
The ReadingMeasure is a continuous score, age.c is centered age measured in months. SESDLD1 is a categorical measure which has 4 levels. I would expect four positive slopes of ReadingMeasure growth with different intercepts and probably differing slopes.
I would really appreciate any pointers on how to do this!
Thank you so much!!
The type of plot I would like to achieve - this was done in Stata
I have some population genomics data where I have allele frequencies of SNPs with respect to the ancestral genome for each treatment.
I wanted to look whether beta diversity was different between treatments so used the vegan package and betadisper() on euclidean distances.
After extracting all the information from the model and putting it into dataframes so that ggplot2 likes it I can get this plot.
Although to my eye this shows higher beta diversity in mixed (circle) than static (triangle) treatments, the anova(), permutest() and TukeyHSD() methods give results where we do not reject the null hypothesis of homogeneity of variances. In addition, the p values for most of these tests are p > 0.8.
From what I can work out, these tests on a betadisper() model object look at differences in the mean distance to the centroid, which is not different across treatments.
However the spread of distance to centroid does seem to be different between the treatments.
I was just wondering if I am ok doing something like a Bartlett test or levene test (in the car package) to look at differences in the variance of the distances from the centroid for each group as another metric of "beta diversity" (variance across groups). Or if there are methods within vegan that anyone knows to look at the variance of distance to centroid as well as the changes in the mean distance to centroid.
Your graphics are misleading: You should use equal aspect ratio (isometric scaling) in PCoA, but the horizontal axis is stretched and vertical axis compressed in your plot. Moreover, the convex hull can be misleading as it focuses on extreme observations, but the test focuses on "typical" distances from the centroid. So your "eye" was wrong and misled by graphics. We do provide correct graphics as methods for betadisper and using these instead of self-concocted ggplot2 graphics would have saved you from this problem, or at least you could use these graphics to cross-check your own versions.
Please note that betadisper already works with "homogeneity" of variances, and having a variance of variances (= variance of distances from centroids) may not be a useful or easily interpreted. The pairs of functions we have are adonis2 for differences of centroids and and betadisper for sizes of dispersion w.r.t. to centroids.
I have used the package lsmeans in R to get the average estimate for all observations for my treatment factor (across the levels of a block factor in the experimental design that has been included with systematic effect because it only had 3 levels). I have used a sqrt transformation for my response variable.
Thus I have used the following commands in R.
First defining model
model<-sqrt(response)~treatment+block
Then applying lsmeans
model_lsmeans<-lsmeans(model,~treatment)
Then plotting this
plot(model_lsmeans,ylab="treatment", xlab="response(with 95% CI)")
This gives a very nice graph with estimates and 95% confidense intervals for the different treatment.
The problems is just that this graph is for the transformed response.
How do I get this same plot with the backtransformed response (so the squared response)?
I have tried to create a new data frame and extract the lsmean, lower.CL, and upper.CL:
a<-summary(model_lsmeans)
New_dataframe<-as.data.frame(a[c("treatment","lsmean","lower.CL","upper.CL")])
And then make these squared
New_dataframe$lsmean<-New_dataframe$lsmean^2
New_dataframe$lower.CL<-New_dataframe$lower.CL^2
New_dataframe$upper.CL<-New_dataframe$upper.CL^2
New_dataframe
This gives me the estimates and CI boundaries squared that I need.
The problem is that I cannot make the same graph for thise estimates and CI as the one that I did in LS means above.
How can I do this? The reason that I ask is that I want to have graphs that are all of a similar style for my article. Since I very much like this LSmeans plot, and it is very convenient for me to use on the non-transformed response variables, I would like to have all my graphs in this style.
Thank you very much for your help! Hope everything is clear!
Kind regards
Ditlev
I'm studying the effect of different predictors (dummy, categorical and continuos variables) on presence of birds, obtained from bird counts at-sea. To do that I used a glmmadmb function and binomial family.
I've plotted the relationship between response variable and predictors in order to asses the model fit and the marginal effect of each predictor. To draw the graphs I used visreg function, specifying the transformation of the vertical axis:
visreg(modelo.bn7, type="conditional", scale="response", ylab= "Bird Presence")
The output graphs showed a confident bands very wide when I used the original scale of the response variable (covering the whole vertical axis). In case of graphs without transformation, confident bands were shorter but they had the same extension in the different levels of dummy variables. Does anyone know how the confidents bands are calculated in binomial distributions? Could it reflect that I have a problem in the estimated coefficients or in the model fit?
The confidence bands are calculated using p-values for binomial distribution... For detailed explanation you can ask on stats.stackexchange.com. If the bands are very wide (and the interpretation of 'wide' is subjective and mostly based on what is your goal) then it shows that your estimates may not be very accurate. High p-values usually are due to small or insufficient number of observations used for building the model. If the number of observations are large, then it does indicate a poor fit.