Related
I need to plot an ordination plot showing only let s say the 20 most abundant species.
I tried to do the sum of the species colunm and then select only a certain sum value:
abu <- colSums(dune)
abu
sol <- metaMDS(dune)
sol
plot(sol, type="text", display="species", select = abu > 40)
I get this error: select is not a graphical parameter
I would expect to see only small number of species but it does not happen,
how do you show only a small number of species in the NMDS plot?
This is not straightforward. You are getting an error because select is not a parameter for the plot. Unfortunately, the result of the analysis is not a data.frame that could be handled easily (e.g. with tidyverse), and even more unfortunately, the plot() function called is not your standard plot, but a method defined specifically for objects of this class. The authors of this method did not foresee your need, and therefore, we must make the plot manually. But to do that, we need to understand what is plotting and how.
Let us find out more about the object sol:
class(sol)
# [1] "metaMDS" "monoMDS"
methods(class="metaMDS")
# [1] goodness nobs plot points print scores sppscores<- text
Oh good, we have a plot method. After a moment of digging, we find it in the vegan package (not exported, so we need to access it via vegan:::plot.metaMDS). It appears to be a wrapper around a function called ordiplot. We edit the function with edit() to figure out what it is doing. Essentially, it boils down to the following (with loads of unnecessary code):
Y <- scores(sol, display="species")
plot(Y, type="n")
text(Y[,1], Y[,2], rownames(Y), col="red")
This is, more or less, your plot. Choosing the species to show is now trivial, but first we must make sure that rows of Y are in the same order as columns of dune:
all(colnames(dune) == rownames(Y))
Y.sel <- Y[colSums(dune) > 40, ]
plot(Y.sel[,1], Y.sel[,2], type="n", xlim=c(-.8, .8), ylim=c(-.4, .4))
text(Y.sel[,1], Y.sel[,2], rownames(Y.sel), col="red")
We can of course make a much nicer plot. For example, with ggplot (it is definitely possible to make a much nicer plot with base R as well). We could actually show the abundance of the plants using the size esthetics:
library(ggplot2)
library(ggrepel)
Y <- data.frame(Y)
Y$abundance <- colSums(dune)
Y$labels <- rownames(Y)
ggplot(Y, aes(x=NMDS1, y=NMDS2, size=abundance)) +
geom_point() + geom_text_repel(aes(label=labels)) +
theme_minimal()
To filter the species by abundance, we now can do the following:
library(tidyverse)
Y %>% filter(abundance > 40) %>%
ggplot(Y, aes(x=NMDS1, y=NMDS2, size=abundance)) +
geom_point() + geom_text_repel(aes(label=labels)) +
theme_minimal()
I have a dataset of leaf trait measurements made at multiple sites at two contrasting seasons. I am interested to explore the association/line fit between a pair of traits and to differentiate the seasons at each site.
Rather than a linear regression, I would prefer to use the Standardised Major Axis approach within the smatr package:
e.g. sma.site1 <- sma(TraitA ~ TraitB * Visit, data=subset(myfile, Site=="Site1")) # testing the null hypothesis of common slopes for the two Visits (Seasons) at a given Site.
I can produce a handy lattice plot in ggplot2 with a separate panel for each Site and the points differentiated by Visit:
e.g. qplot(TraitB, TraitA, data=myfile, colour=Visit) + facet_wrap(~Site, ncol=2)
However, if I add trend lines fitted with the additional argument in ggplot2:
+ geom_smooth(aes(group=Visit), method="lm", se=F)
……, those lines are not a good match for the sma coefficients.
What I would like to do is fit the lines suggested by the sma test onto the ggplot lattice. Is there an easy, or efficient, way to do that?
I know that I can subset the data, produce a plot for each site, add the relevant lines with + geom_abline() and then stitch the separate plots up together with grid.arrange(). But that feels very long-winded.
I would be grateful for any pointers.
I don't know anything about the smatr package but you should be able to tweak this to get the right values. Since you provided no data I used the leaf data from the example in the pkg. The basic idea is to pull out the slope & intercept from the returned sma object and then facet the geom_abline. I may be misinterpreting the object, though.
library(smatr)
library(ggplot2)
data(leaflife)
do.call(rbind, lapply(unique(leaflife$site), function(x) {
obj <- sma(longev~lma*rain, data=subset(leaflife, site=x))
data.frame(site=x,
intercept=obj$coef[[1]][1, 1],
slope=obj$coef[[1]][2, 1])
})) -> fits
gg <- ggplot(leaflife)
gg <- gg + geom_point(aes(x=lma, y=longev, color=soilp))
gg <- gg + geom_abline(data=fits, aes(slope=slope, intercept=intercept))
gg <- gg + facet_wrap(~site, ncol=2)
gg
I just saw this question and am not sure if you are still interested in this. I run the code by hrbrmstr, and found actually the only thing you need to change is:
obj <- sma(longev~lma*rain, data=subset(leaflife, site == x))
then you can get the plot with four lines for each group.
and also
I am trying to find the best way to create barplots in R with standard errors displayed. I have seen other articles but I cannot figure out the code to use with my own data (having not used ggplot before and this seeming to be the most used way and barplot not cooperating with dataframes). I need to use this in two cases for which I have created two example dataframes:
Plot df1 so that the x-axis has sites a-c, with the y-axis displaying the mean value for V1 and the standard errors highlighted, similar to this example with a grey colour. Here, plant biomass should the mean V1 value and treatments should be each of my sites.
Plot df2 in the same way, but so that before and after are located next to each other in a similar way to this, so pre-test and post-test equate to before and after in my example.
x <- factor(LETTERS[1:3])
site <- rep(x, each = 8)
values <- as.data.frame(matrix(sample(0:10, 3*8, replace=TRUE), ncol=1))
df1 <- cbind(site,values)
z <- factor(c("Before","After"))
when <- rep(z, each = 4)
df2 <- data.frame(when,df1)
Apologies for the simplicity for more experienced R users and particuarly those that use ggplot but I cannot apply snippets of code that I have found elsewhere to my data. I cannot even get enough code together to produce a start to a graph so I hope my descriptions are sufficient. Thank you in advance.
Something like this?
library(ggplot2)
get.se <- function(y) {
se <- sd(y)/sqrt(length(y))
mu <- mean(y)
c(ymin=mu-se, ymax=mu+se)
}
ggplot(df1, aes(x=site, y=V1)) +
stat_summary(fun.y=mean, geom="bar", fill="lightgreen", color="grey70")+
stat_summary(fun.data=get.se, geom="errorbar", width=0.1)
ggplot(df2, aes(x=site, y=V1, fill=when)) +
stat_summary(fun.y=mean, geom="bar", position="dodge", color="grey70")+
stat_summary(fun.data=get.se, geom="errorbar", width=0.1, position=position_dodge(width=0.9))
So this takes advantage of the stat_summary(...) function in ggplot to, first, summarize y for given x using mean(...) (for the bars), and then to summarize y for given x using the get.se(...) function for the error-bars. Another option would be to summarize your data prior to using ggplot, and then use geom_bar(...) and geom_errorbar(...).
Also, plotting +/- 1 se is not a great practice (although it's used often enough). You'd be better served plotting legitimate confidence limits, which you could do, for instance, using the built-in mean_cl_normal function instead of the contrived get.se(...). mean_cl_normal returns the 95% confidence limits based on the assumption that the data is normally distributed (or you can set the CL to something else; read the documentation).
I used group_by and summarise_each function for this and std.error function from package plotrix
library(plotrix) # for std error function
library(dplyr) # for group_by and summarise_each function
library(ggplot2) # for creating ggplot
For df1 plot
# Group data by when and site
grouped_df1<-group_by(df1,site)
#summarise grouped data and calculate mean and standard error using function mean and std.error(from plotrix)
summarised_df1<-summarise_each(grouped_df1,funs(mean=mean,std_error=std.error))
# Define the top and bottom of the errorbars
limits <- aes(ymax = mean + std_error, ymin=mean-std_error)
#Begin your ggplot
#Here we are plotting site vs mean and filling by another factor variable when
g<-ggplot(summarised_df1,aes(site,mean))
#Creating bar to show the factor variable position_dodge
#ensures side by side creation of factor bars
g<-g+geom_bar(stat = "identity",position = position_dodge())
#creation of error bar
g<-g+geom_errorbar(limits,width=0.25,position = position_dodge(width = 0.9))
#print graph
g
For df2 plot
# Group data by when and site
grouped_df2<-group_by(df2,when,site)
#summarise grouped data and calculate mean and standard error using function mean and std.error
summarised_df2<-summarise_each(grouped_df2,funs(mean=mean,std_error=std.error))
# Define the top and bottom of the errorbars
limits <- aes(ymax = mean + std_error, ymin=mean-std_error)
#Begin your ggplot
#Here we are plotting site vs mean and filling by another factor variable when
g<-ggplot(summarised_df2,aes(site,mean,fill=when))
#Creating bar to show the factor variable position_dodge
#ensures side by side creation of factor bars
g<-g+geom_bar(stat = "identity",position = position_dodge())
#creation of error bar
g<-g+geom_errorbar(limits,width=0.25,position = position_dodge(width = 0.9))
#print graph
g
The qqmath function makes great caterpillar plots of random effects using the output from the lmer package. That is, qqmath is great at plotting the intercepts from a hierarchical model with their errors around the point estimate. An example of the lmer and qqmath functions are below using the built-in data in the lme4 package called Dyestuff. The code will produce the hierarchical model and a nice plot using the ggmath function.
library("lme4")
data(package = "lme4")
# Dyestuff
# a balanced one-way classiï¬cation of Yield
# from samples produced from six Batches
summary(Dyestuff)
# Batch is an example of a random effect
# Fit 1-way random effects linear model
fit1 <- lmer(Yield ~ 1 + (1|Batch), Dyestuff)
summary(fit1)
coef(fit1) #intercept for each level in Batch
# qqplot of the random effects with their variances
qqmath(ranef(fit1, postVar = TRUE), strip = FALSE)$Batch
The last line of code produces a really nice plot of each intercept with the error around each estimate. But formatting the qqmath function seems to be very difficult, and I've been struggling to format the plot. I've come up with a few questions that I cannot answer, and that I think others could also benefit from if they are using the lmer/qqmath combination:
Is there a way to take the qqmath function above and add a few
options, such as, making certain points empty vs. filled-in, or
different colors for different points? For example, can you make the points for A,B, and C of the Batch variable filled, but then the rest of the points empty?
Is it possible to add axis labels for each point (maybe along the
top or right y axis, for example)?
My data has closer to 45 intercepts, so it is possible to add
spacing between the labels so they do not run into each other?
MAINLY, I am interested in distinguishing/labeling between points on the
graph, which seems to be cumbersome/impossible in the ggmath function.
So far, adding any additional option in the qqmath function produce errors where I would not get errors if it was a standard plot, so I'm at a loss.
Also, if you feel there is a better package/function for plotting intercepts from lmer output, I'd love to hear it! (for example, can you do points 1-3 using dotplot?)
EDIT: I'm also open to an alternative dotplot if it can be reasonably formatted. I just like the look of a ggmath plot, so I'm starting with a question about that.
One possibility is to use library ggplot2 to draw similar graph and then you can adjust appearance of your plot.
First, ranef object is saved as randoms. Then variances of intercepts are saved in object qq.
randoms<-ranef(fit1, postVar = TRUE)
qq <- attr(ranef(fit1, postVar = TRUE)[[1]], "postVar")
Object rand.interc contains just random intercepts with level names.
rand.interc<-randoms$Batch
All objects put in one data frame. For error intervals sd.interc is calculated as 2 times square root of variance.
df<-data.frame(Intercepts=randoms$Batch[,1],
sd.interc=2*sqrt(qq[,,1:length(qq)]),
lev.names=rownames(rand.interc))
If you need that intercepts are ordered in plot according to value then lev.names should be reordered. This line can be skipped if intercepts should be ordered by level names.
df$lev.names<-factor(df$lev.names,levels=df$lev.names[order(df$Intercepts)])
This code produces plot. Now points will differ by shape according to factor levels.
library(ggplot2)
p <- ggplot(df,aes(lev.names,Intercepts,shape=lev.names))
#Added horizontal line at y=0, error bars to points and points with size two
p <- p + geom_hline(yintercept=0) +geom_errorbar(aes(ymin=Intercepts-sd.interc, ymax=Intercepts+sd.interc), width=0,color="black") + geom_point(aes(size=2))
#Removed legends and with scale_shape_manual point shapes set to 1 and 16
p <- p + guides(size=FALSE,shape=FALSE) + scale_shape_manual(values=c(1,1,1,16,16,16))
#Changed appearance of plot (black and white theme) and x and y axis labels
p <- p + theme_bw() + xlab("Levels") + ylab("")
#Final adjustments of plot
p <- p + theme(axis.text.x=element_text(size=rel(1.2)),
axis.title.x=element_text(size=rel(1.3)),
axis.text.y=element_text(size=rel(1.2)),
panel.grid.minor=element_blank(),
panel.grid.major.x=element_blank())
#To put levels on y axis you just need to use coord_flip()
p <- p+ coord_flip()
print(p)
Didzis' answer is great! Just to wrap it up a little bit, I put it into its own function that behaves a lot like qqmath.ranef.mer() and dotplot.ranef.mer(). In addition to Didzis' answer, it also handles models with multiple correlated random effects (like qqmath() and dotplot() do). Comparison to qqmath():
require(lme4) ## for lmer(), sleepstudy
require(lattice) ## for dotplot()
fit <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy)
ggCaterpillar(ranef(fit, condVar=TRUE)) ## using ggplot2
qqmath(ranef(fit, condVar=TRUE)) ## for comparison
Comparison to dotplot():
ggCaterpillar(ranef(fit, condVar=TRUE), QQ=FALSE)
dotplot(ranef(fit, condVar=TRUE))
Sometimes, it might be useful to have different scales for the random effects - something which dotplot() enforces. When I tried to relax this, I had to change the facetting (see this answer).
ggCaterpillar(ranef(fit, condVar=TRUE), QQ=FALSE, likeDotplot=FALSE)
## re = object of class ranef.mer
ggCaterpillar <- function(re, QQ=TRUE, likeDotplot=TRUE) {
require(ggplot2)
f <- function(x) {
pv <- attr(x, "postVar")
cols <- 1:(dim(pv)[1])
se <- unlist(lapply(cols, function(i) sqrt(pv[i, i, ])))
ord <- unlist(lapply(x, order)) + rep((0:(ncol(x) - 1)) * nrow(x), each=nrow(x))
pDf <- data.frame(y=unlist(x)[ord],
ci=1.96*se[ord],
nQQ=rep(qnorm(ppoints(nrow(x))), ncol(x)),
ID=factor(rep(rownames(x), ncol(x))[ord], levels=rownames(x)[ord]),
ind=gl(ncol(x), nrow(x), labels=names(x)))
if(QQ) { ## normal QQ-plot
p <- ggplot(pDf, aes(nQQ, y))
p <- p + facet_wrap(~ ind, scales="free")
p <- p + xlab("Standard normal quantiles") + ylab("Random effect quantiles")
} else { ## caterpillar dotplot
p <- ggplot(pDf, aes(ID, y)) + coord_flip()
if(likeDotplot) { ## imitate dotplot() -> same scales for random effects
p <- p + facet_wrap(~ ind)
} else { ## different scales for random effects
p <- p + facet_grid(ind ~ ., scales="free_y")
}
p <- p + xlab("Levels") + ylab("Random effects")
}
p <- p + theme(legend.position="none")
p <- p + geom_hline(yintercept=0)
p <- p + geom_errorbar(aes(ymin=y-ci, ymax=y+ci), width=0, colour="black")
p <- p + geom_point(aes(size=1.2), colour="blue")
return(p)
}
lapply(re, f)
}
Another way to do this is to extract simulated values from the distribution of each of the random effects and plot those. Using the merTools package, it is possible to easily get the simulations from a lmer or glmer object, and to plot them.
library(lme4); library(merTools) ## for lmer(), sleepstudy
fit <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy)
randoms <- REsim(fit, n.sims = 500)
randoms is now an object with that looks like:
head(randoms)
groupFctr groupID term mean median sd
1 Subject 308 (Intercept) 3.083375 2.214805 14.79050
2 Subject 309 (Intercept) -39.382557 -38.607697 12.68987
3 Subject 310 (Intercept) -37.314979 -38.107747 12.53729
4 Subject 330 (Intercept) 22.234687 21.048882 11.51082
5 Subject 331 (Intercept) 21.418040 21.122913 13.17926
6 Subject 332 (Intercept) 11.371621 12.238580 12.65172
It provides the name of the grouping factor, the level of the factor we are obtaining an estimate for, the term in the model, and the mean, median, and standard deviation of the simulated values. We can use this to generate a caterpillar plot similar to those above:
plotREsim(randoms)
Which produces:
One nice feature is that the values that have a confidence interval that does not overlap zero are highlighted in black. You can modify the width of the interval by using the level parameter to plotREsim making wider or narrower confidence intervals based on your needs.
Yet another way to obtain the desired plot is through the plot_model()command integraded in the sjPlotpackage. The advantage is that the command returns a ggplot-object and hence there are many options to adjust the figure as wished. I kept the example simple because there are many options to individualize the visualisation - just check ?plot_modelfor all options.
library(lme4)
library(sjPlot)
#?plot_model
data(Dyestuff, package = "lme4")
summary(Dyestuff)
fit1 <- lmer(Yield ~ 1 + (1|Batch), Dyestuff)
summary(fit1)
plot_model(fit1, type="re",
vline.color="#A9A9A9", dot.size=1.5,
show.values=T, value.offset=.2)
I can't find a way to ask ggplot2 to show an empty level in a boxplot without imputing my dataframe with actual missing values.
Here is reproducible code :
# fake data
dftest <- expand.grid(time=1:10,measure=1:50)
dftest$value <- rnorm(dim(dftest)[1],3+0.1*dftest$time,1)
# and let's suppose we didn't observe anything at time 2
# doesn't work even when forcing with factor(..., levels=...)
p <- ggplot(data=dftest[dftest$time!=2,],aes(x=factor(time,levels=1:10),y=value))
p + geom_boxplot()
# only way seems to have at least one actual missing value in the dataframe
dftest2 <- dftest
dftest2[dftest2$time==2,"value"] <- NA
p <- ggplot(data=dftest2,aes(x=factor(time),y=value))
p + geom_boxplot()
So I guess I'm missing something. This is not a problem when dealing with a balanced experiment where these missing data might be explicit in the dataframe. But with observed data in a cohort for example, it means imputing the data with missing values for unobserved combinations...
Thanks for your help.
You can control the breaks in a suitable scale function, in this case scale_x_discrete. Make sure you use the argument drop=FALSE:
p <- ggplot(data=dftest[dftest$time!=2,],aes(x=factor(time,levels=1:10),y=value))
p + geom_boxplot() +
scale_x_discrete("time", breaks=factor(1:10), drop=FALSE)
I like to do my data manipulation in advance of sending it to ggplot. I think this makes the code more readable. This is how I would do it myself, but the results are the same. Note, however, that the ggplot scale gets much simpler, since you don't have to specify the breaks:
dfplot <- dftest[dftest$time!=2, ]
dfplot$time <- factor(dfplot$time, levels=1:10)
ggplot(data=dfplot, aes(x=time ,y=value)) +
geom_boxplot() +
scale_x_discrete("time", drop=FALSE)