I am using the following in R to generate a Boxplot out of a given set of data:
ggplot(data = daten, aes(x=Bodentyp, y=Fracht)) + geom_boxplot(aes(fill=Bewirtschaftungsform))
Now I want to display the number of data points going into each category of the column "Bodentyp". How do I achieve this?
You can use fun.datato apply a function (f) to the grouped data to return a count (length(y)) and a position for the label (median(y))
f <- function(y)
c(label=length(y), y=median(y))
library(ggplot2)
data(mtcars)
ggplot(mtcars, aes(x=as.factor(cyl), y=mpg)) +
geom_boxplot() + theme_bw() +
stat_summary(fun.data=f, geom="text", vjust=-0.5, col="blue")
Related
How do I draw a horizontal line indicating the Highest (Posterior) Density interval for faceted density plots in ggplot2? This is what I have tried:
# Functions to calculate lower and upper part of HPD.
hpd_lower = function(x) coda::HPDinterval(as.mcmc(x))[1]
hpd_upper = function(x) coda::HPDinterval(as.mcmc(x))[2]
# Data: two groups with different means
df = data.frame(value=c(rnorm(500), rnorm(500, mean=5)), group=rep(c('A', 'B'), each=500))
# Plot it
ggplot(df, aes(x=value)) +
geom_density() +
facet_wrap(~group) +
geom_segment(aes(x=hpd_lower(value), xend=hpd_upper(value), y=0, yend=0), size=3)
As you can see, geom_segment computes on all data for both facets whereas I would like it to respect the faceting. I would also like a solution where HPDinterval is only run once per facet.
Pre-calculate the hpd intervals. ggplot evaluates the calculations in the aes() function in the entire data frame, even when data are grouped.
# Plot it
library(dplyr)
df_hpd <- group_by(df, group) %>% summarize(x=hpd_lower(value), xend=hpd_upper(value))
ggplot(df, aes(x=value)) +
geom_density() +
facet_wrap(~group) +
geom_segment(data = df_hpd, aes(x=x, xend=xend, y=0, yend=0), size=3)
I output each of boxplot of my response variable with respect to my categorical feautures but I cannot highlight number of observations of each category. I tried stat_summary and geom_text() options which are stated here but they are not working.
How can I show them in my boxplots?
Below is my code:
for(i in 3:ncol(Train_factor)){
b<-paste("Boxplot for",colnames(Train_factor[i]))
p10 <- (ggplot(data=Train_factor, aes_string(x = names(Train_factor)[i],
y = "Response",fill=variable)) +
geom_boxplot())
plot_list[[i]] = p10
}
for (i in 3:ncol(Train_factor)) {
file_name = paste("boxplot", i, ".tiff", sep="")
tiff(file_name)
print(plot_list[[i]])
dev.off()
}
You haven't provided a reproducible example, so here's a generic example using the built-in mtcars data frame. We use geom_text() but instead of stat="identity" (the default) we use stat="count" and label=..count.. (which is the internally calculated count of the number of values) so that the displayed value will be the count of values.
library(ggplot2)
ggplot(mtcars, (aes(x=factor(cyl), y=mpg))) +
geom_boxplot() +
geom_text(aes(label=..count..), y=0, stat='count', colour="red", size=4) +
coord_cartesian(ylim=c(0,max(mtcars$mpg))) +
theme_classic()
In ggplot in R, is it possible to plot each point with a unique number but without circles surrounded? I tried to use color "white" but it doesn't work.
I would recommend geom_text.
set.seed(101)
dd <- data.frame(x=rnorm(50),y=rnorm(50),id=1:50)
library(ggplot2)
ggplot(dd,aes(x,y))+geom_text(aes(label=id))
I'll show how to do it with geom_text and/or geom_point.
Using geom_text (recommended)
For this example I'll use the built-in dataset mtcars and let's pretend the numbers you want to display are the weights (wt) variable:
data(mtcars)
p <- ggplot(mtcars, aes(wt, mpg, label = rownames(mtcars)))
p + geom_text(aes(label = wt),
parse = TRUE)
or if you want an example with truly unique numbers, we can just make up an index using seq:
data(mtcars)
p <- ggplot(mtcars, aes(wt, mpg, label = rownames(mtcars)))
p + geom_text(aes(label = seq(1:32)),
parse = TRUE)
Using geom_point
While it would require more work, it actually is possible to do this with geom_point.
This is a reference image of some of the shapes you can use with geom_point:
As you can see, shapes 48 to 57 are 0 to 9. You can leverage these shapes (and combinations of them to form an infinite amount of numbers) via geom_point like this:
d=data.frame(p=c(48:57))
ggplot() +
scale_y_continuous(name="") +
scale_x_continuous(name="") +
scale_shape_identity() +
geom_point(data=d, mapping=aes(x=p%%16, y=p%/%16, shape=p), size=5, fill="red")
Finally, a trivial example using mtcars + geom_point with arbitrary numbers:
d=data.frame(p=c(48:57,48:57,48:57,48,49))
attach(mtcars)
ggplot(mtcars) +
scale_y_continuous(name="") +
scale_x_continuous(name="") +
scale_shape_identity() +
geom_point(data=d, mapping=aes(x=wt, y=mpg, shape=p), size=5, fill="red")
I have this simple data frame holding three replicates (value) for each factor (CT). I would like to plot it as geom_point and than the means of the point as geom_line.
gene <- c("Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5","Ckap5")
value <- c(0.86443, 0.79032, 0.86517, 0.79782, 0.79439, 0.89221, 0.93071, 0.87170, 0.86488, 0.91133, 0.87202, 0.84028, 0.83242, 0.74016, 0.86656)
CT <- c("ET","ET","ET", "HP","HP","HP","HT","HT","HT", "LT","LT","LT","P","P","P")
df<- cbind(gene,value,CT)
df<- data.frame(df)
So, I can make the scatter plot.
ggplot(df, aes(x=CT, y=value)) + geom_point()
How do I get a geom_line representing the means for each factor. I have tried the stat_summary:
ggplot(df, aes(x=CT, y=value)) + geom_point() +
stat_summary(aes(y = value,group = CT), fun.y=mean, colour="red", geom="line")
But it does not work.
"geom_path: Each group consist of only one observation. Do you need to adjust the group aesthetic?"
But each group has three observations, what is wrong?
Ps. I am also interested in a smooth line.
You should set the group aes to 1:
ggplot(df, aes(x=CT, y=value)) + geom_point() +
stat_summary(aes(y = value,group=1), fun.y=mean, colour="red", geom="line",group=1)
You can use the dplyr package to get the means of each factor.
library(dplyr)
group_means <- df %>%
group_by(CT) %>%
summarise(mean = mean(value))
Then you will need to convert the factors to numeric to let you plot lines on the graph using the geom_segment function. In addition, the scale_x_continuous function will let you set the labels for the x axis.
ggplot(df, aes(x=as.numeric(CT), y=value)) + geom_point() +
geom_segment(aes(x=as.numeric(CT)-0.4, xend=as.numeric(CT)+0.4, y=mean, yend=mean),
data=group_means, colour="red") +
scale_x_continuous("name", labels=as.character(df$CT), breaks=as.numeric(df$CT))
Following on from hrbrmstr's comment you can add the smooth line using the following:
ggplot(df, aes(x=as.numeric(CT), y=value, group=1)) + geom_point() +
geom_segment(aes(x=as.numeric(CT)-0.4, xend=as.numeric(CT)+0.4, y=mean, yend=mean),
data=group_means, colour="red") +
scale_x_continuous("name", labels=as.character(df$CT), breaks=as.numeric(df$CT)) +
geom_smooth()
I am using ggplot2 to produce a plot that has 3 facets. Because I am comparing two different data sets, I would like to then be able to plot a second data set using the same y scale for the facets as in the first plot. However, I cannot find a simple way to save the settings of the first plot to then re-use them with the second plot. Since each facet has its own y scale, it will be a pain to specify them by hand for the second plot. Does anyone know of a quick way of re-using scales? To make this concrete, here is how I am generating first my plot:
p <- ggplot(mtcars, aes(mpg, wt)) + geom_point()
p + facet_wrap(~ cyl, scales = "free_y")
EDIT
When applying one of the suggestions below, I found out that my problem was more specific than described in the original post, and it had to do specifically with scaling of the error bars. Concretely, the error bars look weird when I rescale the second plot as suggested. Does anyone have any suggestions on how to keep the same scale for both plots and dtill display the error bars correctly? I am attaching example below for concreteness:
#Create sample data
d1 <- data.frame(fixtype=c('ff','ff','fp','fp'), detype=c('det','pro','det','pro'),
diffscore=c(-1,-15,3,-17),se=c(2,3,1,2))
d2 <- data.frame(fixtype=c('ff','ff','fp','fp'), detype=c('det','pro','det','pro'),
diffscore=c(-1,-3,-2,-1),se=c(4,3,5,3))
#Plot for data frame 1, this is the scale I want to keep
lim_d1 <- aes(ymax = diffscore + se, ymin=diffscore - se)
ggplot(d1, aes(colour=detype, y=diffscore, x=detype)) +
geom_point(aes(size=1), shape=15) +
geom_errorbar(lim_d1, width=0.2,size=1) +
facet_wrap(~fixtype, nrow=2, ncol=2, scales = "free_y")
#Plot for data frame 2 original scale
lim_d2 <- aes(ymax = diffscore + se, ymin=diffscore - se)
ggplot(d2, aes(colour=detype, y=diffscore, x=detype)) +
geom_point(aes(size=1), shape=15) +
geom_errorbar(lim_d2, width=0.2,size=1) +
facet_wrap(~fixtype, nrow=2, ncol=2, scales = "free_y")
#Plot for data frame 2 adjusted scale. This is where things go wrong!
#As suggested below, first I plot the first plot, then I draw a blank screen and try
#to plot the second data frame on top.
lim_d2 <- aes(ymax = diffscore + se, ymin=diffscore - se)
ggplot(d1, aes(colour=detype, y=diffscore, x=detype)) +
geom_blank() +
geom_point(data=d2, aes(size=1), shape=15) +
geom_errorbar(lim_d2, width=0.2,size=1) +
facet_wrap(~fixtype, nrow=2, ncol=2, scales = "free_y")
#If the error bars are fixed, by adding data=d2 to geom_errorbar(), then
#the error bars are displayed correctly but the scale gets distorted again
lim_d2 <- aes(ymax = diffscore + se, ymin=diffscore - se)
ggplot(d1, aes(colour=detype, y=diffscore, x=detype)) +
geom_blank() +
geom_point(data=d2, aes(size=1), shape=15) +
geom_errorbar(data=d2,lim_d2, width=0.2,size=1) +
facet_wrap(~fixtype, nrow=2, ncol=2, scales = "free_y")
You may first call ggplot on your original data where you add a geom_blank as a first layer. This sets up a plot area, with axes and legends based on the data provided in ggplot.
Then add geoms which use data other than the original data. In the example, I use a simple subset of the original data.
From ?geom_blank: "The blank geom draws nothing, but can be a useful way of ensuring common scales between different plots.".
ggplot(data = mtcars, aes(mpg, wt)) +
geom_blank() +
geom_point(data = subset(mtcars, wt < 3)) +
facet_wrap(~ cyl, scales = "free_y")
Here is an ugly hack that assumes you have an identical facetting layout in both plots.
It replaces the panel element of the ggplot build.
p <- ggplot(mtcars, aes(mpg, wt)) + geom_point()
p1 <- p + facet_wrap(~ cyl, scales = "free_y") + labs(title = 'original')
# create "other" data.frame
n <- nrow(mtcars)
set.seed(201405)
mtcars2 <- mtcars[sample(seq_len(n ),n-15),]
# create this second plot
p2 <- p1 %+% mtcars2 + labs(title = 'new data')
# and a copy so we can attempt to fix
p3 <- p2 + labs(title = 'new data original scale')
# use ggplot_build to construct the plots for rendering
p1b <- ggplot_build(p1)
p3b <- ggplot_build(p3)
# replace the 'panel' information in plot 2 with that
# from plot 1
p3b[['panel']] <- p1b[['panel']]
# render the revised plot
# for comparison
library(gridExtra)
grid.arrange(p1 , p2, ggplot_gtable(p3b))