exact value of the mean of a boxplot - r

I generated a boxplot with three variables ("Jahreszeit","Fracht","Bewirtschaftungsform") like this:
ggplot(daten,aes(x=Jahreszeit, y=Fracht))+ geom_boxplot() +
facet_wrap(~ Bewirtschaftungsform)+
geom_point(position = position_jitter(width = 0.1))+
stat_summary(fun.data=f, geom="text", vjust=+1.5, col="black")
My question is, whether there is a way to extract the exact value of the mean of eacht category of the factor?

I would approach such a task using aggregate or plyr. With aggregate you get the group means (of Fracht I assume) with the following call:
groupMeans <- aggregate(Fracht ~ Bewirtschaftungsform, daten, mean)
Rounding is suggested for printing:
groupMeans$Fracht <- round(groupMeans$Fracht, 2)
Within the ggplot object you can then just add:
+ geom_text(data=groupMeans,aes(label=price,y=0,x=0))
The last term may require some tweaking for the x and y values to optimize the position.

Related

geom_density blind in terms of the aesthetics supplied?

I have to admit that it has been a while since I used ggplot, but this seems a bit silly. Either I am missing something fundamental when trying to make a density plot, or there is a bug in ggplot2 (v3.3.2)
test <- data.frame(Time=rnorm(100),Age=rnorm(100))
ggplot(test,aes(y=Time,x=Age)) +
geom_density(aes(y=Time,x=Age))
produces
ggplot(test,aes(y=Time,x=Age)) +
geom_density(aes(y=Time,x=Age))
Error: geom_density requires the following missing aesthetics: y
how could the 'y' aesthetic be missing??
There are two cases when using geom_density(). It depends which stat layer you're specifying:
The standard case is the stat density which makes the geom_density() function compute its y values based on the frequency distribution of the given x values. In this case you must NOT proved a y aesthetic because those are computed behind the curtain.
Then there is a second case, which is yours, and which you have to specify explicitly by changing the stat to identity: This is needed if, for some reason, you've precalculated values which you want to feed directly into the density function.
Your problem arises, if you're mixing case 1) and 2). But I agree, the error message is not really clear, it could be mentioned to make sure that the used stat is the desired one.
library(ggplot2)
test <- data.frame(time = rnorm(100), age = rnorm(100))
#if you want to use precalculated y values you have to change the used stat to identity:
ggplot(test) +
geom_density(aes(x = age, y = time),
stat = "identity")
# compared to the case with the default value of stat: stat = "density"
ggplot(test) +
geom_density(aes(x = age))
Created on 2020-08-04 by the reprex package (v0.3.0)
If you want to plot the two variables in the graphic you need to "melt" it first.
test <- data.frame(Time=rnorm(100),Age=rnorm(100))
dt <- data.table(test)
dt_melt <- melt.data.table(dt)
ggplot(dt_melt,aes(x=value, fill=variable)) + geom_density(alpha=0.25)

r - scatterplot summary stat (e.g. sum or mean) for each point instead of individual data points

I am looking for a way to summarize data within a ggplot call, not before. I could pre-aggregate the data and then plot it, but I know there is a way to do it within a ggplot call. I'm just unsure how.
In this example, I want to get a mean for each (x,y) combo, and map it onto the colour aes
library(tidyverse)
df <- tibble(x = rep(c(1,2,4,1,5),10),
y = rep(c(1,2,3,1,5),10),
col = sample(c(1:100), 50))
df_summar <- df %>%
group_by(x,y) %>%
summarise(col_mean = mean(col))
ggplot(df_summar, aes(x=x, y=y, col=col_mean)) +
geom_point(size = 5)
I think there must be a better way to avoid the pre-ggplot step (yes, I could also have piped dplyr transformations into the ggplot, but the mechanics would be the same).
For instance, geom_count() counts the instances and plots them onto size aes:
ggplot(df, aes(x=x, y=y)) + geom_count()
I want the same, but mean instead of count, and col instead of size
I'm guessing I need stat_summary() or a stat() call (a replacement for ..xxx.. notation), but I can't get it to give me what I need.
You'll need stat_summary_2d:
ggplot(df, aes(x, y, z = col)) +
stat_summary_2d(aes(col = ..value..), fun = 'mean', geom = 'point', size = 5)
(Or calc(value), if you use the ggplot dev version, or read this in the future.)
You can pass any arbitrary function to fun.
While stat_summary seems like it would be useful, it is not in this case. It is specialized in the common transformation for plotting, summarizing a range of y values, grouped by x, into a set of summary statistics that are plotted as y(, ymin and ymax). You want to group by both x and y, so 2d it is.
Note that this uses binning however, so to get the points to accurately line up, you need to increase bin size (e.g. to 1e3). Unfortunately, there is no non-binning 2d summary stat.

How do I create a barplot in R with a cumulative standard deviation?

I want to make a plot similar to the one attached by Lindfield et al. 2016. I'm familiar with the ggplot command in R with the format:
ggplot(dataframe, aes(x, y)) + geom_bar(stat = 'identity')
However, I don't know how to make a cumulative se error for a stacked barplot; only one that employs a position_dodge command.
I know that there are disadvantages to using stacked bars with se errors, but for my data set, it is more presentable than using the unstacked barplots.
Thanks.
I don't know how you get the cumulative standard errors in an appropriate way (I guess it depends on how your values are generated) but I think you need to do calculate them and store them in a second DF, for example if you have an initial data.frame created like this:
DF <- data.frame( x=c("a","a","b","b"),
sp=c("shark","cod","shark","cod"),
y=c(10,5,15,7),
stringsAsFactors=FALSE )
where y is the value associated with each species at each x point, then you'd create a second DF containing the lower and upper limits of your s.e. for each x value, eg
seDF <- data.frame( x=c('a','b'),
yl=c(12,18),
yu=c(17,24),
stringsAsFactors=FALSE )
Then you can create your plot with:
ggplot() +
geom_bar( data=DF, mapping=aes(x=x,y=y,fill=sp),
position="stack", stat="identity") +
geom_linerange( data=seDF, mapping=aes(x=x, ymin=yl, ymax=yu) )
I used geom_linerange rather then geom_errorbar as it doesn't create crossbars at either end.

ggplot smooth pass aes variable to method.args

After many google searches I decided to ask for your help, guys.
I am plotting just some observations at different time points and I want to add a linear regression with stat_smooth. However, I want the linear model with the intercept at 100 (because data are percentage relative to time 0). To do that, I found that the easiest way is to use the offset parameter in lm. The problem is how to get the number of 'y' observations per group(col and facet groups) to pass it to offset parameter.
If I use data with the same number of observations per group (10 in my case), I can just write the number and it works great:
myplot <- ggplot(mydt2, aes(x=Time_point, y=GFP_rel, col=Gene, fill=Gene,group=Gene))
myplot <- myplot + stat_smooth(method='lm', formula = y ~ x + 0, method.args=list(offset=rep(100,10))) +
facet_wrap(~Cell_line)
However, this is not very elegant and/or flexible. My question is: how can I pass the number of observations to method.args? I tried offset(100,..count..), but I get the error: (list) object cannot be coerced to type 'integer').
Any suggestions?
Thanks
You can use the I(y - 100) coding in the formula as shown here instead of using an offset.
However, the predicted values for stat_smooth will then be predictions for y - 100, not y. This line will go through 0. You can move the lines back to the position to display predictions of the original y variable using position_nudge.
So the stat_smooth code would look something like
stat_smooth(method = "lm", formula = I(y - 100) ~ x + 0,
position = position_nudge(y = 100))

want to layer aes in ggplot2

I would like to plot another series of data on top of a current graph. The additional data only contains information for 3 (out of 6) spp, which are used in the facet_wraping.
The other series of data is currently a column (in the same data file).
Current graph:
ped.num <- ggplot(data, aes(ped.length, seeds.inflorstem))
ped.num + geom_point(size=2) + theme_bw() + facet_wrap(~spp, scales = "free_y")
Additional layer would be:
aes(ped.length, seeds.filled)
I feel I should be able to plot them using the same y-axis, because they have just slightly smaller values. How do I go about add this layer?
#ialm 's solution should work fine, but I recommend calling the aes function separately in each geom_* because it makes the code easier to read.
ped.num <- ggplot(data) +
geom_point(aes(x=ped.length, y=seeds.inflorstem), size=2) +
theme_bw() +
facet_wrap(~spp, scales="free_y") +
geom_point(aes(x=ped.length, y=seeds.filled))
(You'll always get better answers if you include example data, but I'll take a shot in the dark)
Since you want to plot two variables that are on the same data.frame, it's probably easiest to reshape the data before feeding it into ggplot:
library(reshape2)
# Melting data gives you exactly one observation per row - ggplot likes that
dat.melt <- melt(dat,
id.var = c("spp", "ped.length"),
measure.var = c("seeds.inflorstem", "seeds.filled")
)
# Plotting is slightly different - instead of explicitly naming each variable,
# you'll refer to "variable" and "value"
ggplot(dat.melt, aes(x = ped.length, y = value, color = variable)) +
geom_point(size=2) +
theme_bw() +
facet_wrap(~spp, scales = "free_y")
The seeds.filled values should plot only on the facets for the corresponding species.
I prefer this to Drew's (totally valid) approach of explicitly mapping different layers because you only need a single geom_point() whether you have two variables or twenty and it's easy to map a variety of aesthetics to variable.

Resources