R ggplot - box plot in generalzied function - r

With ggplot I'm trying to make a custom function to plot boxplot for a single column in a dataframe such that it can be used with any dataframe
Specific Example
male = data.frame(male = c(127,44,28,83,0,6,78,6,5,213,73,20,214,28,11)) # data from
ggplot(data = male, aes(x = "", y = male)) + geom_boxplot() +
stat_summary(fun=mean, geom="point", shape=20, size=2, color="red", fill="red")
This gives the expected boxplot with the mean shown as a point.
Generalized function - here the operation done in the specific example is wrapped into a generalized function
boxPlotFn = function (df, colName) {
ggplot(data = df, aes_string(x = "", y = colName)) + geom_boxplot() +
stat_summary(fun=mean, geom="point", shape=20, size=2, color="red", fill="red")
}
And I call the function like below
boxPlotFn(male, "male")
However, this gives the error Error: No expression to parse - rlang::last_error() indicates that the error is happening at the call to ggplot. What am I not doing right here?

That's a bit tricky but easily solved. To make your function work with aes_string you have to quote the "double quotes" mapped on x using e.g. single quotes. Additionally it should probably be data = df inside your function:
library(ggplot2)
male = data.frame(male = c(127,44,28,83,0,6,78,6,5,213,73,20,214,28,11)) # data from
boxPlotFn = function (df, colName) {
ggplot(data = df, aes_string(x = '""', y = colName)) +
geom_boxplot() +
stat_summary(fun=mean, geom="point", shape=20, size=2, color="red", fill="red")
}
boxPlotFn(male, "male")

Related

changing the facet_wrap labels using labeller in ggplot2

In my ggplot below, I'm trying to change the 10 facet labels of facet_wrap using labeller(sch.id=paste0("sch.id:", unique(ten$sch.id))).
However, the plot shows NA instead of the correct facet labels, I wonder what the fix is?
library(ggplot2)
hsb <- read.csv('https://raw.githubusercontent.com/rnorouzian/e/master/hsb.csv')
ten <- subset(hsb, sch.id %in% unique(sch.id)[1:10])
p <- ten %>% ggplot() + aes(ses, math) + geom_point() +
facet_wrap(~sch.id) + geom_smooth(method = "lm", se = FALSE)
p + facet_wrap(~sch.id, labeller = labeller(sch.id=paste0("sch.id:", unique(ten$sch.id)))) ## HERE ##
The problem seems to be that you are passing a variable to the labeller function but facet_wrap already passes its own faceting variable. A conflict occurs and the result are NA's.
The solution is to create a labeller function as a function of a variable x (or any other name as long as it's not the faceting variables' names) and then coerce to labeller with as_labeller.
Note that there is no need for unique, just like there is no need for it in the facet_wrap formula.
p <- ten %>% ggplot() + aes(ses, math) + geom_point() +
geom_smooth(method = "lm", formula = y ~ x, se = FALSE)
cust_labeller <- function(x) paste0("sch.id:", x)
p + facet_wrap(~ sch.id,
labeller = as_labeller(cust_labeller)) ## HERE ##
I think the easiest way would be to change sch.id before plotting.
library(ggplot2)
ten$sch.id <- paste0("sch.id:", ten$sch.id)
ggplot(ten) + aes(ses, math) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
facet_wrap(~sch.id)
If you don't want to modify your data and want to use the labeller argument you can create a named vector and use it in labeller.
cust_label <- setNames(paste0("sch.id:", unique(ten$sch.id)), unique(ten$sch.id))
ggplot(ten) + aes(ses, math) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
facet_wrap(~sch.id, labeller = as_labeller(cust_label))

Error with facet_grid() when using a function created for plotting

I have created a function for returning a plot in R. There seems to be a issue with facet_grid() that appears when plotting with the created plot function that does not appear when not using the function (even though I use the exact same lines of code).
# function for plotting
barplot_fill <- function(dataset, x, y, fill, jaar) {
p <- ggplot(dataset, aes(x=x, y=y, fill=fill)) +
geom_bar(stat = "identity") +
facet_grid(~ jaar) +
theme_bw() +
scale_y_continuous(labels=comma)
return(p)
}
I would like to plot variables from the following data frame:
df <- data.frame(V1=c(1,2,3,4), V2=c(20,25,46,13), V3=c('a','a','b','b'), V4=c(2018,2019,2018,2017))
When calling the function, I get the following error:
barplot_fill(df, V1, V2, V3, V4)
Error: At least one layer must contain all faceting variables: dataset$jaar.
* Plot is missing dataset$jaar
* Layer 1 is missing dataset$jaar
When I don't call the created function and just create the plot using the ggplot lines of code, R creates the plot and the error does not appear.
ggplot(df, aes(x=V1, y=V2, fill=V3)) +
geom_bar(stat = "identity") +
theme_bw() +
facet_grid(~ V4) +
scale_y_continuous(labels=comma)
I can't figure out why it gives me an error in the created function and why the error does not appear when run the exact same lines of code when not using the function. Can anyone explain me why the error appears when calling the created function?
The problem is that jaar is not evaluated in the facet_grid call, but ggplot is looking for a jaar column in the data set you provide. Actually, something similar happens in the ggplot-call for x, y, and fill if you remove the fact_grid part of the function:
barplot_fill_no_facet <- function(dataset, x, y, fill, jaar) {
p <- ggplot(dataset, aes(x = x, y = y, fill = fill)) +
geom_bar(stat = "identity") +
theme_bw() +
scale_y_continuous()
return(p)
}
barplot_fill_no_facet(df, V1, V2, V3, V4)
Error in FUN(X[[i]], ...) : object 'V1' not found
One solution uses aes_string and formula for facet_grid:
barplot_fill <- function(dataset, x, y, fill, jaar) {
p <- ggplot(dataset, aes_string(x = x, y = y, fill = fill)) +
geom_bar(stat = "identity") +
facet_grid(formula(paste("~", jaar))) +
theme_bw() +
scale_y_continuous()
return(p)
}
barplot_fill(df, "V1", "V2", "V3", "V4")
Apart from a little glitch with scale_y_continuous (you haven't defined comma), the problem is the evaluation of your variables. For aes, you can use aes_string and pass strings, but facet_grid has a different format. See under Variable Facets here.
barplot_fill <- function(dataset, x, y, fill, jaar) {
jaar <- enquo(jaar)
p <- ggplot(dataset, aes_string(x=x, y=y, fill=fill)) +
geom_bar(stat = "identity") +
facet_grid(cols = vars(!!jaar)) +
theme_bw()
return(p)
}
df <- data.frame(V1=c(1,2,3,4), V2=c(20,25,46,13), V3=c('a','a','b','b'), V4=c(2018,2019,2018,2017))
barplot_fill(df, "V1", "V2", "V3", V4)

combining geom_ribbon when x is a factor

Leading on from this question.
I cannot generate the shaded area when my x is a factor.
Here is some sample data
time <- as.factor(c('A','B','C','D'))
x <- c(1.00,1.03,1.03,1.06)
x.upper <- c(0.91,0.92,0.95,0.90)
x.lower <- c(1.11,1.13,1.17,1.13)
df <- data.frame(time, x, x.upper, x.lower)
ggplot(data = df,aes(time,x))+
geom_ribbon(aes(x=time, ymax=x.upper, ymin=x.lower), fill="pink", alpha=.5) +
geom_point()
when i substitute factor into the aes() I still cannot get the shaded region. Or if i try this:
ggplot()+
geom_ribbon(data = df, aes(x=time, ymax=x.upper, ymin=x.lower), fill="pink", alpha=.5) +
geom_point(data = df, aes(time,x))
I still cannot get the shading. Any ideas how to overcome this...
I think aosmith was exactly right, you simply need to convert your factor variable to numeric. I think the following code is what you're looking for:
ggplot(data = df,aes(as.numeric(time),x))+
geom_ribbon(aes(x=as.numeric(time), ymax=x.upper, ymin=x.lower),
fill="pink", alpha=.5) +
geom_point()
Which produces this plot:
EDIT EDIT: Change x-axis labels back to their original values taken from #aosmith in the comments below:
ggplot(data = df,aes(as.numeric(time),x))+
geom_ribbon(aes(x=as.numeric(time), ymax=x.upper, ymin=x.lower),
fill="pink", alpha=.5) +
geom_point() + labs(title="My ribbon plot",x="Time",y="Value") +
scale_x_continuous(breaks = 1:4, labels = levels(df$time))

R - How to overlay the average of a set of iid RVs

In the code below I build a 40x1000 data frame where in each column I have the cumulative means for successive random draws from an exponential distribution with parameter lambda = 0.2.
I add an additional column to host the specific number of the "draw".
I also calculate the rowmeans as df_means.
How do I add df_means (as a black line) on top of all my simulated RVs? I don't understand ggplot well enough to do this.
df <- data.frame(replicate(1000,cumsum(rexp(40,lambda))/(1:40)))
df$draw <- seq(1,40)
df_means <- rowMeans(df)
Molten <- melt(df, id.vars="draw")
ggplot(Molten, aes(x = draw, y = value, colour = variable)) + geom_line() + theme(legend.position = "none") + geom_line(df_means)
How would I add plot(df_means, type="l") to my ggplot, below?
Thank you,
You can make another data.frame with the means and ids and use that to draw the line,
df_means <- rowMeans(df)
means <- data.frame(id=1:40, mu=df_means)
ggplot(Molten, aes(x=draw, y=value, colour=variable)) +
geom_line() +
theme(legend.position = "none") +
geom_line(data=means, aes(x=id, y=mu), color="black")
As described here
stat_sum_df <- function(fun, geom="crossbar", ...) {
stat_summary(fun.data=fun, colour="red", geom=geom, width=0.2, ...)
}
k<-ggplot(Molten, aes(x = draw, y = value, colour = variable)) + geom_line() + theme(legend.position = "none")
k+stat_sum_single(mean) #gives you the required plot

Boxplot show the value of mean

In this boxplot we can see the mean but how can we have also the number value on the plot for every mean of every box plot?
ggplot(data=PlantGrowth, aes(x=group, y=weight, fill=group)) + geom_boxplot() +
stat_summary(fun.y=mean, colour="darkred", geom="point",
shape=18, size=3,show_guide = FALSE)
First, you can calculate the group means with aggregate:
means <- aggregate(weight ~ group, PlantGrowth, mean)
This dataset can be used with geom_text:
library(ggplot2)
ggplot(data=PlantGrowth, aes(x=group, y=weight, fill=group)) + geom_boxplot() +
stat_summary(fun=mean, colour="darkred", geom="point",
shape=18, size=3, show.legend=FALSE) +
geom_text(data = means, aes(label = weight, y = weight + 0.08))
Here, + 0.08 is used to place the label above the point representing the mean.
An alternative version without ggplot2:
means <- aggregate(weight ~ group, PlantGrowth, mean)
boxplot(weight ~ group, PlantGrowth)
points(1:3, means$weight, col = "red")
text(1:3, means$weight + 0.08, labels = means$weight)
You can use the output value from stat_summary()
ggplot(data=PlantGrowth, aes(x=group, y=weight, fill=group))
+ geom_boxplot()
+ stat_summary(fun.y=mean, colour="darkred", geom="point", hape=18, size=3,show_guide = FALSE)
+ stat_summary(fun.y=mean, colour="red", geom="text", show_guide = FALSE,
vjust=-0.7, aes( label=round(..y.., digits=1)))
You can also use a function within stat_summary to calculate the mean and the hjust argument to place the text, you need a additional function but no additional data frame:
fun_mean <- function(x){
return(data.frame(y=mean(x),label=mean(x,na.rm=T)))}
ggplot(PlantGrowth,aes(x=group,y=weight)) +
geom_boxplot(aes(fill=group)) +
stat_summary(fun.y = mean, geom="point",colour="darkred", size=3) +
stat_summary(fun.data = fun_mean, geom="text", vjust=-0.7)
The Magrittr way
I know there is an accepted answer already, but I wanted to show one cool way to do it in single command with the help of magrittr package.
PlantGrowth %$% # open dataset and make colnames accessible with '$'
split(weight,group) %T>% # split by group and side-pipe it into boxplot
boxplot %>% # plot
lapply(mean) %>% # data from split can still be used thanks to side-pipe '%T>%'
unlist %T>% # convert to atomic and side-pipe it to points
points(pch=18) %>% # add points for means to the boxplot
text(x=.+0.06,labels=.) # use the values to print text
This code will produce a boxplot with means printed as points and values:
I split the command on multiple lines so I can comment on what each part does, but it can also be entered as a oneliner. You can learn more about this in my gist.

Resources