Why isn't my custom errorbar function working in R? - r

I have tried to make a function to quickly make an error bar based on a grouping factor and a numerical value as defined below:
#### Function ####
quick.error <- function(data,x,y){
d <- data
plot.d <- d %>%
mutate(x = as.factor(x)) %>%
group_by(x) %>%
summarise(
sd = sd(y, na.rm = TRUE),
mean = mean(y, na.rm=TRUE)
) %>%
ggplot(aes(x,
mean,
fill=x)) +
geom_col(color = "black") +
geom_errorbar(aes(ymin = mean-sd,
ymax = mean+sd),
width = 0.2) +
theme(legend.position = "none")
return(plot.d)
}
However, when I try to run this with the iris dataset:
#### Test ####
quick.error(data=iris,
x=Species,
y=Petal.Length)
This gives me an error:
Error in `mutate()`:
! Problem while computing `x = as.factor(x)`.
Caused by error in `is.factor()`:
! object 'Species' not found
Running it explicitly with $ operators gives me a different issue:
#### Test ####
quick.error(data=iris,
x=iris$Species,
y=iris$Petal.Length)
As you can see here, it has made all the bars the same, I assume because it did not group the mean like it was supposed to:
How do I fix this problem?

As I indicate in my comment, this is a typical non-standard evaluation problem. Here's a revised function that I believe gives you what you want.
quick.error <- function(data,x,y){
d <- data
plot.d <- d %>%
mutate({{ x }} := as.factor({{ x }})) %>%
group_by({{ x }}) %>%
summarise(
sd = sd({{ y }}, na.rm = TRUE),
mean = mean({{ y }}, na.rm=TRUE)
) %>%
ggplot(aes({{ x }},
mean,
fill={{ x }})) +
geom_col(color = "black") +
geom_errorbar(aes(ymin = mean-sd,
ymax = mean+sd),
width = 0.2) +
theme(legend.position = "none")
return(plot.d)
}
quick.error(data=iris,
x=Species,
y=Petal.Length)

Passing unquoted column names to a function
... requires injection with the embracing operator {{ or, in more complex cases, the injection operator !!.
For more on that see e.g. this vignette.
Hence you could make your function work by wrapping x and y inside your function in {{:
quick.error <- function(data, x, y) {
d <- data
plot.d <- d %>%
mutate(x = as.factor({{ x }})) %>%
group_by(x) %>%
summarise(
sd = sd({{ y }}, na.rm = TRUE),
mean = mean({{ y }}, na.rm = TRUE)
) %>%
ggplot(aes(x,
mean,
fill = x
)) +
geom_col(color = "black") +
geom_errorbar(aes(
ymin = mean - sd,
ymax = mean + sd
),
width = 0.2
) +
theme(legend.position = "none")
return(plot.d)
}
library(ggplot2)
library(dplyr)
quick.error(
data = iris,
x = Species,
y = Petal.Length
)

Related

R: How do you order error plots by variance?

Is it possible to order error plots in R by their variance? So that they are from greatest variance to least?
Code:
library(ggplot2)
df <- ToothGrowth
df$dose <- as.factor(df$dose)
head(df, 3)
library(dplyr)
df.summary <- df %>%
group_by(dose) %>%
summarise(
sd = sd(len, na.rm = TRUE),
len = mean(len)
)
df.summary
f <- ggplot(
df.summary,
aes(x = dose, y = len, ymin = len-sd, ymax = len+sd)
)
f + geom_pointrange()
# Standard error bars
f + geom_errorbar(width = 0.2) +
geom_point(size = 1.5)
Any help at all would be greatly appreciated!
Continue the pipe coercing the sd to ordered factor with the order given by the numeric sd. Then plot as in the question. All that needs to change is the mutate below.
df.summary <- df %>%
group_by(dose) %>%
summarise(
sd = sd(len, na.rm = TRUE),
len = mean(len)
) %>%
mutate(i = order(sd, decreasing = TRUE),
dose = ordered(dose, levels = dose[i])) %>%
select(-i)

"invalid argument type" passing html colour through argument of function

The function is:
violin_rating_by_time <- function(df,var,color,title){
label <- df %>%
filter(!is.na(Decade))%>%
filter(!is.na(!!as.name(var)))%>%
mutate(Decade = fct_reorder(factor(Decade),Release.Date))%>%
group_by(Decade)%>%
summarise(temp=median(!!as.name(var)))
names(label)[names(label)=="temp"] <- var
plot <- df %>%
filter(!is.na(Decade))%>%
mutate(Decade = fct_reorder(factor(Decade),Release.Date))%>%
ggplot(aes(x=Decade,y=!!as.name(var)))+
geom_violin(color=!!as.character(color),fill=!!as.character(color),position="dodge",outlier.colour="transparent",alpha = 0.4,draw_quantiles = c(0.5))+
geom_label(data = label, aes(label = !!as.name(var)),color=!!as.character(color),vjust=-0.4,hjust=0.9)+
theme_aes()+
theme(legend.position = "NA") +
ggtitle(title)
return(plot)
}
I pass arguments into the function (df is just a dataframe Im using, not too important.)
violin_rt <- violin_rating_by_time(df=df
,var="Metacritic.Score"
,color="#EDDDD4"
,title="Distribution of MC Scores by Decade")
I get the following error message
Error in !as.character(color) : invalid argument type
its coming from the geom_violin function. when I plot without colour, it works fine.
Thanks!
edit: theme_aes() is function I use for some basic theme arguments in ggplot. its not causing the error.
Not sure root cause, but it works for me if the !!as.character(color)'s are replaced with {{ color }}.
violin_rating_by_time <- function(df,var,color,title){
label <- df %>%
filter(!is.na(Decade))%>%
filter(!is.na(!!as.name(var)))%>%
mutate(Decade = fct_reorder(factor(Decade),Release.Date))%>%
group_by(Decade)%>%
summarise(temp=median(!!as.name(var)))
names(label)[names(label)=="temp"] <- var
plot <- df %>%
filter(!is.na(Decade))%>%
mutate(Decade = fct_reorder(factor(Decade),Release.Date))%>%
ggplot(aes(x=Decade,y=!!as.name(var)))+
geom_violin(color= {{ color }},fill= {{ color }},position="dodge",alpha = 0.4,draw_quantiles = c(0.5))+
geom_label(data = label, aes(label = !!as.name(var)),color= {{ color }},vjust=-0.4,hjust=0.9)+
theme(legend.position = "NA") +
ggtitle(title)
return(plot)
}
Fake data
df <- data.frame(Decade = 2000,
Metacritic.Score = rnorm(10, 10),
Release.Date = 2005)
Test
violin_rating_by_time(df=df
,var="Metacritic.Score"
,color="#EDDDD4"
,title="Distribution of MC Scores by Decade")
There is no need for !! or {{. First, color is already a color code passed as character string. So you could simply pass it to the color and/or fill argument in geom_violin as is. Additionally, !! will only work within a quasiquotation context, e.g. inside a dplyr pipeline:
library(dplyr)
dd <- data.frame(x = 1)
foo <- "foo"
## This works
dd %>%
mutate(y = !!as.character(foo))
#> x y
#> 1 1 foo
## This does not work
dd$z <- !!as.character(foo)
#> Error in !as.character(foo): invalid argument type
Second, when you pass a column name as a character string you could access the column inside a dplyr pipeline or inside aes() via the .data pronoun which is also the recommended way to do so. See Programming with dplyr and Best practices for programming with ggplot2.
Making use of the example data provided by #JonSpring:
violin_rating_by_time <- function(df, var, color, title) {
label <- df %>%
filter(!is.na(Decade)) %>%
filter(!is.na(.data[[var]])) %>%
mutate(Decade = fct_reorder(factor(Decade), Release.Date)) %>%
group_by(Decade) %>%
summarise(temp = median(.data[[var]]))
names(label)[names(label) == "temp"] <- var
df %>%
filter(!is.na(Decade)) %>%
mutate(Decade = fct_reorder(factor(Decade), Release.Date)) %>%
ggplot(aes(x = Decade, y = .data[[var]])) +
geom_violin(color = color, fill = color, position = "dodge", outlier.colour = "transparent", alpha = 0.4, draw_quantiles = c(0.5)) +
geom_label(data = label, aes(label = .data[[var]]), color = color, vjust = -0.4, hjust = 0.9) +
#theme_aes() +
theme(legend.position = "NA") +
ggtitle(title)
}
library(ggplot2)
library(dplyr)
library(forcats)
set.seed(42)
df <- data.frame(Decade = 2000,
Metacritic.Score = rnorm(10, 10),
Release.Date = 2005)
violin_rating_by_time(
df = df,
var = "Metacritic.Score",
color = "#EDDDD4",
title = "Distribution of MC Scores by Decade"
)
#> Warning: Ignoring unknown parameters: outlier.colour

Use scale_x_continuous with labeller function that also takes a data frame as an argument as well as default breaks

Here's a code block:
# scale the log of price per group (cut)
my_diamonds <- diamonds %>%
mutate(log_price = log(price)) %>%
group_by(cut) %>%
mutate(scaled_log_price = scale(log_price) %>% as.numeric) %>% # scale within each group as opposed to overall
nest() %>%
mutate(mean_log_price = map_dbl(data, ~ .x$log_price %>% mean)) %>%
mutate(sd_log_price = map_dbl(data, ~ .x$log_price %>% sd)) %>%
unnest %>%
select(cut, price, price_scaled:sd_log_price) %>%
ungroup
# for each cut, find the back transformed actual values (exp) of each unit of zscore between -3:3
for (i in -3:3) {
my_diamonds <- my_diamonds %>%
mutate(!! paste0('mean_', ifelse(i < 0 , 'less_', 'plus_'), abs(i), 'z') := map2(.x = mean_log_price, .y = sd_log_price, ~ (.x + (i * .y)) %>% exp) %>% unlist)
}
my_diamonds_split <- my_diamonds %>% group_split(cut)
split_names <- my_diamonds %>% mutate(cut = as.character(cut)) %>% group_keys(cut) %>% pull(cut)
names(my_diamonds_split) <- split_names
I now have a variable my_diamonds_split that is a list of data frames. I would like to loop over these data frames and each time create a new ggplot.
I can use a custom labeller function with a single df, but I don't know how to do this within a loop:
labeller <- function(x) {
paste0(x,"\n", scales::dollar(sd(ex_df$price) * x + mean(ex_df$price)))
}
ex_df <- my_diamonds_split$Ideal
ex_df %>%
ggplot(aes(x = scaled_log_price)) +
geom_density() +
scale_x_continuous(label = labeller, limits = c(-3, 3))
This creates a plot for the 'Ideal' cut of diamonds. I also get two data points on the x axis, the zscore values at -2, 0 and 2 as well as the raw dollar values of 3.8K, 3.9K and 11.8K.
When I define the labeller function, I must specify the df to scale with. Tried instead with placing the dot instead of my_df, hoping that on each iteration ggplot would get the value of the df on any iteration:
labeller <- function(x) {
paste0(x,"\n", scales::dollar(sd(.$price) * x + mean(.$price)))
}
ex_df <- my_diamonds_split$Ideal
ex_df %>%
ggplot(aes(x = scaled_log_price)) +
geom_density() +
scale_x_continuous(label = labeller, limits = c(-3, 3))
Returns:
Error in is.data.frame(x) : object '.' not found
I then tried writing the function to accept an argument for the df to scale with:
labeller <- function(x, df) {
paste0(x,"\n", scales::dollar(sd(df$price) * x + mean(df$price)))
}
ex_df <- my_diamonds_split$Ideal
ex_df %>%
ggplot(aes(x = scaled_log_price)) +
geom_density() +
scale_x_continuous(label = labeller(df = ex_df), limits = c(-3, 3)) # because when it comes to running in real life, I will try something like labeller(df = my_diamonds_split[[i]])
Error in paste0(x, "\n", scales::dollar(sd(df$price) * x + mean(df$price))) :
argument "x" is missing, with no default
Bearing in mind that the scaling must be done per iteration, how could I loop over my_diamonds_split, and on each iteration generate a ggplot per above?
labeller <- function(x) {
# how can I make df variable
paste0(x,"\n", scales::dollar(sd(df$price) * x + mean(df$price)))
}
for (i in split_names) {
my_diamonds_split[[i]] %>%
ggplot(aes(x = scaled_log_price)) +
geom_density() +
scale_x_continuous(label = labeller, # <--- here, labeller must be defined with df$price except that will difer on each iteration
limits = c(-3, 3))
}
There's a hacky way to get this result in facets. Basically, after converting to z scores, you add different amounts (say, multiples of 1000) to each group's z scores. Then you set all the breaks to this collection of points and label them with pre-calculated labels.
library(ggplot2)
library(dplyr)
f <- function(x) {
y <- diamonds$price[diamonds$cut == x]
paste(seq(-3, 3), scales::dollar(round(mean(y) + seq(-3, 3) * sd(y))), sep = "\n")
}
breaks <- as.vector(sapply(levels(diamonds$cut), f))
diamonds %>%
group_by(cut) %>%
mutate(z = scale(price) + 3 + 1000 * as.numeric(cut)) %>%
ggplot(aes(z)) +
geom_point(aes(x = z - 2, y = 1), alpha = 0) +
geom_density() +
scale_x_continuous(breaks = as.vector(sapply(1:5 * 1000, "+", 0:6)),
labels = breaks) +
facet_wrap(vars(cut), scales = "free_x") +
theme(text = element_text(size = 16),
axis.text.x = element_text(size = 6))
You would have to increase the plot size to make the dollar values more visible of course.
Created on 2020-08-04 by the reprex package (v0.3.0)

Using ggplot2 within map(), how to refer objects in formula of sec_axis()?

I created graphs of ggplot2 using map(). I want to create second y-axis referring other objects, but object names in formula of sec_axis() aren't parsed within map(). How to refer other objects in formula of sec_axis()?
Any help would be greatly appreciated. Below is an example code and the outputs:
library(tidyverse)
set.seed(1)
d <- data_frame(n = sample(500:1000, 15),
group = letters[rep(1:5, 3)],
year = rep(2011:2013, each = 5)) %>%
nest(-year)
d <- d %>%
mutate(
gg1 = map2(data, year, ~ {
total <- sum(.x$n)
ggplot(.x, aes(x = group, y = n)) +
geom_bar(stat = "identity") +
ggtitle(paste0("year = ", .y, "; total = ", total)) # no problem
}),
gg2 = map2(data, year, ~ {
total <- sum(.x$n)
ggplot(.x, aes(x = group, y = n)) +
geom_bar(stat = "identity") +
ggtitle(paste0("year = ", .y, "; total = ", total)) +
scale_y_continuous(sec.axis = sec_axis(~ ./total)) # problem line
}))
d$gg1[[1]] # run
d$gg2[[1]] # Error in eval(expr, data, expr_env) : object 'total' not found
Thank you for your response. I noticed that formula() can gives a formula and parsed object to sec_axis() within map2(). (this method doesn't solves the problem within map(), I think this is because . isn't explicit in map())
d <- d %>%
mutate(
gg2 = map2(data, year, ~ {
total <- sum(.x$n)
ggplot(.x, aes(x = group, y = n)) +
geom_bar(stat = "identity") +
ggtitle(paste0("year = ", .y, "; total = ", total)) +
scale_y_continuous(sec.axis = sec_axis(formula(paste0(" ~ ./", total))))
}),
gg3 = map(data, ~ {
total <- sum(.x$n)
ggplot(.x, aes(x = group, y = n)) +
geom_bar(stat = "identity") +
scale_y_continuous(sec.axis = sec_axis(formula(paste0(" ~ ./", total))))
}))
d$gg2[[1]] # run
d$gg3[[1]] # Error in as.list.environment(x, all.names = TRUE) :
# the ... list does not contain 2 elements

ggplot and dplyr showing standard error

I have a plot where I plot multiple lines and for each point I'd like to show the error bars. I need to use
geom_errorbar(aes(ymax=ymax, ymin=ymin), width=0.25) + xlab('points')
My question is how best to make the ymax an ymin columns
Currently the data frame looks like this
data1 <- data.frame(
group=c("A","A","A","A","B","B","B","B"),
x= c(1,2,3,4,5,6,7,8),
y = c(1,2,3,4,5,6,7,8),
z= c(10,20,30,40,50,60,70,80)) # sample data matrix
data2 = as.data.frame (data1 %>% group_by( group ) %>%
summarise(
MU_Y= mean(y),
upper_limit_Y =MU_Y+(1.96*sd(y, na.rm = TRUE)/sqrt(sum(!is.na(y)))),
lower_limit_Y = MU_Y-(1.96*sd(y, na.rm = TRUE)/sqrt(sum(!is.na(y)))),
MU_Z= mean(z),
upper_limit_Z =MU_Z+(1.96*sd(z, na.rm = TRUE)/sqrt(sum(!is.na(z)))),
lower_limit_Z = MU_Z-(1.96*sd(z, na.rm = TRUE)/sqrt(sum(!is.na(z))))
) %>%
gather(key =Metric, value = Value ,
#c(MU_Y,lower_limit_Y,upper_limit_Y,MU_Z, upper_limit_Z,lower_limit_Z) )
c(MU_Y,MU_Z) )
)
group upper_limit_Y lower_limit_Y upper_limit_Z lower_limit_Z Metric Value
1 A 3.765175 1.234825 37.65175 12.34825 MU_Y 2.5
2 B 7.765175 5.234825 77.65175 52.34825 MU_Y 6.5
3 A 3.765175 1.234825 37.65175 12.34825 MU_Z 25.0
4 B 7.765175 5.234825 77.65175 52.34825 MU_Z 65.0
ggplot(data2, aes(x = group, y= Value, group = Metric ))+
geom_line()+
geom_point()
I need to make 2 new columns to the data frame ymin and ymax that are the appropriate upper/lower limits. the new column that are added should look like this (I didnt copy all decimal places):
ymin ymax
1.23.. 3.76..
5.23.. 7.76..
12.34.. 37.65..
52... 77.65...
then I'd be able to plot the lines and the error bars for each point.
ggplot(data2, aes(x = group, y= Value, group = Metric ))+
geom_line()+
geom_point() +
geom_errorbar(aes(ymax=ymax, ymin=ymin), width=0.25) +
xlab('points')
I think you have your gather in the wrong place.
data2 <- data1 %>%
gather(key = Metric, value = Value, -group, -x) %>%
group_by(group, Metric) %>%
summarise(
MU = mean(Value),
SD = sd(Value, na.rm = TRUE),
N = sum(!is.na(Value)),
upper_limit = MU + SD/sqrt(N),
lower_limit = MU - SD/sqrt(N)
)
ggplot(data2, aes(x = group, y= MU, group = Metric ))+
geom_line()+
geom_point() +
geom_errorbar(aes(ymax=upper_limit, ymin=lower_limit), width=0.25) +
xlab('points')
Does this do what you want?

Resources