I am trying to write a function that creates a barplot but I have trouble getting the fill aesthetic right.
If I use fill = !!x leads to Quosures can only be unquoted within a quasiquotation context.
and fill = x leads to Aesthetics must be either length 1 or the same as the data (4): fill
My Code:
genBar <- function(data, x, y) {
x <- enquo(x)
y <- enquo(y)
plot <- ggplot(data) +
geom_bar(aes(!!x, !!y),
stat = 'identity',
fill = <help>)
return(plot)
}
fill should be inside aes. Try :
library(ggplot2)
genBar <- function(data, x, y) {
plot <- ggplot(data) +
geom_bar(aes({{x}}, {{y}}, fill = {{x}}),
stat = 'identity')
return(plot)
}
genBar(mtcars, cyl, mpg)
If you want to pass column names as string use .data pronoun.
genBar <- function(data, x, y) {
plot <- ggplot(data) +
geom_bar(aes(.data[[x]], .data[[y]], fill = .data[[x]]),
stat = 'identity')
return(plot)
}
genBar(mtcars, "cyl", "mpg")
Are you looking for something like this?
library(dplyr)
library(ggplot2)
genBar <- function(data, x, y) {
x <- enquo(x)
y <- enquo(y)
plot <- ggplot(data) +
geom_bar(aes(!!x, !!y, fill = !!x),
stat = 'identity')
return(plot)
}
iris %>%
group_by(Species) %>%
summarize(Size = mean(Petal.Length)) %>%
genBar(Species, Size)
Created on 2020-12-04 by the reprex package (v0.3.0)
Related
I want to plot a data frame within a function. The legend should be ordered in a specific way. To keep things simple in my example I just reverse the order. I actually want to select a specific row and push it to the last position of the legend.
By the way I am creating a new R package, if this is in any way relevant.
Plotting outside of a function
attach(iris)
library(ggplot2)
# This is a normal plot
p <- ggplot(iris, aes(x = Sepal.Width, y = Sepal.Length, fill = Species) ) +
geom_bar( stat = "identity")
p
# This is a plot with reversed legend
iris$Species <- factor(iris$Species, levels = rev(levels(iris$Species)))
p <- ggplot(iris, aes(x = Sepal.Width, y = Sepal.Length, fill = Species) ) +
geom_bar( stat = "identity")
p
Plotting inside a function
The most simple approach was just to use variables, which obviously doesn't work
f1 <- function(myvariable) {
iris$myvariable <- factor(iris$myvariable, levels = rev(levels(iris$myvariable)))
p <- ggplot(iris, aes(x = Sepal.Width, y = Sepal.Length, fill = Species) ) +
geom_bar( stat = "identity")
p
}
f1("Species")
#> Error in `$<-.data.frame`(`*tmp*`, myvariable, value = integer(0)) :
replacement has 0 rows, data has 150
I tried to use quasiquotation, but this approach only let me plot the data frame. I cannot reverse the order yet.
library(rlang)
# Only plotting works
f2 <- function(myvariable) {
v1 <- ensym(myvariable)
p <- ggplot(iris, aes(x = Sepal.Width, y = Sepal.Length, fill = eval(expr(`$`(iris, !!v1))))) +
geom_bar( stat = "identity")
p
}
f2("Species")
# This crashes
f3 <- function(myvariable) {
v1 <- ensym(myvariable)
expr(`$`(iris, !!v1)) <- factor(expr(`$`(iris, !!v1)), levels = rev(levels(expr(`$`(iris, !!v1)))))
p <- ggplot(iris, aes(x = Sepal.Width, y = Sepal.Length, fill = eval(expr(`$`(iris, !!v1))))) +
geom_bar( stat = "identity")
p
}
f3("Species")
#> Error in `*tmp*`$!!v1 : invalid subscript type 'language'
So the main problem is, that I cannot assign something using quasiquotation.
A couple of things:
You can use [[ instead of $ to access data frame columns programmatically.
You can use ggplot's aes_string to minimize notes during R CMD check (since you mentioned you're doing a package).
You can "send" a factor level to the end with fct_relevel from package forcats.
Which translates to:
f <- function(df, var) {
lev <- levels(df[[var]])
df[[var]] <- forcats::fct_relevel(df[[var]], lev[1L], after = length(lev) - 1L)
ggplot(df, aes_string(x = "Sepal.Width", y = "Sepal.Length", fill = var)) +
geom_bar(stat = "identity")
}
f(iris, "Species")
How can I set different y axis limits in each plot when using purrr::map2?
I would like to set the y-axis lower limit to half the maximum y-axis value, something like: max(y-axis value/2).
data(mtcars)
library(tidyverse)
mtcars_split <-
mtcars %>%
split(mtcars$cyl)
plots <- map2(
mtcars_split,
names(mtcars_split),
~ggplot(data = .x, mapping = aes(y = mpg, x = wt)) +
geom_jitter() +
ggtitle(.y)+
scale_y_continuous(limits=c(max(.y)/2,NA))
)
plots
Error in max(.y)/2 : non-numeric argument to binary operator
.y is the name of the dataframe, which is why max(.y)/2 is giving you that error. This should give you what you want:
plots <- imap(
mtcars_split,
~ggplot(data = .x, mapping = aes(y = mpg, x = wt)) +
geom_jitter() +
ggtitle(.y) +
scale_y_continuous(limits=c(max(.x$mpg)/2,NA))
)
Note that imap(x, ...) is just shorthand for map2(x, names(x), ...).
This doesn't work based on the y-axis value, but it gets the job done if you don't mind specifying your y-column twice:
plots <- map2(
mtcars_split,
names(mtcars_split),
~ggplot(data = .x, mapping = aes(y = mpg, x = wt)) +
geom_jitter() +
ggtitle(.y)+
scale_y_continuous(limits=c(max(.x$mpg)/2,NA))
)
Or maybe a safer option:
plots <- map2(
mtcars_split,
names(mtcars_split),
~{
ploty <- 'mpg'
plotx <- 'wt'
ggplot(data = .x, mapping = aes_string(y = ploty, x = plotx)) +
geom_jitter() +
ggtitle(.y)+
scale_y_continuous(limits=c(max(.x[[ploty]])/2,NA))
}
)
I have composed a function that develops histograms using ggplot2 on the numerical columns of a dataframe that will be passed to it. The function stores these plots into a list and then returns the list.
However when I run the function I get the same plot again and again.
My code is the following and I provide also a reproducible example.
hist_of_columns = function(data, class, variables_to_exclude = c()){
library(ggplot2)
library(ggthemes)
data = as.data.frame(data)
variables_numeric = names(data)[unlist(lapply(data, function(x){is.numeric(x) | is.integer(x)}))]
variables_not_to_plot = c(class, variables_to_exclude)
variables_to_plot = setdiff(variables_numeric, variables_not_to_plot)
indices = match(variables_to_plot, names(data))
index_of_class = match(class, names(data))
plots = list()
for (i in (1 : length(variables_to_plot))){
p = ggplot(data, aes(x= data[, indices[i]], color= data[, index_of_class], fill=data[, index_of_class])) +
geom_histogram(aes(y=..density..), alpha=0.3,
position="identity", bins = 100)+ theme_economist() +
geom_density(alpha=.2) + xlab(names(data)[indices[i]]) + labs(fill = class) + guides(color = FALSE)
name = names(data)[indices[i]]
plots[[name]] = p
}
plots
}
data(mtcars)
mtcars$am = factor(mtcars$am)
data = mtcars
variables_to_exclude = 'mpg'
class = 'am'
plots = hist_of_columns(data, class, variables_to_exclude)
If you check the list plots you will discover that it contains the same plot repeated.
Simply use aes_string to pass string variables into the ggplot() call. Right now, your plot uses different data sources, not aligned with ggplot's data argument. Below x, color, and fill are separate, unrelated vectors though they derive from same source but ggplot does not know that:
ggplot(data, aes(x= data[, indices[i]], color= data[, index_of_class], fill=data[, index_of_class]))
However, with aes_string, passing string names to x, color, and fill will point to data:
ggplot(data, aes_string(x= names(data)[indices[i]], color= class, fill= class))
Here is strategy using tidyeval that does what you are after:
library(rlang)
library(tidyverse)
hist_of_cols <- function(data, class, drop_vars) {
# tidyeval overhead
class_enq <- enquo(class)
drop_enqs <- enquo(drop_vars)
data %>%
group_by(!!class_enq) %>% # keep the 'class' column always
select(-!!drop_enqs) %>% # drop any 'drop_vars'
select_if(is.numeric) %>% # keep only numeric columns
gather("key", "value", -!!class_enq) %>% # go to long form
split(.$key) %>% # make a list of data frames
map(~ ggplot(., aes(value, fill = !!class_enq)) + # plot as usual
geom_histogram() +
geom_density(alpha = .5) +
labs(x = unique(.$key)))
}
hist_of_cols(mtcars, am, mpg)
hist_of_cols(mtcars, am, c(mpg, wt))
I want to create multiple plots that have the same x but different y's using purrr package methodology. That is, I would like to use the map() or walk() functions to perform this.
Using mtcars dataset for simplicity.
ggplot(data = mtcars, aes(x = hp, y = mpg)) + geom_point()
ggplot(data = mtcars, aes(x = hp, y = cyl)) + geom_point()
ggplot(data = mtcars, aes(x = hp, y = disp)) + geom_point()
edit
So far I have tried
y <- list("mpg", "cyl", "disp")
mtcars %>% map(y, ggplot(., aes(hp, y)) + geom_point()
This is one possibility
ys <- c("mpg","cyl","disp")
ys %>% map(function(y)
ggplot(mtcars, aes(hp)) + geom_point(aes_string(y=y)))
It's just like any other map function, you just need to configure your aesthetics properly in the function.
I've made a bit more general function for this, because it's part of EDA protocol (Zuur et al., 2010). This article from Ariel Muldoon helped me.
plotlist <- function(data, resp, efflist) {
require(ggplot2)
require(purrr)
y <- enquo(resp)
map(efflist, function(x)
ggplot(data, aes(!!sym(x), !!y)) +
geom_point(alpha = 0.25, color = "darkgreen") +
ylab(NULL)
)
}
where:
data is your dataframe
resp is response variable
efflist is a char of effects (independent variables)
Of course, you may change the geom and/or aesthetics as it needs. The function returns a list of plots which you can pass to e.g. cowplot or gridExtra as in example:
library(gridExtra)
library(dplyr) # just for pipes
plotlist(mtcars, hp, c("mpg","cyl","disp")) %>%
grid.arrange(grobs = ., left = "HP")
The following code plot the predicted probability of several models against time. Having, all the plots on one graph was not readable so I divided the result in a grid.
I was wondering if it was possible to have only one ggplot with all the models then somehow specify which goes where with grid.arrange
Current :
p2.dat1 <- select(ppf, EXPOSURE, predp.glm.gen,predp.glm1, predp.glm2,predp.glm3,predp.glm4 )
mdf1 <- melt(p2.dat1 , id.vars="EXPOSURE")
plm.plot.all1 <- ggplot(data = mdf1,
aes(x = EXPOSURE, y = value, colour = variable)) +
geom_line()
p2.dat2 <- select(ppf, EXPOSURE, predp.glm.gen, predp.glm5,predp.glm.step )
mdf2 <- melt(p2.dat2 , id.vars="EXPOSURE")
plm.plot.all2 <- ggplot(data = mdf2,
aes(x = EXPOSURE, y = value, colour = variable)) +
geom_line()
grid.arrange(plm.plot.all1, plm.plot.all2, nrow=2)
Expected:
p2.dat <- select(ppf, EXPOSURE, predp.glm.gen,predp.glm1, predp.glm2,predp.glm3,predp.glm4,predp.glm5,predp.glm.step)
mdf <- melt(p2.dat , id.vars="EXPOSURE")
plm.plot.all <- ggplot(data = mdf1,
aes(x = EXPOSURE, y = value, colour = variable)) +
geom_line()
grid.arrange(plm.plot.all[some_selection_somehow], plm.plot.all[same], nrow=2)
Thanks,
You can do this with grid.arrange by writing some helper functions. It can be done more succinctly, but I prefer small focused functions that can be used with pipes.
library(tidyverse)
library(gridExtra)
# Helper Functions ----
plot_function <- function(x) {
ggplot(x, aes(x = EXPOSURE, y = value, colour = variable)) +
geom_line() +
labs(title = unique(x$variable)) +
theme(legend.position = "none")
}
grid_plot <- function(x, selection) {
order <- c(names(x)[grepl(selection,names(x))], names(x)[!grepl(selection,names(x))])
grid.arrange(grobs = x[order], nrow = 2)
}
# Actually make the plot ----
ppf %>%
select(EXPOSURE, predp.glm.gen,predp.glm1, predp.glm2,predp.glm3,predp.glm4,predp.glm5,predp.glm.step) %>%
gather(variable, value, -EXPOSURE) %>%
split(.$variable) %>%
map(plot_function) %>%
grid_plot("predp.glm3")
or you could do this with ggplot, a facet_wrap and factoring the variable column to the proper order. This has the benefits of shared axes across the plots, which facilitates easy comparison. You can alter the helper functions in the first approach to set the axes explicitly to achieve the same effect, but its just easier keeping it in ggplot.
library(tidyverse)
selection <- "predp.glm3"
plot_data <- ppf %>%
select(EXPOSURE, predp.glm.gen,predp.glm1, predp.glm2,predp.glm3,predp.glm4,predp.glm5,predp.glm.step) %>%
gather(variable, value, -EXPOSURE) %>%
mutate(variable = fct_relevel(variable, c(selection, levels(variable)[-grepl(selection, levels(variable))])))
ggplot(plot_data, aes(x = EXPOSURE, y = value, colour = variable)) +
geom_line() +
facet_wrap( ~variable, nrow = 2) +
theme(legend.position = "none")