Optional facets as a function parameter - r

How can you specify a facet parameter that is optional for facet_wrap(), without a weird additional label showing up?
For facet_wrap(), it works as expected when facets are specified. But if it's NULL, there is a weird (all) facet. Is it possible to get rid of that facet label without adding another parameter to the function?
foo_wrap <- function(x) {
ggplot(mtcars) +
aes(x = mpg, y = disp) +
geom_point() +
facet_wrap(vars({{ x }}))
}
foo_wrap (cyl) # cylinder facets
foo_wrap (NULL) # how to get rid of "(all)"?
How can you get rid of "(all)"?
Adding these examples as references in case people find this by searching
Below is an example function with optional facets for facet_grid(), where it works as expected:
foo_grid <- function(x) {
ggplot(mtcars) +
aes(x = mpg, y = disp) +
geom_point() +
facet_grid(rows=NULL, cols=vars({{ x }}))
}
foo_grid (cyl) # cylinder facets
foo_grid (NULL) # no facets, as expected
Here is an example with hard coded rows facetting. Note that you need to call vars():
foo_grid_am <- function(x) {
ggplot(mtcars) +
aes(x = mpg, y = disp) +
geom_point() +
facet_grid(rows=vars(am), cols=vars({{ x }}))
}
foo_grid_am (cyl) # automatic-manual x cylinder facets

One option would be to add a conditional facet layer where I use rlang::quo_is_null(rlang::enquo(x)) to check whether a faceting variable was provided or not:
Note: I made NULL the default.
library(ggplot2)
library(rlang)
foo_wrap <- function(x = NULL) {
facet_layer <- if (!rlang::quo_is_null(rlang::enquo(x))) facet_wrap(vars({{ x }}))
ggplot(mtcars) +
aes(x = mpg, y = disp) +
geom_point() +
facet_layer
}
foo_wrap(cyl)
foo_wrap(NULL)

Related

for-loop to create ggplots

I trying to make boxplots with ggplot2.
The code I have to make the boxplots with the format that I want is as follows:
p <- ggplot(mg_data, aes(x=Treatment, y=CD68, color=Treatment)) +
geom_boxplot(mg_data, mapping=aes(x=Treatment, y=CD68))
p+ theme_classic() + geom_jitter(shape=16, position=position_jitter(0.2))
I can was able to use the following code to make looped boxplots:
variables <- mg_data %>%
select(10:17)
for(i in variables) {
print(ggplot(mg_data, aes(x = Treatment, y = i, color=Treatment)) +
geom_boxplot())
}
With this code I get the boxplots however, they do not have the name label of what variable is being select for the y-axis, unlike the original code when not using the for loop. I also do not know how to add the formating code to the loop:
p + theme_classic() + geom_jitter(shape=16, position=position_jitter(0.2))
Here is a way. I have tested with built-in data set iris, just change the data name and selected columns and it will work.
suppressPackageStartupMessages({
library(dplyr)
library(ggplot2)
})
variables <- iris %>%
select(1:4) %>%
names()
for(i in variables) {
g <- ggplot(iris, aes(x = Species, y = get(i), color=Species)) +
geom_boxplot() +
ylab(i)
print(g)
}
Edit
Answering to a comment by user TarJae, reproduced here because answers are less deleted than comments:
Could you please expand with saving all four files. Many thanks.
The code above can be made to save the plots with a ggsave instruction at the loop end. The filename is the variable name and the plot is the default, the return value of last_plot().
for(i in variables) {
g <- ggplot(iris, aes(x = Species, y = get(i), color=Species)) +
geom_boxplot() +
ylab(i)
print(g)
ggsave(paste0(i, ".png"), device = "png")
}
Try this:
variables <- mg_data %>%
colnames() %>%
`[`(10:17)
for (i in variables) {
print(ggplot(mg_data, aes(
x = Treatment, y = {{i}}, color = Treatment
)) +
geom_boxplot())
}
Another option is to use lapply. It's approximately the same as using a loop, but it hides the actual looping part and can make your code look a little cleaner.
variables = iris %>%
select(1:4) %>%
names()
lapply(variables, function(x) {
ggplot(iris, aes(x = Species, y = get(x), color=Species)) +
geom_boxplot() + ylab(x)
})

Outliers appearing in box plot when I use plotly with ggplot2 (R)?

I have a problem with outliers appearing when they shouldn't when I use plotly, and I'm wondering how to stop this from happening. I'll use the mtcars dataset as an example.
I'll set an obvious outlier:
mtcars[1,1] = 60
The first plot, with outliers included:
p <- ggplot(mtcars) +
+ geom_boxplot(
+ aes(x = cyl, y = mpg, group = cyl))
Now I make a plot with outliers removed:
p <- ggplot(mtcars) +
geom_boxplot(
aes(x = cyl, y = mpg, group = cyl),
outlier.shape = NA)
Now I convert the plot to ggplotly and save
p <- plotly::ggplotly(p)
The outliers are showing. Does anyone know how I can get around this problem / have a solution when using ggplotly?
Found a solution:
for (i in 1:3) {
p$x$data[[i]]$marker <- list(opacity = 0)
}

R: Unexplainable behavior of ggplot inside a function

I have composed a function that develops histograms using ggplot2 on the numerical columns of a dataframe that will be passed to it. The function stores these plots into a list and then returns the list.
However when I run the function I get the same plot again and again.
My code is the following and I provide also a reproducible example.
hist_of_columns = function(data, class, variables_to_exclude = c()){
library(ggplot2)
library(ggthemes)
data = as.data.frame(data)
variables_numeric = names(data)[unlist(lapply(data, function(x){is.numeric(x) | is.integer(x)}))]
variables_not_to_plot = c(class, variables_to_exclude)
variables_to_plot = setdiff(variables_numeric, variables_not_to_plot)
indices = match(variables_to_plot, names(data))
index_of_class = match(class, names(data))
plots = list()
for (i in (1 : length(variables_to_plot))){
p = ggplot(data, aes(x= data[, indices[i]], color= data[, index_of_class], fill=data[, index_of_class])) +
geom_histogram(aes(y=..density..), alpha=0.3,
position="identity", bins = 100)+ theme_economist() +
geom_density(alpha=.2) + xlab(names(data)[indices[i]]) + labs(fill = class) + guides(color = FALSE)
name = names(data)[indices[i]]
plots[[name]] = p
}
plots
}
data(mtcars)
mtcars$am = factor(mtcars$am)
data = mtcars
variables_to_exclude = 'mpg'
class = 'am'
plots = hist_of_columns(data, class, variables_to_exclude)
If you check the list plots you will discover that it contains the same plot repeated.
Simply use aes_string to pass string variables into the ggplot() call. Right now, your plot uses different data sources, not aligned with ggplot's data argument. Below x, color, and fill are separate, unrelated vectors though they derive from same source but ggplot does not know that:
ggplot(data, aes(x= data[, indices[i]], color= data[, index_of_class], fill=data[, index_of_class]))
However, with aes_string, passing string names to x, color, and fill will point to data:
ggplot(data, aes_string(x= names(data)[indices[i]], color= class, fill= class))
Here is strategy using tidyeval that does what you are after:
library(rlang)
library(tidyverse)
hist_of_cols <- function(data, class, drop_vars) {
# tidyeval overhead
class_enq <- enquo(class)
drop_enqs <- enquo(drop_vars)
data %>%
group_by(!!class_enq) %>% # keep the 'class' column always
select(-!!drop_enqs) %>% # drop any 'drop_vars'
select_if(is.numeric) %>% # keep only numeric columns
gather("key", "value", -!!class_enq) %>% # go to long form
split(.$key) %>% # make a list of data frames
map(~ ggplot(., aes(value, fill = !!class_enq)) + # plot as usual
geom_histogram() +
geom_density(alpha = .5) +
labs(x = unique(.$key)))
}
hist_of_cols(mtcars, am, mpg)
hist_of_cols(mtcars, am, c(mpg, wt))

automagically using labels (haven semantics) in ggplot2 plots

I'm plotting data marked up using haven semantics, i.e. variables and values have labels defined via attributes.
Often, these labels are also what I want in my axis titles and ticks.
library(ggplot2)
mtcars$mpg = haven::labelled(mtcars$mpg, labels = c("low" = 10, "high" = 30))
attributes(mtcars$mpg)$label = "miles per gallon"
ggplot(mtcars, aes(mpg, cyl)) + geom_point() +
scale_x_continuous(attributes(mtcars$mpg)$label,
breaks = attributes(mtcars$mpg)$labels,
labels = names(attributes(mtcars$mpg)$labels))
Could I write a helper that replaces that laborious scale_x_continuous statement with something that can more easily be iterated? E.g. something like
scale_x_continuous(label_from_attr, breaks = breaks_from_attr, labels = value_labels_from_attr). Or maybe even + add_labels_from_attributes() to replace the whole thing?
I'm aware that I can write/use helpers like Hmisc::label to slightly shorten the attribute-code above, but that's not what I want here.
I don't have a good scale, but you can use a function like this:
label_x <- function(p) {
b <- ggplot_build(p)
x <- b$plot$data[[b$plot$labels$x]]
p + scale_x_continuous(
attributes(x)$label,
breaks = attributes(x)$labels,
labels = names(attributes(x)$labels)
)
}
Then use as (+ won't do):
p <- ggplot(mtcars, aes(mpg, cyl)) + geom_point()
label_x(p)
Alternatively, use a pipe:
mtcars %>% { ggplot(., aes(mpg, cyl)) + geom_point() } %>% label_x()
Old solution
use_labelled <- function(l, axis = "x") {
if (axis == "x") {
scale_x_continuous(attributes(l)$label,
breaks = attributes(l)$labels,
labels = names(attributes(l)$labels))
}
if (axis == "y") {
scale_y_continuous(attributes(l)$label,
breaks = attributes(l)$labels,
labels = names(attributes(l)$labels))
}
}
Then you just give:
ggplot(mtcars, aes(mpg, cyl)) + geom_point() + use_labelled(mtcars$cyl)
Or for the y-axis:
ggplot(mtcars, aes(cyl, mpg)) + geom_point() + use_labelled(mtcars$cyl, "y")
Another approach is to write a wrapper for ggplot() that has its own class. Then attributes have full visibility when the corresponding print method is called. See ?ag.print from package 'yamlet' (0.2.1).
library(ggplot2)
library(yamlet)
library(magrittr)
mtcars$disp %<>% structure(label = 'displacement', unit = 'cu. in.')
mtcars$mpg %<>% structure(label = 'mileage', unit = 'miles/gallon')
mtcars$am %<>% factor(levels = c(0,1), labels = c('automatic','manual'))
mtcars$am %<>% structure(label = 'transmission')
agplot(mtcars, aes(disp, mpg, color = am)) + geom_point()

ggplot using purrr map() to plot same x with multiple y's

I want to create multiple plots that have the same x but different y's using purrr package methodology. That is, I would like to use the map() or walk() functions to perform this.
Using mtcars dataset for simplicity.
ggplot(data = mtcars, aes(x = hp, y = mpg)) + geom_point()
ggplot(data = mtcars, aes(x = hp, y = cyl)) + geom_point()
ggplot(data = mtcars, aes(x = hp, y = disp)) + geom_point()
edit
So far I have tried
y <- list("mpg", "cyl", "disp")
mtcars %>% map(y, ggplot(., aes(hp, y)) + geom_point()
This is one possibility
ys <- c("mpg","cyl","disp")
ys %>% map(function(y)
ggplot(mtcars, aes(hp)) + geom_point(aes_string(y=y)))
It's just like any other map function, you just need to configure your aesthetics properly in the function.
I've made a bit more general function for this, because it's part of EDA protocol (Zuur et al., 2010). This article from Ariel Muldoon helped me.
plotlist <- function(data, resp, efflist) {
require(ggplot2)
require(purrr)
y <- enquo(resp)
map(efflist, function(x)
ggplot(data, aes(!!sym(x), !!y)) +
geom_point(alpha = 0.25, color = "darkgreen") +
ylab(NULL)
)
}
where:
data is your dataframe
resp is response variable
efflist is a char of effects (independent variables)
Of course, you may change the geom and/or aesthetics as it needs. The function returns a list of plots which you can pass to e.g. cowplot or gridExtra as in example:
library(gridExtra)
library(dplyr) # just for pipes
plotlist(mtcars, hp, c("mpg","cyl","disp")) %>%
grid.arrange(grobs = ., left = "HP")

Resources