We can see how to plot a single variable (along with its index).
How can we pipe to a ggplot?
Example
Since
library(ggplot2)
qplot(seq_along(iris$Sepal.Length), iris$Sepal.Length)
yields
I expected
iris$Sepal.Length %>% { qplot(seq_along(.), .) }
to yield the same. But
Error: Discrete value supplied to continuous scale
Question
How do we pipe a single variable to a ggplot?
Seems to get it working you need to explicitly print it when inside a chain.
library(magrittr)
library(ggplot2)
iris$Sepal.Length %>% {print(qplot(seq_along(.), .))}
You can use the following code
library(tidyverse)
iris %>% ggplot(aes(seq_along(Sepal.Length), Sepal.Length))+
geom_point() + theme_bw()+
labs(title="Plot of Sepal length",x="Sepal.Length seq", y = "Sepal.Length")
Related
My question is about using a for loop to repeat data analysis based on a categorial variable.
Using the built in Iris data set how would I run a for loop on the code below so it first produces this chart for just setosa and then versicolor and then virginica without me having to manually change/set the species?
ggplot(iris, aes(Sepal.Length, Sepal.Width)) +
geom_point()
I'm just starting out and have no idea what I'm doing
You need to use print() as described here
library(tidyverse)
data(iris)
species <- iris |> distinct(Species) |> unlist()
for(i in species) {
p <- iris |>
filter(Species == i) |>
ggplot() +
geom_point(aes(x=Sepal.Length, y=Sepal.Width)) +
ggtitle(i)
print(p)
}
You can use a for loop as u/DanY posted; however, it's harder to store and retrieve plots in a universal way with that structure. Running the loop code makes it difficult to retrieve any one particular plot - you would only see the last plot in the output window and have to go "back" to see the others. I would suggest using a list structure instead to allow you to retrieve any one of the individual plots in subsequent functions.
For this, you can use lapply() rather than for(...) { ... }.
Here's an example which uses dplyr and tidyr:
library(ggplot2)
library(dplyr)
library(tidyr)
unique_species <- unique(iris$Species)
myPlots <- lapply(unique_species, function(x) {
ggplot(
data = iris %>% dplyr::filter(Species == x),
mapping = aes(x=Sepal.Length, y=Sepal.Width)
) +
geom_point() +
labs(title=paste("Plot of ", x))
})
You then have the plots stored within myPlots. You can access each plot via myPlots[1], myPlots[2] or myPlots[3]... or you can plot them all together via patchwork or another similar package. Here's one way using cowplot:
cowplot::plot_grid(plotlist = myPlots, nrow=1)
Problem: purrr::pmap() output incompatible with ggplot::aes()
The following reprex boils down to a single question, is there anyway we can use the quoted variable names inside ggplot2::aes() instead of the plain text names? Example: we typically use ggplot(mpg, aes(displ, cyl)) , how to make aes() work normally with ggplot(mpg, aes("displ", "cyl")) ?
If you understood my question, the remainder of this reprex really adds no information. However, I added it to draw the full picture of the problem.
More details: I want to use purrr functions to create a bunch of routinely exploratory data analysis plots effortlessly. The problem is, purrr::pmap() results the string-quoted name of the variables, which ggplot::aes() doesn't understand. As far as I'm concerned, the functions cat() and as.name() can take the string-quoted variable name and return it in the very typical way that aes() understands; unquoted. However, neither of them worked. The following reprex reproduces the problem. I commented the code to spare you the pain of figuring out what the code does.
library(tidyverse)
# Divide the classes of variables into numeric and non-numeric. Goal: place a combination of numeric variables on the axes wwhile encoding a non-numeric variable.
mpg_numeric <- map_lgl(.x = seq_along(mpg), .f = ~ mpg[[.x]] %>% class() %in% c("numeric","integer"))
mpg_factor <- map_lgl(.x = seq_along(mpg), .f = ~ mpg[[.x]] %>% class() %in% c("factor","character"))
# create all possible combinations of the variables
eda_routine_combinations <- expand_grid(num_1 = mpg[mpg_numeric] %>% names(),
num_2 = mpg[mpg_numeric] %>% names(),
fct = mpg[mpg_factor] %>% names()) %>%
filter(num_1 != num_2) %>% slice_head(n = 2) # for simplicity, keep only the first 2 combinations
# use purrr::pmap() to create all the plots we want in a single call
pmap(.l = list(eda_routine_combinations$num_1,
eda_routine_combinations$num_2,
eda_routine_combinations$fct) ,
.f = ~ mpg %>%
ggplot(aes(..1 , ..2, col = ..3)) +
geom_point() )
Next we pinpoint the problem using a typical ggplot2 call.
this is what we want purrr::pmap() to create in its iterations:
mpg %>%
ggplot(aes(displ , cyl, fill = drv)) +
geom_boxplot()
However, this is purrr::pmap() renders; quoted variable names:
mpg %>%
ggplot(aes("displ" , "cyl", fill = "drv")) +
geom_boxplot()
Failing attempts
Using cat() to transform the quoted variable names from pmap() into unquoted form for aes() to understand fails.
mpg %>%
ggplot(aes(cat("displ") , cat("cyl"), fill = cat("drv"))) +
geom_boxplot()
Using as.name() to transform the quoted variable names from pmap() into unquoted form for aes() to understand fails.
mpg %>%
ggplot(aes(as.name("displ") , as.name("cyl"), fill = as.name("drv"))) +
geom_boxplot()
Bottom line
Is there a way to make ggplot(aes("quoted_var_name")) work properly?
I'm looking for a way to apply a function to either specified labels, or to all labels that are included in the plot. The goal is to have neat human readable labels that derive from the default labels, without having to specify each.
To demonstrate what I am looking for in terms of the input variable names and the output, I am including an example based on the starwars data set, that uses the versatile snakecase::to_sentence_case() function, but this could apply to any function, including ones that expand short variable names in pre-determined ways:
library(tidyverse)
library(snakecase)
starwars %>%
filter(mass < 1000) %>%
mutate(species = species %>% fct_infreq %>% fct_lump(5) %>% fct_explicit_na) %>%
ggplot(aes(height, mass, color=species, size=birth_year)) +
geom_point() +
labs(
x = to_sentence_case("height"),
y = to_sentence_case("mass"),
color = to_sentence_case("species"),
size = to_sentence_case("birth_year")
)
Which produces the following graph:
The graph is the desired output, but requires that each of the labels be specified by hand, increasing the possibility of error if the variables are later changed. Note that if I had not specified the labels, all the labels would have been applied automatically, but with the variable names instead of the prettier versions.
This issue seems to be somewhat related to what the labeller() function is intended for, but it seems that it only applies to facetting. Another related issue is raised in this question. However, both of these seem to apply only to values contained within the data, not to the variable names that are being used in the plot, which is what I am looking for.
The very helpful answer by #z-lin demonstrated to me a simple way to do this by simply modifying the plot object before printing.
The intended result can be achieved with the help of gg_apply_labs(), a short function that will apply an arbitrary string processing function to the $labels of a plot object. The resulting code should be a self-contained illustration of this approach:
# Packages
library(tidyverse)
library(snakecase)
# This applies fun to each label present in the plot object
#
# fun should accept and return character vectors, it can either be a simple
# prettyfying function or it can perform more complex lookup to replace
# variable names with variable labels
gg_apply_labs <- function(p, fun) {
p$labels <- lapply(p$labels, fun)
p
}
# This gives the intended result
# Note: The plot is assigned to a named variable before piping to apply_labs()
p <- starwars %>%
filter(mass < 1000) %>%
mutate(species = species %>% fct_infreq %>% fct_lump(5) %>% fct_explicit_na) %>%
ggplot(aes(height, mass, color=species, size=birth_year)) +
geom_point()
p %>% gg_apply_labs(to_sentence_case)
# This also gives the intended result, in a single pipeline
# Note: It is important to put in the extra parentheses!
(starwars %>%
filter(mass < 1000) %>%
mutate(species = species %>% fct_infreq %>% fct_lump(5) %>% fct_explicit_na) %>%
ggplot(aes(height, mass, color=species, size=birth_year)) +
geom_point()) %>%
gg_apply_labs(to_sentence_case)
# This DOES NOT give the intended result
# Note: The issue is probably order precedence
starwars %>%
filter(mass < 1000) %>%
mutate(species = species %>% fct_infreq %>% fct_lump(5) %>% fct_explicit_na) %>%
ggplot(aes(height, mass, color=species, size=birth_year)) +
geom_point() %>%
gg_apply_labs(to_sentence_case)
A simple solution is to pipe through rename_all (or rename_if if you want more control) before plotting:
library(tidyverse)
library(snakecase)
starwars %>%
filter(mass<1000) %>%
mutate(species=species %>% fct_infreq %>% fct_lump(5) %>% fct_explicit_na) %>%
rename_all(to_sentence_case) %>%
#rename_if(is.character, to_sentence_case) %>%
ggplot(aes(Height, Mass, color=Species, size=`Birth year`)) +
geom_point()
#> Warning: Removed 23 rows containing missing values (geom_point).
Created on 2019-11-25 by the reprex package (v0.3.0)
Note, though, that the variables given to aes in ggplot in this case must be modified to match the modified sentence case variable names.
You can modify a ggplot object's appearance at the point of printing / plotting it, without affecting the original plot object, using trace:
trace(what = ggplot2:::ggplot_build.ggplot,
tracer = quote(plot$labels <- lapply(plot$labels,
<whatever string function you desire>)))
This will change the appearance of all existing / new ggplot objects you wish to plot / save, until you turn off the trace via either untrace(...) or tracingState(on = FALSE).
Illustration
Create a normal plot with default labels in lower case:
library(tidyverse)
p <- starwars %>%
filter(mass < 1000) %>%
mutate(species=species %>% fct_infreq %>% fct_lump(5) %>% fct_explicit_na) %>%
ggplot(aes(height, mass, color=species, size=birth_year)) +
geom_point() +
theme_bw()
p # if we print the plot now, all labels will be lower-case
Apply a function to modify the appearance of all labels:
trace(what = ggplot2:::ggplot_build.ggplot,
tracer = quote(plot$labels <- lapply(plot$labels,
snakecase::to_sentence_case)))
p # all labels will be in sentence case
trace(what = ggplot2:::ggplot_build.ggplot,
tracer = quote(plot$labels <- lapply(plot$labels,
snakecase::to_screaming_snake_case)))
p # all labels will be in upper case
trace(what = ggplot2:::ggplot_build.ggplot,
tracer = quote(plot$labels <- lapply(plot$labels,
snakecase::to_random_case)))
p # all letters in all labels may be in upper / lower case randomly
# (exact order can change every time we print the plot again, unless we set the same
# random seed for reproducibility)
trace(what = ggplot2:::ggplot_build.ggplot,
tracer = quote(plot$labels <- lapply(plot$labels,
function(x) paste("!!!", x, "$$$"))))
p # all labels now have "!!!" in front & "$$$" behind (this is a demonstration for
# an arbitrary user-defined function, not a demonstration of good taste in labels)
Toggle between applying & not applying the function:
tracingState(on = FALSE)
p # back to sanity, temporarily
tracingState(on = TRUE)
p # plot labels are affected by the function again
untrace(ggplot2:::ggplot_build.ggplot)
p # back to sanity, permanently
Using the iris dataset..
Sample code and function:
plotfunction <- function(whatspecies){
baz <- iris %>% filter(Species == whatspecies) %>%
ggplot(aes(Petal.Width, Petal.Length)) +
geom_point() +
labs(title = whatspecies)
ggsave(filename = paste0(whatspecies,".png"),
path = getwd())
return(baz)
}
What I'd like to do is to loop over the Species variable to create 3 plots in my working directory. In my real data frame I have many more factors so I was wondering if there is a better way to do this rather than running the function n number of times - as in this instance I only care about modifying/looping over one variable in each graph.
Edit: In my circumstance I require independent plots so I can't use facets or different aesthetics.
Is this what you are looking for?
library(dplyr)
library(ggplot2)
for (sp in levels(iris[["Species"]])) {
plotfunction(sp)
}
I find no solution for these two following issues:
First I try this:
library(tidyverse)
gg <- mtcars %>%
mutate(group=ifelse(gear==3,1,2)) %>%
ggplot(aes(x=carb, y=drat)) + geom_point(shape=group)
Error in layer(data = data, mapping = mapping, stat = stat, geom =
GeomPoint,:object 'group' not found
which is obviously not working. But using something like this .$group is also not successfull. Of note, I have to specifiy the shape outside from aes()
The second problem is this. I'm not able to call a saved ggplot (gg) within a pipe.
gg <- mtcars %>%
mutate(group=ifelse(gear==3,1,2)) %>%
ggplot(aes(x=carb, y=drat)) + geom_point()
mtcars %>%
filter(vs == 0) %>%
gg + geom_point(aes(x=carb, y=drat), size = 4)
Error in gg(.) : could not find function "gg"
Thanks for your help!
Edit
After a long time I found a solution here. One has to set the complete ggplot term in {}.
mtcars %>%
mutate(group=ifelse(gear==3,1,2)) %>% {
ggplot(.,aes(carb,drat)) +
geom_point(shape=.$group)}
If you wrap your shape definition in aes() you can get the desired behavior. To use shape outside of aes() you can pass it a single value (ie shape=1). Also note that group is converted to a discrete var, geom_point throws an error when you pass a continuous var to shape.
library(tidyverse)
gg <- mtcars %>%
mutate(group=ifelse(gear==3,1,2)) %>%
ggplot(aes(x=carb, y=drat)) +
geom_point(aes(shape=as.factor(group)))
gg
Second, the %>% operator, when called as lhs %>% rhs, assumes that the rhs is a function. So as the error shows, you are calling gg as a function. Calling a plot as a function on a dataframe (ie gg(mtcars)) isnt a valid operation.
See #docendo discimus comment on the question for how to use {} to accomplish adding a layer to an existing ggplot object from a magrittr pipeline.