I would like to subset a large dataframe and create a ggplot of each grouping. Sounds like a perfect candidate for dplyr but I'm running into issues calling functions on the group_by results. Any hints would be greatly appreciated.
# what I want to do using base functions: "groupby" the elements in a column
# and create/save a plot for each group
for (i in levels(iris$Species)){
df = iris[iris$Species == i,]
p <- ggplot(df, aes(x=Sepal.Length, y=Sepal.Width) + geom_point())
ggsave(p, filename=paste(i,".pdf",sep=""))
}
# I'm trying to get something like this using dplyr
library(dplyr)
iris %>%
group_by(Species) %>%
do({
p <- ggplot(., aes(x=Sepal.Length, y=Sepal.Width) + geom_point())
ggsave(p, filename=paste(quote(Species),".pdf",sep=""))
})
Well, you have a parenthesis problem and a file naming problem so maybe it's one of those that you are referring to. I'm assuming
iris %>%
group_by(Species) %>%
do({
p <- ggplot(., aes(x=Sepal.Length, y=Sepal.Width)) + geom_point()
ggsave(p, filename=paste0(unique(.$Species),".pdf"))
})
would fix your problem.
Related
I want to arrange N ggplot (each one is facetted) on a grid with grid.arrange.
library(tidyverse)
library(ggplot2)
library(gridExtra)
plots <- lapply(unique(mtcars$cyl), function(cyl) {
data <- mtcars %>% filter(cyl == cyl)
ggplot(data, aes(x=mpg, y=hp))+
geom_point(color = "blue")+
facet_wrap(.~carb)}) %>%
do.call(grid.arrange, .)
do.call(grid.arrange, plots )
The problem is that all the plots are based on the entire dataset and they render the same plot, while they shuold be different as I filter them in line
data <- mtcars %>% filter(cyl == cyl).
filter deals with cyl too letteral and treated as a string, therefore cyl==cyl is TRUE for the entire dataset. You can solve this by unquote cyl using !! or use another variable name in the function e.g. x.
#Option 1
data <- mtcars %>% filter(cyl == !!cyl)
#Option 2
... function(x) {
data <- mtcars %>% filter(cyl == x)
...
Here is a tidyverse approach
library(tidyverse)
group_plots <- mtcars %>%
group_split(cyl) %>%
map(~ggplot(., aes(x = mpg, y = hp))+
geom_point(color = "blue") +
facet_wrap(.~carb))
do.call(gridExtra::grid.arrange, group_plots)
Try use split() first:
library(tidyverse)
library(gridExtra)
l <- split(mtcars, mtcars$cyl) # divide based on cyl in a list
plots <- lapply(l, function(x) {
ggplot(x, aes(x=mpg, y=hp)) +
geom_point(color = "blue") +
facet_wrap(.~carb)
}) # call lapply() on each element
do.call(grid.arrange, plots)
I'm trying to loop through every column of the iris data set and plot a histogram in ggplot. So I'm expecting 5 different histograms to appear. However, my for loop below returns nothing. How can I fix this?
library(ggplot2)
for (i in colnames(iris)){
ggplot(iris, aes(x = i))+
geom_histogram()
}
Instead of using a for loop, the tidyverse/ggplot way would be to reshape the data from wide to long and then plot using facet_wrap
library(tidyverse)
iris %>%
gather(key, val, -Species) %>%
ggplot(aes(val)) +
geom_histogram(bins = 30) +
facet_wrap(~key, scales = "free_x")
Using dplyr, tidyr and ggplot:
library(ggplot2)
library(dplyr)
library(tidyr)
iris %>%
gather(Mesure, Value, -Species) %>%
ggplot(aes(x=Value)) + geom_histogram() + facet_grid(rows=vars(Species), cols=vars(Mesure))
Result:
I've got a grouped dataframe generated in dplyr where each group reflects a unique combination of factor variable levels. I'd like to plot the different groups using code similar to this post. However, I can't figure out how to include two (or more) variables in the title of my plots, which is a hassle since I've got a bunch of different combinations.
Fake data and plotting code:
library(dplyr)
library(ggplot2)
spiris<-iris
spiris$site<-as.factor(rep(c("A","B","C")))
spiris$year<-as.factor(rep(2012:2016))
spiris$treatment<-as.factor(rep(1:2))
g<-spiris %>%
group_by(site, Species) %>%
do(plots=ggplot(data=.) +
aes(x=Petal.Width)+geom_histogram()+
facet_grid(treatment~year))
##Need code for title here
g[[3]] ##view plots
I need the title of each plot to reflect both "site" and "Species". Any ideas?
Use split() %>% purrr::map2() instead of group_by() %>% do() like this:
spiris %>%
split(list(.$site, .$Species)) %>%
purrr::map2(.y = names(.),
~ ggplot(data=., aes(x=Petal.Width)) +
geom_histogram()+
facet_grid(treatment~year) +
labs(title = .y) )
You just need to set the title with ggtitle():
g <- spiris %>% group_by(site, Species) %>% do(plots = ggplot(data = .) +
aes(x = Petal.Width) + geom_histogram() + facet_grid(treatment ~
year) + ggtitle(paste(.$Species,.$site,sep=" - ")))
I want to plot a subset of my dataframe. I am working with dplyr and ggplot2. My code only works with version 1, not version 2 via piping. What's the difference?
Version 1 (plotting is working):
data <- dataset %>% filter(type=="type1")
ggplot(data, aes(x=year, y=variable)) + geom_line()
Version 2 with piping (plotting is not working):
data %>% filter(type=="type1") %>% ggplot(data, aes(x=year, y=variable)) + geom_line()
Error:
Error in ggplot.data.frame(., data, aes(x = year, :
Mapping should be created with aes or aes_string
Thanks for your help!
Solution for version 2: a dot . instead of data:
data %>%
filter(type=="type1") %>%
ggplot(., aes(x=year, y=variable)) +
geom_line()
I usually do this, which also dispenses with the need for the .:
library(dplyr)
library(ggplot2)
mtcars %>%
filter(cyl == 4) %>%
ggplot +
aes(
x = disp,
y = mpg
) +
geom_point()
During typing with piping if you reenter the data name as you have as I shown with bold below, function confuses the sequence of arguments.
data %>% filter(type=="type1") %>% ggplot(***data***, aes(x=year, y=variable)) + geom_line()
Hope it works for you.
I would like to create one separate plot per group in a data frame and include the group in the title.
With the iris dataset I can in base R and ggplot do this
plots1 <- lapply(split(iris, iris$Species),
function(x)
ggplot(x, aes(x=Petal.Width, y=Petal.Length)) +
geom_point() +
ggtitle(x$Species[1]))
Is there an equivalent using dplyr?
Here's an attempt using facets instead of title.
p <- ggplot(data=iris, aes(x=Petal.Width, y=Petal.Length)) + geom_point()
plots2 = iris %>% group_by(Species) %>% do(plots = p %+% . + facet_wrap(~Species))
where I use %+% to replace the dataset in p with the subset for each call.
or (working but complex) with ggtitle
plots3 = iris %>%
group_by(Species) %>%
do(
plots = ggplot(data=.) +
geom_point(aes(x=Petal.Width, y=Petal.Length)) +
ggtitle(. %>% select(Species) %>% mutate(Species=as.character(Species)) %>% head(1) %>% as.character()))
The problem is that I can't seem to set the title per group with ggtitle in a very simple way.
Thanks!
Use .$Species to pull the species data into ggtitle:
iris %>% group_by(Species) %>% do(plots=ggplot(data=.) +
aes(x=Petal.Width, y=Petal.Length) + geom_point() + ggtitle(unique(.$Species)))
library(dplyr, warn.conflicts = FALSE)
library(ggplot2)
plots3 <- iris %>%
group_by(Species) %>%
group_map(~ ggplot(.) + aes(x=Petal.Width, y=Petal.Length) + geom_point() + ggtitle(.y[[1]]))
length(plots3)
#> [1] 3
# for example, the second plot :
plots3[[2]]
Created on 2021-11-19 by the reprex package (v2.0.1)
This is another option using rowwise:
plots2 = iris %>%
group_by(Species) %>%
do(plots = p %+% .) %>%
rowwise() %>%
do(x=.$plots + ggtitle(.$Species))