My question is about using a for loop to repeat data analysis based on a categorial variable.
Using the built in Iris data set how would I run a for loop on the code below so it first produces this chart for just setosa and then versicolor and then virginica without me having to manually change/set the species?
ggplot(iris, aes(Sepal.Length, Sepal.Width)) +
geom_point()
I'm just starting out and have no idea what I'm doing
You need to use print() as described here
library(tidyverse)
data(iris)
species <- iris |> distinct(Species) |> unlist()
for(i in species) {
p <- iris |>
filter(Species == i) |>
ggplot() +
geom_point(aes(x=Sepal.Length, y=Sepal.Width)) +
ggtitle(i)
print(p)
}
You can use a for loop as u/DanY posted; however, it's harder to store and retrieve plots in a universal way with that structure. Running the loop code makes it difficult to retrieve any one particular plot - you would only see the last plot in the output window and have to go "back" to see the others. I would suggest using a list structure instead to allow you to retrieve any one of the individual plots in subsequent functions.
For this, you can use lapply() rather than for(...) { ... }.
Here's an example which uses dplyr and tidyr:
library(ggplot2)
library(dplyr)
library(tidyr)
unique_species <- unique(iris$Species)
myPlots <- lapply(unique_species, function(x) {
ggplot(
data = iris %>% dplyr::filter(Species == x),
mapping = aes(x=Sepal.Length, y=Sepal.Width)
) +
geom_point() +
labs(title=paste("Plot of ", x))
})
You then have the plots stored within myPlots. You can access each plot via myPlots[1], myPlots[2] or myPlots[3]... or you can plot them all together via patchwork or another similar package. Here's one way using cowplot:
cowplot::plot_grid(plotlist = myPlots, nrow=1)
Related
I have the following bit of code and don't understand why the for loop isn't working. I'm new to this, so excuse me if this is obvious, but it's not actually producing a combined set of graphs (as the more brute force method does below), it just prints out each graph individually
library(ggpubr)
graphs <- lapply(names(hemi_split), function(i){
ggplot(data=hemi_split[[i]], aes(x=type, y=shoot.mass))+
geom_point()+
facet_wrap(.~host, scales="free")+
theme_minimal()+
labs(title=i)
});graphs
for (i in 1:length(graphs)) {
ggarrange(graphs[[i]])
} ##not working
## this works, and is the desired output
ggarrange(graphs[[1]], graphs[[2]], graphs[[3]],
graphs[[4]], graphs[[5]], graphs[[6]],
graphs[[7]], graphs[[8]], graphs[[9]],
graphs[[10]], graphs[[11]])
thank you!
You can use do.call to provide all of the list elements of graphs as arguments of ggarrange:
library(ggpubr)
graphs <- lapply(names(mtcars)[2:5],function(x){
ggplot(mtcars,aes_string(x = x, y = "mpg")) +
geom_point()})
do.call(ggarrange,graphs)
another solution using purrr
library(tidyverse)
ggraphs <- map(names(mtcars)[2:5],
~ ggplot(mtcars,aes_string(x = .x, y = "mpg")) +
geom_point())
ggarrange(plotlist = ggraphs)
Using the iris dataset..
Sample code and function:
plotfunction <- function(whatspecies){
baz <- iris %>% filter(Species == whatspecies) %>%
ggplot(aes(Petal.Width, Petal.Length)) +
geom_point() +
labs(title = whatspecies)
ggsave(filename = paste0(whatspecies,".png"),
path = getwd())
return(baz)
}
What I'd like to do is to loop over the Species variable to create 3 plots in my working directory. In my real data frame I have many more factors so I was wondering if there is a better way to do this rather than running the function n number of times - as in this instance I only care about modifying/looping over one variable in each graph.
Edit: In my circumstance I require independent plots so I can't use facets or different aesthetics.
Is this what you are looking for?
library(dplyr)
library(ggplot2)
for (sp in levels(iris[["Species"]])) {
plotfunction(sp)
}
I'm trying to plot multiple plots on a grid using ggplot2 in a for loop, followed by grid.arrange. But all the plots are identical afterwards.
library(ggplot2)
library(grid)
test = data.frame(matrix(rnorm(320), ncol=16 ))
names(test) = sapply(1:16, function(x) paste0("var_",as.character(x)))
plotlist = list()
for (i in 1:(dim(test)[2]-1)){
plotlist[[i]] = ggplot(test) +
geom_point(aes(get(x=names(test)[dim(test)[2]]), y=get(names(test)[i])))
}
pdf("output.pdf")
do.call(grid.arrange, list(grobs=plotlist, nrow=3))
dev.off(4)
When running this code, it seems like the get() calls are only evaluated at the time of the grid.arrange call, so all of the y vectors in the plot are identical as "var_15". Is there a way to force get evaluation immediately, so that I get 15 different plots?
Thanks!
Here are two ways that use purrr::map functions instead of a for-loop. I find that I have less of a clear sense of what's going on when I try to use loops, and since there are functions like the apply and map families that fit so neatly into R's vector operations paradigm, I generally go with mapping instead.
The first example makes use of cowplot::plot_grid, which can take a list of plots and arrange them. The second uses the newer patchwork package, which lets you add plots together—like literally saying plot1 + plot2—and add a layout. To do all those additions, I use purrr::reduce with + as the function being applied to all the plots.
library(tidyverse)
set.seed(722)
test = data.frame(matrix(rnorm(320), ncol=16 ))
names(test) = sapply(1:16, function(x) paste0("var_",as.character(x)))
# extract all but last column
xvars <- test[, -ncol(test)]
By using purrr::imap, I can map over all the columns and apply a function with 2 arguments: the column itself, and its name. That way I can set an x-axis label that specifies the column name. I can also easily access the column of data without having to use get or any tidyeval tricks (although for something for complicated, a tidyeval solution might be better).
plots <- imap(xvars, function(variable, var_name) {
df <- data_frame(x = variable, y = test[, ncol(test)])
ggplot(df, aes(x = x, y = y)) +
geom_point() +
xlab(var_name)
})
cowplot::plot_grid(plotlist = plots, nrow = 3)
library(patchwork)
# same as plots[[1]] + plots[[2]] + plots[[3]] + ...
reduce(plots, `+`) + plot_layout(nrow = 3)
Created on 2018-07-22 by the reprex package (v0.2.0).
Try this:
library(ggplot2)
library(grid)
library(gridExtra)
set.seed(1234)
test = data.frame(matrix(rnorm(320), ncol=16 ))
names(test) = sapply(1:16, function(x) paste0("var_",as.character(x)))
plotlist = list()
for (i in 1:(dim(test)[2]-1)) {
# Define here the dataset for the i-th plot
df <- data.frame(x=test$var_16, y=test[, i])
plotlist[[i]] = ggplot(data=df, aes(x=x, y=y)) + geom_point()
}
grid.arrange(grobs=plotlist, nrow=3)
When doing data analysis, we often use dplyr to modify the dataframe further in specific geoms. This allows us to change the default dataframe of a ggplot later, and have everything still work.
template <- ggplot(db, aes(x=time, y=value)) +
geom_line(data=function(db){db %>% filter(event=="Bla")}) +
geom_ribbon(aes(ymin=low, ymax=up))
ggsave( template, "global.png" )
for(i in unique(db$simulation))
ggsave( template %+% subset(db, simulation==i), paste0(i, ".png")
Is there a nicer/shorter way to specify the filter command, e.g. using some magical .?
EDIT
To clarify some of the comments: By using geom_line(data = db %>% filter(event=="Bla")), the layer would not be updated when I change the default dataframe later using %+%. I am really aiming to use the data argument of geom_* as a function.
Upon reading the documentation of %>% better, I have found the solution:
Using the dot-place holder as lhs
When the dot is used as lhs, the result will be a functional sequence, i.e. a function which applies the entire chain of right-hand sides in turn to its input. See the examples.
Therefore, the nicest way to formulate the above example, incorporating the suggestions from above as well:
db <- diamonds
template <- ggplot(db, aes(x=carat, y=price, color=cut)) +
geom_point() +
geom_smooth(data=. %>% filter(color=="J")) +
labs(caption="Smooths only for J color")
ggsave( template, "global.png" )
db %>% group_by(cut) %>% do(
ggsave( paste0(.$cut[1], ".png"), plot=template %+% .)
)
I'm quite new in R, trying to find my why around. I have created a new data frame based on the "original" data frame.
library(dplyr)
prdgrp <- as.vector(mth['MMITCL'])
prdgrp %>% distinct(MMITCL)
When doing this, then the result is a list of Unique values of the column MMITCL. I would like to use this data in a loop sequence that first creates a new subset of the original data and the prints a graph based on this:
#START LOOP
for (i in 1:length(prdgrp))
{
# mth[c(MMITCL==prdgrp[i],]
mth_1 <- mth[c(mth$MMITCL==prdgrp[i]),]
# Development of TPC by month
library(ggplot2)
library(scales)
ggplot(mth_1, aes(Date, TPC_MTD))+ geom_line()
}
# END LOOP
Doing this gives me the following error message:
Error in mth$MMITCL == prdgrp[i] :
comparison of these types is not implemented
In addition: Warning:
I `[.data.frame`(mth, c(mth$MMITCL == prdgrp[i]), ) :
Incompatible methods ("Ops.factor", "Ops.data.frame") for "=="
What am I doing wrong.
If you just want to plot the outputs there is no need to subset the dataframe, it is simpler to just put ggplot in a loop (or more likely use facet_wrap). Without seeing your data it is a bit hard to give you a precise answer. However there are two generic iris examples below - hopefully these will also show where you made the error in sub setting your dataframe. Please let me know if you have any questions.
library(ggplot2)
#looping example
for(i in 1:length(unique(iris$Species))){
g <- ggplot(data = iris[iris$Species == unique(iris$Species)[i], ],
aes(x = Sepal.Length,
y = Sepal.Width)) +
geom_point()
print(g)
}
#facet_wrap example
g <- ggplot(data = iris, aes(x = Sepal.Length, y = Sepal.Width)) +
geom_point() +
facet_wrap(~Species)
g
However if you need to save the data frames for later use, one option is to put them into a list. If you only need to save the data frame with in the loop you can just remove the list and use whatever variable name you wish.
myData4Later <- list()
for(i in 1:length(unique(iris$Species))){
myData4Later[[i]] <- iris[iris$Species == unique(iris$Species)[i], ]
g <- ggplot(data = myData4Later[[i]],
aes(x = Sepal.Length,
y = Sepal.Width)) +
geom_point()
print(g)
}