I created a function that splits a dataframe by group variable 'gear' and makes plots for each new df. How do I change the title for each plot then?
rank20 <- function(df){
sdf <- df
clusterName <- paste0("cluster",sdf$gear)
splitData <- split(sdf,clusterName)
plot <- lapply(splitData, function (x) {ggplot(x, aes(mpg, cyl)) + geom_point()+
labs(x="mpg", y="cyl",
title="This Needs To Be Changed") +
theme_minimal()})
do.call(grid.arrange,plot)
}
rank20(mtcars)
I want the following titles: gear3, gear4, etc. (corresponding to their gear value)
UPD: both results are right if using mtcars. But in my real case, I transform my initial df. So, I should have put my question in the different way. I need to take the titles from splitData df name itself rather than from the column.
I find it easier with lapply to use the indexes as the input. Provides more flexibility if you need to link the list to another element eg the name of the list:
rank20 <- function(df, col="gear"){
sdf <- df
clusterName <- paste0("cluster", sdf[[col]])
splitData <- split(sdf, clusterName)
# do the apply across the splitData indexes instead and pull out cluster
# name from the column
plot <- lapply(seq_along(splitData), function(i) {
X <- splitData[[i]]
i_title <- paste0(col, X[[col]][1])
## to use clusterName instead eg cluster3 instead of gear3:
#i_title <- paste0(col, names(splitData)[i])
ggplot(X, aes(mpg, cyl)) +
geom_point() +
labs(x="mpg", y="cyl", title=i_title) +
theme_minimal()
})
do.call(grid.arrange,plot)
}
rank20(mtcars)
As you say you wanted the name to be gear2, gear3 I've gone with this but hashed out the alternative i_title that uses the clusterName value instead.
In the input of the function you can change the col value to switch to a different column so gear isn't hard-coded
May I suggest a slightly different approach (yet also using the loop over indices as suggested in user Jonny Phelps answer). I am creating a list of plots and then using patchwork::wrap_plots for plotting. I find it smoother.
library(tidyverse)
library(patchwork)
len_ind <- length(unique(mtcars$cyl))
ls_plot <-
mtcars %>%
split(., .$cyl) %>%
map2(1:len_ind, ., function(x, y) {
ggplot(y, aes(mpg, cyl)) +
geom_point() +
labs(x = "mpg", y = "cyl",
title = names(.)[x]
) +
theme_minimal()
})
wrap_plots(ls_plot) + plot_layout(ncol = 1)
Just noticed this was the wrong column - using cyl instead of gear. Oops. It was kind of fun to wrap this into a function:
plot_col <- function(x, col, plotx, ploty){
len_ind <- length(unique(mtcars[[col]]))
x_name <- deparse(substitute(plotx))
y_name <- deparse(substitute(ploty))
ls_plot <-
mtcars %>%
split(., .[[col]]) %>%
map2(1:len_ind, ., function(x, y) {
ggplot(y, aes({{plotx}}, {{ploty}})) +
geom_point() +
labs(x = x_name, y = y_name,
title = names(.)[x]
) +
theme_minimal()
})
wrap_plots(ls_plot) + plot_layout(ncol = 1)
}
plot_col(mtcars, "gear", mpg, cyl)
Related
I trying to make boxplots with ggplot2.
The code I have to make the boxplots with the format that I want is as follows:
p <- ggplot(mg_data, aes(x=Treatment, y=CD68, color=Treatment)) +
geom_boxplot(mg_data, mapping=aes(x=Treatment, y=CD68))
p+ theme_classic() + geom_jitter(shape=16, position=position_jitter(0.2))
I can was able to use the following code to make looped boxplots:
variables <- mg_data %>%
select(10:17)
for(i in variables) {
print(ggplot(mg_data, aes(x = Treatment, y = i, color=Treatment)) +
geom_boxplot())
}
With this code I get the boxplots however, they do not have the name label of what variable is being select for the y-axis, unlike the original code when not using the for loop. I also do not know how to add the formating code to the loop:
p + theme_classic() + geom_jitter(shape=16, position=position_jitter(0.2))
Here is a way. I have tested with built-in data set iris, just change the data name and selected columns and it will work.
suppressPackageStartupMessages({
library(dplyr)
library(ggplot2)
})
variables <- iris %>%
select(1:4) %>%
names()
for(i in variables) {
g <- ggplot(iris, aes(x = Species, y = get(i), color=Species)) +
geom_boxplot() +
ylab(i)
print(g)
}
Edit
Answering to a comment by user TarJae, reproduced here because answers are less deleted than comments:
Could you please expand with saving all four files. Many thanks.
The code above can be made to save the plots with a ggsave instruction at the loop end. The filename is the variable name and the plot is the default, the return value of last_plot().
for(i in variables) {
g <- ggplot(iris, aes(x = Species, y = get(i), color=Species)) +
geom_boxplot() +
ylab(i)
print(g)
ggsave(paste0(i, ".png"), device = "png")
}
Try this:
variables <- mg_data %>%
colnames() %>%
`[`(10:17)
for (i in variables) {
print(ggplot(mg_data, aes(
x = Treatment, y = {{i}}, color = Treatment
)) +
geom_boxplot())
}
Another option is to use lapply. It's approximately the same as using a loop, but it hides the actual looping part and can make your code look a little cleaner.
variables = iris %>%
select(1:4) %>%
names()
lapply(variables, function(x) {
ggplot(iris, aes(x = Species, y = get(x), color=Species)) +
geom_boxplot() + ylab(x)
})
Here's that illustrates the obstacle I'm facing.
library(tidyverse)
co2_list <- CO2 %>%
group_split(Type)
reprex_fun <- function(x){
x %>%
ggplot(aes(conc, uptake)) +
geom_point() +
facet_wrap(~Plant, ncol = 2)
}
lapply(co2_list, reprex_fun)
Since the listed dataframes are based on the Type value,
How can I add the corresponding title with the type, to the plots I just made?
You can also try labs, similar to ggtitle:
#Data
data("CO2")
#Plot
co2_list <- CO2 %>%
group_split(Type)
#Function
reprex_fun <- function(x){
x %>%
ggplot(aes(conc, uptake)) +
geom_point() +
labs(title = unique(x$Type))+
facet_wrap(~Plant, ncol = 2)
}
#Plots
lapply(co2_list, reprex_fun)
I am trying to create 2 line plots.
But I noticed that using a for loop will generate two plots with y=mev2 (instead of a plot based on y=mev1 and another one based on y=mev2).
The code below shows the observation here.
mev1 <- c(1,3,7)
mev2 <- c(9,8,2)
Period <- c(1960, 1970, 1980)
df <- data.frame(Period, mev1, mev2)
library(ggplot2)
# Method 1: Creating plot1 and plot2 without using "for" loop (hard-code)
plot1 <- ggplot(data = df, aes(x=Period, y=unlist(as.list(df[2])))) + geom_line()
plot2 <- ggplot(data = df, aes(x=Period, y=unlist(as.list(df[3])))) + geom_line()
# Method 2: Creating plot1 and plot2 using "for" loop
for (i in 1:2) {
y_var <- unlist(as.list(df[i+1]))
assign(paste("plot", i, sep = ""), ggplot(data = df, aes(x=Period, y=y_var)) + geom_line())
}
Seems like this is due to some ggplot()'s way of working that I am not aware of.
Question:
If I want to use Method 2, how should I modify the logic?
People said that using assign() is not an "R-style", so I wonder what's an alternate way to do this? Say, using list?
One possible answer with no tidyverse command added is :
library(ggplot2)
y_var <- colnames(df)
for (i in 1:2) {
assign(paste("plot", i, sep = ""),
ggplot(data = df, aes_string(x=y_var[1], y=y_var[1 + i])) +
geom_line())
}
plot1
plot2
You may use aes_string. I hope it helps.
EDIT 1
If you want to stock your plot in a list, you can use this :
Initialize your list :
n <- 2 # number of plots
list_plot <- vector(mode = "list", length = n)
names(list_plot) <- paste("plot", 1:n)
Fill it :
for (i in 1:2) {
list_plot[[i]] <- ggplot(data = df, aes_string(x=y_var[1], y=y_var[1 + i])) +
geom_line()
}
Display :
list_plot[[1]]
list_plot[[2]]
For lines in different "plots", you can simplify it with facet_wrap():
library(tidyverse)
df %>%
gather(variable, value, -c(Period)) %>% # wide to long format
ggplot(aes(Period, value)) + geom_line() + facet_wrap(vars(variable))
You can also put it in a loop if necessary and store the results in a list:
# empty list
listed <- list()
# fill the list with the plots
for (i in c(2:3)){
listed[[i-1]] <- df[,-i] %>%
gather(variable, value, -c(Period)) %>%
ggplot(aes(Period, value)) + geom_line()
}
# to get the plots
listed[[1]]
listed[[2]]
Why do you want 2 separate plots? ggplots way to do this would be to get data in long format and then plot.
library(tidyverse)
df %>%
pivot_longer(cols = -Period) %>%
ggplot() + aes(Period, value, color = name) + geom_line()
Here is an alternative approach using a function and lapply. I recognize that you asked how to solve this using a loop. Still, I think it might be useful to consider this approach.
library(ggplot2)
mev1 <- c(1,3,7)
mev2 <- c(9,8,2)
Period <- c(1960, 1970, 1980)
df <- data.frame(Period, mev1, mev2)
myplot <- function(yvar){
plot <- ggplot(df, aes(Period, !!sym(yvar))) + geom_line()
return(plot)
}
colnames <- c("mev1","mev2")
list <- lapply(colnames, myplot)
names(list) <- paste0("plot_", colnames)
# Alternativing naming: names(list) <- paste0("plot", 1:2)
Using this approach you can easily apply your plot function to whatever columns you like. You can specify the columns by name, which may be preferrabe to specifying by position. Plots are saved in a list, and they are named afterwards using the names attribute. In my example I named the plots plot_mev1 and plot_mev2. But you can easily adjust to some other naming. E.g. write names(list) <- paste0("plot", 1:2) to get plot1 and plot2.
Note that I used !!sym() in the ggplot call. This is essentally an alternative to aes_string which was used in the answer of RĂ©mi Coulaud. In this way ggplot understands even in the context of a function or in the context of a loop that "mev1" is a column of your dataset and not just a text string
I have a for loop plotting 3 geom_lines, how do I add a label/legend so they won't all be 3 indiscernible black lines?
methods.list <- list(rwf,snaive,meanf)
cv.list <- lapply(methods.list, function(method) {
taylor%>% tsCV(forecastfunction = method, h=48)
})
gg <- ggplot(NULL, aes(x))
for (i in seq(1,3)){
gg <- gg + geom_line(aes_string( y=sqrt(colMeans(cv.list[[i]]^2, na.rm=TRUE))))
}
gg + guides(colour=guide_legend(title="Forecast"))
If I don't use a loop, I can use aes instead of that horrible aes_string and then everything works, but I have to write the same code 3 times and replace the loop with this:
gg <- gg + geom_line(aes(y=sqrt(colMeans(cv.list[[1]]^2, na.rm=TRUE)), colour=names(cv.list)[1]))
gg <- gg + geom_line(aes(y=sqrt(colMeans(cv.list[[2]]^2, na.rm=TRUE)), colour=names(cv.list)[2]))
gg <- gg + geom_line(aes(y=sqrt(colMeans(cv.list[[3]]^2, na.rm=TRUE)), colour=names(cv.list)[3]))
and then there are nice automatic colors and legend. What am I missing? Why is r being so noob-unfriendly?
The example is not reproducible, (there is no data!) but it seems you have some information in a list cv.list which contains multiple data.frames, and you want to plot some summary statistic of each against a common varaible stored in x.
The simplest method is simply to create a data.frame and plot using the data.frame.
#Create 3 data.frames with data (forecast?)
df <- lapply(1:3, function(group){
summ_stat <- sqrt(colMeans(cv.list[[i]]^2, na.rm=TRUE))
group <- group
data.frame(summ_stat, group, x = x)
})
#bind the data.frames into a single data.frame
df <- do.call(rbind, df)
#Create the plot
ggplot(data = df, aes(x = x, y = summ_stat, colour = group)) +
geom_line() +
labs(colour = "Forecast")
Note the change of label in the labs argument. This is changing the label of colour which is part of aes.
I have composed a function that develops histograms using ggplot2 on the numerical columns of a dataframe that will be passed to it. The function stores these plots into a list and then returns the list.
However when I run the function I get the same plot again and again.
My code is the following and I provide also a reproducible example.
hist_of_columns = function(data, class, variables_to_exclude = c()){
library(ggplot2)
library(ggthemes)
data = as.data.frame(data)
variables_numeric = names(data)[unlist(lapply(data, function(x){is.numeric(x) | is.integer(x)}))]
variables_not_to_plot = c(class, variables_to_exclude)
variables_to_plot = setdiff(variables_numeric, variables_not_to_plot)
indices = match(variables_to_plot, names(data))
index_of_class = match(class, names(data))
plots = list()
for (i in (1 : length(variables_to_plot))){
p = ggplot(data, aes(x= data[, indices[i]], color= data[, index_of_class], fill=data[, index_of_class])) +
geom_histogram(aes(y=..density..), alpha=0.3,
position="identity", bins = 100)+ theme_economist() +
geom_density(alpha=.2) + xlab(names(data)[indices[i]]) + labs(fill = class) + guides(color = FALSE)
name = names(data)[indices[i]]
plots[[name]] = p
}
plots
}
data(mtcars)
mtcars$am = factor(mtcars$am)
data = mtcars
variables_to_exclude = 'mpg'
class = 'am'
plots = hist_of_columns(data, class, variables_to_exclude)
If you check the list plots you will discover that it contains the same plot repeated.
Simply use aes_string to pass string variables into the ggplot() call. Right now, your plot uses different data sources, not aligned with ggplot's data argument. Below x, color, and fill are separate, unrelated vectors though they derive from same source but ggplot does not know that:
ggplot(data, aes(x= data[, indices[i]], color= data[, index_of_class], fill=data[, index_of_class]))
However, with aes_string, passing string names to x, color, and fill will point to data:
ggplot(data, aes_string(x= names(data)[indices[i]], color= class, fill= class))
Here is strategy using tidyeval that does what you are after:
library(rlang)
library(tidyverse)
hist_of_cols <- function(data, class, drop_vars) {
# tidyeval overhead
class_enq <- enquo(class)
drop_enqs <- enquo(drop_vars)
data %>%
group_by(!!class_enq) %>% # keep the 'class' column always
select(-!!drop_enqs) %>% # drop any 'drop_vars'
select_if(is.numeric) %>% # keep only numeric columns
gather("key", "value", -!!class_enq) %>% # go to long form
split(.$key) %>% # make a list of data frames
map(~ ggplot(., aes(value, fill = !!class_enq)) + # plot as usual
geom_histogram() +
geom_density(alpha = .5) +
labs(x = unique(.$key)))
}
hist_of_cols(mtcars, am, mpg)
hist_of_cols(mtcars, am, c(mpg, wt))