I trying to make boxplots with ggplot2.
The code I have to make the boxplots with the format that I want is as follows:
p <- ggplot(mg_data, aes(x=Treatment, y=CD68, color=Treatment)) +
geom_boxplot(mg_data, mapping=aes(x=Treatment, y=CD68))
p+ theme_classic() + geom_jitter(shape=16, position=position_jitter(0.2))
I can was able to use the following code to make looped boxplots:
variables <- mg_data %>%
select(10:17)
for(i in variables) {
print(ggplot(mg_data, aes(x = Treatment, y = i, color=Treatment)) +
geom_boxplot())
}
With this code I get the boxplots however, they do not have the name label of what variable is being select for the y-axis, unlike the original code when not using the for loop. I also do not know how to add the formating code to the loop:
p + theme_classic() + geom_jitter(shape=16, position=position_jitter(0.2))
Here is a way. I have tested with built-in data set iris, just change the data name and selected columns and it will work.
suppressPackageStartupMessages({
library(dplyr)
library(ggplot2)
})
variables <- iris %>%
select(1:4) %>%
names()
for(i in variables) {
g <- ggplot(iris, aes(x = Species, y = get(i), color=Species)) +
geom_boxplot() +
ylab(i)
print(g)
}
Edit
Answering to a comment by user TarJae, reproduced here because answers are less deleted than comments:
Could you please expand with saving all four files. Many thanks.
The code above can be made to save the plots with a ggsave instruction at the loop end. The filename is the variable name and the plot is the default, the return value of last_plot().
for(i in variables) {
g <- ggplot(iris, aes(x = Species, y = get(i), color=Species)) +
geom_boxplot() +
ylab(i)
print(g)
ggsave(paste0(i, ".png"), device = "png")
}
Try this:
variables <- mg_data %>%
colnames() %>%
`[`(10:17)
for (i in variables) {
print(ggplot(mg_data, aes(
x = Treatment, y = {{i}}, color = Treatment
)) +
geom_boxplot())
}
Another option is to use lapply. It's approximately the same as using a loop, but it hides the actual looping part and can make your code look a little cleaner.
variables = iris %>%
select(1:4) %>%
names()
lapply(variables, function(x) {
ggplot(iris, aes(x = Species, y = get(x), color=Species)) +
geom_boxplot() + ylab(x)
})
Related
I would like to create shorthand notations or functions that combines multiple geoms for ggplot.
For example, instead of
mtcars %>%
ggplot(aes(x = cyl, y = mpg)) +
geom_point() +
geom_smooth(method = "lm") +
ggpubr::stat_cor()
I would like to be able to create a function to combine the geoms like so
lm_and_cor <- function() {
geom_smooth(method = "lm", se = FALSE) +
stat_cor()
}
mtcars %>%
ggplot(aes(x = cyl, y = mpg)) +
geom_point() +
lm_and_cor()
I am aware that I can create functions that does all of the plotting, basically
plot_data <- function(x) {
x %>%
ggplot(aes(x = cyl, y = mpg)) +
geom_point() +
geom_smooth(method = "lm") +
ggpubr::stat_cor()
}
which to be fair does what I want, to some degree. However, I would instead like to combine multiple geoms in a single function, as the underlying geom (e.g. point, lines, etc.) will not always be the same. Is this doable, and is it feasible?
With ggplot2 you can use list of elements:
lm_and_cor <- function()
list(geom_smooth(method = "lm", se = FALSE),
ggpubr::stat_cor()
)
mtcars %>%
ggplot(aes(x = cyl, y = mpg)) +
geom_point() +
lm_and_cor()
Output:
Do you mean something like this?
You can store multiple geom in a list object.
Edit: I misunderstand the question. This should meet the expectation.
data(iris)
library(ggplot2)
x <- list(geom_point(), geom_line())
ggplot(iris, aes(Sepal.Length, Sepal.Width)) + x
Or if you want to make a function to plot by column use this {{variable}}.
library(dplyr)
plotting <- function(data, x, y){
data %>%
ggplot(aes({{x}}, {{y}})) +
geom_point() +
geom_smooth(method = "lm")}
plotting(iris, Sepal.Length, Sepal.Width)
I have a list of values and a list of ggplots. I would like to attach the values from the list on to the ggplots. Is there a good way to do that?
Here's what I have for the list of ggplots:
p.list <- lapply(sort(unique(ind_steps$AnimalID)), function(i){
ggplot(ind_steps[ind_steps$AnimalID == i,], aes(x = t2, y = NSD)) +
geom_line() + theme_bw() +
theme(axis.text.x = element_text(angle = 90)) +
scale_x_datetime(date_breaks = '10 days', date_labels = '%y%j') +
facet_grid( ~ AnimalID, scales = "free") +
scale_colour_manual(values=hcl(seq(15,365,length.out=4)[match(i, sort(unique(ind_steps$AnimalID)))], 100, 65))
})
Assuming I have another list the same length as this one, and each one has a single value in each list.
I want to pair the ggplots with the list of values, and have the values show up in each respective plot. My expected output would be to have each value from the list of values be on each respective plot within the list of plots.
Since you don't provide any example data here I put an example with the iris built-in dataset. You can add values to plots with geom_text or geom_label (if I well understood what you want). For example, here we add the R^2 values to all the plot in a list:
library(ggplot2)
data(iris)
rsq <- lapply(1:length(unique(iris$Species)), function(i) {
cor(iris[iris$Species == unique(iris$Species)[i], "Sepal.Length"], iris[iris$Species == unique(iris$Species)[i], "Petal.Length"])^2
})
p.list <- lapply(1:length(unique(iris$Species)), function(i) {
ggplot(iris[iris$Species == unique(iris$Species)[i], ], aes(x = Sepal.Length, y = Petal.Length)) +
geom_point() + theme_bw()+
geom_text(aes(x=min(Sepal.Length),y=max(Petal.Length),label=paste0("R= ",round(rsq[[i]],2))))
})
library(gridExtra)
grid.arrange(p.list[[1]],p.list[[2]],p.list[[3]],nrow=3)
which return :
How do I pass multiple arguments through to my ggplot function?
Here is an example of the plot I want to automate.
library(ggplot2)
library(scales)
p <- ggplot(diamonds, aes(x=cut, y=price) ) +
geom_boxplot() +
scale_y_continuous(labels = dollar)
p
But I want to graph multiple different variables and use the appropriate scale e.g. price, depth etc, some are in dollars.
So I made a function
myfunction <- function(var1,var2){
p <- ggplot(diamonds, aes(x=cut, y= var1) ) +
geom_boxplot() +
scale_y_continuous(labels = var2)
p
return(p)
}
When I test the function, it doesn't work. Both arguments cause different errors on their own.
myfunction("price","dollar")
For var1 I get:
Error: Discrete value supplied to continuous scale
and var2:
Error in f(..., self = self) : Breaks and labels are different lengths
Question 1. Why doesn't that work? This is the most important question for me.
I then wish to make multiple graphs, which I can do with a for loop, but I keep hearing I should do it with apply. Here's what I tried.
Question 2. How would you make the multiple graphs work with apply?
FirstPlotData <- c("price","dollar")
SecondPlotData <- c("depth", "comma")
plotMetaData <- data.frame(FirstPlotData,SecondPlotData)
lapply doesn't work for me with multiple arguments. Can it pass multiple arguments?
lapply(plotMetaData, function(avar,bvar)myfunction(avar, bvar))
Would mapply work? How?
mapply(mytestfunction,plotMetaData[1,],plotMetaDataList[2,])
Thanks in advance. I note that I could do the multiple graphs with facet, but for my more complex example, with hiding outliers, scaling, and also doing stats, then doing the multiple plots and putting in a {cowplot} grid seems easier.
Try this
library(ggplot2)
library(scales)
library(rlang) # for sym
myfunction <- function(var1,var2){
p <- ggplot(diamonds, aes(x=cut, y= !! sym(var1)) ) +
geom_boxplot() +
scale_y_continuous(labels = get(var2))
p
return(p)
}
myfunction('price','dollar')
You probably want aes_string. This function has been designed to make programming with ggplot easier (similar ideas have also been applied to dplyr commands). The following works:
library(tidyverse)
data(diamonds)
myfunction <- function(var1){
p <- ggplot(diamonds, aes_string(x="cut", y= var1) ) +
geom_boxplot()
p
return(p)
}
myfunction("price")
Why?
contrast the following:
# works
ggplot(diamonds, aes(x=cut, y= price) ) + geom_boxplot()
# these 2 are equivalent, but do not work
ggplot(diamonds, aes(x=cut, y= "price") ) + geom_boxplot()
var1 = "price"
ggplot(diamonds, aes(x=cut, y= var1) ) + geom_boxplot()
# these 2 are equivalent, both works but inputs are strings
ggplot(diamonds, aes_string(x="cut", y= "price") ) + geom_boxplot()
var1 = "price"
ggplot(diamonds, aes_string(x="cut", y= var1) ) + geom_boxplot()
Using apply?
For this purpose I would be inclined to use loops (others are welcome to disagree). If you are set on using an apply approach then you probably want apply as lapply, mapply, vapply and sapply are list-, multivariate-, vector- and simple-apply respectively.
A more ggplot way of doing this now, is using .data pronoun.
library(ggplot2)
myfunction <- function(var1, var2) {
p <- ggplot(diamonds, aes(x = cut, y = .data[[var1]])) +
geom_boxplot() +
scale_y_continuous(
labels = getFromNamespace(x = var2, ns = "scales")
)
p
return(p)
}
myfunction("price", "dollar")
myfunction("price", "comma")
Then to create multiple plots with these function by passing multiple arguments, a better and tidier approach is using map functions from the {purrr}
plots <- purrr::map2(
.x = c("price", "price"),
.y = c("dollar", "comma"),
.f = myfunction
)
So, plots[[1]] contains the 1st plot with var1 = "price" and var2 = "dollar" and plots[[2]] contains the 2nd plot with var1 = "price" and var2 = "comma".
I have composed a function that develops histograms using ggplot2 on the numerical columns of a dataframe that will be passed to it. The function stores these plots into a list and then returns the list.
However when I run the function I get the same plot again and again.
My code is the following and I provide also a reproducible example.
hist_of_columns = function(data, class, variables_to_exclude = c()){
library(ggplot2)
library(ggthemes)
data = as.data.frame(data)
variables_numeric = names(data)[unlist(lapply(data, function(x){is.numeric(x) | is.integer(x)}))]
variables_not_to_plot = c(class, variables_to_exclude)
variables_to_plot = setdiff(variables_numeric, variables_not_to_plot)
indices = match(variables_to_plot, names(data))
index_of_class = match(class, names(data))
plots = list()
for (i in (1 : length(variables_to_plot))){
p = ggplot(data, aes(x= data[, indices[i]], color= data[, index_of_class], fill=data[, index_of_class])) +
geom_histogram(aes(y=..density..), alpha=0.3,
position="identity", bins = 100)+ theme_economist() +
geom_density(alpha=.2) + xlab(names(data)[indices[i]]) + labs(fill = class) + guides(color = FALSE)
name = names(data)[indices[i]]
plots[[name]] = p
}
plots
}
data(mtcars)
mtcars$am = factor(mtcars$am)
data = mtcars
variables_to_exclude = 'mpg'
class = 'am'
plots = hist_of_columns(data, class, variables_to_exclude)
If you check the list plots you will discover that it contains the same plot repeated.
Simply use aes_string to pass string variables into the ggplot() call. Right now, your plot uses different data sources, not aligned with ggplot's data argument. Below x, color, and fill are separate, unrelated vectors though they derive from same source but ggplot does not know that:
ggplot(data, aes(x= data[, indices[i]], color= data[, index_of_class], fill=data[, index_of_class]))
However, with aes_string, passing string names to x, color, and fill will point to data:
ggplot(data, aes_string(x= names(data)[indices[i]], color= class, fill= class))
Here is strategy using tidyeval that does what you are after:
library(rlang)
library(tidyverse)
hist_of_cols <- function(data, class, drop_vars) {
# tidyeval overhead
class_enq <- enquo(class)
drop_enqs <- enquo(drop_vars)
data %>%
group_by(!!class_enq) %>% # keep the 'class' column always
select(-!!drop_enqs) %>% # drop any 'drop_vars'
select_if(is.numeric) %>% # keep only numeric columns
gather("key", "value", -!!class_enq) %>% # go to long form
split(.$key) %>% # make a list of data frames
map(~ ggplot(., aes(value, fill = !!class_enq)) + # plot as usual
geom_histogram() +
geom_density(alpha = .5) +
labs(x = unique(.$key)))
}
hist_of_cols(mtcars, am, mpg)
hist_of_cols(mtcars, am, c(mpg, wt))
I want to create multiple plots that have the same x but different y's using purrr package methodology. That is, I would like to use the map() or walk() functions to perform this.
Using mtcars dataset for simplicity.
ggplot(data = mtcars, aes(x = hp, y = mpg)) + geom_point()
ggplot(data = mtcars, aes(x = hp, y = cyl)) + geom_point()
ggplot(data = mtcars, aes(x = hp, y = disp)) + geom_point()
edit
So far I have tried
y <- list("mpg", "cyl", "disp")
mtcars %>% map(y, ggplot(., aes(hp, y)) + geom_point()
This is one possibility
ys <- c("mpg","cyl","disp")
ys %>% map(function(y)
ggplot(mtcars, aes(hp)) + geom_point(aes_string(y=y)))
It's just like any other map function, you just need to configure your aesthetics properly in the function.
I've made a bit more general function for this, because it's part of EDA protocol (Zuur et al., 2010). This article from Ariel Muldoon helped me.
plotlist <- function(data, resp, efflist) {
require(ggplot2)
require(purrr)
y <- enquo(resp)
map(efflist, function(x)
ggplot(data, aes(!!sym(x), !!y)) +
geom_point(alpha = 0.25, color = "darkgreen") +
ylab(NULL)
)
}
where:
data is your dataframe
resp is response variable
efflist is a char of effects (independent variables)
Of course, you may change the geom and/or aesthetics as it needs. The function returns a list of plots which you can pass to e.g. cowplot or gridExtra as in example:
library(gridExtra)
library(dplyr) # just for pipes
plotlist(mtcars, hp, c("mpg","cyl","disp")) %>%
grid.arrange(grobs = ., left = "HP")