Looping over variables in ggplot - r

I want to use ggplot to loop over several columns to create multiple plots, but using the placeholder in the for loop changes the behavior of ggplot.
If I have this:
t <- data.frame(w = c(1, 2, 3, 4), x = c(23,45,23, 34),
y = c(23,34,54, 23), z = c(23,12,54, 32))
This works fine:
ggplot(data=t, aes(w, x)) + geom_line()
But this does not:
i <- 'x'
ggplot(data=t, aes(w, i)) + geom_line()
Which is a problem if I want to eventually loop over x, y and z.
Any help?

You just need to use aes_string instead of aes, like this:
ggplot(data=t, aes_string(x = "w", y = i)) + geom_line()
Note that w then needs to be specified as a string, too.

ggplot2 > 3.0.0 supports tidy evaluation pronoun .data. So we can do the following:
Build a function that takes x- & y- column names as inputs. Note the use of .data[[]].
Then loop through every column using purrr::map.
library(rlang)
library(tidyverse)
dt <- data.frame(
w = c(1, 2, 3, 4), x = c(23, 45, 23, 34),
y = c(23, 34, 54, 23), z = c(23, 12, 54, 32)
)
Define a function that accept strings as input
plot_for_loop <- function(df, x_var, y_var) {
ggplot(df, aes(x = .data[[x_var]], y = .data[[y_var]])) +
geom_point() +
geom_line() +
labs(x = x_var, y = y_var) +
theme_classic(base_size = 12)
}
Loop through every column
plot_list <- colnames(dt)[-1] %>%
map( ~ plot_for_loop(dt, colnames(dt)[1], .x))
# view all plots individually (not shown)
plot_list
# Combine all plots
library(cowplot)
plot_grid(plotlist = plot_list,
ncol = 3)
Edit: the above function can also be written w/ rlang::sym & !! (bang bang).
plot_for_loop2 <- function(df, .x_var, .y_var) {
# convert strings to variable
x_var <- sym(.x_var)
y_var <- sym(.y_var)
# unquote variables using !!
ggplot(df, aes(x = !! x_var, y = !! y_var)) +
geom_point() +
geom_line() +
labs(x = x_var, y = y_var) +
theme_classic(base_size = 12)
}
Or we can just use facet_grid/facet_wrap after convert the data frame from wide to long format (tidyr::gather)
dt_long <- dt %>%
tidyr::gather(key, value, -w)
dt_long
#> w key value
#> 1 1 x 23
#> 2 2 x 45
#> 3 3 x 23
#> 4 4 x 34
#> 5 1 y 23
#> 6 2 y 34
#> 7 3 y 54
#> 8 4 y 23
#> 9 1 z 23
#> 10 2 z 12
#> 11 3 z 54
#> 12 4 z 32
### facet_grid
ggp1 <- ggplot(dt_long,
aes(x = w, y = value, color = key, group = key)) +
facet_grid(. ~ key, scales = "free", space = "free") +
geom_point() +
geom_line() +
theme_bw(base_size = 14)
ggp1
### facet_wrap
ggp2 <- ggplot(dt_long,
aes(x = w, y = value, color = key, group = key)) +
facet_wrap(. ~ key, nrow = 2, ncol = 2) +
geom_point() +
geom_line() +
theme_bw(base_size = 14)
ggp2
### bonus: reposition legend
# https://cran.r-project.org/web/packages/lemon/vignettes/legends.html
library(lemon)
reposition_legend(ggp2 + theme(legend.direction = 'horizontal'),
'center', panel = 'panel-2-2')

The problem is how you access the data frame t. As you probably know, there are several ways of doing so but unfortunately using a character is obviously not one of them in ggplot.
One way that could work is using the numerical position of the column in your example, e.g., you could try i <- 2. However, if this works rests on ggplot which I have never used (but I know other work by Hadley and I guess it should work)
Another way of circumventing this is by creating a new temporary data frame every time you call ggplot. e.g.:
tmp <- data.frame(a = t[['w']], b = t[[i]])
ggplot(data=tmp, aes(a, b)) + geom_line()

Depending on what you are trying to do, I find facet_wrap or facet_grid to work well for creating multiple plots with the same basic structure. Something like this should get you in the right ballpark:
t.m = melt(t, id="w")
ggplot(t.m, aes(w, value)) + facet_wrap(~ variable) + geom_line()

Related

Detecting programmatically whether axis labels overlap

Is there a way to detect whether axis labels overlap in ggplot2 programmatically?
Suppose I create the following graph:
library(dplyr)
library(tibble)
library(ggplot2)
dt <- mtcars %>% rownames_to_column("name") %>%
dplyr::filter(cyl == 8)
ggplot(dt, aes(x = name, y = mpg)) + geom_point()
I want to programmatically detect whether x-axis labels are overlapping and apply the following first remedy:
ggplot(dt, aes(x = name, y = mpg)) + geom_point() +
scale_x_discrete(guide = guide_axis(n.dodge = 2))
Here is the tricky part. Say the dimensions are different and first remedy also overlaps like this:
I want to apply a second remedy like this:
ggplot(dt, aes(x = name, y = mpg)) + geom_point() +
theme(axis.text.x = element_text(angle=45, hjust = 1, vjust = 1))
Is it possible without visually inspecting the graph?
Not a definitive solution, but if we consider the margins constant, we can do some simple subtraction:
library(dplyr)
library(tibble)
library(ggplot2)
dt <- mtcars %>% rownames_to_column("name") %>%
dplyr::filter(cyl == 8)
p <- ggplot(dt, aes(x = name, y = mpg)) +
geom_point()
# variable part
font_size <- 9 #points, the ggplot default
full_width <- 15 #cm
full_height <- 10 #cm
cm_to_pt <- 28.35 # 1 cm = 28.35 points
# try varying width
for(full_width in c(30, 40, 45, 50)){
axis_text_length_pt <- ceiling(max(nchar(dt$name))/2)*font_size
axis_available_pt <- full_width/n_distinct(dt$name)*cm_to_pt
do_not_touch <- axis_text_length_pt <= axis_available_pt
p +
theme(axis.text.x = element_text(size=font_size)) +
geom_text(aes(x=5,y=15, label=do_not_touch))
ggsave(paste0("tmp_",full_width,".png"),
width = full_width, height = full_height, unit = "cm")
}
At 40 cm we still have the Hornet Sportabout and the Lincoln Continental touching, at 45 cm they separate.

R: pairs plot of one variable with the rest of the variables

I would like to generate a correlation plot with my "True" variable pairs with all of the rest (People variables). I am pretty sure this has been brought up somewhere but solutions I have found do not work for me.
library(ggplot2)
set.seed(0)
dt = data.frame(matrix(rnorm(120, 100, 5), ncol = 6) )
colnames(dt) = c('Salary', paste0('People', 1:5))
ggplot(dt, aes(x=Salary, y=value)) +
geom_point() +
facet_grid(.~Salary)
Where I got error: Error: Column y must be a 1d atomic vector or a list.
I know one of the solutions is writing out all of the variables in y - which I am trying to avoid because my true data has 15 columns.
Also I am not entirely sure what do the "value", "variables" refer to in the ggplot. I saw them a lot in demonstrating codes.
Any suggestion is appreciated!
You want to convert your data from wide to long format using tidyr::gather() for example. Here is a solution using packages in the tidyverse framework
library(tidyr)
library(ggplot2)
theme_set(theme_bw(base_size = 14))
set.seed(0)
dt = data.frame(matrix(rnorm(120, 100, 5), ncol = 6) )
colnames(dt) = c('Salary', paste0('People', 1:5))
### convert data frame from wide to long format
dt_long <- gather(dt, key, value, -Salary)
head(dt_long)
#> Salary key value
#> 1 106.31477 People1 98.87866
#> 2 98.36883 People1 101.88698
#> 3 106.64900 People1 100.66668
#> 4 106.36215 People1 104.02095
#> 5 102.07321 People1 99.71447
#> 6 92.30025 People1 102.51804
### plot
ggplot(dt_long, aes(x = Salary, y = value)) +
geom_point() +
facet_grid(. ~ key)
### if you want to add regression lines
library(ggpmisc)
# define regression formula
formula1 <- y ~ x
ggplot(dt_long, aes(x = Salary, y = value)) +
geom_point() +
facet_grid(. ~ key) +
geom_smooth(method = 'lm', se = TRUE) +
stat_poly_eq(aes(label = paste(..eq.label.., ..rr.label.., sep = "~~")),
label.x.npc = "left", label.y.npc = "top",
formula = formula1, parse = TRUE, size = 3) +
coord_equal()
### if you also want ggpairs() from the GGally package
library(GGally)
ggpairs(dt)
Created on 2019-02-28 by the reprex package (v0.2.1.9000)
You need to stack() your data first, probably that's what you have "seen".
dt <- setNames(stack(dt), c("value", "Salary"))
library(ggplot2)
ggplot(dt, aes(x=Salary, y=value)) +
geom_point() +
facet_grid(.~Salary)
Yields

Refering a column by its name in ggplot in a loop [duplicate]

This question already has answers here:
How to use a variable to specify column name in ggplot
(6 answers)
Closed 4 years ago.
df <- data.frame(id = rep(1:6, each = 50), a = rnorm(50*6, mean = 10, sd = 5),
b = rnorm(50*6, mean = 20, sd = 10),
c = rnorm(50*6, mean = 30, sd = 15))
I have three variables a,b and c. If I have to plot a variable for all loc.id
ggplot(df, aes(a)) + geom_histogram() + facet_wrap(~id)
I have a loop for which I have to plot a, b and c.
var.names <- c("a","b","c")
for(v in seq_along(var.names)){
variable <- var.names[v]
ggplot(df, aes(x = paste0(variable))) + geom_histogram() + facet_wrap(~id)
}
This loop does not work. I was wondering how do I refer to a column in the above command by its name.My actual data
has many variables and hence I was doing like this.
We can use aes_string to pass strings
l1 <- vector("list", length(var.names))
for(v in seq_along(var.names)){
variable <- var.names[v]
l1[[v]] <- ggplot(df, aes_string(x = variable)) +
geom_histogram() +
facet_wrap(~id)
}
Or another option in the dev version should be to convert the string to symbol (rlang::sym) and evaluate (!!) within the aes
for(v in seq_along(var.names)){
variable <- rlang::sym(var.names[v])
l1[[v]] <- ggplot(df, aes(x = !!variable)) +
geom_histogram() +
facet_wrap(~id)
}
The plots stored in the list can be saved in a .pdf file
library(gridExtra)
l2 <- map(l1, ggplotGrob)
ggsave(marrangeGrob(grobs = l2, nrow = 1, ncol = 1), file = 'plots.pdf')
If we need to overlay the three plots in a single page, use gather to convert to 'long' format
library(tidyr)
library(dplyr)
gather(df, key, val, var.names) %>%
ggplot(., aes(x = val, fill = key)) +
geom_histogram() +
facet_wrap(~id)
-output

in R: create scatter plots using ggplot2 inside a for-loop inside a function, differ continuous/discrete variables [duplicate]

I want to use ggplot to loop over several columns to create multiple plots, but using the placeholder in the for loop changes the behavior of ggplot.
If I have this:
t <- data.frame(w = c(1, 2, 3, 4), x = c(23,45,23, 34),
y = c(23,34,54, 23), z = c(23,12,54, 32))
This works fine:
ggplot(data=t, aes(w, x)) + geom_line()
But this does not:
i <- 'x'
ggplot(data=t, aes(w, i)) + geom_line()
Which is a problem if I want to eventually loop over x, y and z.
Any help?
You just need to use aes_string instead of aes, like this:
ggplot(data=t, aes_string(x = "w", y = i)) + geom_line()
Note that w then needs to be specified as a string, too.
ggplot2 > 3.0.0 supports tidy evaluation pronoun .data. So we can do the following:
Build a function that takes x- & y- column names as inputs. Note the use of .data[[]].
Then loop through every column using purrr::map.
library(rlang)
library(tidyverse)
dt <- data.frame(
w = c(1, 2, 3, 4), x = c(23, 45, 23, 34),
y = c(23, 34, 54, 23), z = c(23, 12, 54, 32)
)
Define a function that accept strings as input
plot_for_loop <- function(df, x_var, y_var) {
ggplot(df, aes(x = .data[[x_var]], y = .data[[y_var]])) +
geom_point() +
geom_line() +
labs(x = x_var, y = y_var) +
theme_classic(base_size = 12)
}
Loop through every column
plot_list <- colnames(dt)[-1] %>%
map( ~ plot_for_loop(dt, colnames(dt)[1], .x))
# view all plots individually (not shown)
plot_list
# Combine all plots
library(cowplot)
plot_grid(plotlist = plot_list,
ncol = 3)
Edit: the above function can also be written w/ rlang::sym & !! (bang bang).
plot_for_loop2 <- function(df, .x_var, .y_var) {
# convert strings to variable
x_var <- sym(.x_var)
y_var <- sym(.y_var)
# unquote variables using !!
ggplot(df, aes(x = !! x_var, y = !! y_var)) +
geom_point() +
geom_line() +
labs(x = x_var, y = y_var) +
theme_classic(base_size = 12)
}
Or we can just use facet_grid/facet_wrap after convert the data frame from wide to long format (tidyr::gather)
dt_long <- dt %>%
tidyr::gather(key, value, -w)
dt_long
#> w key value
#> 1 1 x 23
#> 2 2 x 45
#> 3 3 x 23
#> 4 4 x 34
#> 5 1 y 23
#> 6 2 y 34
#> 7 3 y 54
#> 8 4 y 23
#> 9 1 z 23
#> 10 2 z 12
#> 11 3 z 54
#> 12 4 z 32
### facet_grid
ggp1 <- ggplot(dt_long,
aes(x = w, y = value, color = key, group = key)) +
facet_grid(. ~ key, scales = "free", space = "free") +
geom_point() +
geom_line() +
theme_bw(base_size = 14)
ggp1
### facet_wrap
ggp2 <- ggplot(dt_long,
aes(x = w, y = value, color = key, group = key)) +
facet_wrap(. ~ key, nrow = 2, ncol = 2) +
geom_point() +
geom_line() +
theme_bw(base_size = 14)
ggp2
### bonus: reposition legend
# https://cran.r-project.org/web/packages/lemon/vignettes/legends.html
library(lemon)
reposition_legend(ggp2 + theme(legend.direction = 'horizontal'),
'center', panel = 'panel-2-2')
The problem is how you access the data frame t. As you probably know, there are several ways of doing so but unfortunately using a character is obviously not one of them in ggplot.
One way that could work is using the numerical position of the column in your example, e.g., you could try i <- 2. However, if this works rests on ggplot which I have never used (but I know other work by Hadley and I guess it should work)
Another way of circumventing this is by creating a new temporary data frame every time you call ggplot. e.g.:
tmp <- data.frame(a = t[['w']], b = t[[i]])
ggplot(data=tmp, aes(a, b)) + geom_line()
Depending on what you are trying to do, I find facet_wrap or facet_grid to work well for creating multiple plots with the same basic structure. Something like this should get you in the right ballpark:
t.m = melt(t, id="w")
ggplot(t.m, aes(w, value)) + facet_wrap(~ variable) + geom_line()

Remove Factors with no data in facet grouping variable

I have the following data :
data <- data.frame(x = letters[1:6],
group = rep(letters[1:2], each = 3),
y = 1:6)
x group y
1 a a 1
2 b a 2
3 c a 3
4 d b 4
5 e b 5
6 f b 6
And I would like to plot y ~ x and split into facets by groups with ggplot2.
ggplot(data, aes(x, y)) +
geom_bar(stat = "identity") +
facet_grid(group ~ .)
The problem is that some tuples (x; group) don't exist in my data(for example there is no data for x = a && group = b) , but they are kept in the x-axis of both facets so I would like to remove them and then remove white spaces in the facets when factors are missing in respective groups.
I thought scales = "free_x" or drop = TRUE could do the trick but I couldn't manage to do it.
Any help would be appreciated, Thanks !
Use facet_wrap instead
ggplot(data, aes(x, y)) +
geom_col() +
facet_wrap(~group, scales = 'free', nrow = 2, strip.position = 'right')
also note geom_col as an alternative to using identity

Resources