With the amazing help of #Tung we have created a function that creates a list of ggplots through a loop using purrr::pwalk. However, the ploblem is that the plots are printed automatically and it is not possible (or I am not able to solve the problem) to save them as a list of plots. I am coming from this post: Passing labels to xlab and ylab in ggplot2
NOTE: I need to change the ylab and xlab labels from each plot.
The function to plot is as follows:
library(tidyverse)
plot_scatter_with_label <- function(dat,
var_x,
var_y,
label_x,
label_y,
geom_smooth = FALSE,
point_shape = 16,
point_color = "#EB3300",
point_size = 1,
point_alpha = 1,
smooth_method = "loess",
smooth_se = FALSE,
smooth_color = "navy") {
if (is.character(var_x)) {
print('character column names supplied, use rlang::sym()')
var_x <- rlang::sym(var_x)
} else {
print('bare column names supplied, use dplyr::enquo()')
var_x <- enquo(var_x)
}
if (is.character(var_y)) {
var_y <- rlang::sym(var_y)
} else {
var_y <- enquo(var_y)
}
p <- ggplot(dat, aes(x = !! var_x, y = !! var_y)) +
geom_point(shape = point_shape, color = point_color,
size = point_size, alpha = point_alpha) +
ylab(label_y) +
xlab(label_x) +
ggtitle(paste0(label_x, " ~ ", label_y))
print(p)
}
Create a data frame so that we can loop through every row and column
var_y = c("mpg", "hp")
label_y = c("Miles per gallon [Mpg]", "Horse power [CV]")
var_x = c("cyl", "gear")
label_x = c("Cylinders [n]", "Gear [n]")
var_xy <- expand.grid(var_x, var_y, stringsAsFactors = FALSE)
label_xy <- expand.grid(label_x, label_y, stringsAsFactors = FALSE)
select_dat <- data.frame(var_xy, label_xy, stringsAsFactors = FALSE)
pwalk(select_dat, ~ plot_scatter_with_label(mtcars, ..1, ..2,..3,..4))
The problem is that using pwalk and I am guessing that due to the print(p) from the function plot_scatter_with_label, the plots are automatically displayed. Instead, I would like to save them in a list of plots. For example:
I would like:
p_list = pwalk(select_dat, ~ plot_scatter_with_label(mtcars, ..1, ..2,..3,..4))
where p_list is a list of plots to "play with" using some function to arrange them like
cowplot::plot_grid(plotlist=p_list, nrow=3,ncol=2)
or
ggpubr::ggarrange(plotlist=p_list, nrow=3,ncol=2)
#Tung has recomended me to have a look at this post: Multiple plots in for loop ignoring par
However, I am still unable to find the solution.
Any help will he highly appreciated.
Thanks a lot in advance,
Best regards,
Juan Antonio
Edit:: corrected spelling of function plotlist
The pwalk function is explicitly designed not to return an output, but instead to focus on side-effects (like printing, reading/writing, or plotting). It is an alternative function to pmap, which does return its output.
You could return the plots in a list like this:
Change the last line of your custom function from print(p) to return(p)
Use the pmap function rather than pwalk
Related
I am trying to plot multiple "ggplot" in the same page using for loop. The for loop is used because the number of the plots is non-determined "dynamic". The x-axis of the plots will be changed within each iteration. I read so many articles about creating an empty list then add each plot in specific index of the list, then use "multiplot" function to display all the plots of the list in one page. However, This not working !!
The problems is the program ends-up by printing only the last plot information saved at the last index of the list with different labels! the code and the figure below indicates the idea.
`
Howmany <- readline(prompt="Specify the number of the independent variables: ")
Howmany <- as.numeric(Howmany)
plot_lst <- vector("list", length = Howmany) #' an empty list
for ( i in 1:Howmany){
plot_lst[[i]] <- ggplot(data=data, aes(x=data[, c(i)], y=data$gender)) +
geom_point(aes(size = 5)) +
scale_color_discrete(name = "dependent_variable" + labs(
title = (paste("Logistic Regression Fitting Model",i)),
x = names(data)[i],
y = "gender")
}
multiplot(plotlist = plot_lst, cols = 1)
I really appreciate any suggestion.
I also tried what were suggested in this link: show multiple plots from ggplot on one page in r
However, still am facing the same problem.
Because ggplot's aes() is using lazy evaluation you need to force evaluation in each iteration of the loop (otherwise all plots will be the same on the last position of i).
One way to do this is by wrapping the righthand side of the assignment in local() and use i <- i:
The labs(x = ...) seemed not to be correct so I rewrote it as:
x = names(data)[i], please check if that works for you.
plot_lst <- vector("list", length = Howmany) #' an empty list
for (i in 1:Howmany) {
plot_lst[[i]] <- local({
i <- i
ggplot(data=data, aes(x=data[, c(i)], y=data$gender)) +
geom_point(aes(size = 5)) +
scale_color_discrete(name = "dependent_variable") +
labs(
title = (paste("Logistic Regression Fitting Model", i)),
x = names(data)[i],
y = "gender")
})
}
Below is one example using the iris data set. If we print plot_lst we can see three different plots.
I assume the function multiplot is from the scatter package, which is not working with the latest R version, so I can't reproduce if this is working correctly.
Howmany <- readline(prompt="Specify the number of the independent variables: ")
Howmany <- as.numeric(Howmany)
plot_lst <- vector("list", length = Howmany) #' an empty list
for ( i in 1:Howmany){
plot_lst[[i]] <- local({
i <- i
ggplot(data = iris,
aes(x = iris[, c(i)],
y = iris$Species)) +
geom_point(aes(size = 5)) +
scale_color_discrete(name = "dependent_variable") +
labs(
title = paste("Logistic Regression Fitting Model", i),
x = names(data)[i],
y = "species"
)
})
}
plot_lst
this sounds like a very trivial question at first, but no one managed to help me thus far, hence I'm reaching out to you all.
I'd like to do the following:
I'm writing a simple function that allows me to plot two variables against each other, with a third variable coloring the observation points (depending on the corresponding value of the color variable). The code looks like that:
scatterplot <- function(data_used, x.variable, y.variable, color.variable) {
ggplot(data_used, aes(x=x.variable, y = y.variable)) +
geom_point(aes_string(color = color.variable))
}
scatterplot(data_used = example_data, x.variable = example_data$education,
y.variable = example_data$wages,
color.variable = example_data$sex)
What I would like R to do now is to label the x- and y-axis (respectively) by the corresponding variable's name that I decide to be plotted. In this example here, x-axis would be 'education', y-axis would be 'wages'.
I tried to simply put + labs (x = x.variable, y = y.variable) and it doesn't work (when doing that, R labels the axes by the variable values!). By default, R just names the axes "x.variable" and "y.variable".
Can someone help me achieve what I'm trying to do?
Best regards,
xifrix
jpenzer's answer is a good one. Here it is without the quasi-quotation stuff.
scatterplot <- function(data_used, x.variable, y.variable, color.variable) {
ggplot(data_used, aes_string(x=x.variable, y = y.variable)) +
geom_point(aes_string(color = color.variable)) +
labs(x=x.variable, y=y.variable, colour=color.variable)
}
mtcars %>%
mutate(am = as.factor(am)) %>%
scatterplot(., x.variable = "hp",
y.variable = "mpg",
color.variable = "am")
I'm not sure the quasi-quotation stuff is 100% necessary in hindsight, but this is the pattern I use for similar needs:
my_scatterplot <- function(data, x, y){
.x = rlang::enquo(x)
.y = rlang::enquo(y)
data %>%
ggplot(aes(x = x, y = y))+
geom_point()+
labs(x = .x,
y = .y)
}
Let me know if it doesn't work for you, it should though. edit: Should add after DaveArmstrong's answer, the function would be called without quotes for the x / y variable e.g.
diamonds %>% my_scatterplot(price, table)
To pass a column name in the function you could use double curly braces {{...}} around the desired column name in the function body:
library(dplyr)
library(ggplot2)
scatterplot <- function(data_used, x.variable, y.variable, color.variable) {
ggplot(data_used, aes_string({{x.variable}}, {{y.variable}})) +
geom_point(aes_string(color = {{color.variable}})) +
labs(x=x.variable, y=y.variable, colour=color.variable)
}
scatterplot(mtcars %>% mutate(am = as.factor(am)), x.variable = "mpg",
y.variable = "hp",
color.variable = "am")
I'd like to create a map that shows the value of variable for a given state. The dataset contains around a thousand variables and is at the state level, for about 100 years.
The code I have and works is:
plot_usmap(data = database, values = "var1") + scale_fill_continuous(
low = "white", high = "blue", na.value="light gray", name = "Title of graph", label = scales::comma
) + theme(legend.position = "right")
Now what I'd like to do is create this map for a list of about 15 variables and 10 years.
I'm usually a STATA user, and there I could define a variable list and then loop through the variable list. On page 7 of this document of "A Quick Introduction to R (for Stata Users)", I tried applying the following solution:
vars <- c("database$var1", "database$var2", "database$var3","database$var4", "database$var5", "database$var6", "database$var7", "database$var8", "database$var9", "database$var10", "database$var11", "database$var12")
for(var in vars) {
v <- get(var)
plot_usmap(data = darabase, values = "v") +
scale_fill_continuous(low = "white", high = "blue", na.value="light gray", name = "v", label = scales::comma) + theme(legend.position = "right")}
With this code, I get error "Error in get(var) : object 'database$var1' not found. When I try view(database$var1) it appears. The next problem is that I'd like the name of the graph to be the label of the variable rather than the variable. In the example above, I'd restricted the whole data to only include 1 year, so if there's a solution to set the code up that I could use the whole database but map only select years, that would be great.
Any insights would be appreciated! I read that in R, "for" isn't used as much, so if there is a better way to do it, please let me know.
Basically it't not that different in R. First, there is no need to use get and in general should be avoided. Second, while for loops are fine the more R-ish way would be to use lapply. Especially when making plots via ggplot2 it is recommended to use lapply.
Making use of some fake example data to mimic your database:
library(usmap)
library(ggplot2)
# Example data
database <- statepop
names(database) <- c("fips", "abbr", "full", "var1")
database$var2 <- database$var1
vars <- c("var1", "var2")
lapply(vars, function(x) {
plot_usmap(data = database, values = x) +
scale_fill_continuous(
low = "white", high = "blue", na.value="light gray", name = "Title of graph", label = scales::comma
) +
theme(legend.position = "right") +
labs(title = x)
})
#> [[1]]
#>
#> [[2]]
EDIT Assuming that your data contains a column with years I would suggest to wrap the plotting code inside a function which takes your database, a vectors of vars and the desired year as a argument. But there are other approaches and which works best depends on your desired result.
library(usmap)
library(ggplot2)
library(labelled)
# Example data
database <- statepop
names(database) <- c("fips", "abbr", "full", "var1")
database$year <- 2015
database <- rbind(database, transform(database, year = 2020))
var_label(database$var1) <- "Population"
vars <- c("var1")
names(vars) <- vars
map_vars <- function(.data, vars, year) {
lapply(vars, function(x, year) {
.data <- .data[.data$year == year, ]
plot_usmap(data = database, values = x) +
scale_fill_continuous(
low = "white", high = "blue", na.value = "light gray", name = "Title of graph", label = scales::comma
) +
theme(legend.position = "right") +
labs(title = paste(var_label(database[[x]]), "in", year))
}, year = year)
}
map_vars(database, vars, 2015)
#> $var1
map_vars(database, vars, 2020)
#> $var1
For a function, I need to keep variable names in a vector and I use a function to plot density graphs of my variables.
My problem is as follows in summary ;
var_names <- c("mpg","cyl","disp","hp","drat","wt","qsec","vs","am","gear","carb")
plotter <- function(x){
plot(density(x),
main = "",
xlab = "",
ylab = "")
title(var_names)
}
par(mfrow=c(4,3),mar=c(1,1,1,1))
apply(mtcars,2,plotter)
Couldn't imagine how I can match them.
var_names <- c("mpg","cyl","disp","hp","drat","wt","qsec","vs","am","gear","carb")
plotter <- function(x, var){
plot(density(x[[var]]),
main = var,
xlab = "",
ylab = "")
}
par(mfrow=c(4,3),mar=c(2.1,2.1,2.1,1))
for(vn in var_names) plotter(mtcars, vn)
will yield
for loops are discouraged as they are slow. However in conjunction with plotting, which is slow in its own way or if the loop is only run for 11 times as in this example, for loops are perfectly fine and beginner friendly.
If you really need an apply-family function of plotters to have only one argument, the following will do:
var_names <- c("mpg","cyl","disp","hp","drat","wt","qsec","vs","am","gear","carb")
plotter <- function(x){
plot(density(x[[1]]),
main = names(x),
xlab = "",
ylab = "")
}
par(mfrow=c(4,3),mar=c(2.1,2.1,2.1,1))
sapply(1:11,function(n) plotter(mtcars[n]))
I would suggest a tidyverse approach with ggplot2 and the vector of names you have. You can format your data to longer and then filter the desired variables. Using facets and geom_density() you can avoid issues with titles. Here the code:
library(tidyverse)
#Vector
var_names <- c("mpg","cyl","disp","hp","drat","wt","qsec","vs","am","gear","carb")
#Data
mtcars %>% pivot_longer(cols = everything()) %>%
filter(name %in% var_names) %>%
ggplot(aes(x=value))+
geom_density()+
facet_wrap(.~name,scales = 'free')+
theme_bw()
Output:
This question is related to
Create custom geom to compute summary statistics and display them *outside* the plotting region
(NOTE: All functions have been simplified; no error checks for correct objects types, NAs, etc.)
In base R, it is quite easy to create a function that produces a stripchart with the sample size indicated below each level of the grouping variable: you can add the sample size information using the mtext() function:
stripchart_w_n_ver1 <- function(data, x.var, y.var) {
x <- factor(data[, x.var])
y <- data[, y.var]
# Need to call plot.default() instead of plot because
# plot() produces boxplots when x is a factor.
plot.default(x, y, xaxt = "n", xlab = x.var, ylab = y.var)
levels.x <- levels(x)
x.ticks <- 1:length(levels(x))
axis(1, at = x.ticks, labels = levels.x)
n <- sapply(split(y, x), length)
mtext(paste0("N=", n), side = 1, line = 2, at = x.ticks)
}
stripchart_w_n_ver1(mtcars, "cyl", "mpg")
or you can add the sample size information to the x-axis tick labels using the axis() function:
stripchart_w_n_ver2 <- function(data, x.var, y.var) {
x <- factor(data[, x.var])
y <- data[, y.var]
# Need to set the second element of mgp to 1.5
# to allow room for two lines for the x-axis tick labels.
o.par <- par(mgp = c(3, 1.5, 0))
on.exit(par(o.par))
# Need to call plot.default() instead of plot because
# plot() produces boxplots when x is a factor.
plot.default(x, y, xaxt = "n", xlab = x.var, ylab = y.var)
n <- sapply(split(y, x), length)
levels.x <- levels(x)
axis(1, at = 1:length(levels.x), labels = paste0(levels.x, "\nN=", n))
}
stripchart_w_n_ver2(mtcars, "cyl", "mpg")
While this is a very easy task in base R, it is maddingly complex in ggplot2 because it is very hard to get at the data being used to generate the plot, and while there are functions equivalent to axis() (e.g., scale_x_discrete, etc.) there is no equivalent to mtext() that lets you easily place text at specified coordinates within the margins.
I tried using the built in stat_summary() function to compute the sample sizes (i.e., fun.y = "length") and then place that information on the x-axis tick labels, but as far as I can tell, you can't extract the sample sizes and then somehow add them to the x-axis tick labels using the function scale_x_discrete(), you have to tell stat_summary() what geom you want it to use. You could set geom="text", but then you have to supply the labels, and the point is that the labels should be the values of the sample sizes, which is what stat_summary() is computing but which you can't get at (and you would also have to specify where you want the text to be placed, and again, it is difficult to figure out where to place it so that it lies directly underneath the x-axis tick labels).
The vignette "Extending ggplot2" (http://docs.ggplot2.org/dev/vignettes/extending-ggplot2.html) shows you how to create your own stat function that allows you to get directly at the data, but the problem is that you always have to define a geom to go with your stat function (i.e., ggplot thinks you want to plot this information within the plot, not in the margins); as far as I can tell, you can't take the information you compute in your custom stat function, not plot anything in the plot area, and instead pass the information to a scales function like scale_x_discrete(). Here was my try at doing it this way; the best I could do was place the sample size information at the minimum value of y for each group:
StatN <- ggproto("StatN", Stat,
required_aes = c("x", "y"),
compute_group = function(data, scales) {
y <- data$y
y <- y[!is.na(y)]
n <- length(y)
data.frame(x = data$x[1], y = min(y), label = paste0("n=", n))
}
)
stat_n <- function(mapping = NULL, data = NULL, geom = "text",
position = "identity", inherit.aes = TRUE, show.legend = NA,
na.rm = FALSE, ...) {
ggplot2::layer(stat = StatN, mapping = mapping, data = data, geom = geom,
position = position, inherit.aes = inherit.aes, show.legend = show.legend,
params = list(na.rm = na.rm, ...))
}
ggplot(mtcars, aes(x = factor(cyl), y = mpg)) + geom_point() + stat_n()
I thought I had solved the problem by simply creating a wrapper function to ggplot:
ggstripchart <- function(data, x.name, y.name,
point.params = list(),
x.axis.params = list(labels = levels(x)),
y.axis.params = list(), ...) {
if(!is.factor(data[, x.name]))
data[, x.name] <- factor(data[, x.name])
x <- data[, x.name]
y <- data[, y.name]
params <- list(...)
point.params <- modifyList(params, point.params)
x.axis.params <- modifyList(params, x.axis.params)
y.axis.params <- modifyList(params, y.axis.params)
point <- do.call("geom_point", point.params)
stripchart.list <- list(
point,
theme(legend.position = "none")
)
n <- sapply(split(y, x), length)
x.axis.params$labels <- paste0(x.axis.params$labels, "\nN=", n)
x.axis <- do.call("scale_x_discrete", x.axis.params)
y.axis <- do.call("scale_y_continuous", y.axis.params)
stripchart.list <- c(stripchart.list, x.axis, y.axis)
ggplot(data = data, mapping = aes_string(x = x.name, y = y.name)) + stripchart.list
}
ggstripchart(mtcars, "cyl", "mpg")
However, this function does not work correctly with faceting. For example:
ggstripchart(mtcars, "cyl", "mpg") + facet_wrap(~am)
shows the the sample sizes for both facets combined for each facet. I would have to build faceting into the wrapper function, which defeats the point of trying to use everything ggplot has to offer.
If anyone has any insights to this problem I would be grateful. Thanks so much for your time!
I have updated the EnvStats
package to include a stat called stat_n_text which will add the sample size (the number of unique y-values) below each unique x-value. See the help file for stat_n_text for more information and a list of examples. Below is a simple example:
library(ggplot2)
library(EnvStats)
p <- ggplot(mtcars,
aes(x = factor(cyl), y = mpg, color = factor(cyl))) +
theme(legend.position = "none")
p + geom_point() +
stat_n_text() +
labs(x = "Number of Cylinders", y = "Miles per Gallon")
My solution might be a little simple but it works well.
Given an example with faceting by am I start by creating labels
using paste and \n.
mtcars2 <- mtcars %>%
group_by(cyl, am) %>% mutate(n = n()) %>%
mutate(label = paste0(cyl,'\nN = ',n))
I then use these labels instead of cyl in the ggplot code
ggplot(mtcars2,
aes(x = factor(label), y = mpg, color = factor(label))) +
geom_point() +
xlab('cyl') +
facet_wrap(~am, scales = 'free_x') +
theme(legend.position = "none")
To produce something like the figure below.
You can print the counts below the x-axis labels using geom_text if you turn off clipping, but you'll probably have to tweak the placement. I've included a "nudge" parameter for that in the code below. Also, the method below is intended for cases where all the facets (if any) are column facets.
I realize you ultimately want code that will work inside a new geom, but perhaps the examples below can be adapted for use in a geom.
library(ggplot2)
library(dplyr)
pgg = function(dat, x, y, facet=NULL, nudge=0.17) {
# Convert x-variable to a factor
dat[,x] = as.factor(dat[,x])
# Plot points
p = ggplot(dat, aes_string(x, y)) +
geom_point(position=position_jitter(w=0.3, h=0)) + theme_bw()
# Summarise data to get counts by x-variable and (if present) facet variables
dots = lapply(c(facet, x), as.symbol)
nn = dat %>% group_by_(.dots=dots) %>% tally
# If there are facets, add them to the plot
if (!is.null(facet)) {
p = p + facet_grid(paste("~", paste(facet, collapse="+")))
}
# Add counts as text labels
p = p + geom_text(data=nn, aes(label=paste0("N = ", nn$n)),
y=min(dat[,y]) - nudge*1.05*diff(range(dat[,y])),
colour="grey20", size=3.5) +
theme(axis.title.x=element_text(margin=unit(c(1.5,0,0,0),"lines")))
# Turn off clipping and return plot
p <- ggplot_gtable(ggplot_build(p))
p$layout$clip[p$layout$name=="panel"] <- "off"
grid.draw(p)
}
pgg(mtcars, "cyl", "mpg")
pgg(mtcars, "cyl", "mpg", facet=c("am","vs"))
Another, potentially more flexible, option is to add the counts to the bottom of the plot panel. For example:
pgg = function(dat, x, y, facet_r=NULL, facet_c=NULL) {
# Convert x-variable to a factor
dat[,x] = as.factor(dat[,x])
# Plot points
p = ggplot(dat, aes_string(x, y)) +
geom_point(position=position_jitter(w=0.3, h=0)) + theme_bw()
# Summarise data to get counts by x-variable and (if present) facet variables
dots = lapply(c(facet_r, facet_c, x), as.symbol)
nn = dat %>% group_by_(.dots=dots) %>% tally
# If there are facets, add them to the plot
if (!is.null(facet_r) | !is.null(facet_c)) {
facets = paste(ifelse(is.null(facet_r),".",facet_r), " ~ " ,
ifelse(is.null(facet_c),".",facet_c))
p = p + facet_grid(facets)
}
# Add counts as text labels
p + geom_text(data=nn, aes(label=paste0("N = ", nn$n)),
y=min(dat[,y]) - 0.15*min(dat[,y]), colour="grey20", size=3) +
scale_y_continuous(limits=range(dat[,y]) + c(-0.1*min(dat[,y]), 0.01*max(dat[,y])))
}
pgg(mtcars, "cyl", "mpg")
pgg(mtcars, "cyl", "mpg", facet_c="am")
pgg(mtcars, "cyl", "mpg", facet_c="am", facet_r="vs")