Related
I am trying to run a for loop over a factor level (treatment in this case) to plot graphs for each of the levels using a function. My goal is to obtain several graphs on my wd(), one for each treatment level.
Problem: The outcome is always one single messed up barplot with all the variables and errorbars included.
dataset looks something like this:
set.seed(108) test <- data.frame(
n = 1:12,
treatment = factor(paste("trt", 1:2)),
rep= factor(paste("rep", 1:2)),
type = sample(LETTERS, 3),
mean= sample(1:100, 12),
sd= sample(1:50, 12),
var3 = sample(1:100, 12),
var4 = sample(1:100, 12))
I believe that I'm missing something on my for loop code:
df$treatment<- as.factor(df$treatment)
treatment_levels<- unique(levels(df$treatment))
for(i in 1:length(treatment_levels)){
df <- df[treatment_levels[i],]
x <- df$type
avg <- df$mean
sd <- df$sd
grp<- df$rep
title<- treatment_levels[i]
xtitle<- "type"
ytitle<- " "
fig_name <- paste(title,"_bp")
bpfunction(df, x, avg, sd, grp, title, xtitle, ytitle, fig_name)
}
my function to plot a barplot is:
bpfunction(df, x, avg, sd, grp, title, xtitle, ytitle, fig_name)
{
bp <- ggplot(df, aes(x = x, y = avg, fill = grp)) +
geom_bar(stat = 'identity', aes(fill = grp), size = 1) +
geom_errorbar(aes(ymin=avg-sd , ymax=avg +sd))+
labs(x = x, y = avg, title = title)
ggsave(paste(fig_name, "png", sep = "."), plot = bp)
}
Several issues with your implementation:
Currently, your row indexing is not logical but factor value:
df <- df[treatment_levels[i],]
Therefore, adjust to proper filtering by column:
sub_df <- df[df$treatment == treatment_levels[i],]
Better yet, avoid bookkeeping of unique treatment factor levels and use by (object-oriented wrapper to tapply) or split + lapply.
Reassigning df in a for loop. After first iteration, df can no longer be subsetted for other treatment values. Therefore, use a different object name. (Actually, avoid df altogether for more substantive name).
Using a numeric vector of many values as labels per your labs argument.
Avoid passing vectors pointing to data frame columns into aes. Instead, pass string variables to be dynamically rendered with .data[[]] or double curly brace {{}} (ggplot2 v3.0.0+):
bpfunction(treatment_df, x, avg, sd, grp, title, xtitle, ytitle, fig_name)
{
bp <- ggplot(treatment_df, aes(x = .data[[x]], y = .data[[avg]], fill = .data[[grp]])) +
geom_bar(stat = 'identity', aes(fill = .data[[grp]]), size = 1) +
geom_errorbar(aes(ymin={{avg}}-{{sd}}, ymax={{avg}}+{{sd}})) +
labs(x = x, y = avg, title = title)
ggsave(fig_name, plot = bp)
return(bp)
}
# REFACTOR USING by
treatment_plots <- by(df, df$treatment, function(sub_df)
bpfunction(
sub_df,
x = "type",
avg = "mean",
sd = "sd",
grp = "rep",
title sub_df$treatment[1],
xtitle = "type",
ytitle = " ",
fig_name = paste0(sub_df$treatment[1], "_bp.png")
)
)
# REFACTOR USING split + lapply
treatment_plots <- split(df, df$treatment) |> lapply(
function(sub_df) bp_function(...same as above...)
)
I'm trying to add multiple lines to a saved ggplot object. The coordinates for the lines are stored on a list of dataframes, one data frame for each individual plot. I successfully have created multiple plots using lapply, however, the code fails when calling the geom_segment. Below example data and code.
library(ggplot2)
library(tidyverse)
data(iris)
#Dataframes
m.slen <- iris[,c(1,5)]
m.swid <- iris[,c(2,5)]
m.plen <- iris[,c(3,5)]
m.pwid <- iris[,c(4,5)]
#List of dataframes
m.list = list(m.slen = m.slen,
m.swid = m.swid,
m.plen = m.plen,
m.pwid = m.pwid)
#Setting col names
m.list <- lapply(m.list, setNames, nm = c("data", "species"))
#Creating list of data frames with coordinates for geom_segment
meanV = lapply(m.list, function(x) mean(x$data, na.rm = TRUE))
coordy1 = lapply(m.list, function(x) x %>%
group_by(species) %>%
summarise(max = max(data, na.rm=TRUE)) %>%
pull(max) + 2)
#Table with dynamic values
line.plot <- list()
for(i in 1:4) {
line.plot[[i]] <-
tibble(x1 = meanV[[i]],
x2 = meanV[[i]]+1,
y1 = coordy1[[i]][1],
y2 = coordy1[[i]][1])
}
#Creating first set of plots, using first list of DFs
plots <- lapply(m.list,function(x)
p <- ggplot(x, aes( x= data, fill = species)) +
geom_histogram(stat = "count") +
ggtitle(names(m.list)))
print(plots)
#Adding segments using second list of DFs
final_plots <- lapply(plots,function(x)
plots + geom_segment(data = line.plot,
aes(x = x1, y = y1, xend = x2, yend = y2)))
Everything works until the last step, I get the following error
Error in fortify(): ! data must be a data frame, or other
object coercible by fortify(), not a list
Any input or advice is welcome. Thanks
The issue is that line.plot is a list. To achieve your desired result you could use purrr::map2 to loop over both your list of plots and the list of dataframes for the segments:
Note: I also added inherit.aes = FALSE to geom_segment because otherwise you will get an error, too.
final_plots <- purrr::map2(plots, line.plot, function(x, y) {
x + geom_segment(
data = y,
aes(x = x1, y = y1, xend = x2, yend = y2), inherit.aes = FALSE
)
})
final_plots[[1]]
EDIT Using base R you could achieve the same result with mapply:
final_plots <- mapply(function(x, y) {
x + geom_segment(
data = y,
aes(x = x1, y = y1, xend = x2, yend = y2), inherit.aes = FALSE
)
}, x = plots, y = line.plot, SIMPLIFY = FALSE)
or thanks to the comment by #Parfait using Map:
final_plots <- Map(function(x, y) {
x + geom_segment(
data = y,
aes(x = x1, y = y1, xend = x2, yend = y2), inherit.aes = FALSE
)
}, x = plots, y = line.plot)
I have
a data.frame df
df = data.frame(year=c(2018,2019,2020), value1=rnorm(3,1,0.5), value2=rnorm(3,2,0.5)
a ggplot-function called ScatterPlot (function code see below)
a for loop that I want to use to run the ggplot-function over my df
My intent is to plot (scatter) value1 over years and value2 over years somewhat automatically (using scatterplot function and my for loop).
For some reason, the for loop below only generates one plot (the last one in my df). Can someone tell me what I am missing?
for loop:
# Create the loop.vector (all the columns)
loop.vector <- ncol(df)-1
for (i in loop.vector) { # Loop over loop.vector
# store data in column.i as x
x <- df[i]
x = unlist(x) #necessary. otherwise ggplot will generate an error
plotname = colnames(df[i])
#plot
jpeg(filename=paste0("/R-Outputs/plots/",plotname,".jpeg"))
plot= ScatterPlot(df,df$year,"year", x, plotname)
print(plot)
dev.off()
}
Scatterplot Function (this works):
ScatterPlot <- function(df, x, x_var_label,y, y_var_label) {
# Input:
# df: a data frame
# x: a column from df in the form of a character vector
# y: a column from df in the form of a character vector
#
# Output:
# a ggplot2 plot
require(ggplot2)
x_title = x_var_label
y_title = y_var_label
time_labels = c("2018", "2019", "2020")
ggplot(data = df, aes(x = x, y = y)) +
geom_point(col="#69b3a2",fill="#69b3a2",alpha=0.5, size = 0) +
geom_line()+
geom_smooth(method = "lm", se = FALSE, size = 0.8, col="red") +
xlab(label = x_title) +
ylab(label = y_title) +
theme_bw()+
theme(axis.text.x=element_text(angle=45, hjust = 1))+
labs(title = paste0(y_title," over time"))+
scale_x_continuous("year", labels = as.character(time_labels),
breaks = as.integer((time_labels)))
}
You don't need to pass both values as well as column name. Pass only the column name in the function ScatterPlot.
library(ggplot2)
ScatterPlot <- function(df, x_var_label,y_var_label) {
# Input:
# df: a data frame
# x: a column from df in the form of a character vector
# y: a column from df in the form of a character vector
#
# Output:
# a ggplot2 plot
time_labels = c("2018", "2019", "2020")
ggplot(data = df, aes(x = .data[[x_var_label]], y = .data[[y_var_label]])) +
geom_point(col="#69b3a2",fill="#69b3a2",alpha=0.5, size = 0) +
geom_line()+
geom_smooth(method = "lm", se = FALSE, size = 0.8, col="red") +
xlab(label = x_var_label) +
ylab(label = y_var_label) +
theme_bw()+
theme(axis.text.x=element_text(angle=45, hjust = 1))+
labs(title = paste0(y_var_label," over time"))+
scale_x_continuous("year", labels = time_labels,
breaks = as.integer(time_labels))
}
To call this function in a loop something like this should work.
#column names to loop over
loop.vector <- names(df[-1])
plot <- vector('list', length(loop.vector))
for (i in seq_along(loop.vector)) { # Loop over loop.vector
jpeg(filename=paste0("/R-Outputs/plots/",loop.vector[i],".jpeg"))
plot[[i]] = ScatterPlot(df,"year", loop.vector[i])
print(plot[[i]])
dev.off()
}
We are also saving individual plots in a list which you can verify with plot[[1]], plot[[2]] etc.
This question is related to
Create custom geom to compute summary statistics and display them *outside* the plotting region
(NOTE: All functions have been simplified; no error checks for correct objects types, NAs, etc.)
In base R, it is quite easy to create a function that produces a stripchart with the sample size indicated below each level of the grouping variable: you can add the sample size information using the mtext() function:
stripchart_w_n_ver1 <- function(data, x.var, y.var) {
x <- factor(data[, x.var])
y <- data[, y.var]
# Need to call plot.default() instead of plot because
# plot() produces boxplots when x is a factor.
plot.default(x, y, xaxt = "n", xlab = x.var, ylab = y.var)
levels.x <- levels(x)
x.ticks <- 1:length(levels(x))
axis(1, at = x.ticks, labels = levels.x)
n <- sapply(split(y, x), length)
mtext(paste0("N=", n), side = 1, line = 2, at = x.ticks)
}
stripchart_w_n_ver1(mtcars, "cyl", "mpg")
or you can add the sample size information to the x-axis tick labels using the axis() function:
stripchart_w_n_ver2 <- function(data, x.var, y.var) {
x <- factor(data[, x.var])
y <- data[, y.var]
# Need to set the second element of mgp to 1.5
# to allow room for two lines for the x-axis tick labels.
o.par <- par(mgp = c(3, 1.5, 0))
on.exit(par(o.par))
# Need to call plot.default() instead of plot because
# plot() produces boxplots when x is a factor.
plot.default(x, y, xaxt = "n", xlab = x.var, ylab = y.var)
n <- sapply(split(y, x), length)
levels.x <- levels(x)
axis(1, at = 1:length(levels.x), labels = paste0(levels.x, "\nN=", n))
}
stripchart_w_n_ver2(mtcars, "cyl", "mpg")
While this is a very easy task in base R, it is maddingly complex in ggplot2 because it is very hard to get at the data being used to generate the plot, and while there are functions equivalent to axis() (e.g., scale_x_discrete, etc.) there is no equivalent to mtext() that lets you easily place text at specified coordinates within the margins.
I tried using the built in stat_summary() function to compute the sample sizes (i.e., fun.y = "length") and then place that information on the x-axis tick labels, but as far as I can tell, you can't extract the sample sizes and then somehow add them to the x-axis tick labels using the function scale_x_discrete(), you have to tell stat_summary() what geom you want it to use. You could set geom="text", but then you have to supply the labels, and the point is that the labels should be the values of the sample sizes, which is what stat_summary() is computing but which you can't get at (and you would also have to specify where you want the text to be placed, and again, it is difficult to figure out where to place it so that it lies directly underneath the x-axis tick labels).
The vignette "Extending ggplot2" (http://docs.ggplot2.org/dev/vignettes/extending-ggplot2.html) shows you how to create your own stat function that allows you to get directly at the data, but the problem is that you always have to define a geom to go with your stat function (i.e., ggplot thinks you want to plot this information within the plot, not in the margins); as far as I can tell, you can't take the information you compute in your custom stat function, not plot anything in the plot area, and instead pass the information to a scales function like scale_x_discrete(). Here was my try at doing it this way; the best I could do was place the sample size information at the minimum value of y for each group:
StatN <- ggproto("StatN", Stat,
required_aes = c("x", "y"),
compute_group = function(data, scales) {
y <- data$y
y <- y[!is.na(y)]
n <- length(y)
data.frame(x = data$x[1], y = min(y), label = paste0("n=", n))
}
)
stat_n <- function(mapping = NULL, data = NULL, geom = "text",
position = "identity", inherit.aes = TRUE, show.legend = NA,
na.rm = FALSE, ...) {
ggplot2::layer(stat = StatN, mapping = mapping, data = data, geom = geom,
position = position, inherit.aes = inherit.aes, show.legend = show.legend,
params = list(na.rm = na.rm, ...))
}
ggplot(mtcars, aes(x = factor(cyl), y = mpg)) + geom_point() + stat_n()
I thought I had solved the problem by simply creating a wrapper function to ggplot:
ggstripchart <- function(data, x.name, y.name,
point.params = list(),
x.axis.params = list(labels = levels(x)),
y.axis.params = list(), ...) {
if(!is.factor(data[, x.name]))
data[, x.name] <- factor(data[, x.name])
x <- data[, x.name]
y <- data[, y.name]
params <- list(...)
point.params <- modifyList(params, point.params)
x.axis.params <- modifyList(params, x.axis.params)
y.axis.params <- modifyList(params, y.axis.params)
point <- do.call("geom_point", point.params)
stripchart.list <- list(
point,
theme(legend.position = "none")
)
n <- sapply(split(y, x), length)
x.axis.params$labels <- paste0(x.axis.params$labels, "\nN=", n)
x.axis <- do.call("scale_x_discrete", x.axis.params)
y.axis <- do.call("scale_y_continuous", y.axis.params)
stripchart.list <- c(stripchart.list, x.axis, y.axis)
ggplot(data = data, mapping = aes_string(x = x.name, y = y.name)) + stripchart.list
}
ggstripchart(mtcars, "cyl", "mpg")
However, this function does not work correctly with faceting. For example:
ggstripchart(mtcars, "cyl", "mpg") + facet_wrap(~am)
shows the the sample sizes for both facets combined for each facet. I would have to build faceting into the wrapper function, which defeats the point of trying to use everything ggplot has to offer.
If anyone has any insights to this problem I would be grateful. Thanks so much for your time!
I have updated the EnvStats
package to include a stat called stat_n_text which will add the sample size (the number of unique y-values) below each unique x-value. See the help file for stat_n_text for more information and a list of examples. Below is a simple example:
library(ggplot2)
library(EnvStats)
p <- ggplot(mtcars,
aes(x = factor(cyl), y = mpg, color = factor(cyl))) +
theme(legend.position = "none")
p + geom_point() +
stat_n_text() +
labs(x = "Number of Cylinders", y = "Miles per Gallon")
My solution might be a little simple but it works well.
Given an example with faceting by am I start by creating labels
using paste and \n.
mtcars2 <- mtcars %>%
group_by(cyl, am) %>% mutate(n = n()) %>%
mutate(label = paste0(cyl,'\nN = ',n))
I then use these labels instead of cyl in the ggplot code
ggplot(mtcars2,
aes(x = factor(label), y = mpg, color = factor(label))) +
geom_point() +
xlab('cyl') +
facet_wrap(~am, scales = 'free_x') +
theme(legend.position = "none")
To produce something like the figure below.
You can print the counts below the x-axis labels using geom_text if you turn off clipping, but you'll probably have to tweak the placement. I've included a "nudge" parameter for that in the code below. Also, the method below is intended for cases where all the facets (if any) are column facets.
I realize you ultimately want code that will work inside a new geom, but perhaps the examples below can be adapted for use in a geom.
library(ggplot2)
library(dplyr)
pgg = function(dat, x, y, facet=NULL, nudge=0.17) {
# Convert x-variable to a factor
dat[,x] = as.factor(dat[,x])
# Plot points
p = ggplot(dat, aes_string(x, y)) +
geom_point(position=position_jitter(w=0.3, h=0)) + theme_bw()
# Summarise data to get counts by x-variable and (if present) facet variables
dots = lapply(c(facet, x), as.symbol)
nn = dat %>% group_by_(.dots=dots) %>% tally
# If there are facets, add them to the plot
if (!is.null(facet)) {
p = p + facet_grid(paste("~", paste(facet, collapse="+")))
}
# Add counts as text labels
p = p + geom_text(data=nn, aes(label=paste0("N = ", nn$n)),
y=min(dat[,y]) - nudge*1.05*diff(range(dat[,y])),
colour="grey20", size=3.5) +
theme(axis.title.x=element_text(margin=unit(c(1.5,0,0,0),"lines")))
# Turn off clipping and return plot
p <- ggplot_gtable(ggplot_build(p))
p$layout$clip[p$layout$name=="panel"] <- "off"
grid.draw(p)
}
pgg(mtcars, "cyl", "mpg")
pgg(mtcars, "cyl", "mpg", facet=c("am","vs"))
Another, potentially more flexible, option is to add the counts to the bottom of the plot panel. For example:
pgg = function(dat, x, y, facet_r=NULL, facet_c=NULL) {
# Convert x-variable to a factor
dat[,x] = as.factor(dat[,x])
# Plot points
p = ggplot(dat, aes_string(x, y)) +
geom_point(position=position_jitter(w=0.3, h=0)) + theme_bw()
# Summarise data to get counts by x-variable and (if present) facet variables
dots = lapply(c(facet_r, facet_c, x), as.symbol)
nn = dat %>% group_by_(.dots=dots) %>% tally
# If there are facets, add them to the plot
if (!is.null(facet_r) | !is.null(facet_c)) {
facets = paste(ifelse(is.null(facet_r),".",facet_r), " ~ " ,
ifelse(is.null(facet_c),".",facet_c))
p = p + facet_grid(facets)
}
# Add counts as text labels
p + geom_text(data=nn, aes(label=paste0("N = ", nn$n)),
y=min(dat[,y]) - 0.15*min(dat[,y]), colour="grey20", size=3) +
scale_y_continuous(limits=range(dat[,y]) + c(-0.1*min(dat[,y]), 0.01*max(dat[,y])))
}
pgg(mtcars, "cyl", "mpg")
pgg(mtcars, "cyl", "mpg", facet_c="am")
pgg(mtcars, "cyl", "mpg", facet_c="am", facet_r="vs")
I am writing a function to plot heat map for users. In the following example, it plots the change of grade over time for different gender.
However, this is a special case. "Gender" may have other name like "Class".
I will let user input their specific name and then make ggplot have the right label for each axis.
How do I modify my function "heatmap()" based on what I need?
sampledata <- matrix(c(1:60,1:60,rep(0:1,each=60),sample(1:3,120,replace = T)),ncol=3)
colnames(sampledata) <- c("Time","Gender","Grade")
sampledata <- data.frame(sampledata)
heatmap=function(sampledata,Gender)
{
sampledata$Time <- factor(sampledata$Time)
sampledata$Grade <- factor(sampledata$Grade)
sampledata$Gender <- factor(sampledata$Gender)
color_palette <- colorRampPalette(c("#31a354","#2c7fb8", "#fcbfb8","#f03b20"))(length((levels(factor(sampledata$Grade)))))
ggplot(data = sampledata) + geom_tile( aes(x = Time, y = Gender, fill = Grade))+scale_x_discrete(breaks = c("10","20","30","40","50"))+scale_fill_manual(values =color_palette,labels=c("0-1","1-2","2-3","3-4","4-5","5-6",">6"))+ theme_bw()+scale_y_discrete(labels=c("Female","Male"))
}
The easiest solution is redefining the function using aes_string.
When the function is called, you need to pass it the name of the column
you want to use as a string.
heatmap=function(sampledata,y)
{
sampledata$Time <- factor(sampledata$Time)
sampledata$Grade <- factor(sampledata$Grade)
sampledata$new_var <- factor(sampledata[,y])
color_palette <- colorRampPalette(c("#31a354","#2c7fb8", "#fcbfb8","#f03b20"))(length((levels(factor(sampledata$Grade)))))
ggplot(data = sampledata) + geom_tile( aes_string(x = "Time", y = "new_var", fill = "Grade"))+scale_x_discrete(breaks = c("10","20","30","40","50"))+scale_fill_manual(values =color_palette,labels=c("0-1","1-2","2-3","3-4","4-5","5-6",">6"))+ theme_bw()+scale_y_discrete(labels=c("Female","Male")) + ylab(y)
}
# Below an example of how you call the newly defined function
heatmap(sampledata, "Gender")
Alternatively if you want to retain the quote free syntax, there is a slightly more complex solution:
heatmap=function(sampledata,y)
{
arguments <- as.list(match.call())
axis_label <- deparse(substitute(y))
y = eval(arguments$y, sampledata)
sampledata$Time <- factor(sampledata$Time)
sampledata$Grade <- factor(sampledata$Grade)
sampledata$y <- factor(y)
color_palette <- colorRampPalette(c("#31a354","#2c7fb8", "#fcbfb8","#f03b20"))(length((levels(factor(sampledata$Grade)))))
ggplot(data = sampledata) + geom_tile( aes(x = Time, y = y, fill = Grade))+scale_x_discrete(breaks = c("10","20","30","40","50"))+scale_fill_manual(values =color_palette,labels=c("0-1","1-2","2-3","3-4","4-5","5-6",">6"))+ theme_bw()+scale_y_discrete(labels=c("Female","Male")) + ylab(axis_label)
}
# Below an example of how you call the newly defined function
heatmap(sampledata, Gender)