Combining two ggplot objects from different function calls? - r

I am currently trying to implement a graphing library where I need a bit more flexibility than what is currently provided by ggplot. I am interested in going in a functional programming kind of way.
Currently, I have a barchart which is defined as
make_bar <- function(data, x, n_cols)
{
#Data: Dataframe or tibble
#x: Factor singular column
#output: ggplot object
n_colors = nrow(distinct(data[x]))
if (n_colors != length(n_cols)) {
difference <- abs(n_colors - length(colors))
colors <- head(colors, difference)
}
plot <- ggplot(data, aes(x = .data[[x]],
tooltip = .data[[x]],
data_id = .data[[x]])) +
geom_bar_interactive(fill=custom_colour_palette(colors))
}
Which very nicely returns a bar chart. Now I want the functionality to write a function called "add_line" which should then be applied to the barchart if one wishes to do so. The line function as is right now is:
add_line <- function(data, x) {
data %>%
count(.data[[x]]) %>%
ggplot(aes(.data[[x]], n)) +
geom_line(group=1)
}
So now I have two lists, but is there any easy - or best practice - way to add such two lists to create one combined plot with the line overlayed on the barchart?
Code for reproducbility can be called with:
data <- mpg
h <- add_line(data, 'manufacturer')
x <- make_bar(data, 'manufacturer', 15)
# x + h ? does not work and shouldn't but such a functionality would be nice

Adding to what #MrFlick has said, here's how you return a geom object in add_line that can be added onto the base bar chart:
add_line <- function(data, x) {
geom_line(
aes_string(x = x, y = "n"),
data = count(data, .data[[x]]),
group = 1
)
}
Then the following should work:
x <- make_bar(mpg, "manufacturer", 15)
h <- add_line(mpg, "manufacturer")
x + h
The aes_string allows for using character strings rather than expressions, really useful for dynamic column choices.

Related

How to combine jobs to avoid nested lapply

I have a data frame where I would like to perform multiple operations with. Here I give you an example to illustrate it, for example to create a list of plots:
library(tidyverse)
plot_fun = function(data, geom) {
plot = ggplot(data, aes(x = factor(0), y = Sepal.Length))
if (geom == 'bar') {
plot = plot + geom_col()
} else if (geom == 'box') {
plot = plot + geom_boxplot()
}
plot +
labs(x = unique(data$Species)) +
theme_bw() +
theme(axis.text.x = element_blank())
}
As you can see, this function takes a data frame, and perform two types of plots depending the geom parameter.
In my real problem, I have to split the data frame by one or multiple factors, and do the job. Do not take care about this specific example (I know I can put iris$Species on x-axis)
iris_ls = split(iris, iris$Species)
geom_ls = c('bar', 'box')
lapply(geom_ls, function(g) {
lapply(iris_ls, function(x) {
plot_fun(x, g)
})
})
My problem is due if I want to create both types of plots, I have to write a nested lapply (bad performance for parallelization cases).
So my question is, how should I avoid nested lapply procedure?
Should I multiplicate length of iris_ls by the length of geom_ls vector?
I do not know how to asses this. Imagine I have multiple geom like parameters in my function.
PS: Using drop = TRUE on split function, does not drop factor levels for each element of the list, I don't not know if it's the correct way to do it. I have to use another lapply to do it
Use the purrr package :
cross_ls <- purrr::cross(list(iris = split(iris, iris$Species),
geom = list('bar', 'box')))
cross_ls %>% purrr::map(~{plot_fun(.x$iris,.x$geom)})
or in its parallel version :
library(furrr)
plan(multiprocess)
cross_ls %>% furrr::future_map(~{plot_fun(.x$iris,.x$geom)})

ggplot add horizontal line inside for loop [duplicate]

Summary: When I use a "for" loop to add layers to a violin plot (in ggplot), the only layer that is added is the one created by the final loop iteration. Yet in explicit code that mimics the code that the loop would produce, all the layers are added.
Details: I am trying to create violin graphs with overlapping layers, to show the extent that estimate distributions do or do not overlap for several survey question responses, stratified by place. I want to be able to include any number of places, so I have one column in by dataframe for each place, and am trying to use a "for" loop to generate one ggplot layer per place. But the loop only adds the layer from the loop's final iteration.
This code illustrates the problem, and some suggested approaches that failed:
library(ggplot2)
# Create a dataframe with 500 random normal values for responses to 3 survey questions from two cities
topic <- c("Poverty %","Mean Age","% Smokers")
place <- c("Chicago","Miami")
n <- 500
mean <- c(35, 40,58, 50, 25,20)
var <- c( 7, 1.5, 3, .25, .5, 1)
df <- data.frame( topic=rep(topic,rep(n,length(topic)))
,c(rnorm(n,mean[1],var[1]),rnorm(n,mean[3],var[3]),rnorm(n,mean[5],var[5]))
,c(rnorm(n,mean[2],var[2]),rnorm(n,mean[4],var[4]),rnorm(n,mean[6],var[6]))
)
names(df)[2:dim(df)[2]] <- place # Name those last two columns with the corresponding place name.
head(df)
# This "for" loop seems to only execute the final loop (i.e., where p=3)
g <- ggplot(df, aes(factor(topic), df[,2]))
for (p in 2:dim(df)[2]) {
g <- g + geom_violin(aes(y = df[,p], colour = place[p-1]), alpha = 0.3)
}
g
# But mimicing what the for loop does in explicit code works fine, resulting in both "place"s being displayed in the graph.
g <- ggplot(df, aes(factor(topic), df[,2]))
g <- g + geom_violin(aes(y = df[,2], colour = place[2-1]), alpha = 0.3)
g <- g + geom_violin(aes(y = df[,3], colour = place[3-1]), alpha = 0.3)
g
## per http://stackoverflow.com/questions/18444620/set-layers-in-ggplot2-via-loop , I tried
g <- ggplot(df, aes(factor(topic), df[,2]))
for (p in 2:dim(df)[2]) {
df1 <- df[,c(1,p)]
g <- g + geom_violin(aes(y = df1[,2], colour = place[p-1]), alpha = 0.3)
}
g
# but got the same undesired result
# per http://stackoverflow.com/questions/15987367/how-to-add-layers-in-ggplot-using-a-for-loop , I tried
g <- ggplot(df, aes(factor(topic), df[,2]))
for (p in names(df)[-1]) {
cat(p,"\n")
g <- g + geom_violin(aes_string(y = p, colour = p), alpha = 0.3) # produced this error: Error in unit(tic_pos.c, "mm") : 'x' and 'units' must have length > 0
# g <- g + geom_violin(aes_string(y = p ), alpha = 0.3) # produced this error: Error: stat_ydensity requires the following missing aesthetics: y
}
g
# but that failed to produce any graphic, per the errors noted in the "for" loop above
The reason this is happening is due to ggplot's "lazy evaluation". This is a common problem when ggplot is used this way (making the layers separately in a loop, rather than having ggplot to it for you, as in #hrbrmstr's solution).
ggplot stores the arguments to aes(...) as expressions, and only evaluates them when the plot is rendered. So, in your loops, something like
aes(y = df[,p], colour = place[p-1])
gets stored as is, and evaluated when you render the plot, after the loop completes. At this point, p=3 so all the plots are rendered with p=3.
So the "right" way to do this is to use melt(...) in the reshape2 package so convert your data from wide to long format, and let ggplot manage the layers for you. I put "right" in quotes because in this particular case there is a subtlety. When calculating the distributions for the violins using the melted data frame, ggplot uses the grand total (for both Chicago and Miami) as the scale. If you want violins based on frequency scaled individually, you need to use loops (sadly).
The way around the lazy evaluation problem is to put any reference to the loop index in the data=... definition. This is not stored as an expression, the actual data is stored in the plot definition. So you could do this:
g <- ggplot(df,aes(x=topic))
for (p in 2:length(df)) {
gg.data <- data.frame(topic=df$topic,value=df[,p],city=names(df)[p])
g <- g + geom_violin(data=gg.data,aes(y=value, color=city))
}
g
which gives the same result as yours. Note that the index p does not show up in aes(...).
Update: A note about scale="width" (mentioned in a comment). This causes all the violins to have the same width (see below), which is not the same scaling as in OP's original code. IMO this is not a great way to visualize the data, as it suggests there is much more data in the Chicago group.
ggplot(gg) +geom_violin(aes(x=topic,y=value,color=variable),
alpha=0.3,position="identity",scale="width")
You can do it w/o a loop:
df.2 <- melt(df)
gg <- ggplot(df.2, aes(x=topic, y=value))
gg <- gg + geom_violin(position="identity", aes(color=variable), alpha=0.3)
gg
You can use aes_() rather than aes(), which appears to stop the lazy evaluation. Answer found on a closed question that links here (Update a ggplot using a for loop (R)), but thought it should be here since the other question was closed.
While generally speaking, reshaping the data is always preferred, with newer version of ggplot2 (>3.0.0), you can use !! to inject values into the aes() For example you can do
g <- ggplot(df, aes(factor(topic), df[,2]))
for (p in 2:dim(df)[2]) {
g <- g + geom_violin(aes(y = df[,!!p], colour = place[!!p-1]), alpha = 0.3)
}
g
To get the desired result. The !! will force evaluation rather than remaining lazy as is the default.

how to pass an arguments to function to get a line plot using ggplot2?

I am trying to write a function to create time series plot (line graph). How do I pass an argument to function so that the plot is created? I tried different ways like using aes_string etc. but no success.
lineplotfun <- function(feature){
ggplot(aes(x = 1:length(feature), y = feature), data = mtcars) +
geom_line()
}
lineplotfun(mpg)
I want to pass mpg as string or name.
There are numerous problems with the code in the question.
1) y is not in aes()
2) if ggplot2 is loaded, mpg is a tibble
3) y = feature with data = mtcars is meaningless
4) 1:length(feature) only makes sense if feature is a vector
One way of achieving what you want is by setting data = NULL and pass a vector to the function:
lineplotfun <- function(feature){
require(ggplot2)
ggplot2::ggplot(data = NULL, aes(x = seq_along(feature), y = feature)) +
ggplot2::geom_line()
}
lineplotfun(mtcars$mpg)
The result is:

how to use aes_string for groups in ggplot2 inside a function when making boxplot

new to ggplot2, I've scoured the web but still couldn't figure this out.
I understand how to plot a boxplot in ggplot2, my problem is that I can't pass along the variable I use for groups when it is inside a function.
so, normally (i.e. NOT inside a function), I would write this:
ggplot(myData, aes(factor(Variable1), Variable2)) +
geom_boxplot(fill="grey", colour="black")+
labs(title = "Variable1 vs. Variable2" ) +
labs (x = "variable1", y = "Variable2")
Where myData is my data frame
Variable 1 is a 2 level factor variable
Variable 2 is a continuous variable
I want to make boxplots of Variable 1 by its 2 levels/groups
and this works fine,
but as soon as I write this inside a function I couldn't get it to work.
my attempt in writing the function:
myfunction = function (data, Variable1) {
ggplot(data=myData, aes_string(factor("Variable1"), "Variable2"))+
geom_boxplot(fill="grey", colour="black")+
labs(title = paste("Variable1 vs. Variable2" )) +
labs (x = "variable1", y = "Variable2")
}
this only gives me a single boxplot(instead of 2), as if it never understood the factor(Variable1) command (and did a single boxplot of the entire Variable 2, rather than separate them by Variable 1 level first, then boxplot them).
Aes_string evaluates the entire string, so if you do sprintf("factor(%s)",Variable1) you get the desired result. As a further remark: your function has a data-argument, but inside the plotting you use myData. I have also edited the x-lab and title, so that you can pass 'Variable3' and get proper labels.
With some example data:
set.seed(123)
dat <- data.frame(Variable2=rnorm(100),Variable1=c(0,1),Variable3=sample(0:1,100,T))
myfunction = function (data, Variable1) {
ggplot(data=data, aes_string(sprintf("factor(%s)",Variable1), "Variable2"))+
geom_boxplot(fill="grey", colour="black")+
labs(title = sprintf("%s and Variable2", Variable1)) +
labs (x = Variable1, y = "Variable2")
}
p1 <- myfunction(dat,"Variable1")
p2 <- myfunction(dat,"Variable3")

Remove a layer from a ggplot2 chart

I'd like to remove a layer (in this case the results of geom_ribbon) from a ggplot2 created grid object. Is there a way I can remove it once it's already part of the object?
library(ggplot2)
dat <- data.frame(x=1:3, y=1:3, ymin=0:2, ymax=2:4)
p <- ggplot(dat, aes(x=x, y=y)) + geom_ribbon(aes(ymin=ymin, ymax=ymax), alpha=0.3)
+ geom_line()
# This has the geom_ribbon
p
# This overlays another ribbon on top
p + geom_ribbon(aes(ymin=ymin, ymax=ymax, fill=NA))
I'd like this functionality to allow me to build more complicated plots on top of less complicated ones. I am using functions that return a grid object and then printing out the final plot once it is fully assembled. The base plot has a single line with a corresponding error bar (geom_ribbon) surrounding it. The more complicated plot will have several lines and the multiple overlapping geom_ribbon objects are distracting. I'd like to remove them from the plots with multiple lines. Additionally, I'll be able to quickly create alternative versions using facets or other ggplot2 functionality.
Edit: Accepting #mnel's answer as it works. Now I need to determine how to dynamically access the geom_ribbon layer, which is captured in the SO question here.
Edit 2: For completeness, this is the function I created to solve this problem:
remove_geom <- function(ggplot2_object, geom_type) {
layers <- lapply(ggplot2_object$layers, function(x) if(x$geom$objname == geom_type) NULL else x)
layers <- layers[!sapply(layers, is.null)]
ggplot2_object$layers <- layers
ggplot2_object
}
Edit 3: See the accepted answer below for the latest versions of ggplot (>=2.x.y)
For ggplot2 version 2.2.1, I had to modify the proposed remove_geom function like this:
remove_geom <- function(ggplot2_object, geom_type) {
# Delete layers that match the requested type.
layers <- lapply(ggplot2_object$layers, function(x) {
if (class(x$geom)[1] == geom_type) {
NULL
} else {
x
}
})
# Delete the unwanted layers.
layers <- layers[!sapply(layers, is.null)]
ggplot2_object$layers <- layers
ggplot2_object
}
Here's an example of how to use it:
library(ggplot2)
set.seed(3000)
d <- data.frame(
x = runif(10),
y = runif(10),
label = sprintf("label%s", 1:10)
)
p <- ggplot(d, aes(x, y, label = label)) + geom_point() + geom_text()
Let's show the original plot:
p
Now let's remove the labels and show the plot again:
p <- remove_geom(p, "GeomText")
p
If you look at
p$layers
[[1]]
mapping: ymin = ymin, ymax = ymax
geom_ribbon: na.rm = FALSE, alpha = 0.3
stat_identity:
position_identity: (width = NULL, height = NULL)
[[2]]
geom_line:
stat_identity:
position_identity: (width = NULL, height = NULL)
You will see that you want to remove the first layer
You can do this by redefining the layers as just the second component in the list.
p$layer <- p$layer[2]
Now build and plot p
p
Note that p$layer[[1]] <- NULL would work as well. I agree with #Andrie and #Joran's comments regarding in wehat cases this might be useful, and would not expect this to be necessarily reliable.
As this problem looked interesting, I have expanded my 'ggpmisc' package with functions to manipulate the layers in a ggplot object (currently in package 'gginnards'). The functions are more polished versions of the example in my earlier answer to this same question. However, be aware that in most cases this is not the best way of working as it violates the Grammar of Graphics. In most cases one can assemble different variations of the same figure in the normal way with operator +, possibly "packaging" groups of layers into lists to have combined building blocks that can simplify the assembly of complex figures. Exceptionally we may want to edit an existing plot or a plot output by a higher level function that whose definition we cannot modify. In such cases these layer manipulation functions can be useful. The example above becomes.
library(gginnards)
p1 <- delete_layers(p, match_type = "GeomText")
See the documentation of the package for other examples, and for information on the companion functions useful for modifying the ordering of layers, and for inserting new layers at arbitrary positions.
#Kamil Slowikowski Thanks! Very useful. However I could not stop myself from creating a new variation on the same theme... hopefully easier to understand than that in the original post or the updated version by Kamil, also avoiding some assignments.
remove_geoms <- function(x, geom_type) {
# Find layers that match the requested type.
selector <- sapply(x$layers,
function(y) {
class(y$geom)[1] == geom_type
})
# Delete the layers.
x$layers[selector] <- NULL
x
}
This version is functionally identical to Kamil's function, so the usage example above does not need to be repeated here.
As an aside, this function can be easily adapted to select the layers based on the class of the stat instead of the class of the geom.
remove_stats <- function(x, stat_type) {
# Find layers that match the requested type.
selector <- sapply(x$layers,
function(y) {
class(y$stat)[1] == stat_type
})
# Delete the layers.
x$layers[selector] <- NULL
x
}
#Kamil and #Pedro Thanks a lot! For those interested, one can also augment Pedro's function to select only specific layers, as shown here with a last_only argument:
remove_geoms <- function(x, geom_type, last_only = T) {
# Find layers that match the requested type.
selector <- sapply(x$layers,
function(y) {
class(y$geom)[1] == geom_type
})
if(last_only)
selector <- max(which(selector))
# Delete the layers.
x$layers[selector] <- NULL
x
}
Coming back to #Kamil's example plot:
set.seed(3000)
d <- data.frame(
x = runif(10),
y = runif(10),
label = sprintf("label%s", 1:10)
)
p <- ggplot(d, aes(x, y, label = label)) + geom_point() + geom_point(color = "green") + geom_point(size = 5, color = "red")
p
p %>% remove_geoms("GeomPoint")
p %>% remove_geoms("GeomPoint") %>% remove_geoms("GeomPoint")

Resources