Changing legend order in ggplot within a function - r

I want to plot a data frame within a function. The legend should be ordered in a specific way. To keep things simple in my example I just reverse the order. I actually want to select a specific row and push it to the last position of the legend.
By the way I am creating a new R package, if this is in any way relevant.
Plotting outside of a function
attach(iris)
library(ggplot2)
# This is a normal plot
p <- ggplot(iris, aes(x = Sepal.Width, y = Sepal.Length, fill = Species) ) +
geom_bar( stat = "identity")
p
# This is a plot with reversed legend
iris$Species <- factor(iris$Species, levels = rev(levels(iris$Species)))
p <- ggplot(iris, aes(x = Sepal.Width, y = Sepal.Length, fill = Species) ) +
geom_bar( stat = "identity")
p
Plotting inside a function
The most simple approach was just to use variables, which obviously doesn't work
f1 <- function(myvariable) {
iris$myvariable <- factor(iris$myvariable, levels = rev(levels(iris$myvariable)))
p <- ggplot(iris, aes(x = Sepal.Width, y = Sepal.Length, fill = Species) ) +
geom_bar( stat = "identity")
p
}
f1("Species")
#> Error in `$<-.data.frame`(`*tmp*`, myvariable, value = integer(0)) :
replacement has 0 rows, data has 150
I tried to use quasiquotation, but this approach only let me plot the data frame. I cannot reverse the order yet.
library(rlang)
# Only plotting works
f2 <- function(myvariable) {
v1 <- ensym(myvariable)
p <- ggplot(iris, aes(x = Sepal.Width, y = Sepal.Length, fill = eval(expr(`$`(iris, !!v1))))) +
geom_bar( stat = "identity")
p
}
f2("Species")
# This crashes
f3 <- function(myvariable) {
v1 <- ensym(myvariable)
expr(`$`(iris, !!v1)) <- factor(expr(`$`(iris, !!v1)), levels = rev(levels(expr(`$`(iris, !!v1)))))
p <- ggplot(iris, aes(x = Sepal.Width, y = Sepal.Length, fill = eval(expr(`$`(iris, !!v1))))) +
geom_bar( stat = "identity")
p
}
f3("Species")
#> Error in `*tmp*`$!!v1 : invalid subscript type 'language'
So the main problem is, that I cannot assign something using quasiquotation.

A couple of things:
You can use [[ instead of $ to access data frame columns programmatically.
You can use ggplot's aes_string to minimize notes during R CMD check (since you mentioned you're doing a package).
You can "send" a factor level to the end with fct_relevel from package forcats.
Which translates to:
f <- function(df, var) {
lev <- levels(df[[var]])
df[[var]] <- forcats::fct_relevel(df[[var]], lev[1L], after = length(lev) - 1L)
ggplot(df, aes_string(x = "Sepal.Width", y = "Sepal.Length", fill = var)) +
geom_bar(stat = "identity")
}
f(iris, "Species")

Related

How to conditionally choose which variables to include in a plot (ggplot2)? [duplicate]

This question already has answers here:
How to use a variable to specify column name in ggplot
(6 answers)
Closed last month.
I would like to create a function that takes a column name and creates a plot based on that. For example, I want to be able to plot the Petal.Length column of the iris dataset against other variables. The way to do it without indirection is
ggplot(data = iris) + geom_point(x = Petal.Width, y = Petal.Length)
This is the plot I would like to replicate through indirection, but none of the following attempts work. These two return similar undesired plots:
colname = "Petal.Width"
ggplot(data = iris) + geom_point(x = colname, y = Petal.Length)
ggplot(data = iris) + geom_point(x = {{colname}}, y = Petal.Length)
The following attempt does not work either, it returns an error:
ggplot(data = iris) + geom_point(aes(x = !!!rlang::syms(colname), y = Petal.Length))
#> Warning in geom_point(aes(x = !!!rlang::syms(colname_x), y = Petal.Length)):
#> Ignoring unknown aesthetics:
#> Error in `geom_point()`:
#> ! Problem while setting up geom.
#> ℹ Error occurred in the 1st layer.
#> Caused by error in `compute_geom_1()`:
#> ! `geom_point()` requires the following missing aesthetics: x
Any hint on how we could do this? The idea is to have a function that is able to plot that kind of chart just by giving a string corresponding to one x variable that appears in the dataset.
Using the .data pronoun and by wrapping inside aes() you could do:
library(ggplot2)
colname <- "Petal.Width"
ggplot(data = iris) +
geom_point(aes(x = .data[[colname]], y = Petal.Length))
Your plot #2 is invariant in x because it takes "Petal.Width" as a literal value (as in data.frame(x="Petal.Width")), obviously not what you intend.
There are a few ways to work with programmatic variables:
We can use the .data pronoun in ggplot, as in
var1 <- "mpg"
var2 <- "disp"
ggplot(mtcars, aes(x = .data[[var1]], y = .data[[var2]])) +
geom_point()
We can use rlang::quo and !! for interactive work:
x <- rlang::quo(mpg)
y <- rlang::quo(disp)
ggplot(mtcars, aes(x = !!x, y = !!y)) +
geom_point()
If in a function, we can enquo (and !!) them:
fun <- function(data, x, y) {
x <- rlang::enquo(x)
y <- rlang::enquo(y)
ggplot(data, aes(!!x, !!y)) +
geom_point()
}
fun(mtcars, mpg, disp)
another (more clumsy) approach with base R:
colname = "Petal.Width"
ggplot(data = iris) +
geom_point(aes(x = eval(parse(text = colname)), y = Petal.Length)
)

Create graphs by group using ggplot in R

I'm relatively new to using ggplot2 in R and have been struggling with this for awhile. I have figured out how to get everything from one data frame on a graph (that is pretty easy...), and how to write a loop function to get each observation (id in the example below) onto their own graphs but not how to create separate graphs with multiple id per group, when the id and group can change each time I run the code. Here is some sample data and the output I am trying to produce.
x <- c(1,3,6,12,24,48,72,1,3,6,12,24,48,72,1,3,6,12,24,48,72,1,3,6,12,24,48,72)
y <- c(8,27,67,193,271,294,300,10,30,70,195,280,300,310,5,25,60,185,250,275,300,15,40,80,225,275,325,330)
group <- c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2)
id <- c(100,100,100,100,100,100,100,101,101,101,101,101,101,101,102,102,102,102,102,102,102,103,103,103,103,103,103,103)
df <- data.frame(x,y,group,id)
Similar questions were asked here and here but I still can't figure out how to do what I need because I need separate graphs (not facets) by group with multiple id on the same graph.
Edit to add attempt -
l <- unique(df$group)
for(l in df$group){
print(ggplot(df, aes(x = x, y = y, group = group, color = id))+
geom_line())
}
To achieve your desired result
Split your dataframe by group using e.g. split
Use lapply to loop over the list of splitted data frames to create your plots or if you want to add the group labels to the title you could loop over names(df_split).
Note: I converted the id variable to factor. Also, you have to map id on the group aesthetic to get lines per group. However, as your x variable is a numeric there is actually no need for the group aesthetic.
library(ggplot2)
df_split <- split(df, df$group)
lapply(df_split, function(df) {
ggplot(df, aes(x = x, y = y, group = id, color = factor(id))) +
geom_line()
})
lapply(names(df_split), function(i) {
ggplot(df_split[[i]], aes(x = x, y = y, group = id, color = factor(id))) +
geom_line() +
labs(title = paste("group =", i))
})
#> [[1]]
#>
#> [[2]]
And even I if would recommend to use lapply the same could be achieved using a for loop like so:
for (i in names(df_split)) {
print(
ggplot(df_split[[i]], aes(x = x, y = y, group = id, color = factor(id))) +
geom_line() +
labs(title = paste("group =", i))
)
}
Use facet_grid() or facet_wrap()
library(ggplot2)
ggplot(df, aes(x= x, y=y, colour= factor(id))) + geom_line() + facet_grid(group ~ .)
Edit: OP clarifies in comments they want separate graphs, not faceting
# 1
ggplot2(df[df$group == 1,], aes(x= x, y=y, colour= factor(id))) + geom_line()
# 2
ggplot2(df[df$group == 2,], aes(x= x, y=y, colour= factor(id))) + geom_line()

Passing argument to facet grid in function -ggplot

I am trying to write a function to plot graphs in a grid. I am using ggplot and facet grid. I am unable to pass the argument for facet grid. I wonder if anybody can point me in the right direction.
The data example:
Year = as.factor(rep(c("01", "02"), each = 4, times = 1))
Group = as.factor(rep(c("G1", "G2"), each = 2, times = 2))
Gender = as.factor(rep(c("Male", "Female"), times = 4))
Percentage = as.integer(c("80","20","50","50","45","55","15","85"))
df1 = data.frame (Year, Group, Gender, Percentage)
The code for the grid plot without function is:
p = ggplot(data=df1, aes(x=Year, y=Percentage, fill = Gender)) + geom_bar(stat = "identity")
p = p + facet_grid(~ Group, scales = 'free')
p
This produces a plot like the ones I want to do. However, when I put it into a function:
MyGridPlot <- function (df, x_axis, y_axis, bar_fill, fgrid){
p = ggplot(data=df1, aes(x=x_axis, y=y_axis, fill = bar_fill)) + geom_bar(stat = "identity")
p = p + facet_grid(~ fgrid, scales = 'free')
return(p)
}
And then run:
MyGridPlot(df1, df1Year, df1$Percentage, df1$Gender, df1$Group)
It comes up with the error:
Error: At least one layer must contain all faceting variables: `fgrid`.
* Plot is missing `fgrid`
* Layer 1 is missing `fgrid
I have tried using aes_string, which works for the x, y and fill but not for the grid.
MyGridPlot <- function (df, x_axis, y_axis, bar_fill, fgrid){
p = ggplot(data=df1, aes_string(x=x_axis, y=y_axis, fill = bar_fill)) + geom_bar(stat = "identity")
p = p + facet_grid(~ fgrid, scales = 'free')
return(p)
}
and then run:
MyGridPlot(df1, Year, Percentage, Gender, Group)
This produces the same error. If I delete the facet grid, both function code runs well, though no grid :-(
Thanks a lot for helping this beginner.
Gustavo
Your problem is that in your function, ggplot is looking for variable names (x_axis, y_axis, etc), but you're giving it objects (df1$year...).
There are a couple ways you could deal with this. Maybe the simplest would be to rewrite the function so that it expects objects. For example:
MyGridPlot <- function(x_axis, y_axis, bar_fill, fgrid){ # Note no df parameter here
df1 <- data.frame(x_axis = x_axis, y_axis = y_axis, bar_fill = bar_fill, fgrid = fgrid) # Create a data frame from inputs
p = ggplot(data=df1, aes(x=x_axis, y=y_axis, fill = bar_fill)) + geom_bar(stat = "identity")
p = p + facet_grid(~ fgrid, scales = 'free')
return(p)
}
MyGridPlot(Year, Percentage, Gender, Group)
Alternatively, you could set up the function with a data frame and variable names. There isn't really much reason to do this if you're working with individual objects the way you are here, but if you're working with a data frame, it might make your life easier:
MyGridPlot <- function(df, x_var, y_var, fill_var, grid_var){
# Need to "tell" R to treat parameters as variable names.
df <- df %>% mutate(x_var = UQ(enquo(x_var)), y_var = UQ(enquo(y_var)), fill_var = UQ(enquo(fill_var)), grid_var = UQ(enquo(grid_var)))
p = ggplot(data = df, aes(x = x_var, y = y_var, fill = fill_var)) + geom_bar(stat = "identity")
p = p + facet_grid(~grid_var, scales = 'free')
return(p)
}
MyGridPlot(df1, Year, Percentage, Gender, Group)

More compact use of ggplot : grid spaghetti plot

The following code plot the predicted probability of several models against time. Having, all the plots on one graph was not readable so I divided the result in a grid.
I was wondering if it was possible to have only one ggplot with all the models then somehow specify which goes where with grid.arrange
Current :
p2.dat1 <- select(ppf, EXPOSURE, predp.glm.gen,predp.glm1, predp.glm2,predp.glm3,predp.glm4 )
mdf1 <- melt(p2.dat1 , id.vars="EXPOSURE")
plm.plot.all1 <- ggplot(data = mdf1,
aes(x = EXPOSURE, y = value, colour = variable)) +
geom_line()
p2.dat2 <- select(ppf, EXPOSURE, predp.glm.gen, predp.glm5,predp.glm.step )
mdf2 <- melt(p2.dat2 , id.vars="EXPOSURE")
plm.plot.all2 <- ggplot(data = mdf2,
aes(x = EXPOSURE, y = value, colour = variable)) +
geom_line()
grid.arrange(plm.plot.all1, plm.plot.all2, nrow=2)
Expected:
p2.dat <- select(ppf, EXPOSURE, predp.glm.gen,predp.glm1, predp.glm2,predp.glm3,predp.glm4,predp.glm5,predp.glm.step)
mdf <- melt(p2.dat , id.vars="EXPOSURE")
plm.plot.all <- ggplot(data = mdf1,
aes(x = EXPOSURE, y = value, colour = variable)) +
geom_line()
grid.arrange(plm.plot.all[some_selection_somehow], plm.plot.all[same], nrow=2)
Thanks,
You can do this with grid.arrange by writing some helper functions. It can be done more succinctly, but I prefer small focused functions that can be used with pipes.
library(tidyverse)
library(gridExtra)
# Helper Functions ----
plot_function <- function(x) {
ggplot(x, aes(x = EXPOSURE, y = value, colour = variable)) +
geom_line() +
labs(title = unique(x$variable)) +
theme(legend.position = "none")
}
grid_plot <- function(x, selection) {
order <- c(names(x)[grepl(selection,names(x))], names(x)[!grepl(selection,names(x))])
grid.arrange(grobs = x[order], nrow = 2)
}
# Actually make the plot ----
ppf %>%
select(EXPOSURE, predp.glm.gen,predp.glm1, predp.glm2,predp.glm3,predp.glm4,predp.glm5,predp.glm.step) %>%
gather(variable, value, -EXPOSURE) %>%
split(.$variable) %>%
map(plot_function) %>%
grid_plot("predp.glm3")
or you could do this with ggplot, a facet_wrap and factoring the variable column to the proper order. This has the benefits of shared axes across the plots, which facilitates easy comparison. You can alter the helper functions in the first approach to set the axes explicitly to achieve the same effect, but its just easier keeping it in ggplot.
library(tidyverse)
selection <- "predp.glm3"
plot_data <- ppf %>%
select(EXPOSURE, predp.glm.gen,predp.glm1, predp.glm2,predp.glm3,predp.glm4,predp.glm5,predp.glm.step) %>%
gather(variable, value, -EXPOSURE) %>%
mutate(variable = fct_relevel(variable, c(selection, levels(variable)[-grepl(selection, levels(variable))])))
ggplot(plot_data, aes(x = EXPOSURE, y = value, colour = variable)) +
geom_line() +
facet_wrap( ~variable, nrow = 2) +
theme(legend.position = "none")

ggplot mixture model R

I have a dataset with numeric values and a categorical variable. The distribution of the numeric variable differs for each category. I want to plot "density plots" for each categorical variable so that they are visually below the entire density plot.
This is similiar to components of a mixture model without calculating the mixture model (as I already know the categorical variable which splits the data).
If I take ggplot to group according to the categorical variable, each of the four densities are real densities and integrate to one.
library(ggplot2)
ggplot(iris, aes(x = Sepal.Width)) + geom_density() + geom_density(aes(x = Sepal.Width, group = Species, colour = 'Species'))
What I want is to have the densities of each category as a sub-density (not integrating to 1). Similiar to the following code (which I only implemented for two of the three iris species)
myIris <- as.data.table(iris)
# calculate density for entire dataset
dens_entire <- density(myIris[, Sepal.Width], cut = 0)
dens_e <- data.table(x = dens_entire[[1]], y = dens_entire[[2]])
# calculate density for dataset with setosa
dens_setosa <- density(myIris[Species == 'setosa', Sepal.Width], cut = 0)
dens_sa <- data.table(x = dens_setosa[[1]], y = dens_setosa[[2]])
# calculate density for dataset with versicolor
dens_versicolor <- density(myIris[Species == 'versicolor', Sepal.Width], cut = 0)
dens_v <- data.table(x = dens_versicolor[[1]], y = dens_versicolor[[2]])
# plot densities as mixture model
ggplot(dens_e, aes(x=x, y=y)) + geom_line() + geom_line(data = dens_sa, aes(x = x, y = y/2.5, colour = 'setosa')) +
geom_line(data = dens_v, aes(x = x, y = y/1.65, colour = 'versicolor'))
resulting in
Above I hard-coded the number to reduce the y values. Is there any way to do it with ggplot? Or to calculate it?
Thanks for your ideas.
Do you mean something like this? You need to change the scale though.
ggplot(iris, aes(x = Sepal.Width)) +
geom_density(aes(y = ..count..)) +
geom_density(aes(x = Sepal.Width, y = ..count..,
group = Species, colour = Species))
Another option may be
ggplot(iris, aes(x = Sepal.Width)) +
geom_density(aes(y = ..density..)) +
geom_density(aes(x = Sepal.Width, y = ..density../3,
group = Species, colour = Species))

Resources