More compact use of ggplot : grid spaghetti plot - r

The following code plot the predicted probability of several models against time. Having, all the plots on one graph was not readable so I divided the result in a grid.
I was wondering if it was possible to have only one ggplot with all the models then somehow specify which goes where with grid.arrange
Current :
p2.dat1 <- select(ppf, EXPOSURE, predp.glm.gen,predp.glm1, predp.glm2,predp.glm3,predp.glm4 )
mdf1 <- melt(p2.dat1 , id.vars="EXPOSURE")
plm.plot.all1 <- ggplot(data = mdf1,
aes(x = EXPOSURE, y = value, colour = variable)) +
geom_line()
p2.dat2 <- select(ppf, EXPOSURE, predp.glm.gen, predp.glm5,predp.glm.step )
mdf2 <- melt(p2.dat2 , id.vars="EXPOSURE")
plm.plot.all2 <- ggplot(data = mdf2,
aes(x = EXPOSURE, y = value, colour = variable)) +
geom_line()
grid.arrange(plm.plot.all1, plm.plot.all2, nrow=2)
Expected:
p2.dat <- select(ppf, EXPOSURE, predp.glm.gen,predp.glm1, predp.glm2,predp.glm3,predp.glm4,predp.glm5,predp.glm.step)
mdf <- melt(p2.dat , id.vars="EXPOSURE")
plm.plot.all <- ggplot(data = mdf1,
aes(x = EXPOSURE, y = value, colour = variable)) +
geom_line()
grid.arrange(plm.plot.all[some_selection_somehow], plm.plot.all[same], nrow=2)
Thanks,

You can do this with grid.arrange by writing some helper functions. It can be done more succinctly, but I prefer small focused functions that can be used with pipes.
library(tidyverse)
library(gridExtra)
# Helper Functions ----
plot_function <- function(x) {
ggplot(x, aes(x = EXPOSURE, y = value, colour = variable)) +
geom_line() +
labs(title = unique(x$variable)) +
theme(legend.position = "none")
}
grid_plot <- function(x, selection) {
order <- c(names(x)[grepl(selection,names(x))], names(x)[!grepl(selection,names(x))])
grid.arrange(grobs = x[order], nrow = 2)
}
# Actually make the plot ----
ppf %>%
select(EXPOSURE, predp.glm.gen,predp.glm1, predp.glm2,predp.glm3,predp.glm4,predp.glm5,predp.glm.step) %>%
gather(variable, value, -EXPOSURE) %>%
split(.$variable) %>%
map(plot_function) %>%
grid_plot("predp.glm3")
or you could do this with ggplot, a facet_wrap and factoring the variable column to the proper order. This has the benefits of shared axes across the plots, which facilitates easy comparison. You can alter the helper functions in the first approach to set the axes explicitly to achieve the same effect, but its just easier keeping it in ggplot.
library(tidyverse)
selection <- "predp.glm3"
plot_data <- ppf %>%
select(EXPOSURE, predp.glm.gen,predp.glm1, predp.glm2,predp.glm3,predp.glm4,predp.glm5,predp.glm.step) %>%
gather(variable, value, -EXPOSURE) %>%
mutate(variable = fct_relevel(variable, c(selection, levels(variable)[-grepl(selection, levels(variable))])))
ggplot(plot_data, aes(x = EXPOSURE, y = value, colour = variable)) +
geom_line() +
facet_wrap( ~variable, nrow = 2) +
theme(legend.position = "none")

Related

How can I change the size of a bar in a grouped bar chart when one group has no data? [duplicate]

Is there a way to set a constant width for geom_bar() in the event of missing data in the time series example below? I've tried setting width in aes() with no luck. Compare May '11 to June '11 width of bars in the plot below the code example.
colours <- c("#FF0000", "#33CC33", "#CCCCCC", "#FFA500", "#000000" )
iris$Month <- rep(seq(from=as.Date("2011-01-01"), to=as.Date("2011-10-01"), by="month"), 15)
colours <- c("#FF0000", "#33CC33", "#CCCCCC", "#FFA500", "#000000" )
iris$Month <- rep(seq(from=as.Date("2011-01-01"), to=as.Date("2011-10-01"), by="month"), 15)
d<-aggregate(iris$Sepal.Length, by=list(iris$Month, iris$Species), sum)
d$quota<-seq(from=2000, to=60000, by=2000)
colnames(d) <- c("Month", "Species", "Sepal.Width", "Quota")
d$Sepal.Width<-d$Sepal.Width * 1000
g1 <- ggplot(data=d, aes(x=Month, y=Quota, color="Quota")) + geom_line(size=1)
g1 + geom_bar(data=d[c(-1:-5),], aes(x=Month, y=Sepal.Width, width=10, group=Species, fill=Species), stat="identity", position="dodge") + scale_fill_manual(values=colours)
Some new options for position_dodge() and the new position_dodge2(), introduced in ggplot2 3.0.0 can help.
You can use preserve = "single" in position_dodge() to base the widths off a single element, so the widths of all bars will be the same.
ggplot(data = d, aes(x = Month, y = Quota, color = "Quota")) +
geom_line(size = 1) +
geom_col(data = d[c(-1:-5),], aes(y = Sepal.Width, fill = Species),
position = position_dodge(preserve = "single") ) +
scale_fill_manual(values = colours)
Using position_dodge2() changes the way things are centered, centering each set of bars at each x axis location. It has some padding built in, so use padding = 0 to remove.
ggplot(data = d, aes(x = Month, y = Quota, color = "Quota")) +
geom_line(size = 1) +
geom_col(data = d[c(-1:-5),], aes(y = Sepal.Width, fill = Species),
position = position_dodge2(preserve = "single", padding = 0) ) +
scale_fill_manual(values = colours)
The easiest way is to supplement your data set so that every combination is present, even if it has NA as its value. Taking a simpler example (as yours has a lot of unneeded features):
dat <- data.frame(a=rep(LETTERS[1:3],3),
b=rep(letters[1:3],each=3),
v=1:9)[-2,]
ggplot(dat, aes(x=a, y=v, colour=b)) +
geom_bar(aes(fill=b), stat="identity", position="dodge")
This shows the behavior you are trying to avoid: in group "B", there is no group "a", so the bars are wider. Supplement dat with a dataframe with all the combinations of a and b:
dat.all <- rbind(dat, cbind(expand.grid(a=levels(dat$a), b=levels(dat$b)), v=NA))
ggplot(dat.all, aes(x=a, y=v, colour=b)) +
geom_bar(aes(fill=b), stat="identity", position="dodge")
I had the same problem but was looking for a solution that works with the pipe (%>%). Using tidyr::spread and tidyr::gather from the tidyverse does the trick. I use the same data as #Brian Diggs, but with uppercase variable names to not end up with double variable names when transforming to wide:
library(tidyverse)
dat <- data.frame(A = rep(LETTERS[1:3], 3),
B = rep(letters[1:3], each = 3),
V = 1:9)[-2, ]
dat %>%
spread(key = B, value = V, fill = NA) %>% # turn data to wide, using fill = NA to generate missing values
gather(key = B, value = V, -A) %>% # go back to long, with the missings
ggplot(aes(x = A, y = V, fill = B)) +
geom_col(position = position_dodge())
Edit:
There actually is a even simpler solution to that problem in combination with the pipe. Use tidyr::complete gives the same result in one line:
dat %>%
complete(A, B) %>%
ggplot(aes(x = A, y = V, fill = B)) +
geom_col(position = position_dodge())

create single ggplot figure with a standalone ggplot and a ggplot created with facet_wrap()?

I have my an empty panel in my facetted ggplot. I would like to insert my standalone plot into this. Is this possible? See below for example code.
I found a possible solution Here, but can't get it to 'look nice'. To 'look nice' I want the standalone plot to have the same dimensions as one of the facetted plots.
library(ggplot2)
library(plotly)
data("mpg")
first_plot = ggplot(data = mpg, aes(x = trans, y = cty)) +
geom_point(size= 1.3)
facet_plot = ggplot(data = mpg, aes(x = year, y = cty)) +
geom_point(size = 1.3) +
facet_wrap(~manufacturer)
facet_plot # room for one more panel which I want first_plot to go?
# try an merge but makes first plot huge, compared with facetted plots.
subplot(first_plot, facet_plot, which_layout = 2)
Besides the options to manipulate the gtable or using patchwork one approach to achieve your desired result would be via some data wrangling to add the standalone plot as an additional facet. Not sure whether this will work for your real data but at least for mpg you could do:
library(ggplot2)
library(dplyr)
mpg_bind <- list(standalone = mpg, facet = mpg) %>%
bind_rows(.id = "id") %>%
mutate(x = ifelse(id == "standalone", trans, year),
facet = ifelse(id == "standalone", "all", manufacturer),
facet = forcats::fct_relevel(facet, "all", after = 1000))
ggplot(data = mpg_bind, aes(x = x, y = cty)) +
geom_point(size = 1.3) +
facet_wrap(~facet, scales = "free_x")

Same variables from 2 separate data.frames sided by side in ggplot2

I want to plot the exact same variable names (ses & math) from 2 separate data.frames (dat1 & dat2) but side by side so I can visually compare them.
I have tried the following but it places both data.frames on top of each other.
Is there a function within ggplot2 to plot ses vs. math from dat1 and the same from dat2 side by side and placed on the same axes scales?
library(ggplot2)
dat1 <- read.csv('https://raw.githubusercontent.com/rnorouzian/e/master/hsb.csv')
dat2 <- read.csv('https://raw.githubusercontent.com/rnorouzian/e/master/sm.csv')
ggplot(dat1, aes(x = ses, y = math, colour = factor(sector))) +
geom_point() +
geom_point(data = dat2, aes(x = ses, y = math, colour = factor(sector)))
You can try faceting combining the two datasets :
library(dplyr)
library(ggplot2)
list(dat1 = dat1 %>%
select(sector,ses, math) %>%
mutate(sector = as.character(sector)) ,
dat2 = dat2 %>% select(sector,ses, math)) %>%
bind_rows(.id = 'name') %>%
ggplot() +
aes(x = ses, y = math, colour = factor(sector)) +
geom_point() +
facet_wrap(.~name)
Another option is to create list of plots and arrange them with grid.arrange :
list_plots <- lapply(list(dat1, dat2), function(df) {
ggplot(df, aes(x = ses, y = math, colour = factor(sector))) + geom_point()
})
do.call(gridExtra::grid.arrange, c(list_plots, ncol = 2))

Adding a single label per group in ggplot with stat_summary and text geoms

I would like to add counts to a ggplot that uses stat_summary().
I am having an issue with the requirement that the text vector be the same length as the data.
With the examples below, you can see that what is being plotted is the same label multiple times.
The workaround to set the location on the y axis has the effect that multiple labels are stacked up. The visual effect is a bit strange (particularly when you have thousands of observations) and not sufficiently professional for my purposes. You will have to trust me on this one - the attached picture doesn't fully convey the weirdness of it.
I was wondering if someone else has worked out another way. It is for a plot in shiny that has dynamic input, so text cannot be overlaid in a hardcoded fashion.
I'm pretty sure ggplot wasn't designed for the kind of behaviour with stat_summary that I am looking for, and I may have to abandon stat_summary and create a new summary dataframe, but thought I would first check if someone else has some wizardry to offer up.
This is the plot without setting the y location:
library(dplyr)
library(ggplot2)
df_x <- data.frame("Group" = c(rep("A",1000), rep("B",2) ),
"Value" = rnorm(1002))
df_x <- df_x %>%
group_by(Group) %>%
mutate(w_count = n())
ggplot(df_x, aes(x = Group, y = Value)) +
stat_summary(fun.data="mean_cl_boot", size = 1.2) +
geom_text(aes(label = w_count)) +
coord_flip() +
theme_classic()
and this is with my hack
ggplot(df_x, aes(x = Group, y = Value)) +
stat_summary(fun.data="mean_cl_boot", size = 1.2) +
geom_text(aes(y = 1, label = w_count)) +
coord_flip() +
theme_classic()
Create a df_text that has the grouped info for your labels. Then use annotate:
library(dplyr)
library(ggplot2)
set.seed(123)
df_x <- data.frame("Group" = c(rep("A",1000), rep("B",2) ),
"Value" = rnorm(1002))
df_text <- df_x %>%
group_by(Group) %>%
summarise(avg = mean(Value),
n = n()) %>%
ungroup()
yoff <- 0.0
xoff <- -0.1
ggplot(df_x, aes(x = Group, y = Value)) +
stat_summary(fun.data="mean_cl_boot", size = 1.2) +
annotate("text",
x = 1:2 + xoff,
y = df_text$avg + yoff,
label = df_text$n) +
coord_flip() +
theme_classic()
I found another way which is a little more robust for when the plot is dynamic in its ordering and filtering, and works well for faceting. More robust, because it uses stat_summary for the text.
library(dplyr)
library(ggplot2)
df_x <- data.frame("Group" = c(rep("A",1000), rep("B",2) ),
"Value" = rnorm(1002))
counts_df <- function(y) {
return( data.frame( y = 1, label = paste0('n=', length(y)) ) )
}
ggplot(df_x, aes(x = Group, y = Value)) +
stat_summary(fun.data="mean_cl_boot", size = 1.2) +
coord_flip() +
theme_classic()
p + stat_summary(geom="text", fun.data=counts_df)

Passing argument to facet grid in function -ggplot

I am trying to write a function to plot graphs in a grid. I am using ggplot and facet grid. I am unable to pass the argument for facet grid. I wonder if anybody can point me in the right direction.
The data example:
Year = as.factor(rep(c("01", "02"), each = 4, times = 1))
Group = as.factor(rep(c("G1", "G2"), each = 2, times = 2))
Gender = as.factor(rep(c("Male", "Female"), times = 4))
Percentage = as.integer(c("80","20","50","50","45","55","15","85"))
df1 = data.frame (Year, Group, Gender, Percentage)
The code for the grid plot without function is:
p = ggplot(data=df1, aes(x=Year, y=Percentage, fill = Gender)) + geom_bar(stat = "identity")
p = p + facet_grid(~ Group, scales = 'free')
p
This produces a plot like the ones I want to do. However, when I put it into a function:
MyGridPlot <- function (df, x_axis, y_axis, bar_fill, fgrid){
p = ggplot(data=df1, aes(x=x_axis, y=y_axis, fill = bar_fill)) + geom_bar(stat = "identity")
p = p + facet_grid(~ fgrid, scales = 'free')
return(p)
}
And then run:
MyGridPlot(df1, df1Year, df1$Percentage, df1$Gender, df1$Group)
It comes up with the error:
Error: At least one layer must contain all faceting variables: `fgrid`.
* Plot is missing `fgrid`
* Layer 1 is missing `fgrid
I have tried using aes_string, which works for the x, y and fill but not for the grid.
MyGridPlot <- function (df, x_axis, y_axis, bar_fill, fgrid){
p = ggplot(data=df1, aes_string(x=x_axis, y=y_axis, fill = bar_fill)) + geom_bar(stat = "identity")
p = p + facet_grid(~ fgrid, scales = 'free')
return(p)
}
and then run:
MyGridPlot(df1, Year, Percentage, Gender, Group)
This produces the same error. If I delete the facet grid, both function code runs well, though no grid :-(
Thanks a lot for helping this beginner.
Gustavo
Your problem is that in your function, ggplot is looking for variable names (x_axis, y_axis, etc), but you're giving it objects (df1$year...).
There are a couple ways you could deal with this. Maybe the simplest would be to rewrite the function so that it expects objects. For example:
MyGridPlot <- function(x_axis, y_axis, bar_fill, fgrid){ # Note no df parameter here
df1 <- data.frame(x_axis = x_axis, y_axis = y_axis, bar_fill = bar_fill, fgrid = fgrid) # Create a data frame from inputs
p = ggplot(data=df1, aes(x=x_axis, y=y_axis, fill = bar_fill)) + geom_bar(stat = "identity")
p = p + facet_grid(~ fgrid, scales = 'free')
return(p)
}
MyGridPlot(Year, Percentage, Gender, Group)
Alternatively, you could set up the function with a data frame and variable names. There isn't really much reason to do this if you're working with individual objects the way you are here, but if you're working with a data frame, it might make your life easier:
MyGridPlot <- function(df, x_var, y_var, fill_var, grid_var){
# Need to "tell" R to treat parameters as variable names.
df <- df %>% mutate(x_var = UQ(enquo(x_var)), y_var = UQ(enquo(y_var)), fill_var = UQ(enquo(fill_var)), grid_var = UQ(enquo(grid_var)))
p = ggplot(data = df, aes(x = x_var, y = y_var, fill = fill_var)) + geom_bar(stat = "identity")
p = p + facet_grid(~grid_var, scales = 'free')
return(p)
}
MyGridPlot(df1, Year, Percentage, Gender, Group)

Resources