I am using the multiplot() function in R with the coefplot() function to create a plot with coefficients from several models.
multiplot(mod1, mod2, mod3, mod4, mod5, mod6, mod7, mod8, mod9,
intercept = F, zeroType = 1,
zeroColor = 'black',
title = 'The Effect of Masculinity on Sexism Items',
ylab = 'Masculinity Index',
plot.shapes = T,
xlab = 'Sexism',
sort = 'natural',
legend.reverse = F,
names = c('Control', 'Offend', 'Prob', 'Protect', 'Rescue',
'R/W', 'Equal', 'Hire', 'Harass'),
newNames = c(masc_ind = 'Masculinity'))+
scale_color_manual(values = c('green', 'green', 'green', 'blue', 'blue',
'blue', 'red', 'red', 'red'))+
scale_shape_manual(values = c(15,16,17,18,19,0,1,2,5))
Which produces the above plot:
I noticed that the order of the models does not follow the order they are entered into the code (ie: mod1, mod2, mod3). Instead, the models appear to be listed in reverse alphabetical order. I know there are some ways to override this in ggplot() for factor level variables in a single model, but I haven't been able to find much in the way of reordering models in ggplot(). Is there a way to override this so I can list the models in a preferred order?
The difficulty with ggplot wrappers like multiplot is that often what you gain in ease-of-use, you lost in flexibility. However, the data frame that multiplot produces for its plot data is a very nice little summary table. This can be used to plot directly in ggplot however you choose.
Obviously, we don't have your data, but I have created a data set with all the same names and structure as your own (see below), so the following code should work for you:
library(coefplot)
var_names <- c('Control', 'Offend', 'Prob', 'Protect', 'Rescue',
'R/W', 'Equal', 'Hire', 'Harass')
plot_data <- multiplot(mod1, mod2, mod3, mod4, mod5, mod6, mod7, mod8, mod9,
intercept = F, zeroType = 1,
zeroColor = 'black',
title = 'The Effect of Masculinity on Sexism Items',
ylab = 'Masculinity Index',
plot.shapes = T,
xlab = 'Sexism',
sort = 'natural',
legend.reverse = F,
names = var_names,
newNames = c(masc_ind = 'Masculinity'))$data
p <- ggplot(plot_data, aes(Value, factor(Model, rev(var_names)), color = Model)) +
geom_vline(xintercept = 0) +
geom_errorbarh(aes(xmin = LowOuter, xmax = HighOuter), height = 0) +
geom_errorbarh(aes(xmin = LowInner, xmax = HighInner), height = 0, size = 1) +
geom_point(aes(shape = var_names), size = 4) +
labs(x = "Sexism", y = "Masculinity Index") +
theme(legend.position = "bottom")
p + scale_color_manual(values = setNames(c('green', 'green', 'green', 'blue',
'blue', 'blue', 'red', 'red', 'red'),
var_names), name = "Variable") +
scale_shape_manual(values = setNames(c(15, 16, 17, 18, 19, 0, 1, 2, 5),
var_names), name = "Variable")
And of course you can style this however you choose:
p +
theme_minimal(base_size = 16) +
scale_color_manual(values = setNames(rep(c("red3", "orange2",
"steelblue4"), each = 3), var_names),
name = NULL) +
scale_shape_manual(values = setNames(c(15, 16, 17, 18, 19, 0, 1, 2, 5),
var_names), name = NULL) +
labs(x = "Sexism", y = "Masculinity Index") +
theme(legend.position = "bottom")
Data used
set.seed(1)
var_names <- c('Control', 'Offend', 'Prob', 'Protect', 'Rescue',
'RW', 'Equal', 'Hire', 'Harass', 'masc_ind')
df <- setNames(as.data.frame(matrix(rbinom(500, 1, 0.5), 50)), var_names)
list2env(setNames(lapply(head(var_names, -1), function(x) {
df2 <- data.frame(masculinity = df[[x]], masc_ind = df$masc_ind)
do.call("glm",
list(formula = masc_ind ~ masculinity,
data = quote(df2),
family = quote(binomial)))
}), paste0("mod", 1:9)), globalenv())
Related
I have a large number of variables and would like to create scatterplots comparing all variables to a single variable. I have been able to do this in base R using lapply, but I cannot complete the same task in ggplot2 using lapply.
Below is an example dataset.
df <- data.frame("ID" = 1:16)
df$A <- c(1,2,3,4,5,6,7,8,9,10,11,12,12,14,15,16)
df$B <- c(5,6,7,8,9,10,13,15,14,15,16,17,18,18,19,20)
df$C <- c(11,12,14,16,10,12,14,16,10,12,14,16,10,12,14,16)
I define the variables I would like to generate scatterplots with, using the code below:
df_col_names <- df %>% select(A:C) %>% colnames(.)
Below is how I have been able to successfully complete the task of plotting all variables against variable A, using lapply in base R:
lapply(df_col_names, function(x) {
tiff(filename=sprintf("C:\\Documents\\%s.tiff", x),
width = 1000, height = 1000, res=200)
plot(df$A, df[[x]],
pch=19,
cex = 1.5,
ylab = x,
ylim = c(0, 20),
xlim = c(0, 20))
dev.off()
})
Below is my attempt at completing the task in ggplot2 without any success. It generates the tiff images, although they are empty.
lapply(df_col_names, function(x) {
tiff(filename=sprintf("C:\\Documents\\%s.tiff", x),
width = 1000, height = 1000, res=200)
ggplot(df) +
geom_point(data = df,
aes(x = A, y = df_col_names[[x]], size = 3)) +
geom_smooth(aes(x = A, y = df_col_names[[x]], size = 0), method = "lm", size=0.5) +
coord_fixed(ratio = 1, xlim = c(0, 20), ylim = c(0, 20)) +
guides(size = FALSE, color = FALSE) +
theme_bw(base_size = 14)
dev.off()
})
It works for me with ggsave. Also note that you are passing string column names to ggplot so use .data to refer to actual column values.
library(ggplot2)
lapply(df_col_names, function(x) {
ggplot(df) +
geom_point( aes(x = A, y = .data[[x]], size = 3)) +
geom_smooth(aes(x = A, y = .data[[x]], size = 0), method = "lm", size=0.5) +
coord_fixed(ratio = 1, xlim = c(0, 20), ylim = c(0, 20)) +
guides(size = FALSE, color = FALSE) +
theme_bw(base_size = 14) -> plt
ggsave(sprintf("%s.tiff", x), plt)
})
I would like to use my own filling colors (ex: c("red", "blue", "grey50", "black")) when using function scale_fill_binned() withing a ggplot code. How can I do this?
Here is a minimal reproducible example:
library(tidyverse)
dat <- mtcars %>%
group_by(cyl) %>%
summarise(n = n(),
mean_hp = mean(hp)) %>%
ungroup
ggplot(data = dat, aes(x = cyl, y = mean_hp, size = n, fill = n)) +
geom_point(shape = 21) +
scale_size_binned(breaks = c(8, 10, 12), guide = guide_bins(show.limits = T)) +
scale_fill_binned(breaks = c(8, 10, 12), guide = guide_bins(show.limits = T), type = "viridis") +
labs(x = "Cylinder", y = "Mean hp", fill = "Nb of cars", size = "Nb of cars") +
theme_minimal()
Here is what the output looks like:
To use this family of functions you need to provide a function that returns a an object with class "ScaleContinuous" "Scale" "ggproto" "gg" (i.e. the equivalent output to scale_fill_viridis_c)!
scale_fill_custom <- function (..., alpha = 1, begin = 0, end = 1, direction = 1,
option = "D", values = NULL, space = "Lab", na.value = "grey50",
guide = "colourbar", aesthetics = "fill") {
continuous_scale(aesthetics, scale_name = "custom",
palette = scales:::gradient_n_pal(c("red", "blue", "grey50", "black"),
values, space), na.value = na.value,
guide = guide, ...)
}
ggplot(data = dat, aes(x = cyl, y = mean_hp, size = n, fill = n)) +
geom_point(shape = 21) +
scale_size_binned(breaks = c(8, 10, 12), guide = guide_bins(show.limits = T)) +
scale_fill_binned(breaks = c(8, 10, 12), guide = guide_bins(show.limits = T),
type = scale_fill_custom) +
labs(x = "Cylinder", y = "Mean hp", fill = "Nb of cars", size = "Nb of cars") +
theme_minimal()
Note that you are using colour as a scale to be translated by the eye into numerically meaningful difference. The colours are interpolated between the manually applied points, so will not actually be your exact colours. If you wish to band your averages by colour it would be preferable to create a factor, then manually apply your theme.
ggplot(data = mutate(dat, n = cut(n, breaks = c(0, 8, 10, 12, 20))),
aes(x = cyl, y = mean_hp, size = n, fill = n)) +
geom_point(shape = 21) +
scale_size_discrete() +
scale_fill_manual(values = c("red", "blue", "grey50", "black")) +
labs(x = "Cylinder", y = "Mean hp", fill = "Nb of cars", size = "Nb of cars") +
theme_minimal()
With the comment of #teunbrand, I was able to come up with something.
cols <- c("red", "blue", "grey50", "black")
ggplot(data = dat, aes(x = cyl, y = mean_hp, size = n, fill = n)) +
geom_point(shape = 21) +
scale_size_binned(breaks = c(8, 10, 12), guide = guide_bins(show.limits = T)) +
labs(x = "Cylinder", y = "Mean hp", fill = "Nb of cars", size = "Nb of cars") +
theme_minimal() +
binned_scale(aesthetics = "fill", scale_name = "custom",
palette = ggplot2:::binned_pal(scales::manual_pal(values = cols)),
guide = "bins",
breaks = c(8, 10, 12), limits = c(min(dat$n), max(dat$n)), show.limits = T)
Here is what the output looks like:
I would like to produce a graphic combining four facets of a graph with insets in each facet showing a detail of the respective plot. This is one of the things I tried:
#create data frame
n_replicates <- c(rep(1:10,15),rep(seq(10,100,10),15),rep(seq(100,1000,100),15),rep(seq(1000,10000,1000),15))
sim_years <- rep(sort(rep((1:15),10)),4)
sd_data <- rep (NA,600)
for (i in 1:600) {
sd_data[i]<-rnorm(1,mean=exp(0.1 * sim_years[i]), sd= 1/n_replicates[i])
}
max_rep <- sort(rep(c(10,100,1000,10000),150))
data_frame <- cbind.data.frame(n_replicates,sim_years,sd_data,max_rep)
#do first basic plot
library(ggplot2)
plot1<-ggplot(data=data_frame, aes(x=sim_years,y=sd_data,group =n_replicates, col=n_replicates)) +
geom_line() + theme_bw() +
labs(title ="", x = "year", y = "sd")
plot1
#make four facets
my_breaks = c(2, 10, 100, 1000, 10000)
facet_names <- c(
`10` = "2, 3, ..., 10 replicates",
`100` = "10, 20, ..., 100 replicates",
`1000` = "100, 200, ..., 1000 replicates",
`10000` = "1000, 2000, ..., 10000 replicates"
)
plot2 <- plot1 +
facet_wrap( ~ max_rep, ncol=2, labeller = as_labeller(facet_names)) +
scale_colour_gradientn(name = "number of replicates", trans = "log",
breaks = my_breaks, labels = my_breaks, colours = rainbow(20))
plot2
#extract inlays (this is where it goes wrong I think)
library(ggpmisc)
library(tibble)
library(dplyr)
inset <- tibble(x = 0.01, y = 10.01,
plot = list(plot2 +
facet_wrap( ~ max_rep, ncol=2, labeller = as_labeller(facet_names)) +
coord_cartesian(xlim = c(13, 15),
ylim = c(3, 5)) +
labs(x = NULL, y = NULL, color = NULL) +
scale_colour_gradient(guide = FALSE) +
theme_bw(10)))
plot3 <- plot2 +
expand_limits(x = 0, y = 0) +
geom_plot_npc(data = inset, aes(npcx = x, npcy = y, label = plot)) +
annotate(geom = "rect",
xmin = 13, xmax = 15, ymin = 3, ymax = 5,
linetype = "dotted", fill = NA, colour = "black")
plot3
That leads to the following graphic:
As you can see, the colours in the insets are wrong, and all four of them appear in each of the facets even though I only want the corresponding inset of course. I read through a lot of questions here (to even get me this far) and also some examples in the ggpmisc user guide but unfortunately I am still a bit lost on how to achieve what I want. Except maybe to do it by hand extracting four insets and then combining them with plot2. But I hope there will be a better way to do this. Thank you for your help!
Edit: better graphic now thanks to this answer, but problem remains partially unsolved:
The following code does good insets, but unfortunately the colours are not preserved. As in the above version each inset does its own rainbow colours anew instead of inheriting the partial rainbow scale from the facet it belongs to. Does anyone know why and how I could change this? In comments I put another (bad) attempt at solving this, it preserves the colors but has the problem of putting all four insets in each facet.
library(ggpmisc)
library(tibble)
library(dplyr)
# #extract inlays: good colours, but produces four insets.
# fourinsets <- tibble(#x = 0.01, y = 10.01,
# x = c(rep(0.01, 4)),
# y = c(rep(10.01, 4)),
# plot = list(plot2 +
# facet_wrap( ~ max_rep, ncol=2) +
# coord_cartesian(xlim = c(13, 15),
# ylim = c(3, 5)) +
# labs(x = NULL, y = NULL, color = NULL) +
# scale_colour_gradientn(name = "number of replicates", trans = "log", guide = FALSE,
# colours = rainbow(20)) +
# theme(
# strip.background = element_blank(),
# strip.text.x = element_blank()
# )
# ))
# fourinsets$plot
library(purrr)
pp <- map(unique(data_frame$max_rep), function(x) {
plot2$data <- plot2$data %>% filter(max_rep == x)
plot2 +
coord_cartesian(xlim = c(12, 14),
ylim = c(3, 4)) +
labs(x = NULL, y = NULL) +
theme(
strip.background = element_blank(),
strip.text.x = element_blank(),
legend.position = "none",
axis.text=element_blank(),
axis.ticks=element_blank()
)
})
#pp[[2]]
inset_new <- tibble(x = c(rep(0.01, 4)),
y = c(rep(10.01, 4)),
plot = pp,
max_rep = unique(data_frame$max_rep))
final_plot <- plot2 +
geom_plot_npc(data = inset_new, aes(npcx = x, npcy = y, label = plot, vp.width = 0.3, vp.height =0.6)) +
annotate(geom = "rect",
xmin = 12, xmax = 14, ymin = 3, ymax = 4,
linetype = "dotted", fill = NA, colour = "black")
#final_plot
final_plot then looks like this:
I hope this clarifies the problem a bit. Any ideas are very welcome :)
Modifying off #user63230's excellent answer:
pp <- map(unique(data_frame$max_rep), function(x) {
plot2 +
aes(alpha = ifelse(max_rep == x, 1, 0)) +
coord_cartesian(xlim = c(12, 14),
ylim = c(3, 4)) +
labs(x = NULL, y = NULL) +
scale_alpha_identity() +
facet_null() +
theme(
strip.background = element_blank(),
strip.text.x = element_blank(),
legend.position = "none",
axis.text=element_blank(),
axis.ticks=element_blank()
)
})
Explanation:
Instead of filtering the data passed into plot2 (which affects the mapping of colours), we impose a new aesthetic alpha, where lines belonging to the other replicate numbers are assigned 0 for transparency;
Use scale_alpha_identity() to tell ggplot that the alpha mapping is to be used as-is: i.e. 1 for 100%, 0 for 0%.
Add facet_null() to override plot2's existing facet_wrap, which removes the facet for the inset.
Everything else is unchanged from the code in the question.
I think this will get you started although its tricky to get the size of the inset plot right (when you include a legend).
#set up data
library(ggpmisc)
library(tibble)
library(dplyr)
library(ggplot2)
# create data frame
n_replicates <- c(rep(1:10, 15), rep(seq(10, 100, 10), 15), rep(seq(100,
1000, 100), 15), rep(seq(1000, 10000, 1000), 15))
sim_years <- rep(sort(rep((1:15), 10)), 4)
sd_data <- rep(NA, 600)
for (i in 1:600) {
sd_data[i] <- rnorm(1, mean = exp(0.1 * sim_years[i]), sd = 1/n_replicates[i])
}
max_rep <- sort(rep(c(10, 100, 1000, 10000), 150))
data_frame <- cbind.data.frame(n_replicates, sim_years, sd_data, max_rep)
# make four facets
my_breaks = c(2, 10, 100, 1000, 10000)
facet_names <- c(`10` = "2, 3, ..., 10 replicates", `100` = "10, 20, ..., 100 replicates",
`1000` = "100, 200, ..., 1000 replicates", `10000` = "1000, 2000, ..., 10000 replicates")
Get overall plot:
# overall facet plot
overall_plot <- ggplot(data = data_frame, aes(x = sim_years, y = sd_data, group = n_replicates, col = n_replicates)) +
geom_line() +
theme_bw() +
labs(title = "", x = "year", y = "sd") +
facet_wrap(~max_rep, ncol = 2, labeller = as_labeller(facet_names)) +
scale_colour_gradientn(name = "number of replicates", trans = "log", breaks = my_breaks, labels = my_breaks, colours = rainbow(20))
#plot
overall_plot
which gives:
Then from the overall plot you want to extract each plot, see here. We can map over the list to extract one at a time:
pp <- map(unique(data_frame$max_rep), function(x) {
overall_plot$data <- overall_plot$data %>% filter(max_rep == x)
overall_plot + # coord_cartesian(xlim = c(13, 15), ylim = c(3, 5)) +
labs(x = NULL, y = NULL) +
theme_bw(10) +
theme(legend.position = "none")
})
If we look at one of these (I've removed the legend) e.g.
pp[[1]]
#pp[[2]]
#pp[[3]]
#pp[[4]]
Gives:
Then we want to add these inset plots into a dataframe so that each plot has its own row:
inset <- tibble(x = c(rep(0.01, 4)),
y = c(rep(10.01, 4)),
plot = pp,
max_rep = unique(data_frame$max_rep))
Then merge this into the overall plot:
overall_plot +
expand_limits(x = 0, y = 0) +
geom_plot_npc(data = inset, aes(npcx = x, npcy = y, label = plot, vp.width = 0.8, vp.height = 0.8))
Gives:
Here is a solution based on Z. Lin's answer, but using ggforce::facet_wrap_paginate() to do the filtering and keeping colourscales consistent.
First, we can make the 'root' plot containing all the data with no facetting.
library(ggpmisc)
library(tibble)
library(dplyr)
n_replicates <- c(rep(1:10,15),rep(seq(10,100,10),15),rep(seq(100,1000,100),15),rep(seq(1000,10000,1000),15))
sim_years <- rep(sort(rep((1:15),10)),4)
sd_data <- rep (NA,600)
for (i in 1:600) {
sd_data[i]<-rnorm(1,mean=exp(0.1 * sim_years[i]), sd= 1/n_replicates[i])
}
max_rep <- sort(rep(c(10,100,1000,10000),150))
data_frame <- cbind.data.frame(n_replicates,sim_years,sd_data,max_rep)
my_breaks = c(2, 10, 100, 1000, 10000)
facet_names <- c(
`10` = "2, 3, ..., 10 replicates",
`100` = "10, 20, ..., 100 replicates",
`1000` = "100, 200, ..., 1000 replicates",
`10000` = "1000, 2000, ..., 10000 replicates"
)
base <- ggplot(data=data_frame,
aes(x=sim_years,y=sd_data,group =n_replicates, col=n_replicates)) +
geom_line() +
theme_bw() +
scale_colour_gradientn(
name = "number of replicates",
trans = "log10", breaks = my_breaks,
labels = my_breaks, colours = rainbow(20)
) +
labs(title ="", x = "year", y = "sd")
Next, the main plot will be just the root plot with facet_wrap().
main <- base + facet_wrap(~ max_rep, ncol = 2, labeller = as_labeller(facet_names))
Then the new part is to use facet_wrap_paginate with nrow = 1 and ncol = 1 for every max_rep, which we'll use as insets. The nice thing is that this does the filtering and it keeps colour scales consistent with the root plot.
nmax_rep <- length(unique(data_frame$max_rep))
insets <- lapply(seq_len(nmax_rep), function(i) {
base + ggforce::facet_wrap_paginate(~ max_rep, nrow = 1, ncol = 1, page = i) +
coord_cartesian(xlim = c(12, 14), ylim = c(3, 4)) +
guides(colour = "none", x = "none", y = "none") +
theme(strip.background = element_blank(),
strip.text = element_blank(),
axis.title = element_blank(),
plot.background = element_blank())
})
insets <- tibble(x = rep(0.01, nmax_rep),
y = rep(10.01, nmax_rep),
plot = insets,
max_rep = unique(data_frame$max_rep))
main +
geom_plot_npc(data = insets,
aes(npcx = x, npcy = y, label = plot,
vp.width = 0.3, vp.height = 0.6)) +
annotate(geom = "rect",
xmin = 12, xmax = 14, ymin = 3, ymax = 4,
linetype = "dotted", fill = NA, colour = "black")
Created on 2020-12-15 by the reprex package (v0.3.0)
I'm trying to fix my legend text so that the text is representing the appropriate symbols and color. However, I have a lot of variables that I need to include in the legend, and they are all in different columns. Does anyone know a quick way to indicate what the colours and symbol are in the ggplot legend?
Here is some sample code
#sample data
temps = data.frame(Temperature= c(15,25,35),
Growth.Phase = c("exponential", "stationary", "death"),
Carbohydrates = sample(c(3:10), 9, replace = T),
Lipids = sample(c(10:25), 9, replace = T),
Chlorophyll = sample(c(2:15), 9),
DNA.RNA = sample(c(3:15), 9),
Protein = sample(c(5:20), 9))
temps$Shape = if_else(temps$Growth.Phase == "exponential", 21,
if_else(temps$Growth.Phase == "stationary", 22, 23))
#Graph code
ggplot(data = temps, aes(x = Temperature, y = "Proportions", shape = factor(Shape))) +
geom_point(aes(y = Carbohydrates),colour = "darkred",
fill = "darkred", size = 3) +
geom_line(aes(y = Carbohydrates), size = 1, col = "darkred") +
geom_point(aes(y = Lipids), colour = "darkblue",
fill = "darkblue", size = 3, col ="darkblue") +
geom_line(aes(y = Lipids), size = 1) +
geom_point(aes(y = Protein), colour = "violet",
fill = "violet", size = 3) +
geom_line(aes(y = Protein), size = 1, col ="violet") +
geom_point(aes(y = DNA.RNA), colour = "darkorange",
fill = "darkorange", size = 3) +
geom_line(aes(y = DNA.RNA), size = 1, col = "darkorange") +
geom_point(aes(y = Chlorophyll), size = 3, colour = "darkgreen",
fill = "darkgreen") +
geom_line(aes(y = Chlorophyll), size = 1, col = "darkgreen") +
labs(x = "Temperature (°C)", y = "Proportion")
This is the image I am getting
But as you can see it's not giving me the correct text in the legend. I would like the symbols to specify which Growth.Phase they are and the colour to specify what column I have plotted (ie. Carbohydrate, Protein etc....). Does anyone know a quick fix?
When I use my own data this is what the graph looks like, please note the lines are going through the same symbols, and are the same colours
I'm not sure whether I got the legend right. But the idea is the same as in #dc37's answer. Your plot can be considerably simplified using pivot_longer:
#sample data
temps = data.frame(Temperature= c(15,25,35),
Growth.Phase = c("exponential", "stationary", "death"),
Carbohydrates = sample(c(3:10), 9, replace = T),
Lipids = sample(c(10:25), 9, replace = T),
Chlorophyll = sample(c(2:15), 9),
DNA.RNA = sample(c(3:15), 9),
Protein = sample(c(5:20), 9))
library(ggplot2)
library(dplyr)
library(tidyr)
library(tibble)
temps_long <- temps %>%
pivot_longer(-c(Temperature, Growth.Phase)) %>%
mutate(
shape = case_when(
Growth.Phase == "exponential" ~ 21,
Growth.Phase == "stationary" ~ 22,
TRUE ~ 23
),
color = case_when(
name == "Carbohydrates" ~ "darkred",
name == "Lipids" ~ "darkblue",
name == "Protein" ~ "violet",
name == "DNA.RNA" ~ "darkorange",
name == "Chlorophyll" ~ "darkgreen",
TRUE ~ NA_character_
),
)
# named color vector
colors <- select(temps_long, name, color) %>%
distinct() %>%
deframe()
# named shape vector
shapes <- select(temps_long, Growth.Phase, shape) %>%
distinct() %>%
deframe()
ggplot(data = temps_long, aes(x = Temperature, y = value, shape = Growth.Phase, color = name, fill = name, group = Temperature)) +
geom_point(size = 3) +
geom_line(size = 1) +
scale_shape_manual(values = shapes) +
scale_fill_manual(values = colors) +
scale_color_manual(values = colors) +
labs(x = "Temperature (C)", y = "Proportion", color = "XXXX") +
guides(fill = FALSE, shape = guide_legend(override.aes = list(fill = "black")))
Created on 2020-04-04 by the reprex package (v0.3.0)
In order to make your code simpler and not have to repeat several times the same line, you can transform your data into a longer format and then use those new variables to attribute color, fill and shape arguments in your aes.
Then, using scale_color_manual or scale_shape_manual, you can set appropriate color and shape.
In order to add lines between appropriate points, I add a "rep" column in order to mimick the rpesence of replicate in your experiments. Otherwise, geom_line can't decide which points are associated together.
library(tidyr)
library(dplyr)
library(ggplot2)
temps %>% mutate(Rep = rep(1:3,each = 3)) %>%
pivot_longer(cols = Carbohydrates:Protein, names_to = "Type", values_to = "proportions") %>%
ggplot(aes(x = Temperature, y = proportions))+
geom_point(aes(fill = Type, shape = Growth.Phase, color = Type), size = 3)+
geom_line(aes( color = Type, group =interaction(Rep, Type)))+
scale_color_manual(values = c("darkred","darkgreen","darkorange","darkblue","violet"))+
scale_fill_manual(values = c("darkred","darkgreen","darkorange","darkblue","violet"))+
scale_shape_manual(values = c(23,21,22))+
labs(x = "Temperature (°C)", y = "Proportion")
Does it answer your question ?
This is my first question to StackExchange, and I've searched for answers that have been helpful, but haven't really gotten me to where I'd like to be.
This is a stacked bar chart, combined with a point chart, combined with a line.
Here's my code:
theme_set(theme_light())
library(lubridate)
FM <- as.Date('2018-02-01')
x.range <- c(FM - months(1) - days(1) - days(day(FM) - 1), FM - days(day(FM) - 1) + months(1))
x.ticks <- seq(x.range[1] + days(1), x.range[2], by = 2)
#populate example data
preds <- data.frame(FM = FM, DATE = seq(x.range[1] + days(1), x.range[2] - days(1), by = 1))
preds <- data.frame(preds, S_O = round(seq(1, 1000000, by = 1000000/nrow(preds))))
preds <- data.frame(preds, S = round(ifelse(month(preds$FM) == month(preds$DATE), day(preds$DATE) / 30.4, 0) * preds$S_O))
preds <- data.frame(preds, O = preds$S_O - preds$S)
preds <- data.frame(preds, pred_sales = round(1000000 + rnorm(nrow(preds), 0, 10000)))
preds$ma <- with(preds, stats::filter(pred_sales, rep(1/5, 5), sides = 1))
y.max <- ceiling(max(preds$pred_sales) / 5000) * 5000 + 15000
line.cols <- c(O = 'palegreen4', S = 'steelblue4',
P = 'maroon', MA = 'blue')
fill.cols <- c(O = 'palegreen3', S = 'steelblue3',
P = 'red')
p <- ggplot(data = preds,
mapping = aes(DATE, pred_sales))
p <- p +
geom_bar(data = reshape2::melt(preds[,c('DATE', 'S', 'O')], id.var = 'DATE'),
mapping = aes(DATE, value, group = 1, fill = variable, color = variable),
width = 1,
stat = 'identity',
alpha = 0.5) +
geom_point(mapping = aes(DATE, pred_sales, group = 2, fill = 'P', color = 'P'),
shape = 22, #square
alpha = 0.5,
size = 2.5) +
geom_line(data = preds[!is.na(preds$ma),],
mapping = aes(DATE, ma, group = 3, color = 'MA'),
alpha = 0.8,
size = 1) +
geom_text(mapping = aes(DATE, pred_sales, label = formatC(pred_sales / 1000, format = 'd', big.mark = ',')),
angle = 90,
size = 2.75,
hjust = 1.25,
vjust = 0.4) +
labs(title = sprintf('%s Sales Predictions - %s', 'Overall', format(FM, '%b %Y')),
x = 'Date',
y = 'Volume in MMlbs') +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1, size = 8),
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),
legend.title = element_blank(),
legend.position = 'bottom',
legend.text = element_text(size = 8),
legend.margin = margin(t = 0.25, unit = 'cm')) +
scale_x_date(breaks = x.ticks,
date_labels = '%b %e',
limits = x.range) +
scale_y_continuous(limits = c(0, y.max),
labels = function(x) { formatC(x / 1000, format='d', big.mark=',') }) +
scale_color_manual(values = line.cols,
breaks = c('MA'),
labels = c(MA = 'Mvg Avg (5)')) +
scale_fill_manual(values = fill.cols,
breaks = c('P', 'O', 'S'),
labels = c(O = 'Open Orders', S = 'Sales', P = 'Predictions'))
p
The chart it generates is this:
As you can see, the legend does a couple of funky things. It's close, but not quite there. I only want boxes with exterior borders for Predictions, Open Orders, and Sales, and only a blue line for the Mvg Avg (5).
Any advice would be appreciated.
Thanks!
Rather late, but if you are still interested to understand this problem, the following should work. Explanations are included as comments within the code:
library(dplyr)
preds %>%
# scale the values for ALL numeric columns in the dataset, before
# passing the dataset to ggplot()
mutate_if(is.numeric, ~./1000) %>%
# since x / y mappings are stated in the top level ggplot(), there's
# no need to repeat them in the subsequent layers UNLESS you want to
# override them
ggplot(mapping = aes(x = DATE, y = pred_sales)) +
# 1. use data = . to inherit the top level data frame, & modify it on
# the fly for this layer; this is neater as you are essentially
# using a single data source for the ggplot object.
# 2. geom_col() is a more succinct way to say geom_bar(stat = "identity")
# (I'm using tidyr rather than reshape package, since ggplot2 is a
# part of the tidyverse packages, & the two play together nicely)
geom_col(data = . %>%
select(S, O, DATE) %>%
tidyr::gather(variable, value, -DATE),
aes(y = value, fill = variable, color = variable),
width = 1, alpha = 0.5) +
# don't show legend for this layer (o/w the fill / color legend would
# include a square shape in the centre of each legend key)
geom_point(aes(fill = 'P', color = 'P'),
shape = 22, alpha = 0.5, size = 2.5, show.legend = FALSE) +
# use data = . %>% ... as above.
# since the fill / color aesthetic mappings from the geom_col layer would
# result in a border around all fill / color legends, avoid it all together
# here by hard coding the line color to "blue", & map its linetype instead
# to create a separate linetype-based legend later.
geom_line(data = . %>% na.omit(),
aes(y = ma, linetype = 'MA'),
color = "blue", alpha = 0.8, size = 1) +
# scales::comma is a more succinct alternative to formatC for this use case
geom_text(aes(label = scales::comma(pred_sales)),
angle = 90, size = 2.75, hjust = 1.25, vjust = 0.4) +
labs(title = sprintf('%s Sales Predictions - %s', 'Overall', format(FM, '%b %Y')),
x = 'Date',
y = 'Volume in MMlbs') +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1, size = 8),
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),
legend.title = element_blank(),
legend.position = 'bottom',
legend.text = element_text(size = 8),
legend.margin = margin(t = 0.25, unit = 'cm')) +
scale_x_date(breaks = x.ticks,
date_labels = '%b %e',
limits = x.range) +
# as above, scales::comma is more succinct
scale_y_continuous(limits = c(0, y.max / 1000),
labels = scales::comma) +
# specify the same breaks & labels for the manual fill / color scales, so that
# a single legend is created for both
scale_color_manual(values = line.cols,
breaks = c('P', 'O', 'S'),
labels = c(O = 'Open Orders', S = 'Sales', P = 'Predictions')) +
scale_fill_manual(values = fill.cols,
breaks = c('P', 'O', 'S'),
labels = c(O = 'Open Orders', S = 'Sales', P = 'Predictions')) +
# create a separate line-only legend using the linetype mapping, with
# value = 1 (i.e. unbroken line) & specified alpha / color to match the
# geom_line layer
scale_linetype_manual(values = 1,
label = 'Mvg Avg (5)',
guide = guide_legend(override.aes = list(alpha = 1,
color = "blue")))