Merging 2 Legends In a Specific way - r

I have a plot of my data that includes both a boxplot and a point plot (data from mtcars for illustration)
ggplot(mtcars,aes(x=factor(cyl), y=mpg), fill=factor(carb),shape=factor(vs))+
geom_boxplot(data=subset(mtcars,am==1),aes(x = factor(cyl), y = mpg,fill=factor(carb),shape=factor(vs)),outlier.shape = NA, alpha = 0.85, width = .65, colour = "BLACK") +
geom_point(data=subset(mtcars,am==1 & vs==1),aes(x = factor(cyl), y = mpg,fill=factor(carb),shape=factor(vs)),outlier.shape = NA,size=5,alpha=.4,shape=1, colour = "BLACK", position = position_dodge(width = 0.65))
my objective is for there
to be a single legend instead of two legends
that now shows all the colors associated with the fill (based on carb) and a single element which explains what the open circles correspond to (i.e. vs==1).
for that single element (that corresponds to geom_point) to display an open circle (corresponding to the open circle in the graph) and not boxplots as its currently showing.
any help will be greatly appreciated

Remove the shape aesthetic from geom_boxplot. Also, in general no need to specify color = "black", as this is the default for geom_boxplot (same for geom_point).
The version I was running online threw a warning regarding outlier.shape, so I have removed that.
Add shape as constant aesthetic to point and use scale_shape_manual to define your shape (use shape = 21 if you want a fill - your code suggests this, or shape = 1, if you don’t.). When you remove the legend title, the legends look fairly "merged".
However, Not sure what you exactly mean with "merged legend" . Mind showing a desired output?
library(ggplot2)
ggplot(mtcars,aes(x=factor(cyl), y=mpg), fill=factor(carb),shape=factor(vs))+
geom_boxplot(data=subset(mtcars,am==1), aes(x = factor(cyl), y = mpg, fill=factor(carb)), alpha = 0.85, width = .65) +
geom_point(data=subset(mtcars,am==1 & vs==1),aes(x = factor(cyl), y = mpg,fill=factor(carb), shape = "v = 1"), size=5, alpha=.4, position = position_dodge(width = 0.65)) +
scale_shape_manual(NULL, values = 21)

Related

Overlay violin plots in r

I am trying to plot overlaying violin plots by condition within the same variable.
Var <- rnorm(100,50)
Cond <- rbinom(100, 1, 0.5)
df2 <- data.frame(Var,Cond)
ggplot(df2)+
aes(x=factor(Cond),y=Var, colour = Cond)+
geom_violin(alpha=0.3,position="identity")+
coord_flip()
So, where do I specify that I want them to overlap? Preferably, I want them to become more lighter when overlapping and darker colour when not so that their differences are clear. Any clues?
If you don't want them to have different (flipped) x-values, set x to a constant instead of x = factor(Cond). And if you want them filled in, set a fill aesthetic.
ggplot(df2)+
aes(x=0,y=Var, colour = Cond, fill = Cond)+
geom_violin(alpha=0.3,position="identity")+
coord_flip()
coord_flip isn't often needed anymore--since version 3.3.0 (released in early 2020) all geoms can point in either direction. I'd recommend simplifying as below for a similar result.
df2$Cond = factor(df2$Cond)
ggplot(df2) +
aes(y = 0, x = Var, colour = Cond, fill = Cond) +
geom_violin(alpha = 0.3, position = "identity")

how to add legends from stat_summary and remove legends from the main plot?

I want to plot the values of df1 by two groups i.e. product and start_date and also plot a crossbar with the mean of df1(blue) and mean of df2(red) as in the attached diagram.
df1 <- data.frame(product = c("A","A","A","A","A","A","A","B","B","B","B","B","B","B","C","C","C","C","C","C","C","D","D","D","D","D","D","D"),
start_date =as.Date(c('2020-02-01', '2020-02-02', '2020-02-03', '2020-02-04', '2020-02-05', '2020-02-06', '2020-02-07')),
value = c(15.71,17.37,19.93,14.28,15.85,10.5,8.58,5.62,5.19,5.44,4.6,7.04,6.29,3.3,20.35,27.92,23.07,12.83,22.28,21.32,31.46,34.82,23.68,29.11,14.48,25.2,16.91,27.79))
df2 <- data.frame(product = c("A","A","A","A","A","A","B","B","B","B","B","B","C","C","C","C","C","C","D","D","D","D","D","D"),
start_date =as.Date(c('2019-07-09', '2019-07-10', '2019-07-11', '2019-07-12', '2019-07-13', '2019-07-14')),
value = c(9.06,10.74,14.64,7.67,8.72,11.21,4.76,4.53,3.81,4.32,3.95,5.2,20.36,21.17,19.51,16.25,17.93,16.94,14.51,14.65,23.28,10.84,16.71,12.48))
PLOT GRAPH
graph1 <- ggplot(df1, aes(
y = value, x = product, fill = product, color = factor(start_date))) +
geom_col(data = df1, stat = "identity",position = position_dodge(width = 0.8), width = 0.7, inherit.aes = TRUE, size = 0) +
xlab("Product") + ylab("Values") + ylim(c(0,40)) +
scale_fill_manual(values=c("#008FCC", "#FFAA00", "#E60076", "#B00000")) +
stat_summary(data = df1, aes(x = factor(product),y = value),fun = "mean",geom = "crossbar", color = "blue", size = 1, width = 0.8, inherit.aes = FALSE) +
stat_summary(data = df2, aes(x = factor(product),y = value),fun = "mean",geom = "crossbar", color = "red", size = 1, width = 0.8, inherit.aes = FALSE)
Is there any way to remove the borders of the bar plots and add legend of the two crossbars at the top right corner of the plot ?
Additionally I would like to know if is there a way to add the just the "date" from df1 below each bar in the plot ?
Your question about adjusting the plot has multiple parts. To summarize a few points:
Change from color=factor(start_date) to group= to remove the color around bars, but maintain the separation of individual bars by start_date
Use theme(legend.position=... and specify precise placement of legend within plot area. Use theme(legend.direction='horizontal') too when appropriate.
Add color= attribute into the stat_summary(geom='crossbar'...) calls in order to "add" them both to a legend, then use scale_color_manual to specify color if you don't like the default.
Minor suggestion: Use ylim(X,Y) instead of ylim(c(X,Y)). It's not necessary to put the limits into a vector, since ylim can accept that instead and it's simpler. Note that it still works either way, so that's why this point is minor.
You don't need the data=df1 for the first stat_summary call, since it's the default mapping based on the data= value set in ggplot(.... You still need the y= value though, since it is required.
Here's the adjusted code from implementing the notes above:
ggplot(df1, aes(y = value, x = product, fill = product, group = factor(start_date))) +
geom_col(data = df1, position = position_dodge(width = 0.8),
width = 0.7, inherit.aes = TRUE, size = 0) +
xlab("Product") + ylab("Values") + ylim(0,60) +
scale_fill_manual(values=c("#008FCC", "#FFAA00", "#E60076", "#B00000")) +
stat_summary(aes(x = factor(product), y=value, color='mean1'),
fun = "mean", geom = "crossbar",
size = 1, width = 0.8, inherit.aes = FALSE) +
stat_summary(data = df2, aes(x = factor(product),y=value, color='mean2'),
fun = "mean", geom = "crossbar",
size = 1, width = 0.8, inherit.aes = FALSE) +
theme(legend.position=c(0.75,0.8), legend.direction = 'horizontal') +
scale_color_manual(values=c('blue', 'red'))
Explanation: The point of changing to group=factor(start_date) is so that you maintain the splitting of bars among the different products--a concept known as "dodging". Since your original call to color= was in the aes(, it created a legend item and the geom_col used this for dodging, since the other aesthetics were already mapped to x and y, and the fill= aesthetic was being applied. If you remove color=, you get one bar for each product. Even if you specify position='dodge', geom_col would not dodge them because there's no information about how to do that. That's why you include the group= aesthetic--to give geom_col information on how it should be dodging.
You use aes(... to indicate to ggplot which legends to create. If the aesthetic is mapped to x or y, it just uses that for plotting. group= aesthetics are used for dodging and other group attributes, but basically any other aesthetics (size, shape, color, fill, linetype... etc etc) are used to create legends. If we specify both stat_summary calls to include a color aesthetic, a legend will be created that is combined. The problem here is that there is no column in the dataset (because you have two) to use for mapping to color, so we create one by naming a character ("mean1" and "mean2").
Final point: It might be easier to plot this if you combine your datasets. You may still want to indicate where they came from, so something like this works:
df1$origin_df <- 'df1'
df2$origin_df <- 'df2'
df <- rbind(df1, df2)
Then plot with df and not df1. You can then use one stat_summary call where you specify color=origin_df.

Small ggplot2 plots placed on coordinates on a ggmap

I would like to first use ggmap to plot a specific area with longitude and latitude as axes.
Then I would like to put small ggplot2 plots on the specific locations, given their longitude and latitude. These can be barplots with minimal theme.
My database may have the columns:
1. town
2. longitude
3. latitude
4. through 6. value A, B, C
I generate a plot (pseudocode)
p <- ggmap(coordinates)
and I have my minimal ggplot2 design
q<-ggplot2()+geom_bar(....)+ ... x-axis null y axis null minimal template
How to combine the two designs to have a ggmap with small minimal ggplot plots imposed on specific coordinates of the map?
Here's one I did using pie charts as points on a scatterplot. You can use the same concept to put barcharts on a map at specific lat/long coordinates.
R::ggplot2::geom_points: how to swap points with pie charts?
Needs further update. Some of the code used was abbreviated from another answer, which has since been deleted. If you find this answer via a search engine, drop a comment and I'll get around to fleshing it back out.
Updated:
Using mostly your adapted code from your answer, but I had to update a few lines.
p <- ggmap(Poland) + coord_quickmap(xlim = c(13, 25), ylim = c(48.8, 55.5), expand = F)
This change makes a better projection and eliminates the warnings about duplicated scales.
df.grobs <- df %>%
do(subplots = ggplot(., aes(1, value, fill = component)) +
geom_col(position = position_dodge(width = 1),
alpha = 0.75, colour = "white") +
geom_text(aes(label = round(value, 1), group = component),
position = position_dodge(width = 1),
size = 3) +
theme_void()+ guides(fill = F)) %>%
mutate(subgrobs = list(annotation_custom(ggplotGrob(subplots),
x = lon-0.5, y = lat-0.5,
xmax = lon+0.5, ymax = lat+0.5)))
Here I explicitly specified the dodge width for your geom_col so I could match it with geom_text. I used round(value, 1) for the label aesthetic, and it automatically inherits the x and y aesthetics from the subplots = ggplot(...) call. I also manually set the size to be quite small, so the labels would fit, but then I increased the overall bounding box for each subgrob, from 0.35 to 0.5 in each direction.
df.grobs %>%
{p +
.$subgrobs +
geom_text(data=df, aes(label = name), vjust = 3.5, nudge_x = 0.065, size=2) +
geom_col(data = df,
aes(Inf, Inf, fill = component),
colour = "white")}
The only change I made here was for the aesthetics of the "ghost" geom_col. When they were set to 0,0 they weren't plotted at all since that wasn't within the x and y limits. By using Inf,Inf they're plotted at the far upper right corner, which is enough to make them invisible, but still plotted for the legend.

In R, using geom_path with different size and color parameters

I'm really struggling with the size and color parameters of geom_path in ggplot2. Let me share my data and code (both short) with you first, then show the plot I'm getting, then explain what plot I am trying to obtain. I'm really confused with this output right now:
# the data - x and y coordinates to plot
x_loc = c(39.29376, 39.44371, 39.59578, 39.7439, 39.88808, 40.18122,
40.92207, 41.91831, 42.09564, 42.27909, 81.77751, 81.79779, 81.81031,
81.81723, 81.81997, 81.81846)
y_loc = c(21.02953, 20.91538, 20.80633, 20.69479, 20.58158, 20.37095,
19.87498, 19.38372, 19.31743, 19.26005, 35.55103, 35.64354, 35.7384,
35.82535, 35.9067, 35.98656)
# creating the factor with which to base size and color off of
end = length(x_loc)
distances = sqrt(((x_loc[2:end] - x_loc[1:(end-1)]) ^ 2) + ((y_loc[2:end] - y_loc[1:(end-1)]) ^ 2))
my_colors = c('black', ifelse(distances > 0.5, 'red', 'black'))
# and now for my plot
ggplot() +
geom_point(aes(x = x_loc, y = y_loc)) +
geom_path(aes(x = x_loc, y = y_loc, col = as.factor(my_colors), size = as.factor(my_colors)),
alpha = 1) +
scale_color_manual(values = c("black", "red")) +
scale_size_manual(values = c(1.5, 0.45))
Here is the output plot I'm getting, incase you haven't run my code:
Here's what I'm getting, but it's not what I want. My objective here is to plot the coordinate points with lines connecting the points, so I use separate layers for geom_point() and geom_path(). However, for very long lines (long distances between consecutive coordinates), measured in the distances vector, I would like the line color to be red and for the line to be thin. For the short distances, I would like the line color to be black and for the line to be thicker.
What's wrong with my plot above is that the long black line should not be there. There's an additional black line plotting that shouldn't appear either (where the other red line is).
(It appears that by splitting the coordinates into groups (groups by size and by color, both set using the my_colors vector), the geom_path is creating two separate paths for two separate groups of points, each of which has the respective size and colors correct. However, this results in the wrong plot)
Let me know if I'm not explaining this correctly. I really want to get to the bottom of this, somehow. I'll work now on manually creating a plot similar to what I would want, and will edit shortly with it!
Thanks!
EDIT: Here's what I'm hoping to get:
which was created by cheating somewhat (cheating in the sense that I can get away with this for 16 coordinates, but not for 100K), using the following 5 geom_path layers:
ggplot() + geom_point(aes(x = x_loc, y = y_loc)) +
geom_path(aes(x = x_loc[1:6], y = y_loc[1:6]),
color = 'black',
size = 1.5,
alpha = 1) +
geom_path(aes(x = x_loc[6:8], y = y_loc[6:8]),
color = 'red',
size = 0.45,
alpha = 1) +
geom_path(aes(x = x_loc[8:10], y = y_loc[8:10]),
color = 'black',
size = 1.5,
alpha = 1) +
geom_path(aes(x = x_loc[10:11], y = y_loc[10:11]),
color = 'red',
size = 0.45,
alpha = 1) +
geom_path(aes(x = x_loc[11:16], y = y_loc[11:16]),
color = 'black',
size = 1.5,
alpha = 1)
I think I solved this myself - for anybody working on this, has to do with groupings. I will edit this with a solution shortly!
EDIT:
ggplot() +
geom_point(aes(x = x_loc, y = y_loc)) +
geom_path(aes(x = x_loc, y = y_loc, col = my_colors, size = my_colors, group = my_group),
alpha = 1) +
scale_color_manual(values = c("black", "red")) +
scale_size_manual(values = c(1.5, 0.45))
this gets the job done!, needed to group everything into the same group before splitting up the colors and sizes

Can I fix overlapping dashed lines in a histogram in ggplot2?

I am trying to plot a histogram of two overlapping distributions in ggplot2. Unfortunately, the graphic needs to be in black and white. I tried representing the two categories with different shades of grey, with transparency, but the result is not as clear as I would like. I tried adding outlines to the bars with different linetypes, but this produced some strange results.
require(ggplot2)
set.seed(65)
a = rnorm(100, mean = 1, sd = 1)
b = rnorm(100, mean = 3, sd = 1)
dat <- data.frame(category = rep(c('A', 'B'), each = 100),
values = c(a, b))
ggplot(data = dat, aes(x = values, linetype = category, fill = category)) +
geom_histogram(colour = 'black', position = 'identity', alpha = 0.4, binwidth = 1) +
scale_fill_grey()
Notice that one of the lines that should appear dotted is in fact solid (at a value of x = 4). I think this must be a result of it actually being two lines - one from the 3-4 bar and one from the 4-5 bar. The dots are out of phase so they produce a solid line. The effect is rather ugly and inconsistent.
Is there any way of fixing this overlap?
Can anyone suggest a more effective way of clarifying the difference between the two categories, without resorting to colour?
Many thanks.
One possibility would be to use a 'hollow histogram', as described here:
# assign your original plot object to a variable
p1 <- ggplot(data = dat, aes(x = values, linetype = category, fill = category)) +
geom_histogram(colour = 'black', position = 'identity', alpha = 0.4, binwidth = 0.4) +
scale_fill_grey()
# p1
# extract relevant variables from the plot object to a new data frame
# your grouping variable 'category' is named 'group' in the plot object
df <- ggplot_build(p1)$data[[1]][ , c("xmin", "y", "group")]
# plot using geom_step
ggplot(data = df, aes(x = xmin, y = y, linetype = factor(group))) +
geom_step()
If you want to vary both linetype and fill, you need to plot a histogram first (which can be filled). Set the outline colour of the histogram to transparent. Then add the geom_step. Use theme_bw to avoid 'grey elements on grey background'
p1 <- ggplot() +
geom_histogram(data = dat, aes(x = values, fill = category),
colour = "transparent", position = 'identity', alpha = 0.4, binwidth = 0.4) +
scale_fill_grey()
df <- ggplot_build(p1)$data[[1]][ , c("xmin", "y", "group")]
df$category <- factor(df$group, labels = c("A", "B"))
p1 +
geom_step(data = df, aes(x = xmin, y = y, linetype = category)) +
theme_bw()
First, I would recommend theme_set(theme_bw()) or theme_set(theme_classic()) (this sets the background to white, which makes it (much) easier to see shades of gray).
Second, you could try something like scale_linetype_manual(values=c(1,3)) -- this won't completely eliminate the artifacts you're unhappy about, but it might make them a little less prominent since linetype 3 is sparser than linetype 2.
Short of drawing density plots instead (which won't work very well for small samples and may not be familiar to your audience), dodging the positions of the histograms (which is ugly), or otherwise departing from histogram conventions, I can't think of a better solution.

Resources