ggplot legends when plot is built from two data frames - r

I have data coming from two different data frames. I am trying to create legend for each data frame. I know I can combine the data frame and do it, but because of my data source it makes the most sense to plot from two different data frames.
Please find the simplified example below. I have gotten close but the 'Main Forecast' in the legend is only white color. I want to show where 'Main Forecast' is red on the outside and white on the inside.
x = seq(1,10, 1)
y = seq(10,100, 10)
df = data.frame(x=x, y=y)
df2 = data.frame(x=5, y=50)
p = ggplot(data=df) +
geom_point(data=df,aes(x=x, y=y, color="Weekly Forecast"), fill="red", size=5, shape=16) +
geom_line(data=df,aes(x=x, y=y), color="red", size=1) +
geom_point(data=df2, aes(x=x, y=y, color="Main Forecast"), size=2, shape=16) +
scale_color_manual("Legend Title", breaks=c("Weekly Forecast", "Main Forecast"), values = c("white","red"))
p
Any assistance will be greatly appreciated.

You need to use one of the symbols that takes a fill (pch = 21:25). You then need to use override.aes to get the legend right. I've moved shared data and aes into the ggplot command.
ggplot(data=df, aes(x=x, y=y)) +
geom_point(aes(color="Weekly Forecast"), shape=16, size = 5) +
geom_line(color="red", size=1) +
geom_point(data=df2, aes(color="Main Forecast"), shape=21, fill = "white", size = 5) +
scale_color_manual("Legend Title", limits=c("Weekly Forecast", "Main Forecast"), values = c("red","red")) +
guides(colour = guide_legend(override.aes = list(pch = c(16, 21), fill = c("red", "white"))))
This can also be done without override.aes:
ggplot(data=df, aes(x=x, y=y)) +
geom_line(aes(color="Main Forecast"), size=1) +
geom_point(aes(color="Weekly Forecast", fill="Weekly Forecast"), shape=21, size = 5) +
geom_point(data=df2, aes(color="Main Forecast", fill="Main Forecast"), shape=21, size = 5) +
scale_color_manual(name="", values = c("red","red")) +
scale_fill_manual(name="", values=c("white","red"))

Related

R - Changing shape only in legend in ggplot2

My plot currently looks like this:
I want to change the shape in the legend (which is currently "a") for only points that indicate the respective colours. This is my code so far:
ggplot(data=pca2.data, aes(x=X, y=Y, label=Sample, colour = col)) +
geom_text() +
xlab(paste("PC1 - ", pca2.var.per[1], "%", sep="")) +
ylab(paste("PC2 - ", pca2.var.per[2], "%", sep="")) +
theme_bw() +
ggtitle("My PCA Graph") +
geom_hline(yintercept = 0) +
geom_vline(xintercept = 0) +
scale_color_manual(values=c("black", "red", "green"), labels = c("No significant difference", "Sharpe Decrease", "Sharpe Increase")) +
theme(legend.position = 'bottom') + guides(color=guide_legend(""))
I already tried adding "shape = c(20, 20, 20)" inside of "guide_legend", but it changed nothing.
Just put an empty point layer and don't plot legend for geom_text
As you didn't provide data, I've used mtcars dataset but it should translate to your problem
ggplot(mtcars, aes(mpg, cyl, label=rownames(mtcars), color=factor(carb))) +geom_point(shape=NA)+
geom_text( show.legend = F ) + guides(colour=guide_legend(override.aes = list(shape = 16)))

How to fix: when overlaying two scatter plots with using reorder of aes, the reorder gets lost

I have two scatter plots obtained from two sets of data that I would like to overlay, when using the ggplo2 for creating single plot i am using log scale and than ordering the numbers sothe scatter plot falls into kind if horizontal S shape. Byt when i want to overlay, the information about reordering gets lost, and the plot loses its shape.
this is how the df looks like (one has 1076 entries and the other 1448)
protein Light_Dark log10
AT1G01080 1.1744852 0.06984755
AT1G01090 1.0710359 0.02980403
AT1G01100 0.4716955 -0.32633823
AT1G01320 156.6594802 2.19495668
AT1G02500 0.6406005 -0.19341276
AT1G02560 1.3381804 0.12651467
AT1G03130 0.6361147 -0.19646458
AT1G03475 0.7529015 -0.12326181
AT1G03630 0.7646064 -0.11656207
AT1G03680 0.8340107 -0.07882836
this is for single plot:
p1 <- ggplot(ratio_log_ENR4, aes(x=reorder(protein, -log10), y=log10)) +
geom_point(size = 1) +
#coord_cartesian(xlim = c(0, 1000)) +
geom_hline(yintercept=0.1, col = "red") + #check gene
geom_hline(yintercept=-0.12, col = "red") +#check gene
labs(x = "Protein")+
theme_classic()+
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank())+
labs(y = "ratio Light_Dark log10")+
labs(x="Protein")
image=p1
ggsave(file="p1_ratio_data_ENR4_cys.svg", plot=image, width=10, height=8)
and for over lay:
p1_14a <- ggplot(ratio_log_ENR1, aes(x=reorder(protein, -log10), y=log10)) +
geom_point(size = 1) +
#coord_cartesian(xlim = c(0, 1000)) +
geom_hline(yintercept=0.1, col = "red") + #check gene
geom_hline(yintercept=-0.12, col = "red") +#check gene
labs(x = "Protein")+
theme_classic()+
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank())+
labs(y = "ratio Light_Dark log10")+
labs(x="Protein")+
geom_point()+
geom_point(data=ratio_log_ENR4, color="red")
p=ggplot(ratio_log_ENR1, aes(x=reorder(protein, -log10), y=log10)) +
geom_point(size = 1) +
#coord_cartesian(xlim = c(0, 1000)) +
geom_hline(yintercept=0.1, col = "red") + #check gene
geom_hline(yintercept=-0.12, col = "red") +#check gene
labs(x = "Protein")+
theme_classic()+
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank())+
labs(y = "ratio Light_Dark log10")+
labs(x="Protein")
p = p + geom_point(data=ratio_log_ENR4, aes(x=reorder(protein, -log10), y=log10), color ="red" )
p
I tried to change classes... but it cant be the problem since for single plot its working like it is
The easiest solution I see for you is just binding together your two dataframes before plotting.
a$color <- 'red'
b$color <- 'blue'
ab <- a %>%
rbind(b)
ggplot(ab, aes(x = fct_reorder(protein, -log10), y = log10, color = color)) +
geom_point() +
scale_color_identity()
You can find a nice cheat-sheet for working with factors here: https://stat545.com/block029_factors.html

ggplot2: doughnuts, how to conditional color fill with if_else

Following guides like ggplot Donut chart I am trying to draw small gauges, doughnuts with a label in the middle, with the intention to put them later on on a map.
If the value reaches a certain threshold I would like the fill of the doughnut to change to red. Is it possible to achieve with if_else (it would be most natural but it does not work).
library(tidyverse)
df <- tibble(ID=c("A","B"),value=c(0.7,0.5)) %>% gather(key = cat,value = val,-ID)
ggplot(df, aes(x = val, fill = cat)) + scale_fill_manual(aes,values = c("red", "yellow"))+
geom_bar(position="fill") + coord_polar(start = 0, theta="y")
ymax <- max(df$val)
ymin <- min(df$val)
p2 = ggplot(df, aes(fill=cat, y=0, ymax=1, ymin=val, xmax=4, xmin=3)) +
geom_rect(colour="black",stat = "identity") +
scale_fill_manual(values = if_else (val > 0.5, "red", "black")) +
geom_text( aes(x=0, y=0, label= scales::percent (1-val)), position = position_dodge(0.9))+
coord_polar(theta="y") +
xlim(c(0, 4)) +
theme_void() +
theme(legend.position="none") +
scale_y_reverse() + facet_wrap(facets = "ID")
Scale fill manual values= if else.... this part does not work, the error says: Error in if_else(val > 0.5, "red", "black") : object 'val' not found. Is it my error, or some other solution exists?
I also realize my code is not optimal, initially gather waited for more variables to be included in the plot, but I failed to stack one variable on top of the other. Now one variable should be enough to indicate the percentage of completion. I realise my code is redundant for the purpose. Can you help me out?
A solution for the color problem is to first create a variable in the data and then use that to map the color in the plot:
df <- tibble(ID=c("A","B"),value=c(0.7,0.5)) %>% gather(key = cat,value = val,-ID) %>%
mutate(color = if_else(val > 0.5, "red", "black"))
p2 = ggplot(df, aes(fill=color, y=0, ymax=1, ymin=val, xmax=4, xmin=3)) +
geom_rect(colour="black",stat = "identity") +
scale_fill_manual(values = c(`red` = "red", `black` = "black")) +
geom_text( aes(x=0, y=0, label= scales::percent (1-val)), position = position_dodge(0.9))+
coord_polar(theta="y") +
xlim(c(0, 4)) +
theme_void() +
theme(legend.position="none") +
scale_y_reverse() + facet_wrap(facets = "ID")
The result would be:

Use position_jitterdodge to plot points, and add highlighted points that are also dodged

I have some data where x is categorical, y is numeric, and color.var is another categorical variable that I would like to color by. My goal is to plot all of the points using position_jitterdodge(), and then highlight a couple of the points, draw a line between them, and add labels, while making sure these highlighted points line up with the corresponding strips of points that were plotted using position_jitterdodge(). The highlighted points are aligned properly when all factors are present in the variable used to dodge, but it does not work well when some factors are missing.
Minimal (non-)working example
library(ggplot2)
Generate some data
d = data.frame(x = c(rep('x1', 1000), rep('x2', 1000)),
y = runif(n=2000, min=0, max=1),
color.var= rep(c('color1', 'color2'), 1000),
facet.var = rep(c('facet1', 'facet1', 'facet2', 'facet2'), 500))
head(d)
dd = d[c(1,2,3,4,1997,1998, 1999,2000),]
dd
df1 = dd[dd$color.var=='color1',] ## data for first set of points, labels, and the line connecting them
df2 = dd[dd$color.var=='color2',] ## data for second set of points, labels, and the line connecting them
df1
dw = .75 ## Define the dodge.width
Plot all points
Here are all of the points, separated using position_jitterdodge() and the aesthetic fill.
ggplot() +
geom_point(data=d, aes(x=x, y=y, fill=color.var), position=position_jitterdodge(dodge.width=dw), size=3, alpha=1, shape=21, color='darkgray') +
facet_wrap(~facet.var) +
scale_fill_manual(values=c( 'lightblue','gray'))+
theme(axis.title = element_blank()) +
theme(legend.position="top")
That works well.
Additional highlighted points.
Here is the same plot, with additional points in dd added.
ggplot() +
geom_point(data=d, aes(x=x, y=y, fill =color.var), position=position_jitterdodge(dodge.width=dw), size=3, alpha=1, shape=21, color='darkgray') +
geom_point(data=dd, aes(x=x, y=y, color=color.var ), position=position_dodge(width=.75), size=4 ) +
geom_line(data=dd, aes(x=x, y=y, color=color.var, group=color.var ), position=position_dodge(width=.75), size=1 ) +
geom_label(data=dd, aes(x=x, y=y, color=color.var, group=color.var, label=round(y,1)), position=position_dodge(width=.75), vjust=-.5) +
facet_wrap(~facet.var) +
scale_fill_manual(values=c( 'lightblue','gray'))+
scale_color_manual(values=c( 'blue', 'gray40')) +
theme(axis.title = element_blank())+
theme(legend.position="top")
This is what I want it to look like. However, this only works properly if both factors of the color.var variable are in the set of points to highlight.
If both factors aren't present in the new data, the horizonal alignment fails.
Highlight points, only one factor present
Here is an example where only the 'color1' factor (blue) is present. Note that data=dd was replaced with data=df1 (data that only contains blue highlighted dots) in this code.
ggplot() +
geom_point(data=d, aes(x=x, y=y, fill =color.var), position=position_jitterdodge(dodge.width=dw), size=3, alpha=1, shape=21, color='darkgray') +
geom_point(data=df1, aes(x=x, y=y, color=color.var ), position=position_dodge(width=.75), size=4 ) +
geom_line(data=df1, aes(x=x, y=y, color=color.var, group=color.var ), position=position_dodge(width=.75), size=1 ) +
geom_label(data=df1, aes(x=x, y=y, color=color.var, group=color.var, label=round(y,1)), position=position_dodge(width=.75), vjust=-.5) +
facet_wrap(~facet.var) +
scale_fill_manual(values=c( 'lightblue','gray'))+
scale_color_manual(values=c( 'blue', 'gray40')) +
theme(axis.title = element_blank())+
theme(legend.position="top") +
scale_x_discrete(drop=F)
The highlight blue dots appear between the blue and gray dots, instead of aligned with the blue dots. Note that the additional code scale_x_discrete(drop=F) had no apparent effect on the alignment.
A manual solution
One possible fix is to edit the x coordinate manually, like this
ggplot(data=d, aes(x=x, y=y)) +
geom_point(aes(fill=color.var), position=position_jitterdodge(dodge.width=dw), size=3, alpha=1, shape=21, color='darkgray') +
geom_point(data=df1, aes(x=as.numeric(x)-dw/4, y=y), alpha=.9, size=4 , color='blue') + ## first set of points
geom_line( data=df1, aes(x=as.numeric(x)-dw/4, y=y , group=color.var ), color='blue', size=1) + ## first line
geom_label(data=df1, aes(x=as.numeric(x)-dw/4, y=y , label=round(y,1)), color='blue', vjust=-.25)+ ## first set of labels
facet_wrap(~facet.var) +
scale_fill_manual(values=c( 'lightblue','gray'))+
theme(axis.title = element_blank() +
theme(legend.position="top")
An adjustment of 1/4 of the dodge.width seems to work. This works fine, but it seems like there should be a better way, especially since I will eventually want to do this with 4-5 sets of highlighted points/lines, which may all be all be the same color.var, like the blue 'color1' factor above. Repeating this 4-5 times would be cumbersome. I will also eventually want to do this will 5-10 different figures. I suppose dodge.width*1/4 will always work, and copying and pasting might do the trick, but would like to know if there is a better way.
Here is a solution based on #aosmith's comment. Basically, just need to add this code before using ggplot:
library(dplyr) ## needed for group_by()
library(tidyr) ## needed for complete()
df1 = df1 %>% group_by(facet.var, x) %>% complete(color.var)
That adds extra rows to the data so that all the levels of color.var are present. Then the code given in the question, along with a couple of small edits that fix the legend, can be used:
ggplot() +
geom_point(data=d , aes(x=x, y=y, fill =color.var), position=position_jitterdodge(dodge.width=dw), size=3, alpha=1, shape=21, color='darkgray', show.legend=T) +
geom_point(data=df1, aes(x=x, y=y, color=color.var ), position=position_dodge(width=.75), size=4, show.legend=T ) +
geom_line( data=df1, aes(x=x, y=y, color=color.var, group=color.var ), position=position_dodge(width=.75), size=1, show.legend=F ) +
geom_label(data=df1, aes(x=x, y=y, color=color.var, group=color.var, label=round(y,1)), position=position_dodge(width=.75), vjust=-.5, show.legend=F) +
facet_wrap(~facet.var) +
scale_fill_manual( values=c( 'lightblue','gray'), name='Background dots', guide=guide_legend(override.aes = list(color=c('lightblue', 'gray')))) +
scale_color_manual(values=c( 'blue', 'gray40') , name='Highlighted dots') +
theme(axis.title = element_blank())+
theme(legend.position="top")+
scale_x_discrete(drop=F)

How to add multiple geom_hlines with color equal to grouping variable

I've created a grouped boxplot and added three specific geom_hlines to the plot. However, I want to set the hline colors to fill=factor(Training.Location), rather than trying to match the colors manually with a color palette. Is there a way to do this?
ggplot(aes(x=factor(CumDes),y=Mn_Handle), data=NH_C) +
geom_boxplot( aes(fill=factor(Training.Location))) +
geom_point( aes(color=factor(Training.Location)),
position=position_dodge(width=0.75) ) +
theme(axis.ticks = element_blank(), axis.text.x = element_blank()) +
coord_cartesian(ylim = c(0, 2000)) +
geom_hline(yintercept=432, linetype="dashed", lwd=1.2) +
geom_hline(yintercept=583, linetype="dashed", lwd=1.2) +
geom_hline(yintercept=439, linetype="dashed", lwd=1.2)
This is the sort of thing that seems easiest with a new dataset. I'm not sure how you are calculating the values you are using for the horizontal lines, but often times I want to calculate these from the original dataset and use some sort of aggregation function/package for that.
Here is a modified example from the help page for geom_hline.
Make the dataset to give to geom_hline, including the values for the horizontal lines as well as the grouping variable.
mean_wt = data.frame(cyl = c(4, 6, 8), wt = c(2.28, 3.11, 4.00))
Then just plot with the new dataset for that layer, using whatever aesthetic you wish with the grouping variable.
ggplot(mtcars, aes(x = factor(vs), wt) ) +
geom_boxplot(aes(fill = factor(cyl))) +
geom_point(aes(color = factor(cyl)), position = position_dodge(.75)) +
geom_hline(data = mean_wt, aes(yintercept = wt, color = factor(cyl)) )
Here's a somewhat hackish solution (I had to improvise on the data, feel free to improve)
# install.packages("ggplot2", dependencies = TRUE)
library(ggplot2)
col <- c("#CC6666", "#9999CC", "#66CC99")
ggplot(mtcars, aes(x = factor(cyl), y=mpg)) +
geom_boxplot(aes(fill=gear)) +
geom_point( aes(color=factor(gear)),
position=position_dodge(width=0.75) ) +
scale_colour_manual(values= col) +
theme(axis.ticks = element_blank(), axis.text.x = element_blank()) + coord_cartesian(ylim = c(8, 35)) +
geom_hline(yintercept=12, linetype="dashed", lwd=1.2, color=col[1]) +
geom_hline(yintercept=18, linetype="dashed", lwd=1.2, color=col[2]) +
geom_hline(yintercept=28, linetype="dashed", lwd=1.2, color=col[3])

Resources