R ggplot2 overlapping histogram, adding in legend for overlapping part - r

I have a histogram that is plotting 2 different groups with some overlap between them. I have been able to manually color the groups and a legend is generated for each group, however I am asking how to add into the legend a color and label for the overlapping part?
For example, in the above histogram I would like to add a legend for the purplish part where A and B overlap (which should be labeled as "Overlap" in the legend, underneath B).
Code for generating above histogram:
set.seed(42)
n <- 100
dat <- data.frame(id=1:n,
group=rep(LETTERS[1:2], n/2),
x=rnorm(n))
ggplot(dat, aes(x=x, fill=group)) + geom_histogram(alpha=.5, position="identity") +
scale_fill_manual(values=c("blue","red"))

A partially overlap solution
Sample code:
library(ggplot2)
ggplot(dat, aes(x=x, fill=group)) +
geom_histogram(position = position_dodge(width = 0.6))+
scale_fill_manual(values=c("blue","red"))+
scale_y_continuous(expand=c(0,0))+
theme_bw()
Plot:

Related

Is it possible to make a column plot using ggplot in which the column fill is controlled by a third variable?

I have a data frame with three continuous variables (x,y,z). I want a column plot in which x defines the x-axis position of the columns, y defines the length of the columns, and the column colors (function of y) are defined by z. The test code below shows the set up.
`require(ggplot2)
require(viridis)
# Create a dummy data frame
x <- c(rep(0.0, 5),rep(0.5,10),rep(1.0,15))
y <- c(seq(0.0,-5,length.out=5),
seq(0.0,-10,length.out=10),
seq(0.0,-15,length.out=15))
z <- c(seq(10,0,length.out=5),
seq(8,0,length.out=10),
seq(6,0,length.out=15))
df <- data.frame(x=x, y=y, z=z)
pbase <- ggplot(df, aes(x=x, y=y, fill=z))
ptest <- pbase + geom_col(width=0.5, position="identity") +
scale_fill_viridis(option="turbo",
limits = c(0,10),
breaks=seq(0,10,2.5),
labels=c("0","2.5","5.0","7.5","10.0"))
print(ptest)`
The legend has the correct colors but the columns do not. Perhaps this is not the correct way to do this type of plot. I tried using geom_bar() which creates a bars with the correct colors but the y-values are incorrect.
It looks like you have 3 X values that each appear 5, 10, or 15 times. Do you want the bars to be overlaid on top of one another, as they are now? If you add an alpha = 0.5 to the geom_col call you'll see the overlapping bars.
Alternatively, you might use dodging to show the bars next to one another instead of on top of one another.
ggplot(df, aes(x=x, y=y, fill=z, group = z)) +
geom_col(width=0.5, position=position_dodge()) +
scale_fill_viridis_c(option="turbo", # added with ggplot 3.x in 2018
limits = c(0,10),
breaks=seq(0,10,2.5),
labels=c("0","2.5","5.0","7.5","10.0"))
Or you might plot the data in order of y so that the smaller bars appear on top, visibly:
ggplot(dplyr::arrange(df,y), aes(x=x, y=y, fill=z))+
geom_col(width=0.5, position="identity") +
scale_fill_viridis_c(option="turbo",
limits = c(0,10),
breaks=seq(0,10,2.5),
labels=c("0","2.5","5.0","7.5","10.0"))
I solved this by using geom_tile() in place of geom_col().

ggplot2: Varying facet width with independent `Y` axes

Dummy data
d = data.frame(
x = factor(LETTERS[c(1,2,3,4,1,2,3,4,1,2,1,2,1,2,1,2)]),
y = c(100,80,70,60,130,90,65,60,2,3,3,3,2,2,1,2),
grid = rep(letters[1:2], each=8)
)
Issue
ggplot(d, aes(x=x, y=y)) + facet_grid(~grid, scales="free",space="free_x") + geom_point()
I like this graph. My only issue is that both grids use the same Y axis. So, I tried using facet_wrap instead of facet_grid and got
ggplot(d, aes(x=x, y=y)) + facet_wrap(~grid, scales="free") + geom_point()
But unfortunately, facet_wrap does not have a "space" parameter and as a result the right and the left graph are of the same width.
Question
How can I do so that the space between levels of the variable d$x is equal among both facets (leading to facets having different width) AND to have a separate Y axis for each facet. Of course, I would like to keep the facets to be aligned horizontally.
Use ggplot grob and modify the widths in the table
# Capture the plot
q = ggplot(d, aes(x=x, y=y)) + facet_grid(~grid, scales="free",space="free_x") + geom_point()
gt = ggplotGrob(q)
# Modify the widths
gt$widths[5] = unit(8, "cm")
gt$widths[9] = unit(4, "cm")
# Plot the graph
grid.newpage()
grid.draw(gt)

Add label to abline ggplot2 [duplicate]

I'd like to label a horizontal line on a ggplot with multiple series, without associating the line with a series. R ggplot2: Labelling a horizontal line on the y axis with a numeric value asks about the single-series case, for which geom_text solves. However, geom_text associates the label with one of the series via color and legend.
Consider the same example from that question, with another color column:
library(ggplot2)
df <- data.frame(y=1:10, x=1:10, col=c("a", "b")) # Added col
h <- 7.1
plot1 <- ggplot(df, aes(x=x, y=y, color=col)) + geom_point()
plot2 <- plot1 + geom_hline(aes(yintercept=h))
# Applying top answer https://stackoverflow.com/a/12876602/1840471
plot2 + geom_text(aes(0, h, label=h, vjust=-1))
How can I label the line without associating the label to one of the series?
Is this what you had in mind?
library(ggplot2)
df <- data.frame(y=1:10, x=1:10, col=c("a", "b")) # Added col
h <- 7.1
ggplot(df, aes(x=x,y=y)) +
geom_point(aes(color=col)) +
geom_hline(yintercept=h) +
geom_text(data=data.frame(x=0,y=h), aes(x, y), label=h, vjust=-1)
First, you can make the color mapping local to the points layer. Second, you do not have to put all the aesthetics into calls to aes(...) - only those you want mapped to columns of the dataset. Three, you can have layer-specific datasets using data=... in the calls to a specific geom_*.
You can use annotate instead:
plot2 + annotate(geom="text", label=h, x=1, y=h, vjust=-1)
Edit: Removed drawback that x is required, since that's also true of geom_text.

R ggplot2: Labeling a horizontal line without associating the label with a series

I'd like to label a horizontal line on a ggplot with multiple series, without associating the line with a series. R ggplot2: Labelling a horizontal line on the y axis with a numeric value asks about the single-series case, for which geom_text solves. However, geom_text associates the label with one of the series via color and legend.
Consider the same example from that question, with another color column:
library(ggplot2)
df <- data.frame(y=1:10, x=1:10, col=c("a", "b")) # Added col
h <- 7.1
plot1 <- ggplot(df, aes(x=x, y=y, color=col)) + geom_point()
plot2 <- plot1 + geom_hline(aes(yintercept=h))
# Applying top answer https://stackoverflow.com/a/12876602/1840471
plot2 + geom_text(aes(0, h, label=h, vjust=-1))
How can I label the line without associating the label to one of the series?
Is this what you had in mind?
library(ggplot2)
df <- data.frame(y=1:10, x=1:10, col=c("a", "b")) # Added col
h <- 7.1
ggplot(df, aes(x=x,y=y)) +
geom_point(aes(color=col)) +
geom_hline(yintercept=h) +
geom_text(data=data.frame(x=0,y=h), aes(x, y), label=h, vjust=-1)
First, you can make the color mapping local to the points layer. Second, you do not have to put all the aesthetics into calls to aes(...) - only those you want mapped to columns of the dataset. Three, you can have layer-specific datasets using data=... in the calls to a specific geom_*.
You can use annotate instead:
plot2 + annotate(geom="text", label=h, x=1, y=h, vjust=-1)
Edit: Removed drawback that x is required, since that's also true of geom_text.

Plotting continuous and discrete series in ggplot with facet

I have data that plots over time with four different variables. I would like to combine them in one plot using facet_grid, where each variable gets its own sub-plot. The following code resembles my data and the way I'm presenting it:
require(ggplot2)
require(reshape2)
subm <- melt(economics, id='date', c('psavert','uempmed','unemploy'))
mcsm <- melt(data.frame(date=economics$date, q=quarters(economics$date)), id='date')
mcsm$value <- factor(mcsm$value)
ggplot(subm, aes(date, value, col=variable, group=1)) + geom_line() +
facet_grid(variable~., scale='free_y') +
geom_step(data=mcsm, aes(date, value)) +
scale_y_discrete(breaks=levels(mcsm$value))
If I leave out scale_y_discrete, R complains that I'm trying to combine discrete value with continuous scale. If I include scale_y_discreate my continuous series miss their scale.
Is there any neat way of solving this issue ie. getting all scales correct ? I also see that the legend is alphabetically sorted, can I change that so the legend is ordered in the same order as the sub-plots ?
Problem with your data is that that for data frame subm value is numeric (continuous) but for the mcsm value is factor (discrete). You can't use the same scale for numeric and continuous values and you get y values only for the last facet (discrete). Also it is not possible to use two scale_y...() functions in one plot.
My approach would be to make mcsm value as numeric (saved as value2) and then use them - it will plot quarters as 1,2,3 and 4. To solve the problem with legend, use scale_color_discrete() and provide breaks= in order you need.
mcsm$value2<-as.numeric(mcsm$value)
ggplot(subm, aes(date, value, col=variable, group=1)) + geom_line()+
facet_grid(variable~., scale='free_y') + geom_step(data=mcsm, aes(date, value2)) +
scale_color_discrete(breaks=c('psavert','uempmed','unemploy','q'))
UPDATE - solution using grobs
Another approach is to use grobs and library gridExtra to plot your data as separate plots.
First, save plot with all legends and data (code as above) as object p. Then with functions ggplot_build() and ggplot_gtable() save plot as grob object gp. Extract from gp only part that plots legend (saved as object gp.leg) - in this case is list element number 17.
library(gridExtra)
p<-ggplot(subm, aes(date, value, col=variable, group=1)) + geom_line()+
facet_grid(variable~., scale='free_y') + geom_step(data=mcsm, aes(date, value2)) +
scale_color_discrete(breaks=c('psavert','uempmed','unemploy','q'))
gp<-ggplot_gtable(ggplot_build(p))
gp.leg<-gp$grobs[[17]]
Make two new plot p1 and p2 - first plots data of subm and second only data of mcsm. Use scale_color_manual() to set colors the same as used for plot p. For the first plot remove x axis title, texts and ticks and with plot.margin= set lower margin to negative number. For the second plot change upper margin to negative number. faced_grid() should be used for both plots to get faceted look.
p1 <- ggplot(subm, aes(date, value, col=variable, group=1)) + geom_line()+
facet_grid(variable~., scale='free_y')+
theme(plot.margin = unit(c(0.5,0.5,-0.25,0.5), "lines"),
axis.text.x=element_blank(),
axis.title.x=element_blank(),
axis.ticks.x=element_blank())+
scale_color_manual(values=c("#F8766D","#00BFC4","#C77CFF"),guide="none")
p2 <- ggplot(data=mcsm, aes(date, value,group=1,col=variable)) + geom_step() +
facet_grid(variable~., scale='free_y')+
theme(plot.margin = unit(c(-0.25,0.5,0.5,0.5), "lines"))+ylab("")+
scale_color_manual(values="#7CAE00",guide="none")
Save both plots p1 and p2 as grob objects and then set for both plots the same widths.
gp1 <- ggplot_gtable(ggplot_build(p1))
gp2 <- ggplot_gtable(ggplot_build(p2))
maxWidth = grid::unit.pmax(gp1$widths[2:3],gp2$widths[2:3])
gp1$widths[2:3] <- as.list(maxWidth)
gp2$widths[2:3] <- as.list(maxWidth)
With functions grid.arrange() and arrangeGrob() arrange both plots and legend in one plot.
grid.arrange(arrangeGrob(arrangeGrob(gp1,gp2,heights=c(3/4,1/4),ncol=1),
gp.leg,widths=c(7/8,1/8),ncol=2))

Resources