Line plot with bars in secondary axis with different scales in ggplot2 - r

I'm trying to plot a line graph (data points between 0 and 2.5, with interval of 0.5). I want to plot some bars in the same chart on the right-hand axis (between 0 and 60 with interval of 10). I am making some mistake in my code such that the bars get plotted in the left hand axis.
Here's some sample data and code:
Month <- c("J","F","M","A")
Line <- c(2.5,2,0.5,3.4)
Bar <- c(30,33,21,40)
df <- data.frame(Month,Line,Bar)
ggplot(df, aes(x=Month)) +
geom_line(aes(y = Line,group = 1)) +
geom_col(aes(y=Bar))+
scale_y_continuous("Line",
sec.axis = sec_axis(trans= ~. /50, name = "Bar"))
Here's the output
Thanks in advance.

Try this approach with scaling factor. It is better if you work with a scaling factor between your variables and then you use it for the second y-axis. I have made slight changes to your code:
library(tidyverse)
#Data
Month <- c("J","F","M","A")
Line <- c(2.5,2,0.5,3.4)
Bar <- c(30,33,21,40)
df <- data.frame(Month,Line,Bar)
#Scale factor
sfactor <- max(df$Line)/max(df$Bar)
#Plot
ggplot(df, aes(x=Month)) +
geom_line(aes(y = Line,group = 1)) +
geom_col(aes(y=Bar*sfactor))+
scale_y_continuous("Line",
sec.axis = sec_axis(trans= ~. /sfactor, name = "Bar"))
Output:

Related

R: plotting a line and horizontal barplot on the same plot

I am trying to combine a line plot and horizontal barplot on the same plot. The difficult part is that the barplot is actually counts of the y values of the line plot.
Can someone show me how this can be done using the example below ?
library(ggplot2)
library(plyr)
x <- c(1:100)
dff <- data.frame(x = x,y1 = sample(-500:500,size=length(x),replace=T), y2 = sample(3:20,size=length(x),replace=T))
counts <- ddply(dff, ~ y1, summarize, y2 = sum(y2))
# line plot
ggplot(data=dff) + geom_line(aes(x=x,y=y1))
# bar plot
ggplot() + geom_bar(data=counts,aes(x=y1,y=y2),stat="identity")
I believe what I need is presented in the pseudocode below but I do not know how to write it out in R.
Apologies. I actually meant the secondary x axis representing the value of counts for the barplot, while primary y-axis is the y1.
ggplot(data=dff) + geom_line(aes(x=x,y=y1)) + geom_bar(data=counts , aes(primary y axis = y1,secondary x axis =y2),stat="identity")
I just want the barplots to be plotted horizontally, so I tried the code below which flip both the line chart and barplot, which is also not I wanted.
ggplot(data=dff) +
geom_line(aes(x=x,y=y1)) +
geom_bar(data=counts,aes(x=y2,y=y1),stat="identity") + coord_flip()
You can combine two plots in ggplot like you want by specifying different data = arguments in each geom_ layer (and none in the original ggplot() call).
ggplot() +
geom_line(data=dff, aes(x=x,y=y1)) +
geom_bar(data=counts,aes(x=y1,y=y2),stat="identity")
The following plot is the result. However, since x and y1 have different ranges, are you sure this is what you want?
Perhaps you want y1 on the vertical axis for both plots. Something like this works:
ggplot() +
geom_line(data=dff, aes(x=y1 ,y = x)) +
geom_bar(data=counts,aes(x=y1,y=y2),stat="identity", color = "red") +
coord_flip()
Maybe you are looking for this. Ans based on your last code you look for a double axis. So using dplyr you can store the counts in the same dataframe and then plot all variables. Here the code:
library(ggplot2)
library(dplyr)
#Data
x <- c(1:100)
dff <- data.frame(x = x,y1 = sample(-500:500,size=length(x),replace=T), y2 = sample(3:20,size=length(x),replace=T))
#Code
dff %>% group_by(y1) %>% mutate(Counts=sum(y2)) -> dff2
#Scale factor
sf <- max(dff2$y1)/max(dff2$Counts)
# Plot
ggplot(data=dff2)+
geom_line(aes(x=x,y=y1),color='blue',size=1)+
geom_bar(stat='identity',aes(x=x,y=Counts*sf),fill='tomato',color='black')+
scale_y_continuous(name="y1", sec.axis = sec_axis(~./sf, name="Counts"))
Output:

ggplot2 - Create a stacked density plot with respect to the total sample size

Suppose we have two groups, "a" and "b", of different sample size.
n = 10000
set.seed(123)
dist1 = round(rnorm(n, mean = 1, sd=0.5), digits = 1)
dist2 = round(rnorm(n/10, mean = 2, sd = 0.2), digits = 1)
df = data.frame(group=c(rep("a", n), rep("b", n/10)), value=c(dist1,dist2))
I would like to translate the following stacked barplot to a stacked density plot.
library(ggplot2)
ggplot(data=df, aes(x=value, y=(..count..)/sum(..count..), fill=group)) +
geom_bar()
I know there is an option position="stack" for density plots. However, the result looks as follows, since the height of the density is with respect to the group sample size, not the total sample size. Hence, the small group is, in a way, overrepresented.
ggplot(data=df, aes(x=value, fill=group)) +
geom_density(position="stack")
Is there a way to create a density plot that corresponds to the above barplot?
Does just doing the same thing with the density chart as you did with the bar chart not give you what you're looking for?
ggplot(data=df, aes(x=value, fill=group)) +
geom_density( aes(y = ..count../sum(..count..)), position="stack", alpha=.7)
which gives
If you do a density plot, the y-axis is different from that you get from the first histogram, where your y-axis reflects the counts over total . To get something close to that, you can try below, where the histogram function is used to get the counts, converted and then stacked:
library(dplyr)
library(ggplot2)
RN =range(df$value)
df %>% group_by(group) %>%
do(data.frame(hist(.$value,breaks=seq(RN[1],RN[2],
length.out=40),plot=FALSE)[c("mids","counts")])) %>%
mutate(freq=counts/nrow(df)) %>%
ggplot(aes(x=mids,y=freq,col=group)) + geom_line(position="stack")

How to plot histograms of raw data on the margins of a plot of interpolated data

I would like to show in the same plot interpolated data and a histogram of the raw data of each predictor. I have seen in other threads like this one, people explain how to do marginal histograms of the same data shown in a scatter plot, in this case, the histogram is however based on other data (the raw data).
Suppose we see how price is related to carat and table in the diamonds dataset:
library(ggplot2)
p = ggplot(diamonds, aes(x = carat, y = table, color = price)) + geom_point()
We can add a marginal frequency plot e.g. with ggMarginal
library(ggExtra)
ggMarginal(p)
How do we add something similar to a tile plot of predicted diamond prices?
library(mgcv)
model = gam(price ~ s(table, carat), data = diamonds)
newdat = expand.grid(seq(55,75, 5), c(1:4))
names(newdat) = c("table", "carat")
newdat$predicted_price = predict(model, newdat)
ggplot(newdat,aes(x = carat, y = table, fill = predicted_price)) +
geom_tile()
Ideally, the histograms go even beyond the margins of the tileplot, as these data points also influence the predictions. I would, however, be already very happy to know how to plot a histogram for the range that is shown in the tileplot. (Maybe the values that are outside the range could just be added to the extreme values in different color.)
PS. I managed to more or less align histograms to the margins of the sides of a tile plot, using the method of the accepted answer in the linked thread, but only if I removed all kind of labels. It would be particularly good to keep the color legend, if possible.
EDIT:
eipi10 provided an excellent solution. I tried to modify it slightly to add the sample size in numbers and to graphically show values outside the plotted range since they also affect the interpolated values.
I intended to include them in a different color in the histograms at the side. I hereby attempted to count them towards the lower and upper end of the plotted range. I also attempted to plot the sample size in numbers somewhere on the plot. However, I failed with both.
This was my attempt to graphically illustrate the sample size beyond the plotted area:
plot_data = diamonds
plot_data <- transform(plot_data, carat_range = ifelse(carat < 1 | carat > 4, "outside", "within"))
plot_data <- within(plot_data, carat[carat < 1] <- 1)
plot_data <- within(plot_data, carat[carat > 4] <- 4)
plot_data$carat_range = as.factor(plot_data$carat_range)
p2 = ggplot(plot_data, aes(carat, fill = carat_range)) +
geom_histogram() +
thm +
coord_cartesian(xlim=xrng)
I tried to add the sample size in numbers with geom_text. I tried fitting it in the far right panel but it was difficult (/impossible for me) to adjust. I tried to put it on the main graph (which would anyway probably not be the best solution), but it didn’t work either (it removed the histogram and legend, on the right side and it did not plot all geom_texts). I also tried to add a third row of plots and writing it there. My attempt:
n_table_above = nrow(subset(diamonds, table > 75))
n_table_below = nrow(subset(diamonds, table < 55))
n_table_within = nrow(subset(diamonds, table >= 55 & table <= 75))
text_p = ggplot()+
geom_text(aes(x = 0.9, y = 2, label = paste0("N(>75) = ", n_table_above)))+
geom_text(aes(x = 1, y = 2, label = paste0("N = ", n_table_within)))+
geom_text(aes(x = 1.1, y = 2, label = paste0("N(<55) = ", n_table_below)))+
thm
library(egg)
pobj = ggarrange(p2, ggplot(), p1, p3,
ncol=2, widths=c(4,1), heights=c(1,4))
grid.arrange(pobj, leg, text_p, ggplot(), widths=c(6,1), heights =c(6,1))
I would be very happy to receive help on either or both tasks (adding sample size as text & adding values outside plotted range in a different color).
Based on your comment, maybe the best approach is to roll your own layout. Below is an example. We create the marginal plots as separate ggplot objects and lay them out with the main plot. We also extract the legend and put it outside the marginal plots.
Set-up
library(ggplot2)
library(cowplot)
# Function to extract legend
#https://github.com/hadley/ggplot2/wiki/Share-a-legend-between-two-ggplot2-graphs
g_legend<-function(a.gplot){
tmp <- ggplot_gtable(ggplot_build(a.gplot))
leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
legend <- tmp$grobs[[leg]]
return(legend) }
thm = list(theme_void(),
guides(fill=FALSE),
theme(plot.margin=unit(rep(0,4), "lines")))
xrng = c(0.6,4.4)
yrng = c(53,77)
Plots
p1 = ggplot(newdat, aes(x = carat, y = table, fill = predicted_price)) +
geom_tile() +
theme_classic() +
coord_cartesian(xlim=xrng, ylim=yrng)
leg = g_legend(p1)
p1 = p1 + thm[-1]
p2 = ggplot(diamonds, aes(carat)) +
geom_line(stat="density") +
thm +
coord_cartesian(xlim=xrng)
p3 = ggplot(diamonds, aes(table)) +
geom_line(stat="density") +
thm +
coord_flip(xlim=yrng)
plot_grid(
plot_grid(plotlist=list(p2, ggplot(), p1, p3), ncol=2,
rel_widths=c(4,1), rel_heights=c(1,4), align="hv", scale=1.1),
leg, rel_widths=c(5,1))
UPDATE: Regarding your comment about the space between the plots: This is an Achilles heel of plot_grid and I don't know if there's a way to fix it. Another option is ggarrange from the experimental egg package, which doesn't add so much space between plots. Also, you need to save the output of ggarrange first and then lay out the saved object with the legend. If you run ggarrange inside grid.arrange you get two overlapping copies of the plot:
# devtools::install_github('baptiste/egg')
library(egg)
pobj = ggarrange(p2, ggplot(), p1, p3,
ncol=2, widths=c(4,1), heights=c(1,4))
grid.arrange(pobj, leg, widths=c(6,1))

Different behavior between ggplot2 and plotly using ggplotly

I want to make a line chart in plotly so that it does not have the same color on its whole length. The color is given continuous scale. It is easy in ggplot2 but when I translate it to plotly using ggplotly function the variable determining color behaves like categorical variable.
require(dplyr)
require(ggplot2)
require(plotly)
df <- data_frame(
x = 1:15,
group = rep(c(1,2,1), each = 5),
y = 1:15 + group
)
gg <- ggplot(df) +
aes(x, y, col = group) +
geom_line()
gg # ggplot2
ggplotly(gg) # plotly
ggplot2 (desired):
plotly:
I found one work-around that, on the other hand, behaves oddly in ggplot2.
df2 <- df %>%
tidyr::crossing(col = unique(.$group)) %>%
mutate(y = ifelse(group == col, y, NA)) %>%
arrange(col)
gg2 <- ggplot(df2) +
aes(x, y, col = col) +
geom_line()
gg2
ggplotly(gg2)
I also did not find a way how to do this in plotly directly. Maybe there is no solution at all. Any ideas?
It looks like ggplotly is treating group as a factor, even though it's numeric. You could use geom_segment as a workaround to ensure that segments are drawn between each pair of points:
gg2 = ggplot(df, aes(x,y,colour=group)) +
geom_segment(aes(x=x, xend=lead(x), y=y, yend=lead(y)))
gg2
ggplotly(gg2)
Regarding #rawr's (now deleted) comment, I think it would make sense to have group be continuous if you want to map line color to a continuous variable. Below is an extension of the OP's example to a group column that's continuous, rather than having just two discrete categories.
set.seed(49)
df3 <- data_frame(
x = 1:50,
group = cumsum(rnorm(50)),
y = 1:50 + group
)
Plot gg3 below uses geom_line, but I've also included geom_point. You can see that ggplotly is plotting the points. However, there are no lines, because no two points have the same value of group. If we hadn't included geom_point, the graph would be blank.
gg3 <- ggplot(df3, aes(x, y, colour = group)) +
geom_point() + geom_line() +
scale_colour_gradient2(low="red",mid="yellow",high="blue")
gg3
ggplotly(gg3)
Switching to geom_segment gives us the lines we want with ggplotly. Note, however, that line color will be based on the value of group at the first point in the segment (whether using geom_line or geom_segment), so there might be cases where you want to interpolate the value of group between each (x,y) pair in order to get smoother color gradations:
gg4 <- ggplot(df3, aes(x, y, colour = group)) +
geom_segment(aes(x=x, xend=lead(x), y=y, yend=lead(y))) +
scale_colour_gradient2(low="red",mid="yellow",high="blue")
ggplotly(gg4)

Line up columns of bar graph with points of line plot with ggplot

Is there any way to line up the points of a line plot with the bars of a bar graph using ggplot when they have the same x-axis? Here is the sample data I'm trying to do it with.
library(ggplot2)
library(gridExtra)
data=data.frame(x=rep(1:27, each=5), y = rep(1:5, times = 27))
yes <- ggplot(data, aes(x = x, y = y))
yes <- yes + geom_point() + geom_line()
other_data = data.frame(x = 1:27, y = 50:76 )
no <- ggplot(other_data, aes(x=x, y=y))
no <- no + geom_bar(stat = "identity")
grid.arrange(no, yes)
Here is the output:
The first point of the line plot is to the left of the first bar, and the last point of the line plot is to the right of the last bar.
Thank you for your time.
Extending #Stibu's post a little: To align the plots, use gtable (Or see answers to your earlier question)
library(ggplot2)
library(gtable)
data=data.frame(x=rep(1:27, each=5), y = rep(1:5, times = 27))
yes <- ggplot(data, aes(x = x, y = y))
yes <- yes + geom_point() + geom_line() +
scale_x_continuous(limits = c(0,28), expand = c(0,0))
other_data = data.frame(x = 1:27, y = 50:76 )
no <- ggplot(other_data, aes(x=x, y=y))
no <- no + geom_bar(stat = "identity") +
scale_x_continuous(limits = c(0,28), expand = c(0,0))
gYes = ggplotGrob(yes) # get the ggplot grobs
gNo = ggplotGrob(no)
plot(rbind(gNo, gYes, size = "first")) # Arrange and plot the grobs
Edit To change heights of plots:
g = rbind(gNo, gYes, size = "first") # Combine the plots
panels <- g$layout$t[grepl("panel", g$layout$name)] # Get the positions for plot panels
g$heights[panels] <- unit(c(0.7, 0.3), "null") # Replace heights with your relative heights
plot(g)
I can think of (at least) two ways to align the x-axes in the two plots:
The two axis do not align because in the bar plot, the geoms cover the x-axis from 0.5 to 27.5, while in the other plot, the data only ranges from 1 to 27. The reason is that the bars have a width and the points don't. You can force the axex to align by explicitly specifying an x-axis range. Using the definitions from your plot, this can be achieved by
yes <- yes + scale_x_continuous(limits=c(0,28))
no <- no + scale_x_continuous(limits=c(0,28))
grid.arrange(no, yes)
limits sets the range of the x-axis. Note, though, that the alginment is still not quite perfect. The y-axis labels take up a little more space in the upper plot, because the numbers have two digits. The plot looks as follows:
The other solution is a bit more complicated but it has the advantage that the x-axis is drawn only once and that ggplot makes sure that the alignment is perfect. It makes use of faceting and the trick described in this answer. First, the data must be combined into a single data frame by
all <- rbind(data.frame(other_data,type="other"),data.frame(data,type="data"))
and then the plot can be created as follows:
ggplot(all,aes(x=x,y=y)) + facet_grid(type~.,scales = "free_y") +
geom_bar(data=subset(all,type=="other"),stat="identity") +
geom_point(data=subset(all,type=="data")) +
geom_line(data=subset(all,type=="data"))
The trick is to let the facets be constructed by the variable type which was used before to label the two data sets. But then each geom only gets the subset of the data that should be drawn with that specific geom. In facet_grid, I also used scales = "free_y" because the two y-axes should be independent. This plot looks as follows:
You can change the labels of the facets by giving other names when you define the data frame all. If you want to remove them alltogether, then add the following to your plot:
+ theme(strip.background = element_blank(), strip.text = element_blank())

Resources