Add secondary axis to stacked bar chart

Add secondary axis to stacked bar chart - r

I have this code which creates the plot below
sec_axis_data <- mpg %>%
group_by(manufacturer) %>%
summarise(entries = n())
p <- ggplot(mpg, aes(x = manufacturer, fill = class == "compact")) +
geom_bar(position = "fill") +
scale_fill_manual(values = c('blue', 'red')) +
scale_y_continuous(sec.axis = sec_axis(~. * 50))
p
However, I'm not sure how to get the secondary axis data to display properly as a line across the plot? When, for example, I try:
p <- ggplot(mpg, aes(x = manufacturer, fill = class == "compact")) +
geom_bar(position = "fill") +
scale_fill_manual(values = c('blue', 'red')) +
scale_y_continuous(sec.axis = sec_axis(~. * 50)) +
geom_line(data = sec_axis_data, aes(x = manufacturer, y = entries))
p
... I get an error. I think the issue is linked to the different lengths of the data for mpg and sec_axis_data, but I'm not sure how to resolve this.

You were quite close of the solution.
You need to add inherit.aes = FALSE because of the fill argument not find in your second dataframe.
Also, to set the appropiate value, you need to divide your "entries" values by the same ratio you used for building the second axis in sec.axis function:
library(ggplot2)
ggplot(mpg, aes(x = manufacturer, fill = class == "compact")) +
geom_bar(position = "fill", alpha = 0.5) +
scale_fill_manual(values = c('blue', 'red')) +
scale_y_continuous(sec.axis = sec_axis(~. * 50, name = "Second axis")) +
geom_line(inherit.aes = FALSE, data = sec_axis_data,
aes(x = manufacturer, y = entries/50, group = 1), size = 2)
Does it answer your question ?

Related

mean line in every facet_wrap

ggplot(data = results, aes(x = inst, y = value, group = inst)) +
geom_boxplot() +
facet_wrap(~color) +
#geom_line(data = mean,
#mapping = aes(x = inst, y = average, group = 1))
theme_bw()
When I run the code above with the code line commented, it runs and gives the graph below but I want a joining mean lines on the boxplots based on its own color category for each group in facet wraps. Any ideas how can I do that?

Your code is generally correct (though you'll want to add color = color to the aes() specification in geom_line()), so I suspect your mean dataset isn't set up correctly. Do you have means grouped by both your x axis and faceting variable? Using ggplot2::mpg as an example:
library(dplyr) # >= v1.1.0
library(ggplot2)
mean_dat <- summarize(mpg, average = mean(hwy), .by = c(cyl, drv))
ggplot(mpg, aes(factor(cyl), hwy)) +
geom_boxplot() +
geom_line(
data = mean_dat,
aes(y = average, group = 1, color = drv),
linewidth = 1.5,
show.legend = FALSE
) +
facet_wrap(~drv) +
theme_bw()
Alternatively, you could use stat = "summary" and not have to create a means dataframe at all:
ggplot(mpg, aes(factor(cyl), hwy)) +
geom_boxplot() +
geom_line(
aes(group = 1, color = drv),
stat = "summary",
linewidth = 1.5,
show.legend = FALSE
) +
facet_wrap(~drv) +
theme_bw()
# same result as above

Make a line separated by group in bar chart

I am trying to overlay two sets of data that with be used in bar charts. The first is the main set of data and I want that to be the main focus. For the second dataset I want just a line marking where on the chart it would be. I can get close to what I want by doing this:
Tbl = data.frame(Factor = c("a","b","c","d"),
Percent = c(43,77,37,55))
Tbl2 = data.frame(Factor = c("a","b","c","d"),
Percent = c(58,68,47,63))
ggplot(aes(x = Factor, y = Percent), data = Tbl) +
geom_bar(position = "stack", stat = "identity", fill = "blue") +
ylim(0,100) +
geom_bar(aes(x = Factor, y = Percent), data = Tbl2,
position = "stack", stat = "identity", fill = NA, color = "black") +
theme_bw()
What I have so far
I believe I can accomplish what I want by using geom_vline if there is a way to separate it by groups. Another option I came up with is if it is possible to change the colors of the "sides" of the bars in the overlay to white while keeping the "top" of each bar chart as black.
An idea of what I want (Edited in paint)

You could use geom_errorbar where the ymin and ymax are the same values like this:
library(ggplot2)
ggplot(aes(x = Factor, y = Percent), data = Tbl) +
geom_bar(position = "stack", stat = "identity", fill = "blue") +
ylim(0,100) +
geom_errorbar(aes(x = Factor, ymin = Percent, ymax = Percent), data = Tbl2,
stat = "identity", color = "black") +
theme_bw()
Created on 2022-12-28 with reprex v2.0.2

Another option is geom_point with shape = 95 (line) and the size adjusted to suit:
library(tidyverse)
Tbl = data.frame(Factor = c("a","b","c","d"),
Percent = c(43,77,37,55))
Tbl2 = data.frame(Factor = c("a","b","c","d"),
Percent = c(58,68,47,63))
ggplot(aes(x = Factor, y = Percent), data = Tbl) +
geom_bar(position = "stack", stat = "identity", fill = "blue") +
ylim(0,100) +
geom_point(aes(x = Factor, y = Percent), data = Tbl2,
position = "stack", stat = "identity", color = "black", shape = 95, size = 30) +
theme_bw()
Created on 2022-12-28 with reprex v2.0.2

Here is one more using geom_segment(). Some would say to fancy, but anyway:
For this we have to extend Tbl2:
library(ggplot2)
library(dplyr)
ggplot(aes(x = Factor, y = Percent), data = Tbl) +
geom_bar(position = "stack", stat = "identity", fill = "blue") +
ylim(0,100) +
geom_segment(data = Tbl2 %>%
mutate(x = c(0.55, 1.55, 2.55, 3.55),
xend = x+0.9), aes(x=x,xend=xend, y = Percent, yend=Percent), size=2)+
theme_bw()

How to graph two different columns on one ggplot?

I am trying to plot one column by Date (different color points for each animal category) and on the same graph, plot a second column by Date as well. The second column has entries for the days but only for certain categories, Large Dog. There is no adoption_with_discount for small or medium dogs (please see the reproducible example data set, example_data). When I plot them separately they visualize fine but not when plotted together. I thought I would just overlay a separate geom but that is not working.
I want to combine the two plots into one. My goal is for the points plot to have the line graph on top of it. I am trying to visualize the adoption as points colored by animal and put a line on the same graph of adoption_with_discount.
Thank you for your help!
# Make example -----------------------------------------------------------
# Here is an example data set
# You can see in the `adoption_with_discount` the values I want to add as a line.
library(lubridate)
library(tidyverse)
example_days <- data.frame(Date = c(seq.Date(from = as.Date('2022-03-01'), to = as.Date('2022-04-30'), by = 'days')))
example_small <-
example_days %>%
mutate(animal = "Small Dog")
a <-sample(100:150, nrow(example_small), rep = TRUE)
example_small <-
example_small %>%
mutate(adoption = a,
adoption_with_discount = NA)
example_med <-
example_days %>%
mutate(animal = "Medium Dog")
b <-sample(150:180, nrow(example_med), rep = TRUE)
example_med <-
example_med %>%
mutate(adoption = b,
adoption_with_discount = NA)
example_large <-
example_days %>%
mutate(animal = "Large Dog")
c <-sample(150:200, nrow(example_large), rep = TRUE)
example_large <-
example_large %>%
mutate(adoption = c)
example_large <-
example_large %>%
mutate(adoption_with_discount = adoption - 15)
example_data <- rbind(example_small, example_med, example_large)
# Plot --------------------------------------------------------------------
ggplot(data = example_data) +
geom_point(mapping = aes(x = Date,
y = adoption,
color = animal)) +
ggtitle("Dog Adoption by Size") +
labs(x = "Date", y = "Adoption Fee") +
scale_y_continuous(labels = scales::dollar) +
theme(axis.text.x = element_text(angle = 45))
# Plot with Fee -----------------------------------------------------------
# This is where the problem is occurring
# When I want to add a line that plots the adoption with discount by day
# on top of the points, it does not populate.
ggplot(data = example_data) +
geom_point(mapping = aes(x = Date,
y = adoption,
color = animal)) +
geom_line(mapping = aes(x = Date,
y = adoption_with_discount),
color = "black") +
ggtitle("Dog Adoption by Size with Discount Included") +
labs(x = "Date", y = "Adoption Fee") +
scale_y_continuous(labels = scales::dollar) +
theme(axis.text.x = element_text(angle = 45))
# See if just Discount will Plot -----------------------------------------
#This plots separately
ggplot(data = example_large) +
geom_line(mapping = aes(x = Date,
y = adoption_with_discount),
color = "black") +
ggtitle("Discount") +
labs(x = "Date", y = "Adoption Fee") +
scale_y_continuous(labels = scales::dollar) +
theme(axis.text.x = element_text(angle = 45))

While subsetting is an option to fix the issue, the reason why no line is plotted is simply the missing grouping, i.e. in geom_line you are trying to plot observations for all three dog types as one group or line. However, because of the NAs no line will show up. An easy option to solve that would be to explicitly map animal on the group aes. Additionally I added na.rm=TRUE to silent the warning about removed NAs. Finally I right aligned your axis labels by adding hjust=1:
library(ggplot2)
ggplot(data = example_data) +
geom_point(mapping = aes(
x = Date,
y = adoption,
color = animal
)) +
geom_line(
mapping = aes(
x = Date,
y = adoption_with_discount,
group = animal
),
color = "black",
na.rm = TRUE
) +
ggtitle("Dog Adoption by Size with Discount Included") +
labs(x = "Date", y = "Adoption Fee") +
scale_y_continuous(labels = scales::dollar) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))

Based on discussion here I found that you can use subset argument in the aes of geom_line to select values that are not NAs in adoption_with_discount column.
ggplot(data = example_data) +
geom_point(mapping = aes(x = Date,
y = adoption,
color = animal)) +
geom_line(mapping = aes(x = Date,
y = adoption_with_discount),
color = "black") +
ggtitle("Dog Adoption by Size with Discount Included") +
labs(x = "Date", y = "Adoption Fee") +
scale_y_continuous(labels = scales::dollar) +
theme(axis.text.x = element_text(angle = 45)) +
geom_line(mapping = aes(x = Date,
y = adoption_with_discount,
subset = !is.na(adoption_with_discount)),
color = "black") +
ggtitle("Discount") +
labs(x = "Date", y = "Adoption Fee") +
scale_y_continuous(labels = scales::dollar) +
theme(axis.text.x = element_text(angle = 45))
The result:

It looks like it is the NA that are included in the geom_line portion that is creating the issue so you can filter those out before plotting the line:
geom_point(mapping = aes(x = Date,
y = adoption,
color = animal)) +
geom_line(data=example_data %>% filter(!is.na(adoption_with_discount)),
mapping = aes(x = Date,
y = adoption_with_discount),
color = "black") +
ggtitle("Dog Adoption by Size with Discount Included") +
labs(x = "Date", y = "Adoption Fee") +
scale_y_continuous(labels = scales::dollar) +
theme(axis.text.x = element_text(angle = 45))

geom_histogram with proportions and factor data

I'm trying to consistently plot histograms for zonal statistics from a thematic map. The data within a single zone often looks something like this:
dat <- data.frame("CLASS" = sample(LETTERS[1:6], 250, replace = TRUE,
prob = c(.15, .06, .35, .4, .02, 0)))
dat$CLASS <- factor(dat$CLASS, levels = LETTERS[1:6], ordered = T)
wherein not all possible classes may have been present in the zone.
I can pre-compute the data summary and use geom_bar and a manual colour scale to get consistent bar colours regardless of missing data:
library(dplyr)
library(ggplot2)
library(viridis)
dat_summ <- dat %>%
group_by(CLASS, .drop = FALSE) %>%
summarise(percentage = n() / nrow(.) * 100)
mancols <- viridis_pal()(6)
names(mancols) <- LETTERS[1:6]
ggplot(dat_summ) +
geom_bar(aes(x = CLASS, y = percentage, fill = CLASS),
stat = 'identity', show.legend = FALSE) +
scale_x_discrete(drop = FALSE) +
scale_fill_manual(values = mancols, drop = FALSE) +
labs(x = 'Class', y = 'Percent') +
theme_minimal() +
theme(panel.grid.minor = element_blank())
But I can't keep the colours consistent across plots when I try to use geom_histogram:
ggplot(dat) +
geom_histogram(aes(x = CLASS,
y = (..count../sum(..count..)) * 100,
fill = ..x..), stat = 'count', show.legend = FALSE) +
scale_x_discrete(drop = FALSE) +
scale_fill_viridis_c() +
labs(x = 'Class', y = 'Percent') +
theme_minimal() +
theme(panel.grid.minor = element_blank())
If any of the outside-edge columns (A, F) are count = 0, the colours rescale to where data is present. This doesn't happen if there's a gap in one of the middle classes. Using scale_fill_viridis_b() doesn't solve the problem - it always rescales the palette against the number of non-0 columns.
Is it possible to prevent this behaviour and output consistent colours no matter which columns are count = 0, or am I stuck with my geom_bar approach?

Maybe scale_fill_discrete/scale_fill_viridis_d(drop = F) is what you want (with fill = CLASS).
ggplot(dat) +
geom_histogram(aes(x = CLASS,
y = (..count../sum(..count..)) * 100,
fill = CLASS), stat = 'count', show.legend = FALSE) +
scale_x_discrete(drop = FALSE) +
scale_fill_viridis_d(drop = FALSE) +
labs(x = 'Class', y = 'Percent') +
theme_minimal() +
theme(panel.grid.minor = element_blank())

I think that the problem is that you pass the calculated variable ..x.. to fill in the aesthetics. It appears the length of this variable changes with your data set. You could replace it with scale_fill_manual and you will get the same plot colours regardless of how many levels there are in your CLASS variable:
ggplot(dat) +
geom_histogram(aes(x = CLASS, y = stat(count/sum(count) * 100), fill = CLASS), stat = 'count', show.legend = FALSE) +
scale_x_discrete(drop = FALSE) +
scale_fill_manual(values = c("#FF0000FF", "#CCFF00FF", "#00FF66FF", "#0066FFFF", "#CC00FFFF", "#FF99FFFF"))
labs(x = 'Class', y = 'Percent') +
theme_minimal() +
theme(panel.grid.minor = element_blank())

Ordering of multiple legends/guides (what is the automatic logic & how to change it?)

I stumbled onto this weird behavior with ggplot2s ordering of legends and just can't figure out what the logic behind the automatic placement of the legends is:
My aim: In a plot with multiple scales I want to arrange them in a different (thematic) order than the automatic one. But I couldn't find a command in opts() or guides() to do this for me. And just to be clear: I don't want to change the items within the legends, that works fine, but the arrangement of multiple complete legends.
So first I assumed they were ordered by type, i.e. scale, colour etc. But that is not the case, their position changes (see below).
Alphabetical order? No.
library(ggplot2) ## v0.9
## Scale_colour on top
qplot(data = mpg,x = displ, y = cty, size = hwy, colour = class) +
scale_size(name = "A") + scale_colour_discrete(name = "B")
## Reverse names --> scale_colour on bottom
qplot(data = mpg,x = displ, y = cty, size = hwy, colour = class) +
scale_size(name = "B") + scale_colour_discrete(name = "A")
## Change name B to C --> scale_colour on bottom
qplot(data = mpg,x = displ, y = cty, size = hwy, colour = class) +
scale_size(name = "C") + scale_colour_discrete(name = "A")
## Change name B to D --> scale_colour on top
qplot(data = mpg,x = displ, y = cty, size = hwy, colour = class) +
scale_size(name = "D") + scale_colour_discrete(name = "A")
Further positions of scale_colour (for exchanged scale_size name)
"E": bottom
"F" - "L": top
"M" - "N": bottom
and it continues to appear on top an at the bottom.
Factorial order? No.
## From top to bottom: C - B - A
fname <- factor(c("A","B","C"), levels = c("A","B","C"))
qplot(data = mpg,x = displ, y = cty, size = hwy, colour = class, alpha = cyl) +
scale_size(name = fname[1]) + scale_colour_discrete(name = fname[2]) + scale_alpha(name=fname[3])
## From top to bottom: B - C - A
fname <- factor(c("A","B","C"), levels = c("C","B","A"))
qplot(data = mpg,x = displ, y = cty, size = hwy, colour = class, alpha = cyl) +
scale_size(name = fname[1]) + scale_colour_discrete(name = fname[2]) + scale_alpha(name=fname[3])
## From top to bottom: B - C - A
fname <- factor(c("A","B","C"), levels = c("B","C","A"))
qplot(data = mpg,x = displ, y = cty, size = hwy, colour = class, alpha = cyl)+
scale_size(name = fname[1]) + scale_colour_discrete(name = fname[2]) + scale_alpha(name=fname[3])
Length of title? No.
I'll stop for now with example code, but that one also yielded fluctuating orders independent of character length.
Any ideas?

As I mentioned in the comment above, there is no way to control and predict the position of legend box.
I wasn't aware of this problem. Thank you for making clear this.
Maybe some people need to control the legend box, here I put a quick fix:
# run this code before calling ggplot2 function
guides_merge <- function(gdefs) {
gdefs <- lapply(gdefs, function(g) { g$hash <- paste(g$order, g$hash, sep = "z"); g})
tapply(gdefs, sapply(gdefs, function(g)g$hash), function(gs)Reduce(guide_merge, gs))
}
environment(guides_merge) <- environment(ggplot)
assignInNamespace("guides_merge", guides_merge, pos = "package:ggplot2")
and then you can use order argument for guide_legend (and also guide_colorbar),
# specify the order of the legend.
qplot(data = mpg,x = displ, y = cty, size = hwy, colour = class, alpha = cyl)+
guides(size = guide_legend(order = 1), colour = guide_legend(order = 2), alpha = guide_legend(order = 3))
qplot(data = mpg,x = displ, y = cty, size = hwy, colour = class, alpha = cyl)+
guides(size = guide_legend(order = 3), colour = guide_legend(order = 1), alpha = guide_legend(order = 2))
order argument should be a positive integer. The legends are arranged along the order.
Note that this is a quick fix so the interface may be changed in the next official version of ggplot2.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Add secondary axis to stacked bar chart - r

Related

mean line in every facet_wrap

Make a line separated by group in bar chart

How to graph two different columns on one ggplot?

geom_histogram with proportions and factor data

Ordering of multiple legends/guides (what is the automatic logic & how to change it?)

Categories

Resources