I am trying to obtain a back-to-back bar plot (or pyramid plot) similar to the ones shown here:
Population pyramid with gender and comparing across two time periods with ggplot2
Basically, a pyramid plot of a quantitative variable whose values have to be displayed for combinations of three categorical variables.
library(ggplot2)
library(dplyr)
df <- data.frame(Gender = rep(c("M", "F"), each = 20),
Age = rep(c("0-10", "11-20", "21-30", "31-40", "41-50",
"51-60", "61-70", "71-80", "81-90", "91-100"), 4),
Year = factor(rep(c(2009, 2010, 2009, 2010), each= 10)),
Value = sample(seq(50, 100, 5), 40, replace = TRUE)) %>%
mutate(Value = ifelse(Gender == "F", Value *-1 , Value))
ggplot(df) +
geom_col(aes(fill = interaction(Gender, Year, sep = "-"),
y = Value,
x = Age),
position = "dodge") +
scale_y_continuous(labels = abs,
expand = c(0, 0)) +
scale_fill_manual(values = hcl(h = c(15,195,15,195),
c = 100,
l = 65,
alpha=c(0.4,0.4,1,1)),
name = "") +
coord_flip() +
facet_wrap(.~ Gender,
scale = "free_x",
strip.position = "bottom") +
theme_minimal() +
theme(legend.position = "bottom",
panel.spacing.x = unit(0, "pt"),
strip.background = element_rect(colour = "black"))
example of back-to-back barplot I want to mimick
Trying to mimick this example on my data, things go wrong from the first ggplot function call as the bars are not dodged on both sides of the axis:
mydf = read.table("https://raw.githubusercontent.com/gilles-guillot/IPUMS_R/main/tmp/df.csv",
header=TRUE,sep=";")
ggplot(mydf) +
geom_col(aes(fill = interaction(mig,ISCO08WHO_yrstud, sep = "-"),
x = country,
y = f),
position = "dodge")
failed attempt to get a back-to-back bar plot
as I was expected from:
ggplot(df) +
geom_col(aes(fill = interaction(Gender, Year, sep = "-"),
y = Value,
x = Age),
position = "dodge")
geol_col plot with bar dodged symmetrically around axis
In the example you are following, df$Value is made negative if Gender == 'F'. You need to do similar to achieve "bar dodged symmetrically around axis".
Related
I have a swimlane plot which I want to order by a group variable. I was also wondering if it is possible to label the groups on the ggplot.
Here is the code to create the data set and plot the data
dataset <- data.frame(subject = c("1002", "1002", "1002", "1002", "10034","10034","10034","10034","10054","10054","10054","1003","1003","1003","1003"),
exdose = c(5,10,20,5,5,10,20,20,5,10,20,5,20,10,5),
p= c(1,2,3,4,1,2,3,4,1,2,3,1,2,3,4),
diff = c(3,3,9,7,3,3,4,5,3,5,6,3,5,6,7),
group =c("grp1","grp1","grp1","grp1","grp2","grp2","grp2","grp2","grp1","grp1","grp1","grp2","grp2","grp2","grp2")
)
ggplot(dataset, aes(x = diff + 1, y = subject, group = p)) +
geom_col(aes(fill = as.factor(exdose)), position = position_stack(reverse = TRUE))
I want the y axis order by group and I want a label on the side to label the groups if possible
you can see from the plot it is ordered by subject number but I want it ordered by group and some indicator of group.
I tried reorder but I was unsuccessful in getting the desired plot.
As Stefan points out, facets are probably the way to go here, but you can use them with subtle theme tweaks to make it look as though you have just added a grouping variable on the y axis:
library(tidyverse)
dataset %>%
mutate(group = factor(group),
subject = reorder(subject, as.numeric(group)),
exdose = factor(exdose)) %>%
ggplot(aes(x = diff + 1, y = subject, group = p)) +
geom_col(aes(fill = exdose), color = "gray50",
position = position_stack(reverse = TRUE)) +
scale_y_discrete(expand = c(0.1, 0.4)) +
scale_fill_brewer(palette = "Set2") +
facet_grid(group ~ ., scales = "free_y", switch = "y") +
theme_minimal(base_size = 16) +
theme(strip.background = element_rect(color = "gray"),
strip.text = element_text(face = 2),
panel.spacing.y = unit(0, "mm"),
panel.background = element_rect(fill = "#f9f8f6", color = NA))
I have the following data & code to produce a barplot (building on this answer)
tmpdf <- tibble(class = c("class 1", rep("class 2", 4), rep("class 3", 4)),
var_1 = c("none", rep(c("A", "B", "C", "D"), 2)),
y_ = as.integer(c(runif(9, min = 100, max=250))))
tmpdf <- rbind(tmpdf, cbind(expand.grid(class = levels(as.factor(tmpdf$class)),
var_1 = levels(as.factor(tmpdf$var_1))),
y_ = NA))
ggplot(data=tmpdf, aes(x = class, y = y_, fill=var_1, width=0.75 )) +
geom_bar(stat = "identity", position=position_dodge(width = 0.90), color="black", size=0.2)
This produces the below plot:
However, since not all class / var_1 combinations are present, some space on the x-axis is lost. I would now like to remove the empty space on the x-axis without making the bars wider(!).
Can someone point me to the right direction?
You can use na.omit to remove unused levels, and then use facet_grid with scales = "free_x" and space = "free_x" to remove space.
ggplot(data=na.omit(tmpdf), aes(x = var_1, y = y_, fill=var_1, width=0.75)) +
geom_col(position=position_dodge(width = 0.90), color="black", size=0.2) +
facet_grid(~ class, scales = "free_x", space = "free_x", switch = "x") +
theme(axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
strip.background = element_blank())
Technically, you could tweak a column chart (geom_col) to the desired effect, like so:
mpdf %>%
mutate(xpos = c(1.6, 2 + .2 * 0:3, 3 + .2 * 0:3)) %>%
ggplot() +
geom_col(aes(x = xpos, y = y_, fill = var_1)) +
scale_x_continuous(breaks = c(1.6, 2.3 + 0:1), labels = unique(mpdf$class))
However, the resulting barplot (condensed or not) might be difficult to interpret as long as you want to convey differences between classes. For example, the plot has to be studied carefully to detect that variable D runs against the pattern of increasing values from class 2 to 3.
With this data:
df <- data.frame(value =c(20, 50, 90),
group = c(1, 2,3))
I can get a bar chart:
df %>% ggplot(aes(x = group, y = value, fill = value)) +
geom_col() +
coord_flip()+
scale_fill_viridis_c(option = "C") +
theme(legend.position = "none")
But I would like to have the colors of those bars to vary according to their corresponding values in value.
I have managed to change them using geom_raster:
ggplot() +
geom_raster(aes(x = c(0:20), y = .9, fill = c(0:20)),
interpolate = TRUE) +
geom_raster(aes(x = c(0:50), y = 2, fill = c(0:50)),
interpolate = TRUE) +
geom_raster(aes(x = c(0:90), y = 3.1, fill = c(0:90)),
interpolate = TRUE) +
scale_fill_viridis_c(option = "C") +
theme(legend.position = "none")
This approach is not efficient when I have many groups in real data. Any suggestions to get it done more efficiently would be appreciated.
I found the accepted answer to a previous similar question, but "These numbers needs to be adjusted depending on the number of x values and range of y". I was looking for an approach that I do not have to adjust numbers based on data. David Gibson's answer fits my purpose.
It does not look like this is supported natively in ggplot. I was able to get something close by adding additional rows, ranging from 0 to value) to the data. Then use geom_tile and separating the tiles by specifying width.
library(tidyverse)
df <- data.frame(value = c(20, 50, 90),
group = c(1, 2, 3))
df_expanded <- df %>%
rowwise() %>%
summarise(group = group,
value = list(0:value)) %>%
unnest(cols = value)
df_expanded %>%
ggplot() +
geom_tile(aes(
x = group,
y = value,
fill = value,
width = 0.9
)) +
coord_flip() +
scale_fill_viridis_c(option = "C") +
theme(legend.position = "none")
If this is too pixilated you can increase the number of rows generated by replacing list(0:value) with seq(0, value, by = 0.1).
This is a real hack using ggforce. This package has a geom that can take color gradients but it is for a line segment. I've just increased the size to make the line segment look like a bar. I made all the bars the same length to get the correct gradient, then covered a portion of each bar over with the same color as the background color to make them appear to be the correct length. Had to hide the grid lines, however. :-)
df %>%
ggplot() +
geom_link(aes(x = 0, xend = max(value), y = group, yend = group, color = stat(index)), size = 30) +
geom_link(aes(x = value, xend = max(value), y = group, yend = group), color = "grey", size = 31) +
scale_color_viridis_c(option = "C") +
theme(legend.position = "none", panel.background = element_rect(fill = "grey"),
panel.grid = element_blank()) +
ylim(0.5, max(df$group)+0.5 )
Hi I want to make this kind of graphs. I know it is bar charts plot. And can be used with bar_charts. But the only I can do is to make a simple bar charts with the maximum value above the bars. I suppose it is more complicated and I am not sure whether ggplot can make it. Any idea ?
[
tab1<read.csv("/Users/vladalexandru/Documente/vlad_R/R_markdown/Rmrkd/Tab/Tx_tn.csv")
tab1$dat <- as.Date(tab1$dat)
tab2 <- aggregate(tab1, by = list(format(tab1$dat, "%m")), FUN = "mean")
ggplot(data=tab2, aes(x=Group.1, y= tx)) +
geom_bar(stat="identity", fill="yellow")+
geom_text(aes(label=round(tx,0)), vjust=-0.3, size=3.5)+
labs(x = "month")+
scale_y_continuous(name = "°C")+
theme_minimal()
It is actually possible to make a gradient-filled boxplot like this in ggplot. It's a hassle though. You have to add a transparent bar at the bottom of a stacked bar chart, then construct the gradient by slicing the bar into thin pieces and coloring them:
set.seed(123)
a <- sample(10:20, 12, TRUE)
b <- sample(1:10, 12, TRUE)
data.frame(vals = c(sapply(1:12, function(i) c(rep(a[i]/39, 39), 20 - b[i]))),
month =factor( rep(month.abb, each = 40), levels = month.abb),
fills = rep(c(1:39, "top"), 12)) %>%
ggplot(aes(x = month, y = vals, fill = fills)) +
geom_col(fill = "gray95", aes(y = Inf), width = 0.7) +
geom_col(position = position_stack(), width = 0.5) +
scale_fill_manual(values = c("#00000000",
colorRampPalette(colors = c("forestgreen",
"gold", "orange"))(38),
"#00000000"),
guide = guide_none()) +
theme_classic()
I'm plotting 3 columns/character vectors in a faceted bar graph and would like to be able to plot "smoker" as the stacked bar graph inside each bar graph.
I'm using ggplot2. I've managed to plot "edu" and "sex" already, but I'd also like to be able to see the count of each "y" and "n" inside each bar graph of "sex" (divided along the x-axis by "edu"). I have attached an image of my graph,
which I achieved by entering the following code:
I tried entering the "fill=smoker" argument in aes, but this didn't work.
If anyone has any suggestions on how to clean up the code I used to turn the graph into a faceted one and express it as percentages, I would also be very grateful, as I took it from somewhere else.
test <- read.csv('test.csv', header = TRUE)
library(ggplot2)
ggplot(test, aes(x= edu, group=sex)) +
geom_bar(aes(y = ..prop.., fill = factor(..x..)), stat="count", show.legend = FALSE) +
geom_text(aes( label = scales::percent(..prop..),
y= ..prop.. ), stat= "count", vjust = -.5, size = 3) +
labs(y = NULL, x="education") +
facet_grid(~sex) +
scale_y_continuous(labels = scales::percent)
Not sure if this is what you are looking for but I attempted my best at answering your question.
library(tidyverse)
library(lubridate)
library(scales)
test <- tibble(
edu = c(rep("hs", 5), rep("bsc", 3), rep("msc", 3)),
sex = c(rep("m", 3), rep("f", 4), rep("m", 4)),
smoker = c("y", "n", "n", "y", "y", rep("n", 3), "y", "n", "n"))
test %>%
count(sex, edu, smoker) %>%
group_by(sex) %>%
mutate(percentage = n/sum(n)) %>%
ggplot(aes(edu, percentage, fill = smoker)) +
geom_col() +
geom_text(aes(label = percent(percentage)),
position = position_stack(vjust = 0.5)) +
facet_wrap(~sex) +
scale_y_continuous(labels = scales::percent) +
scale_fill_manual(values = c("#A0CBE8", "#F28E2B"))