I'm plotting 3 columns/character vectors in a faceted bar graph and would like to be able to plot "smoker" as the stacked bar graph inside each bar graph.
I'm using ggplot2. I've managed to plot "edu" and "sex" already, but I'd also like to be able to see the count of each "y" and "n" inside each bar graph of "sex" (divided along the x-axis by "edu"). I have attached an image of my graph,
which I achieved by entering the following code:
I tried entering the "fill=smoker" argument in aes, but this didn't work.
If anyone has any suggestions on how to clean up the code I used to turn the graph into a faceted one and express it as percentages, I would also be very grateful, as I took it from somewhere else.
test <- read.csv('test.csv', header = TRUE)
library(ggplot2)
ggplot(test, aes(x= edu, group=sex)) +
geom_bar(aes(y = ..prop.., fill = factor(..x..)), stat="count", show.legend = FALSE) +
geom_text(aes( label = scales::percent(..prop..),
y= ..prop.. ), stat= "count", vjust = -.5, size = 3) +
labs(y = NULL, x="education") +
facet_grid(~sex) +
scale_y_continuous(labels = scales::percent)
Not sure if this is what you are looking for but I attempted my best at answering your question.
library(tidyverse)
library(lubridate)
library(scales)
test <- tibble(
edu = c(rep("hs", 5), rep("bsc", 3), rep("msc", 3)),
sex = c(rep("m", 3), rep("f", 4), rep("m", 4)),
smoker = c("y", "n", "n", "y", "y", rep("n", 3), "y", "n", "n"))
test %>%
count(sex, edu, smoker) %>%
group_by(sex) %>%
mutate(percentage = n/sum(n)) %>%
ggplot(aes(edu, percentage, fill = smoker)) +
geom_col() +
geom_text(aes(label = percent(percentage)),
position = position_stack(vjust = 0.5)) +
facet_wrap(~sex) +
scale_y_continuous(labels = scales::percent) +
scale_fill_manual(values = c("#A0CBE8", "#F28E2B"))
Related
I have the following data & code to produce a barplot (building on this answer)
tmpdf <- tibble(class = c("class 1", rep("class 2", 4), rep("class 3", 4)),
var_1 = c("none", rep(c("A", "B", "C", "D"), 2)),
y_ = as.integer(c(runif(9, min = 100, max=250))))
tmpdf <- rbind(tmpdf, cbind(expand.grid(class = levels(as.factor(tmpdf$class)),
var_1 = levels(as.factor(tmpdf$var_1))),
y_ = NA))
ggplot(data=tmpdf, aes(x = class, y = y_, fill=var_1, width=0.75 )) +
geom_bar(stat = "identity", position=position_dodge(width = 0.90), color="black", size=0.2)
This produces the below plot:
However, since not all class / var_1 combinations are present, some space on the x-axis is lost. I would now like to remove the empty space on the x-axis without making the bars wider(!).
Can someone point me to the right direction?
You can use na.omit to remove unused levels, and then use facet_grid with scales = "free_x" and space = "free_x" to remove space.
ggplot(data=na.omit(tmpdf), aes(x = var_1, y = y_, fill=var_1, width=0.75)) +
geom_col(position=position_dodge(width = 0.90), color="black", size=0.2) +
facet_grid(~ class, scales = "free_x", space = "free_x", switch = "x") +
theme(axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
strip.background = element_blank())
Technically, you could tweak a column chart (geom_col) to the desired effect, like so:
mpdf %>%
mutate(xpos = c(1.6, 2 + .2 * 0:3, 3 + .2 * 0:3)) %>%
ggplot() +
geom_col(aes(x = xpos, y = y_, fill = var_1)) +
scale_x_continuous(breaks = c(1.6, 2.3 + 0:1), labels = unique(mpdf$class))
However, the resulting barplot (condensed or not) might be difficult to interpret as long as you want to convey differences between classes. For example, the plot has to be studied carefully to detect that variable D runs against the pattern of increasing values from class 2 to 3.
The geom Texts labels are automatically in decreasing order instead of the data frame.
The question is concerning this part of the snippet "geom_text(aes(label = Freq)..."
Here you can clearly see the that the order is not followed by geom_text. But Frequency descreasing in all categories.
ggplot(df_beine_clan, aes(x = Var2, y = Freq, fill = Var1)) +
geom_bar(stat = "identity") +
geom_text(aes(label = Freq), vjust = 0, size = 5, nudge_y = 2, nudge_x = -0.5)
See Freq Order
How to command that the order should not be changed when rendered on the bar chart?
You could add position_stack like this:
library(ggplot2)
ggplot(df_beine_clan, aes(x = Var2, y = Freq, fill = Var1)) +
geom_bar(stat = "identity") +
geom_text(aes(label = Freq), position = position_stack(vjust = 0.5), size = 5)
Created on 2022-09-03 with reprex v2.0.2
Answering your question without having a MWE is a bit tricky. Hence, as Julien mentioned, an output of dput(df_beine_clan) would be helpful.
I tried to recreate an example, but this might not be applicable to the structure of your data. It might give you an example, however how to tackle the problem. I have created a column in the data table, that contains the label I think you want to add to your plot. Having a separate column gives you more flexibility inside ggplot.
library(tidyverse)
Group <- c("A", "A", "A", "B", "B", "B")
Value <- c(2,4,6,8,10,12)
Response <- c("Yes","No","Maybe","Yes","No","Maybe")
label <- Value
df <- data.frame(Group, Value, Response, label)
ggplot(df, aes(x = Group, y = Value, fill = Response)) + geom_bar(stat = "identity") +
geom_text(aes(label = label), position = position_stack(vjust = 0.5))
Changing the variable label to label <- sort(Value, decreasing = T) or label <- c("Blue", "Green", "Red", "Blue", "Green", "Red") gives you the two figures below.
This question already has an answer here:
How to align and label the stratum in ggalluvial using ggrepel (or otherwise)
(1 answer)
Closed 1 year ago.
I am getting problems with applying ggrepel() in an alluvial plot with different variables on columns. Some observations are so small, I need ggrepel to make them readable.
Because there are three columns, I want to apply different ggrepel() functions to each column:
Left (region): Align lables to the left of axis
Middle (supplySector): Do nothing (i.e. leave text in axis)
Right (demandSector): Align to right of axis.
I've found these issues:
https://cran.r-project.org/web/packages/ggalluvial/vignettes/labels.html
and
How to align and label the stratum in ggalluvial using ggrepel (or otherwise)
Difference is: these examples only have 2 columns, and also columns made of the same variable (but subset of the variable). Previous published fixes are through an ifelse(), selecting a subset within the variable.
ReprEx:
library(ggplot2)
library(ggrepel)
library(tidyr)
library(dplyr)
df <- data.frame(region = c("A","A","A","B","B","B"),
supplySector = c("coal","gas","wind","coal","gas","wind"),
demandSector = c("resid","indus","ag","resid","indus","ag"),
Freq = 20*runif(6)); df
p<- ggplot(df, aes(y = Freq, axis1 = region, axis2 = supplySector, axis3=demandSector, label = after_stat(stratum))) +
ggalluvial::geom_alluvium(aes(fill = demandSector), width = 1/12, color="black", alpha=0.8) +
ggalluvial::geom_stratum(width = 1/3, fill = "grey70", color = "grey10", alpha=1) +
scale_x_discrete(limits = c("Region", "Supply Sector", "Demand Sector"), expand = c(0.3,0),drop=F) +
scale_y_continuous("Frequency (n)")+
theme_classic()+
theme(legend.position = "none")
I've tried to feed the colnames(df) == "region" to get a true/false vector into
p + ggrepel::geom_text_repel(
aes(label = ifelse(colnames(df) == "region", as.character(region), NA)),
stat = "stratum", size = 4, direction = "y", nudge_x = -.5
)
I would then repeat this for aes(label = ifelse(colnames(df) == "demandSector" with nudge_x = 1.5.
Maybe I got you wrong. But after a closer look at your example I would call it a duplicate to my answer you linked in your post.
library(ggplot2)
library(ggrepel)
library(ggalluvial)
p + ggrepel::geom_text_repel(
aes(label = ifelse(after_stat(x) == 1, as.character(after_stat(stratum)), NA)),
stat = "stratum", size = 4, direction = "y", nudge_x = -.5
) + ggrepel::geom_text_repel(
aes(label = ifelse(after_stat(x) == 2, as.character(after_stat(stratum)), NA)),
stat = "stratum", size = 4, direction = "y", nudge_x = 0
) + ggrepel::geom_text_repel(
aes(label = ifelse(after_stat(x) == 3, as.character(after_stat(stratum)), NA)),
stat = "stratum", size = 4, direction = "y", nudge_x = +.5
)
I am trying to obtain a back-to-back bar plot (or pyramid plot) similar to the ones shown here:
Population pyramid with gender and comparing across two time periods with ggplot2
Basically, a pyramid plot of a quantitative variable whose values have to be displayed for combinations of three categorical variables.
library(ggplot2)
library(dplyr)
df <- data.frame(Gender = rep(c("M", "F"), each = 20),
Age = rep(c("0-10", "11-20", "21-30", "31-40", "41-50",
"51-60", "61-70", "71-80", "81-90", "91-100"), 4),
Year = factor(rep(c(2009, 2010, 2009, 2010), each= 10)),
Value = sample(seq(50, 100, 5), 40, replace = TRUE)) %>%
mutate(Value = ifelse(Gender == "F", Value *-1 , Value))
ggplot(df) +
geom_col(aes(fill = interaction(Gender, Year, sep = "-"),
y = Value,
x = Age),
position = "dodge") +
scale_y_continuous(labels = abs,
expand = c(0, 0)) +
scale_fill_manual(values = hcl(h = c(15,195,15,195),
c = 100,
l = 65,
alpha=c(0.4,0.4,1,1)),
name = "") +
coord_flip() +
facet_wrap(.~ Gender,
scale = "free_x",
strip.position = "bottom") +
theme_minimal() +
theme(legend.position = "bottom",
panel.spacing.x = unit(0, "pt"),
strip.background = element_rect(colour = "black"))
example of back-to-back barplot I want to mimick
Trying to mimick this example on my data, things go wrong from the first ggplot function call as the bars are not dodged on both sides of the axis:
mydf = read.table("https://raw.githubusercontent.com/gilles-guillot/IPUMS_R/main/tmp/df.csv",
header=TRUE,sep=";")
ggplot(mydf) +
geom_col(aes(fill = interaction(mig,ISCO08WHO_yrstud, sep = "-"),
x = country,
y = f),
position = "dodge")
failed attempt to get a back-to-back bar plot
as I was expected from:
ggplot(df) +
geom_col(aes(fill = interaction(Gender, Year, sep = "-"),
y = Value,
x = Age),
position = "dodge")
geol_col plot with bar dodged symmetrically around axis
In the example you are following, df$Value is made negative if Gender == 'F'. You need to do similar to achieve "bar dodged symmetrically around axis".
I am trying to use ggsignif for displaying significant stars in top of paired bar graphs using facet_wrap. However, I canĀ“t manage to find a way of displaying one significant bar per facet. Here is what I mean:
dat <- data.frame(Group = c("S1", "S1", "S2", "S2"),
Sub = c("A", "B", "A", "B"),
Value = c(3,5,7,8))
ggplot(dat, aes(Group, Value)) +
geom_bar(aes(fill = Sub), stat="identity", position="dodge", width=.5) +
geom_signif(y_position=c(5.3, 8.3), xmin=c(0.8, 1.8), xmax=c(1.2, 2.2),
annotation=c("**", "NS"), tip_length=0) +
scale_fill_manual(values = c("grey80", "grey20")) +
facet_grid(~ Group, scales = "free")
Is there a way of making sure that each facet has its individual significance label?
The main problem seems to me is that the geom_signif layer doesn't know to what panel the variables go to, since it has no data argument provided.
I'm not that familiar with the package, but the documentation seems to suggest that manual = TRUE is recommended for plotting it in different facets. Doing that and making some adjustments for the errors that were thrown, I got the following to work:
ggplot(dat, aes(Group, Value)) +
geom_bar(aes(fill = Sub), stat="identity", position="dodge", width=.5) +
geom_signif(data = data.frame(Group = c("S1","S2")),
aes(y_position=c(5.3, 8.3), xmin=c(0.8, 0.8), xmax=c(1.2, 1.2),
annotations=c("**", "NS")), tip_length=0, manual = T) +
scale_fill_manual(values = c("grey80", "grey20")) +
facet_grid(~ Group, scales = "free")
The key seemed to be to provide a data argument from which the facetting code could deduce what bit goes in what panel.
Have you considered using ggpubr and stat_compare_means?
https://rpkgs.datanovia.com/ggpubr/reference/stat_compare_means.html
Since your example only contains one observation pr. bar it does not work but if you include multiple observations you can get what you want.
rewrite the test data:
dat <- data.frame(A_S1 = sample(rnorm(20, 3, 1)),
B_S1 = sample(rnorm(20, 5, 1)),
A_S2 = sample(rnorm(20, 7, 1)),
B_S2 = sample(rnorm(20, 8, 1))) %>%
tidyr::gather("G", "value") %>%
tidyr::separate("G", c("Sub", "Group"))
Plot the data using the ggpubr package
ggerrorplot(dat, x = "Sub", y = "value",
facet.by = "Group",
error.plot = "pointrange") +
stat_compare_means(aes(label = ..p.signif..),
method = "t.test", ref.group = "A")