Get equal width bars and dense x-axis in ggplot barplot - r

I have the following data & code to produce a barplot (building on this answer)
tmpdf <- tibble(class = c("class 1", rep("class 2", 4), rep("class 3", 4)),
var_1 = c("none", rep(c("A", "B", "C", "D"), 2)),
y_ = as.integer(c(runif(9, min = 100, max=250))))
tmpdf <- rbind(tmpdf, cbind(expand.grid(class = levels(as.factor(tmpdf$class)),
var_1 = levels(as.factor(tmpdf$var_1))),
y_ = NA))
ggplot(data=tmpdf, aes(x = class, y = y_, fill=var_1, width=0.75 )) +
geom_bar(stat = "identity", position=position_dodge(width = 0.90), color="black", size=0.2)
This produces the below plot:
However, since not all class / var_1 combinations are present, some space on the x-axis is lost. I would now like to remove the empty space on the x-axis without making the bars wider(!).
Can someone point me to the right direction?

You can use na.omit to remove unused levels, and then use facet_grid with scales = "free_x" and space = "free_x" to remove space.
ggplot(data=na.omit(tmpdf), aes(x = var_1, y = y_, fill=var_1, width=0.75)) +
geom_col(position=position_dodge(width = 0.90), color="black", size=0.2) +
facet_grid(~ class, scales = "free_x", space = "free_x", switch = "x") +
theme(axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
strip.background = element_blank())

Technically, you could tweak a column chart (geom_col) to the desired effect, like so:
mpdf %>%
mutate(xpos = c(1.6, 2 + .2 * 0:3, 3 + .2 * 0:3)) %>%
ggplot() +
geom_col(aes(x = xpos, y = y_, fill = var_1)) +
scale_x_continuous(breaks = c(1.6, 2.3 + 0:1), labels = unique(mpdf$class))
However, the resulting barplot (condensed or not) might be difficult to interpret as long as you want to convey differences between classes. For example, the plot has to be studied carefully to detect that variable D runs against the pattern of increasing values from class 2 to 3.

Related

modify horizontal barplots for combination (tight design)

I have the following sample data:
library(tidyverse)
df <- data.frame(col=rep(c("A_B", "A_C", "A_D",
"B_A", "C_A", "D_A",
"B_C", "B_D",
"C_B", "D_B",
"C_D", "D_C"), 2),
level=c(rep("lower_level", 12), rep("higher_level", 12)),
value=abs(rnorm(24, mean=5, sd=2)))%>% tibble()
df[c('origin', 'target')] <- str_split_fixed(df$col, '_', 2)
df <- df %>% select(c(origin, target, level, value))
I now want to create horizontal stacked barplots for each target (df %>% filter(target=="A")). I do this using the following code:
# plot
p1 <- ggplot(data = df %>% filter(target=="A"),
aes(x = factor(level), y = value, fill = factor(origin)))+
geom_bar(stat="identity", position="fill", width = .1) +
scale_fill_manual(values = c("A"="yellow", "B" = "green", "C"="red", "D"="blue")) +
coord_flip()
Since I want to combine multiple such plots later (s. below), I would like to
remove the empty space between y-axis and the bars (or manipulate it to value X)
have the fill label displayed on the right side
have one value on the left, saying "target: A"
and have fill legend and y axis shared between all plots.
See annotated plot:
For reference, I create additional plots with this code:
p2 <- ggplot(data = df %>% filter(target=="B"),
aes(x = factor(level), y = value, fill = factor(origin)))+
geom_bar(stat="identity", position="fill", width = .1) +
scale_fill_manual(values = c("A"="yellow", "B" = "green", "C"="red", "D"="blue")) +
coord_flip()
p3 <- ggplot(data = df %>% filter(target=="C"),
aes(x = factor(level), y = value, fill = factor(origin)))+
geom_bar(stat="identity", position="fill", width = .1) +
scale_fill_manual(values = c("A"="yellow", "B" = "green", "C"="red", "D"="blue")) +
coord_flip()
p4 <- ggplot(data = df %>% filter(target=="D"),
aes(x = factor(level), y = value, fill = factor(origin)))+
geom_bar(stat="identity", position="fill", width = .1) +
scale_fill_manual(values = c("A"="yellow", "B" = "green", "C"="red", "D"="blue")) +
coord_flip()
And combine them with this code (but happy to use other ways of combining them if needed).
library("gridExtra")
grid.arrange(p1, p2, p3, p4, ncol = 1, nrow = 4)
It sounds very much as though you simply want to facet by target. No need for stitching multiple plots here.
ggplot(data = df %>% mutate(target = paste('Target:', target)),
aes(x = factor(level), y = value, fill = factor(origin)))+
geom_col(position = "fill", width = 0.9) +
scale_fill_manual(values = c("A"="yellow", "B" = "green",
"C"="red", "D"="blue"), name = 'origin') +
facet_grid(target~., switch = 'y') +
coord_flip() +
theme(strip.placement = 'outside',
strip.background = element_blank(),
axis.title.y = element_blank())
two suggestions_
to remove the offset between axis and bar, set the axis expansion to zero
scale_x_continuous(..., expand = c(0,0))
instead of tediously subsetting the data frame, use the facet_wrap or facet_grid option of ggplot:
ggplot(data = df,
aes(x = factor(level), y = value, fill = factor(origin))) +
## other plot instructions
facet_wrap( ~target)
see ?facet_wrap for various layout options like number of plot columns
3. the vertical spacing between bars will be adjusted to the output dimensions (here: figure height) anyway

how to change / specify fill color which exceeds the limits of a gradient bar?

In ggplot2/geom_tile, how to change fill color whice exceed the limits?
As the image, Region_4/5 are out of limis(1,11) , so the fill color is default grey, how to change 'Region_4' to 'darkblue', 'Region_5' to 'black' . Thanks!
library(tidyverse)
library(RColorBrewer)
tile_data <- data.frame(category=letters[1:5],
region=paste0('region_',1:5),
sales=c(1,2,5,0.1,300))
tile_data %>% ggplot(aes(x=category,
y=region,
fill=sales))+
geom_tile()+
scale_fill_gradientn(limits=c(1,11),
colors=brewer.pal(12,'Spectral'))+
theme_minimal()
If you want to keep the gradient scale and have two additional discrete values for off limits above and below, I think the easiest way would be to have separate fill scales for "in-limit" and "off-limit" values. This can be done with separate calls to geom_tile on subsets of your data and with packages such as {ggnewscale}.
I think it then would make sense to place the discrete "off-limits" at the respective extremes of your gradient color bar. You need then three geom_tile calls and three scale_fill calls, and you will need to specify the guide order within each scale_fill call. You will then need to play around with the legend margins, but it's not a big problem to make it look OK.
library(tidyverse)
library(RColorBrewer)
tile_data <- data.frame(
category = letters[1:5],
region = paste0("region_", 1:5),
sales = c(1, 2, 5, 0.1, 300)
)
ggplot(tile_data, aes(
x = category,
y = region,
fill = sales
)) +
geom_tile(data = filter(tile_data, sales <= 11 & sales >=1)) +
scale_fill_gradientn(NULL,
limits = c(1, 11),
colors = brewer.pal(11, "Spectral"),
guide = guide_colorbar(order = 2)
) +
ggnewscale::new_scale_fill() +
geom_tile(data = filter(tile_data, sales > 11), mapping = aes(fill = sales > 11)) +
scale_fill_manual("Sales", values = "black", labels = "> 11", guide = guide_legend(order = 1)) +
ggnewscale::new_scale_fill() +
geom_tile(data = filter(tile_data, sales < 1), mapping = aes(fill = sales < 1)) +
scale_fill_manual(NULL, values = "darkblue", labels = "< 1", guide = guide_legend(order = 3)) +
theme_minimal() +
theme(legend.spacing.y = unit(-6, "pt"),
legend.title = element_text(margin = margin(b = 10)))
Created on 2021-11-22 by the reprex package (v2.0.1)
You can try scales::squish, define the limits, and put the out of bound (oob) values into the scalw:
p = tile_data %>% ggplot(aes(x=category,y=region,fill=sales))+ geom_tile()
p + scale_fill_gradientn(colors = brewer.pal(11,"Spectral"),
limit = c(1,11),oob=scales::squish)

Back-to-back bar plot with ggplot2 in R

I am trying to obtain a back-to-back bar plot (or pyramid plot) similar to the ones shown here:
Population pyramid with gender and comparing across two time periods with ggplot2
Basically, a pyramid plot of a quantitative variable whose values have to be displayed for combinations of three categorical variables.
library(ggplot2)
library(dplyr)
df <- data.frame(Gender = rep(c("M", "F"), each = 20),
Age = rep(c("0-10", "11-20", "21-30", "31-40", "41-50",
"51-60", "61-70", "71-80", "81-90", "91-100"), 4),
Year = factor(rep(c(2009, 2010, 2009, 2010), each= 10)),
Value = sample(seq(50, 100, 5), 40, replace = TRUE)) %>%
mutate(Value = ifelse(Gender == "F", Value *-1 , Value))
ggplot(df) +
geom_col(aes(fill = interaction(Gender, Year, sep = "-"),
y = Value,
x = Age),
position = "dodge") +
scale_y_continuous(labels = abs,
expand = c(0, 0)) +
scale_fill_manual(values = hcl(h = c(15,195,15,195),
c = 100,
l = 65,
alpha=c(0.4,0.4,1,1)),
name = "") +
coord_flip() +
facet_wrap(.~ Gender,
scale = "free_x",
strip.position = "bottom") +
theme_minimal() +
theme(legend.position = "bottom",
panel.spacing.x = unit(0, "pt"),
strip.background = element_rect(colour = "black"))
example of back-to-back barplot I want to mimick
Trying to mimick this example on my data, things go wrong from the first ggplot function call as the bars are not dodged on both sides of the axis:
mydf = read.table("https://raw.githubusercontent.com/gilles-guillot/IPUMS_R/main/tmp/df.csv",
header=TRUE,sep=";")
ggplot(mydf) +
geom_col(aes(fill = interaction(mig,ISCO08WHO_yrstud, sep = "-"),
x = country,
y = f),
position = "dodge")
failed attempt to get a back-to-back bar plot
as I was expected from:
ggplot(df) +
geom_col(aes(fill = interaction(Gender, Year, sep = "-"),
y = Value,
x = Age),
position = "dodge")
geol_col plot with bar dodged symmetrically around axis
In the example you are following, df$Value is made negative if Gender == 'F'. You need to do similar to achieve "bar dodged symmetrically around axis".

Include 2nd variable labels on an existing Variable vs sample plot geom_jitter

I have a geom_jitter plot showing Variables between 2 samples, I would like to include the Group-variable parameters on the left of the plot, setting a separation by lines like in the figure below. Thus, Variables are organised by Group.
Here is a reproducible example:
data<- tibble::tibble(
Variable = c("A","B","C","D","E", "F"),
Group = c("Asia","Asia","Europe","Europe","Africa","America"),
sample1 = c(0.38,0.22,0.18,0.12,0.1,0),
sample2 = c(0.23,0.2,0,0.12,0.11,0.15))
library(reshape2)
data2<- melt(data,
id.vars=c("Variable", "Group"),
measure.vars=c("sample1", "sample2"),
variable.name="Sample",
value.name="value")
data22[is.na(data22)] <- 0
library(ggplot2)
ggplot(data2, aes(x = Sample, y = Variable, label=NA)) +
geom_point(aes(size = value, colour = value)) +
geom_text(hjust = 1, size = 2) +
# scale_size(range = c(1,3)) +
theme_bw()+
scale_color_gradient(low = "lightblue", high = "darkblue")
Here is the current output I have:
And this is the format I would like:
To get a polished version of the plot most similar to your ideal plot, you can use facet_grid() plus some theme() customization.
ggplot(data2, aes(x = Sample, y = Variable, label=NA)) +
geom_point(aes(size = value, colour = value)) +
geom_text(hjust = 1, size = 2) +
# scale_size(range = c(1,3)) +
theme_bw()+
scale_color_gradient(low = "lightblue", high = "darkblue") +
facet_grid(Group~., scales = "free", switch = "y") +
theme(strip.placement = "outside",
strip.text.y = element_text(angle = 180),
panel.spacing = unit(0, "cm"))

How do I represent percent of a variable in a filled barplot?

I have a data frame(t1) and I want to illustrate the shares of companies in relation to their size
I added a Dummy variable in order to make a filled barplot and not 3:
t1$row <- 1
The size of companies are separated in medium, small and micro:
f_size <- factor(t1$size,
ordered = TRUE,
levels = c("medium", "small", "micro"))
The plot is build up with the economic_theme:
ggplot(t1, aes(x = "Size", y = prop.table(row), fill = f_size)) +
geom_col() +
geom_text(aes(label = as.numeric(f_size)),
position = position_stack(vjust = 0.5)) +
theme_economist(base_size = 14) +
scale_fill_economist() +
theme(legend.position = "right",
legend.title = element_blank()) +
theme(axis.title.y = element_text(margin = margin(r = 20))) +
ylab("Percentage") +
xlab(NULL)
How can I modify my code to get the share for medium, small and micro in the middle of the three filled parts in the barplot?
Thanks in advance!
Your question isn't quite clear to me and I suggest you re-phrase it for clarity. But I believe you're trying to get the annotations to be accurately aligned on the Y-axis. For this use, pre-calculate the labels and then use annotate
library(data.table)
library(ggplot2)
set.seed(3432)
df <- data.table(
cat= sample(LETTERS[1:3], 1000, replace = TRUE)
, x= rpois(1000, lambda = 5)
)
tmp <- df[, .(pct= sum(x) / sum(df[,x])), cat][, cumsum := cumsum(pct)]
ggplot(tmp, aes(x= 'size', y= pct, fill= cat)) + geom_bar(stat='identity') +
annotate('text', y= tmp[,cumsum] - 0.15, x= 1, label= as.character(tmp[,pct]))
But this is a poor decision graphically. Stacked bar charts, by definition sum to 100%. Rather than labeling the components with text, just let the graphic do this for you via the axis labels:
ggplot(tmp, aes(x= cat, y= pct, fill= cat)) + geom_bar(stat='identity') + coord_flip() +
scale_y_continuous(breaks= seq(0,1,.05))

Resources