R Adjusting space between bars after coord_flip() in ggplot - r

I have created a chart with ggplot.
I have set the width of each bar, but I also want to set the spacing between the bars to a certain value (I want to reduce the spacing marked in red to 0.1, for example)? I know there are options like position_dodge, but that does not seem to work in combination with coord_flip().
In this related post it was suggested to use theme(aspect.ratio = .2), but this does not allow to additionally set the specific width of the bars.
Are there any suggestions to achieve this?
Code:
library(ggplot2)
set.seed(0)
numbers <- runif(5, 0, 10)
names <- LETTERS[seq(1, 5)]
df <- cbind.data.frame(names, numbers)
ggplot(data = df, aes(x = names, y = numbers)) +
geom_bar(stat = "identity", fill = "blue", width = 0.30) +
coord_flip()

I think the solution is in the combination of
the width argument of geom_bar() (which fills the space reserved for a bar)
and the aspect ratio argument of theme(), which squeezes the plot vertically, leading to 'small' bars.
With the following code:
library(ggplot2)
## your data
set.seed(0)
numbers <- runif(5, 0, 10)
names <- LETTERS[seq(1, 5)]
df <- cbind.data.frame(names, numbers) ## corrected args
ggplot(data = df, aes(x = names, y = numbers)) +
geom_bar(stat="identity",
fill = "blue",
width=0.9) + ### increased
theme(aspect.ratio = .2) + ### aspect ratio added
coord_flip()
you get the following graph:

I personally prefer to use the ggstance package to avoid messing around with coord_flip. you need to switch your x and y
library(ggplot2)
library(ggstance)
ggplot(df, aes(x = numbers, y = names)) +
geom_colh(fill = "blue", width = 0.9)

Related

Add new geom as new row in ggplot2, preventing layering of plots

I am pretty sure that this is easy to do but I can't seem to find a proper way to query this question into google or stack, so here we are:
I have a plot made in ggplot2 which makes use of geom_jitter(), efficiently creating one row for each element in a factor and plotting its values.
I would like to add a complementary geom_violin() to the plot, but just adding the extra geom_ function to the plot code returns two layers: the jitter and the violin, one on top of the other (as usually expected).
EDIT:
This is how the plot looks like:
How can I have the violin as a separate row, without generating a second plot?
Side quest: how I can I have the jitter and the violin geoms interleaved? (i.e. element A jitter row followed by element A violin row, and then element B jitter row followed by element B violin row)
This is the minimum required code to make it (without all the theme() embellishments):
P1 <- ggplot(data=TEST_STACK_SUB, aes(x=E, y=C, col=A)) +
theme(... , aspect.ratio=0.3) +
geom_point(position = position_jitter(w = 0.30, h = 0), alpha=0.2, size=0.5) +
geom_violin(data=TEST_STACK_SUB, mapping=aes(x=E, y=C), position="dodge") +
scale_x_discrete() +
scale_y_continuous(limits=c(0,1), breaks=seq(0,1,0.1),
labels=c(seq(0,1,0.1))) +
scale_color_gradient2(breaks=seq(0,100,20),
limits=c(0,100),
low="green3",
high="darkorchid4",
midpoint=50,
name="") +
coord_flip()
options(repr.plot.width=8, repr.plot.height=2)
plot(P1)
Here is a subset of the data to generate it (for you to try):
data
How about manipulating your factor as a continuous variable and nudging the entries across the aes() calls like so:
library(dplyr)
library(ggplot2)
set.seed(42)
tibble(x = rep(c(1, 3), each = 10),
y = c(rnorm(10, 2), rnorm(10))) -> plot_data
ggplot(plot_data) +
geom_jitter(aes(x = x - 0.5, y = y), width = 0.25) +
geom_violin(aes(x = x + 0.5, y = y, group = x), width = 0.5) +
coord_flip() +
labs(x = "x") +
scale_x_continuous(breaks = c(1, 3),
labels = paste("Level", 1:2),
trans = scales::reverse_trans())

Case dependent scaling of plot size in ggplot loop

I am running a several ggplot barplots in a loop, including added text on top of each bar. I have defined plot scale via coord_fixed and expand_limits. Unfortunately, the y-axis differs from plot to plot, so that scale settings will not fit in all cases, i.e. the text gets cut off and/or the axes get compressed. Let me illustrate:
period <- c(rep("A",4),rep("B",4))
group <- rep(c("C","C","D","D"),2)
size <- rep(c("E","F"),4)
value <- c(23,29,77,62,18,30,54,81)
df <- data.frame(period,group,size,value)
library(ggplot2)
for (i in levels(df$group))
{
p <- ggplot(subset(df, group==i), aes(x=size, y=value, fill = period)) +
geom_bar(position="dodge", stat="identity", show.legend=F) +
geom_text(data=subset(df, group==i), aes(x=size, y=value,label=value),
size=10, fontface="bold", position = position_dodge(width=1),vjust = -0.5) +
expand_limits(y = max(df$value)*0.6) +
coord_fixed(ratio = 0.01)
ggsave(paste0("yourfilepath",i,".png"), width=7.72, height=4.5, units="in", p)
}
I would like the settings of coord_fixed and expand_limits to be case sensitive, dependening on value. I have experimented with using e.g. expand_limits(y = max(df$value * ifelse(df$value <= 50, 0.6, 1))), but that doesn't work in the way I had hoped. Any suggestions will be greatly appreciated!
Based on #Z.Lin's comment, I have added the df$value[df$group==i] argument to my ifelse function: expand_limits(y = max(df$value[df$group==i] * ifelse(df$value[df$group==i] <= 50, 5, 8))).

Adding Space between my geom_histogram bars-not barplot

Let's say I want to make a histogram
So I use the following code
v100<-c(runif(100))
v100
library(ggplot2)
private_plot<-ggplot()+aes(v100)+geom_histogram(binwidth = (0.1),boundary=0
)+scale_x_continuous(breaks=seq(0,1,0.1), lim=c(0,1))
private_plot
How do I separate my columns so that the whole thing is more pleasing to the eye?
I tried this but it somehow doesn't work:
Adding space between bars in ggplot2
Thanks
You could set the line color of the histogram bars with the col parameter, and the filling color with the fill parameter. This is not really adding space between the bars, but it makes them visually distinct.
library(ggplot2)
set.seed(9876)
v100<-c(runif(100))
### use "col="grey" to set the line color
ggplot() +
aes(v100) +
geom_histogram(binwidth = 0.1, fill="black", col="grey") +
scale_x_continuous(breaks = seq(0,1,0.1), lim = c(0,1))
Yielding this graph:
Please let me know whether this is what you want.
If you want to increase the space for e.g. to indicate that values are discrete, one thing to do is to plot your histogram as a bar plot. In that case, you have to summarize the data yourself, and use geom_col() instead of geom_histogram(). If you want to increase the space further, you can use the width parameter.
library(tidyverse)
lambda <- 1:6
pois_bar <-
map(lambda, ~rpois(1e5, .x)) %>%
set_names(lambda) %>%
as_tibble() %>%
gather(lambda, value, convert = TRUE) %>%
count(lambda, value)
pois_bar %>%
ggplot() +
aes(x = value, y = n) +
geom_col(width = .5) +
facet_wrap(~lambda, scales = "free", labeller = "label_both")
Just use color and fill options to distinguish between the body and border of bins:
library(ggplot2)
set.seed(1234)
df <- data.frame(sex=factor(rep(c("F", "M"), each=200)),
weight=round(c(rnorm(200, mean=55, sd=5), rnorm(200, mean=65, sd=5))))
ggplot(df, aes(x=weight)) +
geom_histogram(color="black", fill="white")
In cases where you are creating a "histogram" over a range of integers, you could use:
ggplot(data) + geom_bar(aes(x = value, y = ..count..))
I just came across this issue. My solution was to add vertical lines at the points separating my bins. I use "theme_classic" and have a white background. I set my bins to break at 10, 20, 30, etc. So I just added 9 vertical lines with:
geom_vline(xintercept=10, linetype="solid", color = "white", size=2)+
geom_vline(xintercept=20, linetype="solid", color = "white", size=2)+
etc
A silly hack, but it works.

ggplot2: how to create correct legend after using scale_xx_manual

I have a plot with three different lines. I want one of those lines to have points on as well. I also want the two lines without points to be thicker than the one without points. I have managed to get the plot I want, but I the legend isn't keeping up.
library(ggplot2)
y <- c(1:10, 2:11, 3:12)
x <- c(1:10, 1:10, 1:10)
testnames <- c(rep('mod1', 10), rep('mod2', 10), rep('meas', 10))
df <- data.frame(testnames, y, x)
ggplot(data=df, aes(x=x, y=y, colour=testnames)) +
geom_line(aes(size=testnames)) +
scale_size_manual("", values=c(0.5,1,1)) +
geom_point(aes(alpha=testnames), size=5, shape=4) +
scale_alpha_manual("", values=c(1, 0, 0))
I can remove the second (black) legend:
ggplot(data = df, aes(x=x, y=y, colour=testnames)) +
geom_line(aes(size=testnames)) +
scale_size_manual("", values=c(0.5,1,1), guide='none') +
geom_point(aes(alpha=testnames), size=5, shape=4) +
scale_alpha_manual("", values=c(1, 0.05, 0.05), guide='none')
But what I really want is a merge of the two legends - a legend with colours, cross only on the first variable (meas) and the lines of mod1 and mod2 thicker than the first line. I have tried guide and override, but with little luck.
You don't need transparency to hide the shapes for mod1 and mod2. You can omit these points from the plot and legend by setting their shape to NA in scale_shape_manual:
ggplot(data = df, aes(x = x, y = y, colour = testnames, size = testnames)) +
geom_line() +
geom_point(aes(shape = testnames), size = 5) +
scale_size_manual(values=c(0.5, 2, 2)) +
scale_shape_manual(values=c(8, NA, NA))
This gives the following plot:
NOTE: I used some more distinct values in the size-scale and another shape in order to better illustrate the effect.

Can I fix overlapping dashed lines in a histogram in ggplot2?

I am trying to plot a histogram of two overlapping distributions in ggplot2. Unfortunately, the graphic needs to be in black and white. I tried representing the two categories with different shades of grey, with transparency, but the result is not as clear as I would like. I tried adding outlines to the bars with different linetypes, but this produced some strange results.
require(ggplot2)
set.seed(65)
a = rnorm(100, mean = 1, sd = 1)
b = rnorm(100, mean = 3, sd = 1)
dat <- data.frame(category = rep(c('A', 'B'), each = 100),
values = c(a, b))
ggplot(data = dat, aes(x = values, linetype = category, fill = category)) +
geom_histogram(colour = 'black', position = 'identity', alpha = 0.4, binwidth = 1) +
scale_fill_grey()
Notice that one of the lines that should appear dotted is in fact solid (at a value of x = 4). I think this must be a result of it actually being two lines - one from the 3-4 bar and one from the 4-5 bar. The dots are out of phase so they produce a solid line. The effect is rather ugly and inconsistent.
Is there any way of fixing this overlap?
Can anyone suggest a more effective way of clarifying the difference between the two categories, without resorting to colour?
Many thanks.
One possibility would be to use a 'hollow histogram', as described here:
# assign your original plot object to a variable
p1 <- ggplot(data = dat, aes(x = values, linetype = category, fill = category)) +
geom_histogram(colour = 'black', position = 'identity', alpha = 0.4, binwidth = 0.4) +
scale_fill_grey()
# p1
# extract relevant variables from the plot object to a new data frame
# your grouping variable 'category' is named 'group' in the plot object
df <- ggplot_build(p1)$data[[1]][ , c("xmin", "y", "group")]
# plot using geom_step
ggplot(data = df, aes(x = xmin, y = y, linetype = factor(group))) +
geom_step()
If you want to vary both linetype and fill, you need to plot a histogram first (which can be filled). Set the outline colour of the histogram to transparent. Then add the geom_step. Use theme_bw to avoid 'grey elements on grey background'
p1 <- ggplot() +
geom_histogram(data = dat, aes(x = values, fill = category),
colour = "transparent", position = 'identity', alpha = 0.4, binwidth = 0.4) +
scale_fill_grey()
df <- ggplot_build(p1)$data[[1]][ , c("xmin", "y", "group")]
df$category <- factor(df$group, labels = c("A", "B"))
p1 +
geom_step(data = df, aes(x = xmin, y = y, linetype = category)) +
theme_bw()
First, I would recommend theme_set(theme_bw()) or theme_set(theme_classic()) (this sets the background to white, which makes it (much) easier to see shades of gray).
Second, you could try something like scale_linetype_manual(values=c(1,3)) -- this won't completely eliminate the artifacts you're unhappy about, but it might make them a little less prominent since linetype 3 is sparser than linetype 2.
Short of drawing density plots instead (which won't work very well for small samples and may not be familiar to your audience), dodging the positions of the histograms (which is ugly), or otherwise departing from histogram conventions, I can't think of a better solution.

Resources