In ggplot, I want to label some error bars with asterisks ('*') to indicate significance level. The graph is arranged with category labels on the y axis, so that they are easily legible. This means that the error bars are horizontal, and the *'s need to align vertically with them. However, the symbol '*' is not vertically centred in a line of text, so it gets plotted too high using geom_text.
Reproducible example
set.seed(123)
x = data.frame(grp = LETTERS[1:8], val = sample(10,8))
se = runif(8, 0.1,2)
x$upper = x$val + se
x$lower = x$val - se
x$labs = sample(c('*','**', '***', ''), 8, T)
gg = ggplot(x, aes(grp,val)) +
geom_point() +
geom_errorbar(aes(ymax = upper, ymin=lower), width=0.3) +
scale_y_continuous(limits = c(-2,12)) +
coord_flip()
gg + geom_text(aes(y=upper+0.2, label=labs), size=8, hjust='left')
I know that I can nudge the label position like this:
gg + geom_text(aes(y=upper+0.2, label=labs), size=8, nudge_x = -0.2, hjust='left')
However, getting the correct value of nudge_x needs to be done in an ad-hoc manner and the correct value varies with size of graphics output, font size, number of categories on the y scale etc. Is there a way to get the labels to automatically align vertically? I tried using geom_point with shape=42 instead of geom_text to draw the asterisks. Although this solves the vertical alignment issue, it introduces its own problem with getting the spacing between a horizontal row of asterisks correct (i.e. getting '**' and '***' to print with the correct separation between adjacent symbols).
Just eyeballing it on my machine, it looks like this vjust adjustment seems to work, and I think it may be fairly robust to changes in device output size, font size, etc.
gg + geom_text(aes(y=upper+0.2, label=labs), size=8, hjust='left',vjust = 0.77)
Related
I have a plot with a continous y-axis and discrete x-axis.
For the data I have a group factor with 3 levels and 2 meausement points, so 6 geoms are created
1
I would like to keep the width of the single geoms but adding space between the two measurement points, respectively the two groups of geoms. Like: 3 geoms - gap - 3 geoms. Is there any possibility of adjusting the position of a group of geoms on the x-axis in ggplot?
preferences %>%
pivot_longer(c(F1_life_satisfaction_pre, F1_life_satisfaction_current), names_to = "variables", values_to = "ratings")%>%
ggplot( aes(y=ratings, x=fct_inorder(variables), fill=fct_inorder(playing_preference))) +
geom_violin(scale="width", adjust=0.5, width=0.8, alpha= 0.2, position = position_dodge(1)) +
stat_summary(fun=mean, geom="point", shape=23, size=2, position = position_dodge(1)) +
stat_summary(aes(group=fct_inorder(playing_preference)), fun=mean, geom = "line", size= 0.5, position = position_dodge(1)) +
stat_summary(fun.data=mean_cl_normal, fun.args=list(mult=1),aes(x=fct_inorder(variables), y=ratings), geom="errorbar",
width=0.05, position = position_dodge(1)) +
scale_x_discrete(labels = c("pre-Pokemon-Go", "current"),expand = c(0, 0.3)) +
theme(axis.text.x = element_text(color = "black", size=10)) +
scale_y_continuous(breaks = c(1, 2, 3, 4, 5, 6, 7), limits=c(1,7)) +
geom_segment(aes(x = 0, y=4, yend=4, xend=3), color="grey") +
theme(axis.ticks.x = element_blank()) +
labs(fill = "playing preference") +
labs(x="life satisfaction") +
theme(axis.title = element_text(size = 10))+
theme(legend.text = element_text(size = 10)) +
theme(legend.title = element_text(size = 10)) +
labs(y = "mean ratings") +
geom_boxplot(width=0.1,color="black", alpha=0.2, position = position_dodge(1)) +
scale_fill_viridis(discrete=T)
TL;DR - play with width= and position_dodge(width=...) within your line for geom_violin and geom_boxplot to adjust the positions along with scale_x_discrete(expand=expansion(...).
The first point is that the resolution (and how close and far apart) things are on your plot will be related to the size of your window. With that being said, the positioning relationship of the plot elements between one another can be controlled via ggplot. In particular, you want to change the values of width= and position_dodge(width=...) in your geom_violin call (and your geom_boxplot call).
Example Dataset
I'll use an example dataset to illustrate the idea, where I'll plot boxplots... but the idea is identical. The example dataset contains two x values ("Group1" and "Group2"), and each of those has subdivisions that are either "A", "B", or "C", containing a separate normal distribution of 50 datapoints for every x and x.subdiv.
set.seed(8675309)
df <- data.frame(
x=c(rep('Group1', 150), rep('Group2', 150)),
x.subdiv=rep(c(rep('A', 50), rep('B',50), rep('C',50)), 2),
y=unlist(lapply(1:6, function(x){rnorm(50, runif(1,10,15), runif(1,0,7))}))
)
Width of position_dodge
Here's the simple boxplot, where I'll use 0.5 as the value for both width= and position_dodge(width=...). Note that the first argument in position_dodge is width=, so you can just supply that number directly to that function without explicitly assigning to the width argument.
p <- ggplot(df, aes(x=x, y=y)) + theme_bw()
p + geom_boxplot(aes(fill=x.subdiv), width=0.5, position=position_dodge(0.5))
The rule to note here is:
geom_boxplot(width=...) controls how wide the overall spread of box plots are around each x= value.
position_dodge(width=...) controls the amount of spread (the amount of "dodging") for the groups around the x= aesthetic.
So this is what happens when you change position_dodge(width=1), but leave geom_boxplot(width=0.5):
p + geom_boxplot(aes(fill=x.subdiv), width=0.5, position=position_dodge(1))
The width of each box remains the same as before, but the positioning of each box around x= is more "spread out". In effect, each is "dodged" more. If you set position_dodge(width=0.2), you'll see the opposite effect, where the boxes become squished together (because they are not spread out as much around x=):
p + geom_boxplot(aes(fill=x.subdiv), width=0.5, position=position_dodge(0.2))
The interesting thing is how geom_boxplot(width=) and position_dodge(width=) are related:
If geom_boxplot(width=) is equal to position_dodge(width=), the boxes will be touching
If geom_boxplot(width=) is less than position_dodge(width=), the boxes will be separated from one another
If geom_boxplot(width=) is greater than position_dodge(width=), the boxes will be overlapping one another
Width of the geom
The width= of the geom itself relates to how wide the boxplots are. The point to keep in mind are these two points:
The width= is the sum of all the widths of the individual dodged geoms for that particular x= aesthetic.
width=1 is the width between two values on a discrete axis, meaning when you set width=1, the boxes will be wide enough to touch
That means that if we set geom_boxplot(width=1), the combined total of all the boxes for "Group1" will be wide enough to touch the boxes of "Group2"... but you would only see that if there were no overlap among the boxes (meaning that position_dodge(width=) would be equal to geom_boxplot(width=)).
So this makes the boxes wide enough to be touching, but position_dodge(width) is less than geom_boxplot(width)... so the boxes overlap, but "Group1" boxes are separated from "Group2" boxes:
p + geom_boxplot(aes(fill=x.subdiv), width=1, position=position_dodge(0.8))
If we want everything to touch, you have to set them equal, and both equal to 1:
p + geom_boxplot(aes(fill=x.subdiv), width=1, position=position_dodge(1))
Control both widths
In the end, it's probably best to control both. If we go from the previous plot, you probably want the plots to have separation between "Group1" and "Group2". That means you need to make the width of all boxes smaller (which we control by geom_boxplot(width)). However, you probably still want the dodging to leave a bit of space between the boxes, so we'll have to set position_dodge(width) to be greater than geom_boxplot(width), but not too large so that we lose the separation between "Group1" and "Group2". Something like this works pretty well:
p + geom_boxplot(aes(fill=x.subdiv), width=0.5, position=position_dodge(0.55))
In your case, you have both geom_violin and geom_boxplot, so you'll need to adjust those together and work out the proper look.
EDIT: "Shift Left and Right" and "Squish"
If the width= and position_dodge(width= arguments are just not quite getting you what you need, there is another parameter that can work in concert with them to move things around. This would be to use scale_x_discrete(expand=... to control the amount of space to the left and right of your x axis items. Used together with width= and position_dodge(width=, this actually gives you precise control of where to position your data along the x axis while still respecting the automated plotting that ggplot2 provides.
width= controls the whitespace between data along the x axis
position_dodge(width= controls the amount of whitespace between subgroups in the data positioned along the x axis
scale_x_discrete(expand=... controls white space to the left and right sides of the panel.
I'll demonstrate the functionality using the same dataset as before. Note that proper use of the expand= argument for scale_x_discrete should call expansion() and you will need to provide a 1 or 2 length vector to either add= or mult=. Play around with both and numbers to see the effect, but here's kind of what to expect.
The expansion() function takes either mult= or add= as arguments, which can either be a vector of length 2 (where 1 is applied to left side and 2 is applied to the right side, or length 1 (where the number is applied to both sides). Numbers sent to mult= are multiplied by the normal expansion to give you the new amount, so the code below sets the extra whitespace to the left and the right equal to 30% (0.3 * normal) of the typical expansion for both sides:
p + geom_boxplot(aes(fill=x.subdiv), width=0.5, position=position_dodge(0.55)) +
scale_x_discrete(expand=expansion(mult=0.3))
Sending two values, you can adjust separately. This sets the left side to be 100% (normal) and the right side to be reduced to 50% of normal:
p + geom_boxplot(aes(fill=x.subdiv), width=0.5, position=position_dodge(0.55)) +
scale_x_discrete(expand=expansion(mult=c(1,0.5)))
Bottom Line: Seems like by using all three arguments for width=, position_dodge(width=, and scale_x_discrete(expand=expansion(..., you can theoretically place your x groupings anywhere along your plot. Just keep in mind that the resolution and aspect ratio of your graphics device will change how things are laid out a bit, so additional control can be adjusted by resizing the graphics window.
When using a histogram with x as a POSIXct value, I'm not sure how you're supposed to line the ticks up with the binsize of the graph.
Setting the tick size to the same as the binsize makes it line a bit off, but the offset adds onto each other until its no longer accurate.
bymonth <- ggplot() +
scale_x_datetime("", breaks = date_breaks("60 days"), labels = date_format("%m-%y")) +
...
lots of geom_rects for background colors
...
theme(legend.title = element_blank()) +
geom_histogram(data=dat, aes(x = iso, fill = name), binwidth = 30*24*60*60, position = 'dodge')
I tried using annotate() as well as experimenting with the spacing of the tick but I think my approach here might be wrong in its own accord
This leads to a graph looking something like this
Which is quite annoying
I haven't been able to remove extra white space flanking groups of bars in geom_plot.
I'd like to do what Roland achieves here: Remove space between bars ggplot2 but when I try to implement his solution I get the error "Warning message:
geom_bar() no longer has a binwidth parameter. Please use geom_histogram() instead."
I added this line of code to my plot (trying different widths):
geom_histogram(binwidth = 0.5) +
which returns "Error: stat_bin() must not be used with a y aesthetic." and no plot.
Data:
mydf<- data.frame(Treatment = c("Con", "Con", "Ex", "Ex"),
Response = rep(c("Alive", "Dead"), times=2),
Count = c(259,10,290,21))
aPalette<-c("#009E73", "#D55E00")
Plot:
example<-ggplot(mydf, aes(factor(Response), Count, fill = Treatment)) +
geom_bar(stat="identity",position = position_dodge(width = 0.55), width =
0.5) +
scale_fill_manual(values = aPalette, name = "Treatment") + #legend title
theme_classic() +
labs(x = "Response",
y = "Count") +
scale_y_continuous(breaks = c(0,50,100,150,200,250,275), expand = c(0,0),
limits = c(0, 260)) +
theme(legend.position = c(0.7, 0.3)) +
theme(text = element_text(size = 15)) #change all text size
example
Returns:
Note: I don't know why I'm getting "Warning message: Removed 1 rows containing missing values (geom_bar)." but I'm not concerned about it because that doesn't happen using my actual data
**Edit re: note - this is happening because I set the limit for the y-axis lower then the max value for the bar that was removed. I'm not going to change to code so I don't have to redraw my figure, but changing
limits = c(0, 260)
to
limits = c(0, 300)
will show all the bars. In case someone else had a similar problem. I'm going to find a post related to this issue and will make this edit more concise when I can link an answer
Forgive me if I completely missed what your trying to accomplish here but the only reason that ggplot has included so much white space is because you constrained the bars to a particular width and increased the size of the graph.
The white space within the graph is an output of width of the bars and width of the graph.
Using your original graph...
We notice a lot of whitespace but you both made the bins small and your graph wide. Think of the space as a compromise between bins and whitespace. Its illogical to expect a wide graph with small bins and no whitespace. To fix this we can either decrease the graph size or increase the bin size.
First we increase the bin size back to normal by removing your constraints.
Which looks rediculous....
But by looking at the Remove space between bars ggplot2 link that you included above all he did was remove constraints and limit width. Doing so would result in a similar graph...
Including the graph from your link above....
And removing all of your constraints....
example<-ggplot(mydf, aes(factor(Response), Count, fill = Treatment)) +
geom_bar(stat="identity",position = position_dodge()) +
scale_fill_manual(values = aPalette, name = "Treatment") +
theme_bw() +
labs(x = "Response", y = "Count")
example
If your goal was not to make your graph similar to the one in the link by removing whitespace let me know, other then that I hope this helped.
Being new to R, I produced very simple horizontal bar plots using ggplot2 and coord_flip().
Notably, I insert the values of the x variable at the left side of the bar by default (or at the right side if the label does not fit) using the following command:
geom_text(aes(x=TYPE, y=COUNT, ymax=COUNT, label=COUNT,
hjust=ifelse(COUNT>1000, 1.5, -0.3)),
size=3.5, position = position_dodge(width=0.8))
The problem is that, depending on the data-sets, the x values can vary significantly (e.g. dataset_1 x values can be between 1 to 200; dataset_2 x values can be between 10,000 to 100,000; ...), which causes the label of the shortest bar to be misplaced with the ifelse statement I am using (see brown bar in figure A below).
In this case I cannot just use a constant COUNT>1000 condition for all the datasets.
Figure A:
I could modify manually the value of the hjust=ifelse(COUNT>1000,...statement for each dataset.
But I was wondering if it is possible to automatically move the label outs of the bar if it does not fit between the axis and the top of the bar without modifying the value of the ifelse condition for each dataset, like in figure B below.
Figure B :
EDIT
Workaround (not perfect but better):
Placing the label at the right of the bar if the value is less than 5% of the maximum value
MAXI <- max(data[,2])
geom_text(aes(x=TYPE, y=COUNT, ymax=COUNT, label=COUNT,
hjust=ifelse((COUNT/MAXI)<0.05, -0.3, 1.3)))
Having some labels outside the bars and some inside can distort the visual encoding of magnitude as the length of the bar. Another option is to put the values in the middle of the bar but set geom_text to skip values that are small relative to the maximum bar. Or, if you want to include text for all the bar values added, you can put them below the bars in order to keep a clean visual pattern for the bar lengths. Examples of both options are below:
# Fake data
dat = data.frame(x = LETTERS[1:5], y=c(432, 1349, 10819, 5489, 12123))
ggplot(dat, aes(x, y, fill=x)) +
geom_bar(stat="identity") +
geom_text(aes(label=ifelse(y < 0.05*max(dat$y), "", format(y, big.mark=",")), y=0.5*y),
colour="white") +
coord_flip(xlim=c(0.4,5.6), ylim=c(0, 1.03*max(dat$y)), expand=FALSE) +
guides(fill=FALSE)
ggplot(dat, aes(x, y, fill=x)) +
geom_hline(yintercept=0, lwd=0.3, colour="grey40") +
geom_bar(stat="identity") +
geom_text(aes(label=format(y, big.mark=","), y=-0.01*max(dat$y)),
size=3.5, hjust=1) +
coord_flip(ylim = c(-0.04*max(dat$y), max(dat$y))) +
guides(fill=FALSE)
I know, 3D Barcharts are a sin. But i´m asked to do them and as a trade-off i suggested to only make a border with a slightly darker color than the bar´s on the top and the right side of the bar. Like that, the bars would have some kind of "shadow" (urgh) but at least you still would be able to compare them.
Is there any way to do this?
ggplot(diamonds, aes(clarity)) + geom_bar()
Another possibility, using two sets of geom_bar. The first set, the green ones, are made slightly higher and offset to the right. I borrow the data from #Didzis Elferts.
ggplot(data = df2) +
geom_bar(aes(x = as.numeric(clarity) + 0.1, y = V1 + 100),
width = 0.8, fill = "green", stat = "identity") +
geom_bar(aes(x = as.numeric(clarity), y = V1),
width = 0.8, stat = "identity") +
scale_x_continuous(name = "clarity",
breaks = as.numeric(df2$clarity),
labels = levels(df2$clarity))+
ylab("count")
As you already said - 3D barcharts are "bad". You can't do it directly in ggplot2 but here is a possible workaround for this.
First, make new data frame that contains levels of clarity and corresponding count for each level.
library(plyr)
df2<-ddply(diamonds,.(clarity),nrow)
Then in ggplot() call use new data frame and clarity as x values and V1 (counts) as y values and add geom_blank() - this will make x axis with levels we need. Then add geom_rect() to produce shading for bars - here xmin and xmax values are made as.numeric() from clarity and constant is added - for xmin constant should be less than half of bars width and xmax constant larger than half of bars width. ymin is 0 and ymax is V1 (counts) plus some constant. Finally add geom_bar(stat="identity") above this shadow to plot actually barplot.
ggplot(df2,aes(clarity,V1)) + geom_blank()+
geom_rect(aes(xmin=as.numeric(clarity)-0.38,
xmax=as.numeric(clarity)+.5,
ymin=0,
ymax=V1+250),fill="green")+
geom_bar(width=0.8,stat="identity")