How to modify whiskers of a boxplot in ggplot2? - r

I'll start with an MWE:
library(ggplot2)
p <- ggplot(mtcars, aes(factor(cyl), mpg, fill = factor(am)))
p + geom_boxplot()
I'd like to modify the colour of the whiskers, e.g., set it to red. I don't think it's possible to do this directly both geom_boxplot, so this is my workaround:
library(Hmisc)
stat_sum_df <- function(fun, geom = "crossbar", ...) {
stat_summary(fun.data = fun, geom = geom, width = 0.4, ...)
}
p + stat_boxplot(geom = 'linerange', colour = "red", position = "dodge) +
stat_sum_df("median_hilow", conf.int = 0.5, position = "dodge")
The line ranges are stacked on top of each other. So next try:
p + stat_boxplot(geom = 'linerange', colour = "red", position = position_dodge(width = .5)) +
stat_sum_df("median_hilow",conf.int=0.5, position = position_dodge(width = .5))
Looks nicer, but now there is a fixed space between the boxes (compare cyl = 8 on first and third plot). As I'm going to use this code for different number of levels of am (of course in my real data, it's not am), I don't know in advance how wide the boxes themselves will be, so I can't set a fixed width for the linerange without specifying a fixed width for the boxes.
Is there a way either to selectively modify whiskers of a boxplot or to adjust space between linerange elements according to space between the boxes?

How about plotting two boxplots on top of each other. One with red lines and a second one on top without any wiskers at all.
p + geom_boxplot(color="red") + geom_boxplot(aes(ymin=..lower.., ymax=..upper..))

Another option is to plot error bars and on top of them the boxplots without the whiskers:
library(ggplot)
p + stat_boxplot(
geom = "errorbar",
colour = "red",
width = 0,
position = position_dodge(0.75)
) +
geom_boxplot(coef = 0, outlier.shape = NA)

Related

Merging 2 Legends In a Specific way

I have a plot of my data that includes both a boxplot and a point plot (data from mtcars for illustration)
ggplot(mtcars,aes(x=factor(cyl), y=mpg), fill=factor(carb),shape=factor(vs))+
geom_boxplot(data=subset(mtcars,am==1),aes(x = factor(cyl), y = mpg,fill=factor(carb),shape=factor(vs)),outlier.shape = NA, alpha = 0.85, width = .65, colour = "BLACK") +
geom_point(data=subset(mtcars,am==1 & vs==1),aes(x = factor(cyl), y = mpg,fill=factor(carb),shape=factor(vs)),outlier.shape = NA,size=5,alpha=.4,shape=1, colour = "BLACK", position = position_dodge(width = 0.65))
my objective is for there
to be a single legend instead of two legends
that now shows all the colors associated with the fill (based on carb) and a single element which explains what the open circles correspond to (i.e. vs==1).
for that single element (that corresponds to geom_point) to display an open circle (corresponding to the open circle in the graph) and not boxplots as its currently showing.
any help will be greatly appreciated
Remove the shape aesthetic from geom_boxplot. Also, in general no need to specify color = "black", as this is the default for geom_boxplot (same for geom_point).
The version I was running online threw a warning regarding outlier.shape, so I have removed that.
Add shape as constant aesthetic to point and use scale_shape_manual to define your shape (use shape = 21 if you want a fill - your code suggests this, or shape = 1, if you don’t.). When you remove the legend title, the legends look fairly "merged".
However, Not sure what you exactly mean with "merged legend" . Mind showing a desired output?
library(ggplot2)
ggplot(mtcars,aes(x=factor(cyl), y=mpg), fill=factor(carb),shape=factor(vs))+
geom_boxplot(data=subset(mtcars,am==1), aes(x = factor(cyl), y = mpg, fill=factor(carb)), alpha = 0.85, width = .65) +
geom_point(data=subset(mtcars,am==1 & vs==1),aes(x = factor(cyl), y = mpg,fill=factor(carb), shape = "v = 1"), size=5, alpha=.4, position = position_dodge(width = 0.65)) +
scale_shape_manual(NULL, values = 21)

How to properly form ggplot graphs, without cutting off important parts of the graph?

I have created a barchart using ggplot() + geom_bar() functions, by ggplot2 package. I have also used coord_flip() to reverse the orientation of the bars and geom_text() to add the values at the top of each bar. Some of the bars have different colors, so there is a legend following the graph. What I am getting as result is a picture half occupied by the graph, half by the legend and with the values on top of the longest bars being cut off because of the small size of the graph.
Any ideas on how to enlarge the size of the graph and reduce the size of the legend, in order the values of the bars not to be cut off?
Thank you
This is my code on imaginary data:
labels <- c("A","B","C","D","E")
freq <- c(10.3678, 5.84554, 1.5673, 2.313, 7.111)
df <- as.data.frame(cbind(labels,freq))
type <- c("rich","poor","poor","poor","rich")
library(ggplot2)
ggplot(df, aes(x = reorder(labels,freq), y= freq, fill = type)) +
geom_bar(stat = "identity", alpha = 1, width = 0.9)+
coord_flip()+
xlab("")+
ylab("Mean frequency")+
scale_fill_manual(name = "Type", values = c("red", "blue")) +
ggtitle("Mean frequency of different labels")+
geom_text(label = sort(freq, decreasing = FALSE), size = 3.5, hjust = -0.2)
And this is the graph it gives as result:
There are a few fixes to this:
Change your Limits
As indicated by #Dave2e - see his response
Change the size of your output
The interesting thing about graphics in R is that the aspect ratio and resolution of the graphics device will change the result and look of a plot. When I ran your code... no clipping was observed. You can test this out creating the plot and then saving differently. If I take your default code, here's what I get with different arguments to width= and height= for ggsave() as a png:
ggsave('a1.png', width=10, height=5)
ggsave('a2.png', width=15, height=5)
Set an Expansion
The third way is to set an expansion to the scale limits. By default, ggplot2 actually adds some "padding" to the ends of a scale. So, if you set your limits from 0 to 10, you'll actually have a plot area that goes a bit beyond this (about 5% beyond by default). You can redefine that setting by using the expand= argument of scale_... commands in ggplot. So you can set this limit, for example in the following code:
labels <- c("A","B","C","D","E")
freq <- c(10.3678, 5.84554, 1.5673, 2.313, 7.111)
type <- c("rich","poor","poor","poor","rich")
df <- data.frame(labels, freq, type)
library(ggplot2)
ggplot(df, aes(x = reorder(labels,freq), y= freq, fill = type)) +
geom_bar(stat = "identity", alpha = 1, width = 0.9)+
coord_flip()+
xlab("")+
ylab("Mean frequency")+
scale_fill_manual(name = "Type", values = c("red", "blue")) +
ggtitle("Mean frequency of different labels")+
geom_text(label = freq, size = 3.5, hjust = -0.2) +
scale_y_continuous(expand=expansion(mult=c(0,0.15)))
You can define the lower and upper expansion for an axis, so in the above code I've defined to set no expansion to the lower limit of the y scale and to use a multiplier of 0.15 (about 15%) to the upper limit. Default is 0.05, I believe (or 5%).
You can override the default limits on the y axis scale with with the ylim() function.
labels <- c("A","B","C","D","E")
freq <- c(10.3678, 5.84554, 1.5673, 2.313, 7.111)
type <- c("rich","poor","poor","poor","rich")
df <- data.frame(labels, freq, type)
#set the max y axis limit to allow enough room for the label
ylimitmax <- 11
library(ggplot2)
ggplot(df, aes(x = reorder(labels,freq), y= freq, fill = type)) +
geom_bar(stat = "identity", alpha = 1, width = 0.9)+
coord_flip()+
xlab("")+
ylab("Mean frequency")+
scale_fill_manual(name = "Type", values = c("red", "blue")) +
ggtitle("Mean frequency of different labels")+
ylim(0, ylimitmax) +
geom_text(label = freq, size = 3.5, hjust = -0.2)
The script shows how to code the manual limits but you may want to automate the limit calculation with something like ylimitmax= max(freq) * 1.2.

Is there a way to give a fixed width and a fixed heigh of a text (string) on the ggplot

It seems that one can only change the size of the text, which scales the text in both width and height in ggplot. Does anyone know if there is a way to plot a text with a given width and height at a given x, y position on the plot? In another words, I want to be able to stretch the text in x or y dimension without affecting the other, like the image provided here.
Much appreciated.
What you want is the ggfittext package
You can play around with it for what you need, but stealing a few examples from its doc gives this :
library(ggplot2)
library(ggfittext)
library(ggpubr) # for ggarrange
g1 <- ggplot(animals, aes(x = type, y = flies, label = animal)) +
geom_tile(fill = "white", colour = "black") +
geom_fit_text() + ggtitle("no options")
g2 <- ggplot(animals, aes(x = type, y = flies, label = animal)) +
geom_tile(fill = "white", colour = "black") +
geom_fit_text(grow = T) + ggtitle("grow = T")
ggarrange(g1, g2, ncol = 2)

R barplot - highest value on top is hidden

Simple barplot with values on top of bars (I know it is silly - I was forced to add them :)). text works good, but value above highest frequency bar is hidden. I tried margins but it moves the whole plot instead of only the graph area. What can you suggest? Thanks!
x = c(28,1,4,17,2)
lbl = c("1","2","3","4+","tough guys\n(type in)")
bp = barplot(x,names.arg=lbl,main="Ctrl-C clicks",col="grey")
text(x = bp, y = x, label = x, pos = 3, cex = 0.8, col = "red",font=2)
Plot example:
You can fix this by extending the ylim
bp = barplot(x,names.arg=lbl,main="Ctrl-C clicks",col="grey", ylim=c(0,30))
Another solution using ggplot2:
library(ggplot2)
x = c(28,1,4,17,2)
lbl = c("1","2","3","4+","tough guys \n(type in)")
test <- data.frame(x, lbl)
bp = ggplot(test, aes(x=lbl, y= x))+
geom_bar(color = "grey", stat="identity")+ ## set color of bars and use the value of the number in the cells.
geom_text(aes(label= x), vjust = -1, color = "red")+
ggtitle("Ctrl-C clicks")+
theme_bw()+ ## give black and white theme
theme(plot.title = element_text(hjust = 0.5),## adjust position of title
panel.grid.minor=element_blank(), ## suppress minor grid lines
panel.grid.major=element_blank() ##suppress major grid lines
)+
scale_y_continuous(limits = c(0,30)) ## set scale limits
bp

How to set automatic label position based on box height

In a previous question, I asked about moving the label position of a barplot outside of the bar if the bar was too small. I was provided this following example:
library(ggplot2)
options(scipen=2)
dataset <- data.frame(Riserva_Riv_Fine_Periodo = 1:10 * 10^6 + 1,
Anno = 1:10)
ggplot(data = dataset,
aes(x = Anno,
y = Riserva_Riv_Fine_Periodo)) +
geom_bar(stat = "identity",
width=0.8,
position="dodge") +
geom_text(aes( y = Riserva_Riv_Fine_Periodo,
label = round(Riserva_Riv_Fine_Periodo, 0),
angle=90,
hjust= ifelse(Riserva_Riv_Fine_Periodo < 3000000, -0.1, 1.2)),
col="red",
size=4,
position = position_dodge(0.9))
And I obtain this graph:
The problem with the example is that the value at which the label is moved must be hard-coded into the plot, and an ifelse statement is used to reposition the label. Is there a way to automatically extract the value to cut?
A slightly better option might be to base the test and the positioning of the labels on the height of the bar relative to the height of the highest bar. That way, the cutoff value and label-shift are scaled to the actual vertical range of the plot. For example:
ydiff = max(dataset$Riserva_Riv_Fine_Periodo)
ggplot(dataset, aes(x = Anno, y = Riserva_Riv_Fine_Periodo)) +
geom_bar(stat = "identity", width=0.8) +
geom_text(aes(label = round(Riserva_Riv_Fine_Periodo, 0), angle=90,
y = ifelse(Riserva_Riv_Fine_Periodo < 0.3*ydiff,
Riserva_Riv_Fine_Periodo + 0.1*ydiff,
Riserva_Riv_Fine_Periodo - 0.1*ydiff)),
col="red", size=4)
You would still need to tweak the fractional cutoff in the test condition (I've used 0.3 in this case), depending on the physical size at which you render the plot. But you could package the code into a function to make the any manual adjustments a bit easier.
It's probably possible to automate this by determining the actual sizes of the various grobs that make up the plot and setting the condition and the positioning based on those sizes, but I'm not sure how to do that.
Just as an editorial comment, a plot with labels inside some bars and above others risks confusing the visual mapping of magnitudes to bar heights. I think it would be better to find a way to shrink, abbreviate, recode, or otherwise tweak the labels so that they contain the information you want to convey while being able to have all the labels inside the bars. Maybe something like this:
library(scales)
ggplot(dataset, aes(x = Anno, y = Riserva_Riv_Fine_Periodo/1000)) +
geom_col(width=0.8, fill="grey30") +
geom_text(aes(label = format(Riserva_Riv_Fine_Periodo/1000, big.mark=",", digits=0),
y = 0.5*Riserva_Riv_Fine_Periodo/1000),
col="white", size=3) +
scale_y_continuous(label=dollar, expand=c(0,1e2)) +
theme_classic() +
labs(y="Riserva (thousands)")
Or maybe go with a line plot instead of bars:
ggplot(dataset, aes(Anno, Riserva_Riv_Fine_Periodo/1e3)) +
geom_line(linetype="11", size=0.3, colour="grey50") +
geom_text(aes(label=format(Riserva_Riv_Fine_Periodo/1e3, big.mark=",", digits=0)),
size=3) +
theme_classic() +
scale_y_continuous(label=dollar, expand=c(0,1e2)) +
expand_limits(y=0) +
labs(y="Riserva (thousands)")

Resources