Consider the following
d = data.frame(y=rnorm(120),
x=rep(c("bar", "long category name", "foo"), each=40))
ggplot(d,aes(x=x,y=y)) +
geom_boxplot() +
theme(axis.text.x=element_text(size=15, angle=90))
The x-axis labels are aligned by the center of the label. Is it possible to automatically align on the right so that every label would end right below the graph?
This is precisely what the hjust and vjust parameters are for in ggplot. They control the horizontal and vertical justification respectively and range from 0 to 1. See this question for more details on justifications and their values (What do hjust and vjust do when making a plot using ggplot?).
To get the labels the way you want you can use:
hjust = 0.95 (to leave some space between the labels and the axis)
vjust = 0.2 (to center them in this case)
ggplot(d,aes(x=x,y=y)) + geom_boxplot() +
theme(axis.text.x=element_text(size=15, angle=90,hjust=0.95,vjust=0.2))
Alternatively, flip the axis, your customers will thank you and have less neck pain (plus, I find most boxplots easier to interpret with this orientation):
ggplot(d, aes(x = x, y = y)) +
geom_boxplot() +
coord_flip()
Related
In a previous question, I asked about moving the label position of a barplot outside of the bar if the bar was too small. I was provided this following example:
library(ggplot2)
options(scipen=2)
dataset <- data.frame(Riserva_Riv_Fine_Periodo = 1:10 * 10^6 + 1,
Anno = 1:10)
ggplot(data = dataset,
aes(x = Anno,
y = Riserva_Riv_Fine_Periodo)) +
geom_bar(stat = "identity",
width=0.8,
position="dodge") +
geom_text(aes( y = Riserva_Riv_Fine_Periodo,
label = round(Riserva_Riv_Fine_Periodo, 0),
angle=90,
hjust= ifelse(Riserva_Riv_Fine_Periodo < 3000000, -0.1, 1.2)),
col="red",
size=4,
position = position_dodge(0.9))
And I obtain this graph:
The problem with the example is that the value at which the label is moved must be hard-coded into the plot, and an ifelse statement is used to reposition the label. Is there a way to automatically extract the value to cut?
A slightly better option might be to base the test and the positioning of the labels on the height of the bar relative to the height of the highest bar. That way, the cutoff value and label-shift are scaled to the actual vertical range of the plot. For example:
ydiff = max(dataset$Riserva_Riv_Fine_Periodo)
ggplot(dataset, aes(x = Anno, y = Riserva_Riv_Fine_Periodo)) +
geom_bar(stat = "identity", width=0.8) +
geom_text(aes(label = round(Riserva_Riv_Fine_Periodo, 0), angle=90,
y = ifelse(Riserva_Riv_Fine_Periodo < 0.3*ydiff,
Riserva_Riv_Fine_Periodo + 0.1*ydiff,
Riserva_Riv_Fine_Periodo - 0.1*ydiff)),
col="red", size=4)
You would still need to tweak the fractional cutoff in the test condition (I've used 0.3 in this case), depending on the physical size at which you render the plot. But you could package the code into a function to make the any manual adjustments a bit easier.
It's probably possible to automate this by determining the actual sizes of the various grobs that make up the plot and setting the condition and the positioning based on those sizes, but I'm not sure how to do that.
Just as an editorial comment, a plot with labels inside some bars and above others risks confusing the visual mapping of magnitudes to bar heights. I think it would be better to find a way to shrink, abbreviate, recode, or otherwise tweak the labels so that they contain the information you want to convey while being able to have all the labels inside the bars. Maybe something like this:
library(scales)
ggplot(dataset, aes(x = Anno, y = Riserva_Riv_Fine_Periodo/1000)) +
geom_col(width=0.8, fill="grey30") +
geom_text(aes(label = format(Riserva_Riv_Fine_Periodo/1000, big.mark=",", digits=0),
y = 0.5*Riserva_Riv_Fine_Periodo/1000),
col="white", size=3) +
scale_y_continuous(label=dollar, expand=c(0,1e2)) +
theme_classic() +
labs(y="Riserva (thousands)")
Or maybe go with a line plot instead of bars:
ggplot(dataset, aes(Anno, Riserva_Riv_Fine_Periodo/1e3)) +
geom_line(linetype="11", size=0.3, colour="grey50") +
geom_text(aes(label=format(Riserva_Riv_Fine_Periodo/1e3, big.mark=",", digits=0)),
size=3) +
theme_classic() +
scale_y_continuous(label=dollar, expand=c(0,1e2)) +
expand_limits(y=0) +
labs(y="Riserva (thousands)")
I was asked this question on Twitter and thought it might be good to have it here.
When making labeled, side-by-side plots with plot_grid(), things work as expected for single-letter labels:
library(cowplot)
p1 <- ggplot(iris, aes(x = Sepal.Length, fill = Species)) +
geom_density(alpha = 0.7) +
ggtitle("") + theme_minimal()
p2 <- ggplot(iris, aes(x = Sepal.Length, fill = Species)) +
geom_density(alpha = 0.7) +
ggtitle("") +
scale_fill_grey() + theme_minimal()
plot_grid(p1, p2, labels = c("A", "B"))
However, if we're using longer strings as labels, the labels move to the right, and they move the more the longer the strings are:
plot_grid(p1, p2, labels = c("Density plot in color", "In gray"))
How can this be fixed?
Disclaimer: I'm the author of the package. Posting this here with answer in the hope it will be useful.
The default settings for the parameters hjust and label_x in plot_grid() are optimized for single-letter labels, and they don't work for longer labels. Overriding the settings fixes the problem:
plot_grid(p1, p2, labels = c("Density plot in color", "In gray"),
hjust = 0, label_x = 0.01)
In particular, the default hjust setting is hjust = -0.5. This moves the label to the right by an amount equivalent to half its width. This makes sense for single letter labels, because then we can have the letters appear half a letter width away from the left border by setting label_x = 0, and this will work irrespective of label font size or any other plot features the user may have chosen.
However, moving a label by half its width doesn't make any sense at all for longer labels, and in particular labels of differing lengths.
ggplot(G, aes(x=State, y=Score, fill=State))+
geom_bar(stat="identity", position="dodge")+
scale_y_continuous(labels = scales::comma)
Please help me make more elegant to read.
this output
+ I wanna use a line in x at the score of 236, tried
abline( v=236)
did not work!
Try this, it works for me
barplot(c(1,2,3,4),space=c(1,1,1,1)) # equally spaced bars as expected
barplot(c(1,2,3,4),space=c(1,20,1,1)) # massive gap before the 2nd bar
barplot(c(1,2,3,4),space=c(20,1,1,1)) # the same as the first plot
That's a lot of bars. You can make the bars narrower by specifying their width inside geom_bar() (as a proportion, 1 is touching, 0.5 is equal amounts of bar and gap, the default is 0.9).
ggplot(G, aes(x = State, y = Score, fill = State)) +
geom_bar(stat = "identity", position = "dodge", width = 0.8) +
scale_y_continuous(labels = scales::comma)
Also note that the position = "dodge" isn't doing anything in your example.
For a plot with that many bars, if you want them all labeled, I would suggest adding + coord_flip() to your plot - usually it's easier to have lots of vertical space than lots of horizontal space, and the long labels won't overlap. When you have over 50 bars, you're going to need a fair amount of space.
I know, 3D Barcharts are a sin. But i´m asked to do them and as a trade-off i suggested to only make a border with a slightly darker color than the bar´s on the top and the right side of the bar. Like that, the bars would have some kind of "shadow" (urgh) but at least you still would be able to compare them.
Is there any way to do this?
ggplot(diamonds, aes(clarity)) + geom_bar()
Another possibility, using two sets of geom_bar. The first set, the green ones, are made slightly higher and offset to the right. I borrow the data from #Didzis Elferts.
ggplot(data = df2) +
geom_bar(aes(x = as.numeric(clarity) + 0.1, y = V1 + 100),
width = 0.8, fill = "green", stat = "identity") +
geom_bar(aes(x = as.numeric(clarity), y = V1),
width = 0.8, stat = "identity") +
scale_x_continuous(name = "clarity",
breaks = as.numeric(df2$clarity),
labels = levels(df2$clarity))+
ylab("count")
As you already said - 3D barcharts are "bad". You can't do it directly in ggplot2 but here is a possible workaround for this.
First, make new data frame that contains levels of clarity and corresponding count for each level.
library(plyr)
df2<-ddply(diamonds,.(clarity),nrow)
Then in ggplot() call use new data frame and clarity as x values and V1 (counts) as y values and add geom_blank() - this will make x axis with levels we need. Then add geom_rect() to produce shading for bars - here xmin and xmax values are made as.numeric() from clarity and constant is added - for xmin constant should be less than half of bars width and xmax constant larger than half of bars width. ymin is 0 and ymax is V1 (counts) plus some constant. Finally add geom_bar(stat="identity") above this shadow to plot actually barplot.
ggplot(df2,aes(clarity,V1)) + geom_blank()+
geom_rect(aes(xmin=as.numeric(clarity)-0.38,
xmax=as.numeric(clarity)+.5,
ymin=0,
ymax=V1+250),fill="green")+
geom_bar(width=0.8,stat="identity")
Every time I make a plot using ggplot, I spend a little while trying different values for hjust and vjust in a line like
+ opts(axis.text.x = theme_text(hjust = 0.5))
to get the axis labels to line up where the axis labels almost touch the axis, and are flush against it (justified to the axis, so to speak). However, I don't really understand what's going on. Often, hjust = 0.5 gives such dramatically different results from hjust = 0.6, for example, that I haven't been able to figure it out just by playing around with different values.
Can anyone point me to a comprehensive explanation of how hjust and vjust options work?
The value of hjust and vjust are only defined between 0 and 1:
0 means left-justified
1 means right-justified
Source: ggplot2, Hadley Wickham, page 196
(Yes, I know that in most cases you can use it beyond this range, but don't expect it to behave in any specific way. This is outside spec.)
hjust controls horizontal justification and vjust controls vertical justification.
An example should make this clear:
td <- expand.grid(
hjust=c(0, 0.5, 1),
vjust=c(0, 0.5, 1),
angle=c(0, 45, 90),
text="text"
)
ggplot(td, aes(x=hjust, y=vjust)) +
geom_point() +
geom_text(aes(label=text, angle=angle, hjust=hjust, vjust=vjust)) +
facet_grid(~angle) +
scale_x_continuous(breaks=c(0, 0.5, 1), expand=c(0, 0.2)) +
scale_y_continuous(breaks=c(0, 0.5, 1), expand=c(0, 0.2))
To understand what happens when you change the hjust in axis text, you need to understand that the horizontal alignment for axis text is defined in relation not to the x-axis, but to the entire plot (where this includes the y-axis text). (This is, in my view, unfortunate. It would be much more useful to have the alignment relative to the axis.)
DF <- data.frame(x=LETTERS[1:3],y=1:3)
p <- ggplot(DF, aes(x,y)) + geom_point() +
ylab("Very long label for y") +
theme(axis.title.y=element_text(angle=0))
p1 <- p + theme(axis.title.x=element_text(hjust=0)) + xlab("X-axis at hjust=0")
p2 <- p + theme(axis.title.x=element_text(hjust=0.5)) + xlab("X-axis at hjust=0.5")
p3 <- p + theme(axis.title.x=element_text(hjust=1)) + xlab("X-axis at hjust=1")
library(ggExtra)
align.plots(p1, p2, p3)
To explore what happens with vjust aligment of axis labels:
DF <- data.frame(x=c("a\na","b","cdefghijk","l"),y=1:4)
p <- ggplot(DF, aes(x,y)) + geom_point()
p1 <- p + theme(axis.text.x=element_text(vjust=0, colour="red")) +
xlab("X-axis labels aligned with vjust=0")
p2 <- p + theme(axis.text.x=element_text(vjust=0.5, colour="red")) +
xlab("X-axis labels aligned with vjust=0.5")
p3 <- p + theme(axis.text.x=element_text(vjust=1, colour="red")) +
xlab("X-axis labels aligned with vjust=1")
library(ggExtra)
align.plots(p1, p2, p3)
Probably the most definitive is Figure B.1(d) of the ggplot2 book, the appendices of which are available at http://ggplot2.org/book/appendices.pdf.
However, it is not quite that simple. hjust and vjust as described there are how it works in geom_text and theme_text (sometimes). One way to think of it is to think of a box around the text, and where the reference point is in relation to that box, in units relative to the size of the box (and thus different for texts of different size). An hjust of 0.5 and a vjust of 0.5 center the box on the reference point. Reducing hjust moves the box right by an amount of the box width times 0.5-hjust. Thus when hjust=0, the left edge of the box is at the reference point. Increasing hjust moves the box left by an amount of the box width times hjust-0.5. When hjust=1, the box is moved half a box width left from centered, which puts the right edge on the reference point. If hjust=2, the right edge of the box is a box width left of the reference point (center is 2-0.5=1.5 box widths left of the reference point. For vertical, less is up and more is down. This is effectively what that Figure B.1(d) says, but it extrapolates beyond [0,1].
But, sometimes this doesn't work. For example
DF <- data.frame(x=c("a","b","cdefghijk","l"),y=1:4)
p <- ggplot(DF, aes(x,y)) + geom_point()
p + opts(axis.text.x=theme_text(vjust=0))
p + opts(axis.text.x=theme_text(vjust=1))
p + opts(axis.text.x=theme_text(vjust=2))
The three latter plots are identical. I don't know why that is. Also, if text is rotated, then it is more complicated. Consider
p + opts(axis.text.x=theme_text(hjust=0, angle=90))
p + opts(axis.text.x=theme_text(hjust=0.5 angle=90))
p + opts(axis.text.x=theme_text(hjust=1, angle=90))
p + opts(axis.text.x=theme_text(hjust=2, angle=90))
The first has the labels left justified (against the bottom), the second has them centered in some box so their centers line up, and the third has them right justified (so their right sides line up next to the axis). The last one, well, I can't explain in a coherent way. It has something to do with the size of the text, the size of the widest text, and I'm not sure what else.