Does ggplot's vjust of axis.text depend on angle? [duplicate] - r

Every time I make a plot using ggplot, I spend a little while trying different values for hjust and vjust in a line like
+ opts(axis.text.x = theme_text(hjust = 0.5))
to get the axis labels to line up where the axis labels almost touch the axis, and are flush against it (justified to the axis, so to speak). However, I don't really understand what's going on. Often, hjust = 0.5 gives such dramatically different results from hjust = 0.6, for example, that I haven't been able to figure it out just by playing around with different values.
Can anyone point me to a comprehensive explanation of how hjust and vjust options work?

The value of hjust and vjust are only defined between 0 and 1:
0 means left-justified
1 means right-justified
Source: ggplot2, Hadley Wickham, page 196
(Yes, I know that in most cases you can use it beyond this range, but don't expect it to behave in any specific way. This is outside spec.)
hjust controls horizontal justification and vjust controls vertical justification.
An example should make this clear:
td <- expand.grid(
hjust=c(0, 0.5, 1),
vjust=c(0, 0.5, 1),
angle=c(0, 45, 90),
text="text"
)
ggplot(td, aes(x=hjust, y=vjust)) +
geom_point() +
geom_text(aes(label=text, angle=angle, hjust=hjust, vjust=vjust)) +
facet_grid(~angle) +
scale_x_continuous(breaks=c(0, 0.5, 1), expand=c(0, 0.2)) +
scale_y_continuous(breaks=c(0, 0.5, 1), expand=c(0, 0.2))
To understand what happens when you change the hjust in axis text, you need to understand that the horizontal alignment for axis text is defined in relation not to the x-axis, but to the entire plot (where this includes the y-axis text). (This is, in my view, unfortunate. It would be much more useful to have the alignment relative to the axis.)
DF <- data.frame(x=LETTERS[1:3],y=1:3)
p <- ggplot(DF, aes(x,y)) + geom_point() +
ylab("Very long label for y") +
theme(axis.title.y=element_text(angle=0))
p1 <- p + theme(axis.title.x=element_text(hjust=0)) + xlab("X-axis at hjust=0")
p2 <- p + theme(axis.title.x=element_text(hjust=0.5)) + xlab("X-axis at hjust=0.5")
p3 <- p + theme(axis.title.x=element_text(hjust=1)) + xlab("X-axis at hjust=1")
library(ggExtra)
align.plots(p1, p2, p3)
To explore what happens with vjust aligment of axis labels:
DF <- data.frame(x=c("a\na","b","cdefghijk","l"),y=1:4)
p <- ggplot(DF, aes(x,y)) + geom_point()
p1 <- p + theme(axis.text.x=element_text(vjust=0, colour="red")) +
xlab("X-axis labels aligned with vjust=0")
p2 <- p + theme(axis.text.x=element_text(vjust=0.5, colour="red")) +
xlab("X-axis labels aligned with vjust=0.5")
p3 <- p + theme(axis.text.x=element_text(vjust=1, colour="red")) +
xlab("X-axis labels aligned with vjust=1")
library(ggExtra)
align.plots(p1, p2, p3)

Probably the most definitive is Figure B.1(d) of the ggplot2 book, the appendices of which are available at http://ggplot2.org/book/appendices.pdf.
However, it is not quite that simple. hjust and vjust as described there are how it works in geom_text and theme_text (sometimes). One way to think of it is to think of a box around the text, and where the reference point is in relation to that box, in units relative to the size of the box (and thus different for texts of different size). An hjust of 0.5 and a vjust of 0.5 center the box on the reference point. Reducing hjust moves the box right by an amount of the box width times 0.5-hjust. Thus when hjust=0, the left edge of the box is at the reference point. Increasing hjust moves the box left by an amount of the box width times hjust-0.5. When hjust=1, the box is moved half a box width left from centered, which puts the right edge on the reference point. If hjust=2, the right edge of the box is a box width left of the reference point (center is 2-0.5=1.5 box widths left of the reference point. For vertical, less is up and more is down. This is effectively what that Figure B.1(d) says, but it extrapolates beyond [0,1].
But, sometimes this doesn't work. For example
DF <- data.frame(x=c("a","b","cdefghijk","l"),y=1:4)
p <- ggplot(DF, aes(x,y)) + geom_point()
p + opts(axis.text.x=theme_text(vjust=0))
p + opts(axis.text.x=theme_text(vjust=1))
p + opts(axis.text.x=theme_text(vjust=2))
The three latter plots are identical. I don't know why that is. Also, if text is rotated, then it is more complicated. Consider
p + opts(axis.text.x=theme_text(hjust=0, angle=90))
p + opts(axis.text.x=theme_text(hjust=0.5 angle=90))
p + opts(axis.text.x=theme_text(hjust=1, angle=90))
p + opts(axis.text.x=theme_text(hjust=2, angle=90))
The first has the labels left justified (against the bottom), the second has them centered in some box so their centers line up, and the third has them right justified (so their right sides line up next to the axis). The last one, well, I can't explain in a coherent way. It has something to do with the size of the text, the size of the widest text, and I'm not sure what else.

Related

How to remove empty spaces between tiles in geom_tile and change tile size

I have a df with the following structure:
id col1 col2 col3
#1 A 1 3 3
#2 B 2 2 3
#3 C 1 2 3
#4 D 3 1 1
I wanted to create a "heatmap-like" figure where col1-col3 are treated as a factor variable (with five levels 1-5, not all shown here) and depending on their value they receive a different color. I've gotten relatively far with the following code:
df <- melt(df, id.vars="id")
p <- ggplot(df, aes(x=variable, y=id, label=value, fill=as.factor(value))) +
geom_tile(colour="white", alpha=0.2, aes(width=0.4)) +
scale_fill_manual(values=c("yellow", "orange", "red", "green", "grey")) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
labs(x = "Value", y="id") +
scale_x_discrete(expand=c(0,0))+
scale_y_discrete(expand=c(0,0))
However, for some reason my tiles have large grey empty spaces between them on the x-axis (i.e. between each factor level).
The output image looks something like this
Additionally, I have these thin white lines in the middle of each tile that
So what I'd like to do is:
1- change the tile size and shape (would like it to be a square & smaller than now)
2- remove white line in the middle of tile.
Thank you!
OP. I noticed that in your response to another answer, you've refined your question a bit. I would recommend you edit your original question to reflect some of what you were looking to do, but here's the overall picture to summarize what you wanted to know:
How to remove the gray space between tiles
How to make the tiles smaller
How to make the tiles more square
Here's how to address each one in turn.
How to remove gray space between tiles
This was already answered in a comment and in the other answer from #dy_by. The tile geom has the attribute width which determines how big the tile is relative to the coordinate system, where width=1 means the tiles "touch" one another. This part is important, because the size of the tile is different than the size of the tile relative to the coordinate system. If you set width=0.4, then the size of the tile is set to take up 40% of the area between one discrete value in x and y. This means, if you have any value other than width=1, then you will have "space" between the tiles.
How to make the tiles square
The tile geom draws a square tile, so the reason that your tiles are not square in the output has nothing to do with the geom - it has to do with your coordinate system and the graphics device drawing it in your program. By default, ggplot2 will draw your coordinate system in an aspect ratio to match that of your graphics device. Change the size of the device viewport (the window), and the aspect ratio of your coordinate system (and tiles) will change. There is an easy way to fix this to be "square", which is to use coord_fixed(). You can set any aspect ratio you want, but by default, it will be set to 1 (square).
How to make the tiles smaller
Again, the size of your tiles is not controlled by the geom_tile() function... or the coordinate system. It's controlled by the viewport you set in your graphics device. Note that the coordinate system and geoms will resize, but the text will remain constant. This means that if you scale down a viewport or window, your tiles will become smaller, but the size of the text will (relatively-speaking) seem larger. Try this out by calling ggsave() with different arguments for width= with your plot.
Putting it together
Therefore, here's my suggestion for how to change your code to fix all of that. Note I'm also suggesting you change the theme to theme_classic() or something similar, which removes the gridlines by default and the background color is set to white. It works well for tile maps like this.
p <- ggplot(df, aes(x=variable, y=id, label=value, fill=as.factor(value))) +
geom_tile(colour="white", alpha=0.2, width=1) +
scale_fill_manual(values=c("yellow", "orange", "red", "green", "grey")) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
labs(x = "Value", y="id") +
scale_x_discrete(expand=c(0,0))+
scale_y_discrete(expand=c(0,0)) +
coord_fixed() +
theme_classic()
p
Now for saving that plot with different width= settings to show you how things change for sizing. You don't have to specify height=, since the aspect ratio is fixed at 1.
ggsave("example_big.png", plot=p, width=12)
ggsave("example_small.png", plot=p, width=3)
gaps between tiles: change width=0.4 to width=1 or remove it.
white lines between tiles: they come from parameter colour="white" - remove it if you want
lines on tiles are backround lines, couse transparency parameter alpha=0.2 - change it to higher value or remove lines by + theme(panel.grid.major = element_blank()) at the end
summary:
ggplot(df, aes(x=variable, y=id, label=value, fill=as.factor(value))) +
geom_tile(alpha=0.2) +
scale_fill_manual(values=c("yellow", "orange", "red", "green", "grey")) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
labs(x = "Value", y="id") +
scale_x_discrete(expand=c(0,0))+
scale_y_discrete(expand=c(0,0))+
theme(panel.grid.major = element_blank())

Justify x-axis labels to align with x-axis in R [duplicate]

Consider the following
d = data.frame(y=rnorm(120),
x=rep(c("bar", "long category name", "foo"), each=40))
ggplot(d,aes(x=x,y=y)) +
geom_boxplot() +
theme(axis.text.x=element_text(size=15, angle=90))
The x-axis labels are aligned by the center of the label. Is it possible to automatically align on the right so that every label would end right below the graph?
This is precisely what the hjust and vjust parameters are for in ggplot. They control the horizontal and vertical justification respectively and range from 0 to 1. See this question for more details on justifications and their values (What do hjust and vjust do when making a plot using ggplot?).
To get the labels the way you want you can use:
hjust = 0.95 (to leave some space between the labels and the axis)
vjust = 0.2 (to center them in this case)
ggplot(d,aes(x=x,y=y)) + geom_boxplot() +
theme(axis.text.x=element_text(size=15, angle=90,hjust=0.95,vjust=0.2))
Alternatively, flip the axis, your customers will thank you and have less neck pain (plus, I find most boxplots easier to interpret with this orientation):
ggplot(d, aes(x = x, y = y)) +
geom_boxplot() +
coord_flip()

Adjusting position on discrete x-axis of two goups in ggplot

I have a plot with a continous y-axis and discrete x-axis.
For the data I have a group factor with 3 levels and 2 meausement points, so 6 geoms are created
1
I would like to keep the width of the single geoms but adding space between the two measurement points, respectively the two groups of geoms. Like: 3 geoms - gap - 3 geoms. Is there any possibility of adjusting the position of a group of geoms on the x-axis in ggplot?
preferences %>%
pivot_longer(c(F1_life_satisfaction_pre, F1_life_satisfaction_current), names_to = "variables", values_to = "ratings")%>%
ggplot( aes(y=ratings, x=fct_inorder(variables), fill=fct_inorder(playing_preference))) +
geom_violin(scale="width", adjust=0.5, width=0.8, alpha= 0.2, position = position_dodge(1)) +
stat_summary(fun=mean, geom="point", shape=23, size=2, position = position_dodge(1)) +
stat_summary(aes(group=fct_inorder(playing_preference)), fun=mean, geom = "line", size= 0.5, position = position_dodge(1)) +
stat_summary(fun.data=mean_cl_normal, fun.args=list(mult=1),aes(x=fct_inorder(variables), y=ratings), geom="errorbar",
width=0.05, position = position_dodge(1)) +
scale_x_discrete(labels = c("pre-Pokemon-Go", "current"),expand = c(0, 0.3)) +
theme(axis.text.x = element_text(color = "black", size=10)) +
scale_y_continuous(breaks = c(1, 2, 3, 4, 5, 6, 7), limits=c(1,7)) +
geom_segment(aes(x = 0, y=4, yend=4, xend=3), color="grey") +
theme(axis.ticks.x = element_blank()) +
labs(fill = "playing preference") +
labs(x="life satisfaction") +
theme(axis.title = element_text(size = 10))+
theme(legend.text = element_text(size = 10)) +
theme(legend.title = element_text(size = 10)) +
labs(y = "mean ratings") +
geom_boxplot(width=0.1,color="black", alpha=0.2, position = position_dodge(1)) +
scale_fill_viridis(discrete=T)
TL;DR - play with width= and position_dodge(width=...) within your line for geom_violin and geom_boxplot to adjust the positions along with scale_x_discrete(expand=expansion(...).
The first point is that the resolution (and how close and far apart) things are on your plot will be related to the size of your window. With that being said, the positioning relationship of the plot elements between one another can be controlled via ggplot. In particular, you want to change the values of width= and position_dodge(width=...) in your geom_violin call (and your geom_boxplot call).
Example Dataset
I'll use an example dataset to illustrate the idea, where I'll plot boxplots... but the idea is identical. The example dataset contains two x values ("Group1" and "Group2"), and each of those has subdivisions that are either "A", "B", or "C", containing a separate normal distribution of 50 datapoints for every x and x.subdiv.
set.seed(8675309)
df <- data.frame(
x=c(rep('Group1', 150), rep('Group2', 150)),
x.subdiv=rep(c(rep('A', 50), rep('B',50), rep('C',50)), 2),
y=unlist(lapply(1:6, function(x){rnorm(50, runif(1,10,15), runif(1,0,7))}))
)
Width of position_dodge
Here's the simple boxplot, where I'll use 0.5 as the value for both width= and position_dodge(width=...). Note that the first argument in position_dodge is width=, so you can just supply that number directly to that function without explicitly assigning to the width argument.
p <- ggplot(df, aes(x=x, y=y)) + theme_bw()
p + geom_boxplot(aes(fill=x.subdiv), width=0.5, position=position_dodge(0.5))
The rule to note here is:
geom_boxplot(width=...) controls how wide the overall spread of box plots are around each x= value.
position_dodge(width=...) controls the amount of spread (the amount of "dodging") for the groups around the x= aesthetic.
So this is what happens when you change position_dodge(width=1), but leave geom_boxplot(width=0.5):
p + geom_boxplot(aes(fill=x.subdiv), width=0.5, position=position_dodge(1))
The width of each box remains the same as before, but the positioning of each box around x= is more "spread out". In effect, each is "dodged" more. If you set position_dodge(width=0.2), you'll see the opposite effect, where the boxes become squished together (because they are not spread out as much around x=):
p + geom_boxplot(aes(fill=x.subdiv), width=0.5, position=position_dodge(0.2))
The interesting thing is how geom_boxplot(width=) and position_dodge(width=) are related:
If geom_boxplot(width=) is equal to position_dodge(width=), the boxes will be touching
If geom_boxplot(width=) is less than position_dodge(width=), the boxes will be separated from one another
If geom_boxplot(width=) is greater than position_dodge(width=), the boxes will be overlapping one another
Width of the geom
The width= of the geom itself relates to how wide the boxplots are. The point to keep in mind are these two points:
The width= is the sum of all the widths of the individual dodged geoms for that particular x= aesthetic.
width=1 is the width between two values on a discrete axis, meaning when you set width=1, the boxes will be wide enough to touch
That means that if we set geom_boxplot(width=1), the combined total of all the boxes for "Group1" will be wide enough to touch the boxes of "Group2"... but you would only see that if there were no overlap among the boxes (meaning that position_dodge(width=) would be equal to geom_boxplot(width=)).
So this makes the boxes wide enough to be touching, but position_dodge(width) is less than geom_boxplot(width)... so the boxes overlap, but "Group1" boxes are separated from "Group2" boxes:
p + geom_boxplot(aes(fill=x.subdiv), width=1, position=position_dodge(0.8))
If we want everything to touch, you have to set them equal, and both equal to 1:
p + geom_boxplot(aes(fill=x.subdiv), width=1, position=position_dodge(1))
Control both widths
In the end, it's probably best to control both. If we go from the previous plot, you probably want the plots to have separation between "Group1" and "Group2". That means you need to make the width of all boxes smaller (which we control by geom_boxplot(width)). However, you probably still want the dodging to leave a bit of space between the boxes, so we'll have to set position_dodge(width) to be greater than geom_boxplot(width), but not too large so that we lose the separation between "Group1" and "Group2". Something like this works pretty well:
p + geom_boxplot(aes(fill=x.subdiv), width=0.5, position=position_dodge(0.55))
In your case, you have both geom_violin and geom_boxplot, so you'll need to adjust those together and work out the proper look.
EDIT: "Shift Left and Right" and "Squish"
If the width= and position_dodge(width= arguments are just not quite getting you what you need, there is another parameter that can work in concert with them to move things around. This would be to use scale_x_discrete(expand=... to control the amount of space to the left and right of your x axis items. Used together with width= and position_dodge(width=, this actually gives you precise control of where to position your data along the x axis while still respecting the automated plotting that ggplot2 provides.
width= controls the whitespace between data along the x axis
position_dodge(width= controls the amount of whitespace between subgroups in the data positioned along the x axis
scale_x_discrete(expand=... controls white space to the left and right sides of the panel.
I'll demonstrate the functionality using the same dataset as before. Note that proper use of the expand= argument for scale_x_discrete should call expansion() and you will need to provide a 1 or 2 length vector to either add= or mult=. Play around with both and numbers to see the effect, but here's kind of what to expect.
The expansion() function takes either mult= or add= as arguments, which can either be a vector of length 2 (where 1 is applied to left side and 2 is applied to the right side, or length 1 (where the number is applied to both sides). Numbers sent to mult= are multiplied by the normal expansion to give you the new amount, so the code below sets the extra whitespace to the left and the right equal to 30% (0.3 * normal) of the typical expansion for both sides:
p + geom_boxplot(aes(fill=x.subdiv), width=0.5, position=position_dodge(0.55)) +
scale_x_discrete(expand=expansion(mult=0.3))
Sending two values, you can adjust separately. This sets the left side to be 100% (normal) and the right side to be reduced to 50% of normal:
p + geom_boxplot(aes(fill=x.subdiv), width=0.5, position=position_dodge(0.55)) +
scale_x_discrete(expand=expansion(mult=c(1,0.5)))
Bottom Line: Seems like by using all three arguments for width=, position_dodge(width=, and scale_x_discrete(expand=expansion(..., you can theoretically place your x groupings anywhere along your plot. Just keep in mind that the resolution and aspect ratio of your graphics device will change how things are laid out a bit, so additional control can be adjusted by resizing the graphics window.

vertically align asterisks (stars) in ggplot

In ggplot, I want to label some error bars with asterisks ('*') to indicate significance level. The graph is arranged with category labels on the y axis, so that they are easily legible. This means that the error bars are horizontal, and the *'s need to align vertically with them. However, the symbol '*' is not vertically centred in a line of text, so it gets plotted too high using geom_text.
Reproducible example
set.seed(123)
x = data.frame(grp = LETTERS[1:8], val = sample(10,8))
se = runif(8, 0.1,2)
x$upper = x$val + se
x$lower = x$val - se
x$labs = sample(c('*','**', '***', ''), 8, T)
gg = ggplot(x, aes(grp,val)) +
geom_point() +
geom_errorbar(aes(ymax = upper, ymin=lower), width=0.3) +
scale_y_continuous(limits = c(-2,12)) +
coord_flip()
gg + geom_text(aes(y=upper+0.2, label=labs), size=8, hjust='left')
I know that I can nudge the label position like this:
gg + geom_text(aes(y=upper+0.2, label=labs), size=8, nudge_x = -0.2, hjust='left')
However, getting the correct value of nudge_x needs to be done in an ad-hoc manner and the correct value varies with size of graphics output, font size, number of categories on the y scale etc. Is there a way to get the labels to automatically align vertically? I tried using geom_point with shape=42 instead of geom_text to draw the asterisks. Although this solves the vertical alignment issue, it introduces its own problem with getting the spacing between a horizontal row of asterisks correct (i.e. getting '**' and '***' to print with the correct separation between adjacent symbols).
Just eyeballing it on my machine, it looks like this vjust adjustment seems to work, and I think it may be fairly robust to changes in device output size, font size, etc.
gg + geom_text(aes(y=upper+0.2, label=labs), size=8, hjust='left',vjust = 0.77)

Longer plot labels move to the right when arranging plots with plot_grid from cowplot

I was asked this question on Twitter and thought it might be good to have it here.
When making labeled, side-by-side plots with plot_grid(), things work as expected for single-letter labels:
library(cowplot)
p1 <- ggplot(iris, aes(x = Sepal.Length, fill = Species)) +
geom_density(alpha = 0.7) +
ggtitle("") + theme_minimal()
p2 <- ggplot(iris, aes(x = Sepal.Length, fill = Species)) +
geom_density(alpha = 0.7) +
ggtitle("") +
scale_fill_grey() + theme_minimal()
plot_grid(p1, p2, labels = c("A", "B"))
However, if we're using longer strings as labels, the labels move to the right, and they move the more the longer the strings are:
plot_grid(p1, p2, labels = c("Density plot in color", "In gray"))
How can this be fixed?
Disclaimer: I'm the author of the package. Posting this here with answer in the hope it will be useful.
The default settings for the parameters hjust and label_x in plot_grid() are optimized for single-letter labels, and they don't work for longer labels. Overriding the settings fixes the problem:
plot_grid(p1, p2, labels = c("Density plot in color", "In gray"),
hjust = 0, label_x = 0.01)
In particular, the default hjust setting is hjust = -0.5. This moves the label to the right by an amount equivalent to half its width. This makes sense for single letter labels, because then we can have the letters appear half a letter width away from the left border by setting label_x = 0, and this will work irrespective of label font size or any other plot features the user may have chosen.
However, moving a label by half its width doesn't make any sense at all for longer labels, and in particular labels of differing lengths.

Resources