When x axis labels are rotated in ggplot sometimes it happens that labels are cut off
I looked at those posts How can I manipulate a ggplot in R to allow extra room on lhs for angle=45 long x-axis labels? and ggplot2 plot area margins?. The suggestion in both cases is to use plot.margin parameter. But I'm wondering if there's more elegant and dynamic solution to the problem. In my application users will be allowed to change font size for axes labels, so setting a hardcoded value for plot margin seems not to be a good approach. Are there any other ways to avoid such effect? Is it possible to manipulate the layout somehow?
Code to reproduce:
categories <- c(
"Entertainment",
"Research",
"Development",
"Support",
"Classic",
"Old Time"
)
years <- 2020:2021
types <- c(
"Easy",
"Pro",
"Free",
"Trial",
"Subscription"
)
d <- expand.grid(category = categories,
type = types,
year = years)
counts <- sample(0:100, size = nrow(d))
d$n <- counts
ggplot(
data = d,
aes(x = category, y = n, fill = category)
) + geom_bar(stat = "identity") +
facet_grid(rows = vars(year), cols = vars(type)) +
theme(
axis.text.x = element_text(
angle = 22.5,
hjust = 1,
size = 12
)
)
I don't see any way to do this automatically natively using ggplot2 tools, so why not write a small function that sets the size of the margin based on the number of characters in the leftmost x category value?
margin_spacer <- function(x) {
# where x is the column in your dataset
left_length <- nchar(levels(factor(x)))[1]
if (left_length > 8) {
return((left_length - 8) * 4)
}
else
return(0)
}
The function can deal with a character column (or factor), and checks the number of characters in the first level (which would appear on the left in the plot). Fiddling around, it seemed anything longer than 8 posed an issue for the plot code, so this adds 4 points to the margin for every character past 8 characters.
Note also, that I changed the angle of the x axis text on your plot - I think 22.5 is a bit too shallow and you get a lot of overlapping with the size of your text on my graphics device. This means that 8 and 4 value may not work quite as well for you, but here's how it works for a few different data frames.
Here's the new plot code:
ggplot(data = d, aes(x = category, y = n, fill = category)) +
geom_bar(stat = "identity") +
facet_grid(rows = vars(year), cols = vars(type)) +
theme(
axis.text.x = element_text(angle = 40, hjust = 1, size = 12),
plot.margin = margin(l = 0 + margin_spacer(d$category))
)
I created the following plots by changing the code where d$categories is defined. I am showing you the output using the code above where the first entry in categories <- c(...) is changed accordingly for each one. You'll note it works pretty well unless it's crazy long. As the text gets really long, the text size may have to be adjusted as well. If you think your users are going to get crazy with labels, you can use a similar strategy to adjust text size... but that's probably overkill.
"Enter" (5 characters)
"Entertain" (9 characters)
"Entertainer" (11 characters)
"Entertainment" (13 characters)
"Quality Entertainment" (21 characters)
Related
This question already has answers here:
Order discrete x scale by frequency/value
(7 answers)
Closed 4 years ago.
I'm trying to make a heatmap using ggplot2 using the geom_tiles function
here is my code below:
p<-ggplot(data,aes(Treatment,organisms))+geom_tile(aes(fill=S))+
scale_fill_gradient(low = "black",high = "red") +
scale_x_discrete(expand = c(0, 0)) +
scale_y_discrete(expand = c(0, 0)) +
theme(legend.position = "right",
axis.ticks = element_blank(),
axis.text.x = element_text(size = base_size, angle = 90, hjust = 0, colour = "black"),
axis.text.y = element_text(size = base_size, hjust = 1, colour = "black")).
data is my data.csv file
my X axis is types of Treatment
my Y axis is types of organisms
I'm not too familiar with commands and programming and I'm relatively new at this. I just want to be able to specify the order of the labels on the x axis. In this case, I'm trying to specify the order of "Treatment". By default, it orders alphabetically. How do I override this/keep the data in the same order as in my original csv file?
I've tried this command
scale_x_discrete(limits=c("Y","X","Z"))
where x, y and z are my treatment condition order. It however doesn't work very well, and give me missing heat boxes.
It is a little difficult to answer your specific question without a full, reproducible example. However something like this should work:
#Turn your 'treatment' column into a character vector
data$Treatment <- as.character(data$Treatment)
#Then turn it back into a factor with the levels in the correct order
data$Treatment <- factor(data$Treatment, levels=unique(data$Treatment))
In this example, the order of the factor will be the same as in the data.csv file.
If you prefer a different order, you can order them by hand:
data$Treatment <- factor(data$Treatment, levels=c("Y", "X", "Z"))
However this is dangerous if you have a lot of levels: if you get any of them wrong, that will cause problems.
One can also simply factorise within the aes() call directly. I am not sure why setting the limits doesn't work for you - I assume you get NA's because you might have typos in your level vector.
The below is certainly not much different than user Drew Steen's answer, but with the important difference of not changing the original data frame.
library(ggplot2)
## this vector might be useful for other plots/analyses
level_order <- c('virginica', 'versicolor', 'setosa')
p <- ggplot(iris)
p + geom_bar(aes(x = factor(Species, level = level_order)))
## or directly in the aes() call without a pre-created vector:
p + geom_bar(aes(x = factor(Species, level = c('virginica', 'versicolor', 'setosa'))))
## plot identical to the above - not shown
## or use your vector as limits in scale_x_discrete
p + geom_bar(aes(x = Species)) +
scale_x_discrete(limits = level_order)
Created on 2022-11-20 with reprex v2.0.2
Sometimes I'd like to present data that refer to periods (not to points in time) as a step function. When e.g. data are per-period averages, this seems more appropriate than using a line connecting points (with geom_line). Consider, as a MWE, the following:
df = data.frame(x=1:8,y=rnorm(8,5,2))
ggplot(df,aes(x=x,y=y))+geom_step(size=1)+scale_x_continuous(breaks=seq(0,8,2))
This gives
However, the result is not fully satisfactory, as (1) I'd like the final observation to be represented by an horizontal segment and (2) I'd like to have labels on the x-axis aligned at the center of the horizontal line. What I want can be obtained with some hacking:
df %>% rbind(tail(df,1) %>% mutate(x=x+1)) %>%
ggplot(aes(x,y))+geom_step(size=1)+
scale_x_continuous(breaks=seq(0,12,2))+
theme(axis.ticks.x=element_blank(),axis.text.x=element_text(hjust=-2))
which produces:
This corresponds to what I am looking for (except that the horizontal alignment of labels requires some fine tuning and is not perfect). However, I am not sure this is the best way to proceed and I wonder if there is a better way.
Does this work for you? It comes down to altering the data as it is passed rather than changing the plotting code per se (as is often the case in ggplot)
Essentially what we do is add an extra copy of the final y value on to the end of the data frame at an incremented x value.
To make the horizontal segments line up to the major axis breaks, we simply subtract 0.5 from the x value.
ggplot(rbind(df, data.frame(x = 9, y = tail(df$y, 1))),
aes(x = x - 0.5, y = y)) +
geom_step(size = 1)+
scale_x_continuous(breaks = seq(0, 8, 2), name = "x",
minor_breaks = seq(0, 8, 1) + 0.5) +
theme_bw() +
theme(panel.grid.major.x = element_blank(),
panel.grid.minor = element_line())
This question already has answers here:
Order discrete x scale by frequency/value
(7 answers)
Closed 4 years ago.
I'm trying to make a heatmap using ggplot2 using the geom_tiles function
here is my code below:
p<-ggplot(data,aes(Treatment,organisms))+geom_tile(aes(fill=S))+
scale_fill_gradient(low = "black",high = "red") +
scale_x_discrete(expand = c(0, 0)) +
scale_y_discrete(expand = c(0, 0)) +
theme(legend.position = "right",
axis.ticks = element_blank(),
axis.text.x = element_text(size = base_size, angle = 90, hjust = 0, colour = "black"),
axis.text.y = element_text(size = base_size, hjust = 1, colour = "black")).
data is my data.csv file
my X axis is types of Treatment
my Y axis is types of organisms
I'm not too familiar with commands and programming and I'm relatively new at this. I just want to be able to specify the order of the labels on the x axis. In this case, I'm trying to specify the order of "Treatment". By default, it orders alphabetically. How do I override this/keep the data in the same order as in my original csv file?
I've tried this command
scale_x_discrete(limits=c("Y","X","Z"))
where x, y and z are my treatment condition order. It however doesn't work very well, and give me missing heat boxes.
It is a little difficult to answer your specific question without a full, reproducible example. However something like this should work:
#Turn your 'treatment' column into a character vector
data$Treatment <- as.character(data$Treatment)
#Then turn it back into a factor with the levels in the correct order
data$Treatment <- factor(data$Treatment, levels=unique(data$Treatment))
In this example, the order of the factor will be the same as in the data.csv file.
If you prefer a different order, you can order them by hand:
data$Treatment <- factor(data$Treatment, levels=c("Y", "X", "Z"))
However this is dangerous if you have a lot of levels: if you get any of them wrong, that will cause problems.
One can also simply factorise within the aes() call directly. I am not sure why setting the limits doesn't work for you - I assume you get NA's because you might have typos in your level vector.
The below is certainly not much different than user Drew Steen's answer, but with the important difference of not changing the original data frame.
library(ggplot2)
## this vector might be useful for other plots/analyses
level_order <- c('virginica', 'versicolor', 'setosa')
p <- ggplot(iris)
p + geom_bar(aes(x = factor(Species, level = level_order)))
## or directly in the aes() call without a pre-created vector:
p + geom_bar(aes(x = factor(Species, level = c('virginica', 'versicolor', 'setosa'))))
## plot identical to the above - not shown
## or use your vector as limits in scale_x_discrete
p + geom_bar(aes(x = Species)) +
scale_x_discrete(limits = level_order)
Created on 2022-11-20 with reprex v2.0.2
I am opening this question for three reasons : First, to re-open the dual-axis discussion with ggplot. Second, to ask if there is a non-torturing generic approach to do that. And finally to ask for your help with respect to a work-around.
I realize that there are multiple discussions and questions on how to add a secondary axis to a ggplot. Those usually end up in one of two conclusions:
It's bad, don't do it: Hadley Wickham answered the same question here, concluding that it is not possible. He had a very good argument that "using separate y scales (not y-scales that are transformations of each other) are fundamentally flawed".
If you insist, over-complicate your life and use grids : for example here and here
However, here are some situations that I often face, in which the visualization would greatly benefit from dual-axis. I abstracted the concepts below.
The plot is wide, hence duplicating the y-axis on the right side would help (or x-axis on the top) would ease interpretation. (We've all stumbled across one of those plots where we need to use a ruler on the screen, because the axis is too far)
I need to add a new axis that is a transformation to the original axes (eg: percentages, quantiles, .. ). (I am currently facing a problem with that. Reproducible example below)
And finally, adding Grouping/Meta information: I stumble across that when using categorical data with multiple-level, (e.g.: Categories = {1,2,x,y,z}, which are "meta-divided" into letters and numerics.) Even though color-coding the meta-levels and adding a legend or even facetting solve the issue, things get a little bit simpler with a secondary axis, where the user won't need to match the color of the bars to that of the legend.
General question: Given the new extensibility features ggplot 2.0.0, is there a more-robust no-torture way to have dual-axis without using grids?
And one final comment: I absolutely agree that the wrong use of dual-axis can be dangerously misleading... But, isn't that the case for information visualization and data science in general?
Work-around question:
Currently, I need to have a percentage-axis (2nd case). I used annotate and geom_hline as a workaround. However, I can't move the text outside the main plot. hjust also didn't seem to work with me.
Reproducible example:
library(ggplot2)
# Random values generation - with some manipulation :
maxVal = 500
value = sample(1:maxVal, size = 100, replace = T)
value[value < 400] = value[value < 400] * 0.2
value[value > 400] = value[value > 400] * 0.9
# Data Frame prepartion :
labels = paste0(sample(letters[1:3], replace = T, size = length(value)), as.character(1:length(value)))
df = data.frame(sample = factor(labels, levels = labels), value = sort(value, decreasing = T))
# Plotting : Adding Percentages/Quantiles as lines
ggplot(data = df, aes(x = sample, y = value)) +
geom_bar(stat = "identity", fill = "grey90", aes(y = maxVal )) +
geom_bar(stat = "identity", fill = "#00bbd4") +
geom_hline(yintercept = c(0, maxVal)) + # Min and max values
geom_hline(yintercept = c(maxVal*0.25, maxVal*0.5, maxVal*0.75), alpha = 0.2) + # Marking the 25%, 50% and 75% values
annotate(geom = "text", x = rep(100,3), y = c(maxVal*0.25, maxVal*0.5, maxVal*0.75),
label = c("25%", "50%", "75%"), vjust = 0, hjust = 0.2) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
theme(panel.background = element_blank()) +
theme(plot.background = element_blank()) +
theme(plot.margin = unit(rep(2,4), units = "lines"))
In response to #1
We've all stumbled across one of those plots where we need to use a ruler on the screen, because the axis is too far
cowplot.
# Assign your original plot to some variable, `gpv` <- ggplot( ... )
ggdraw(switch_axis_position(gpv, axis="y", keep="y"))
I am trying to build a horizontal bar chart.
library(ggplot2)
library(plyr)
salary <- read.csv('September 15 2015 Salary Information - Alphabetical.csv', na.strings = '')
head(salary)
salary$X <- NULL
salary$X.1 <- NULL
salary$Club <- as.factor(salary$Club)
levels(salary$Club)
salary$Base.Salary <- gsub(',', '', salary$Base.Salary)
salary$Base.Salary <- as.numeric(as.character(salary$Base.Salary))
salary$Base.Salary <- salary$Base.Salary / 1000000
salary <- ddply(salary, .(Club), transform, pos = cumsum(Base.Salary) - (0.5 * Base.Salary))
ggplot(salary, aes(x = Club, y = Base.Salary, fill = Base.Salary)) +
geom_bar(stat = 'identity') +
ylab('Base Salary in millions of dollars') +
theme(axis.title.y = element_blank()) +
coord_flip() +
geom_text(data = subset(salary, Base.Salary > 2), aes(label = Last.Name, y = pos))
(credits to this thread: Showing data values on stacked bar chart in ggplot2 for the text position calculation)
and the resulting plot is this:
I was thoroughly confused for a while, because I was using xlab to specify the label, and theme(axis.title.y = element_blank()) to hide the y label. However, this didn't work, and I got it to work by changing it to ylab. This seems rather confusing, is it intended?
This seems rather confusing, is it intended?
Yes.
Rather than using theme() to hide the y label, I think
labs(x = "My x label",
y = "")
is more straightforward.
When you flip x and y, they take their labels with them. If this weren't the case, a graph compared with and without coordinate flip would have incorrect axis labels in one of the two cases - which seems confusing and inconsistent. As-is, the labels will be correct always (with and without coord_flip).
Theming, on the other hand, is applied after-the-fact.