I have a project using boxplots, but one of the boxplots was a little different from all others. So I want to add some more space between this box and all other, but in the meanwhile, the spacing between other boxes remain the same. And I also want to add a dash line between this box and all others.
Here is a reproducible example:
library(ggplot2)
ggplot(data = mtcars, aes(x = factor(gear), y = mpg)) +
geom_boxplot(width = 0.5) +
geom_vline(xintercept = 4.5) +
theme_classic()
What I want is add some extra space between factor(gear) 4 and 5, while keeping the space between 3 and 4 the same. In addition, add a dash line between 4 and 5.
I tried to google this but did not find a good answer. Any suggestion will be greatly appreciated.
To make things more realistic, let's start off with gear as a factor rather than converting it inside ggplot:
mtcars2 <- within(mtcars, gear <- factor(gear))
The trick is to make the discrete axis a continuous axis with custom labels. We therefore need to convert the factor to numeric and add a little to the rightmost value:
xvals <- as.numeric(mtcars2$gear)
xvals[xvals == max(xvals)] <- xvals[xvals == max(xvals)] + 1
mtcars2$xvals <- xvals
Now we plot using xvals on the x axis, but using the factor levels from gear to label the breaks. Note that we could use words instead of the characters "3", "4" and "5" even though this is a numeric axis.
ggplot(data = mtcars2, aes(x = xvals, y = mpg, group = gear)) +
geom_boxplot(width = 0.5) +
geom_vline(xintercept = max(xvals) - 1, linetype = 2) +
scale_x_continuous(breaks = sort(unique(xvals)), labels = levels(mtcars2$gear)) +
labs(x = "gear") +
theme_classic()
To manually add some additional space between boxplots we could add an additional factor level as shown in the post linked by #MrFlick in his comment. However, additionally you want to a seperating vertical without altering the spacing between the categories.
In my opinion the easiest way to achieve both is to convert the factor to numerics. Try this:
library(ggplot2)
library(dplyr)
mtcars$gear <- factor(mtcars$gear)
# Save factor labels
labels <- levels(mtcars$gear)
mtcars %>%
mutate(
# Convert factor to numeric
gear = as.numeric(gear),
# Recode the special category, i.e. shift to the left. Here: Category 1
gear = ifelse(gear == 1, 0.5, gear)
) %>%
ggplot(aes(x = gear, y = mpg, group = gear)) +
geom_boxplot(width = 0.5) +
# Add dahed line
geom_vline(xintercept = 2.5, linetype = "dashed") +
# Set breaks and labels
scale_x_continuous(breaks = c(0.5, 2:3), labels = labels) +
theme_classic()
Related
I am struggling with changing the position of a few points' labels in the geom_point in ggplot. So far, my code:
p <- ggplot(all_short_filled, aes(x=Modernization, y=Passionate_love)) +
geom_point(size=2)+geom_abline(intercept = 0.965830, slope = -0.001127)+ theme_bw()
p1 <- p + geom_text(label=all_short_filled$Country, position = position_dodge(width = 1),
vjust = -0.5)
p1
It gives me something like this:
And I want to change the position of a few overlapping labels (such as Russia and Serbia, or the Netherlands and Belgium, so that, e.g., the label of Serbia would go below the dot, not the above). Please, send help :-)
You could create two label columns in your dataset: one for countries that should be plotted above their point and the other for below. Since I do not have a sample of your data I used the mtcars dataset to create a reproducible example:
This will require you to know which countries and is hardcoded.
library(datasets) # used to create fake data
library(tidyverse)
# create fake dataset for example
df <- tail(datasets::mtcars) %>%
tibble::rownames_to_column("car")
below <- c("Ferrari Dino", "Maserati Bora")
# create two columns for geom_text labels
data <- df %>%
dplyr::mutate(label_above = ifelse(car %in% below, "", car),
label_below = ifelse(car %in% below, car, ""))
# ignore scale_x.. and scale_y.. those were to fit points/labels neatly
ggplot2::ggplot(data, aes(x = hp, y = mpg)) +
geom_point() +
geom_text(aes(label = label_above), vjust = -0.5) + # labels above their points
geom_text(aes(label = label_below), vjust = 1) + # labels below their points
scale_x_continuous(expand = ggplot2::expansion(0.3)) +
scale_y_continuous(expand = ggplot2::expansion(0.15))
That being said, as mentioned in the comments ggrepel is usually very good at handling this sort of thing.
I'm aware there are similar posts but I could not get those answers to work in my case.
e.g. Here and here.
Example:
diamonds %>%
ggplot(aes(scale(price) %>% as.vector)) +
geom_density() +
xlim(-3, 3) +
facet_wrap(vars(cut))
Returns a plot:
Since I used scale, those numbers are the zscores or standard deviations away from the mean of each break.
I would like to add as a row underneath the equivalent non scaled raw number that corresponds to each.
Tried:
diamonds %>%
ggplot(aes(scale(price) %>% as.vector)) +
geom_density() +
xlim(-3, 3) +
facet_wrap(vars(cut)) +
geom_text(aes(label = price))
Gives:
Error: geom_text requires the following missing aesthetics: y
My primary question is how can I add the raw values underneath -3:3 of each break? I don't want to change those breaks, I still want 6 breaks between -3:3.
Secondary question, how can I get -3 and 3 to actually show up in the chart? They have been trimmed.
[edit]
I've been trying to make it work with geom_text but keep hitting errors:
diamonds %>%
ggplot(aes(x = scale(price) %>% as.vector)) +
geom_density() +
xlim(-3, 3) +
facet_wrap(vars(cut)) +
geom_text(label = price)
Error in layer(data = data, mapping = mapping, stat = stat, geom = GeomText, :
object 'price' not found
I then tried changing my call to geom_text()
geom_text(data = diamonds, aes(price), label = price)
This results in the same error message.
You can make a custom labeling function for your axis. This takes each label on the axis and performs a custom transform for you. In your case you could paste the z score, a line break, and the z-score times the standard deviation plus the mean. Because of the distribution of prices in the diamonds data set, this means that z scores below about -1 represent negative prices. This may not be a problem in your own data. For clarity I have drawn in a vertical line representing $0
labeller <- function(x) {
paste0(x,"\n", scales::dollar(sd(diamonds$price) * x + mean(diamonds$price)))
}
diamonds %>%
ggplot(aes(scale(price) %>% as.vector)) +
geom_density() +
geom_vline(aes(xintercept = -0.98580251364833), linetype = 2) +
facet_wrap(vars(cut)) +
scale_x_continuous(label = labeller, limits = c(-3, 3)) +
xlab("price")
We can use the sec_axis functionality in scale_x_continuous. To use this functionality we need to manually scale your data. This will add a secondary axis at the top of the plot, not underneath. So it's not quite exactly what you're looking for.
library(tidyverse)
# manually scale the data
mean_price <- mean(diamonds$price)
sd_price <- sd(diamonds$price)
diamonds$price_scaled <- (diamonds$price - mean_price) / sd_price
# make the plot
ggplot(diamonds, aes(price_scaled))+
geom_density()+
facet_wrap(~cut)+
scale_x_continuous(sec.axis = sec_axis(~ mean_price + (sd_price * .)),
limits = c(-3, 4), breaks = -3:3)
You could cheat a bit by passing some dummy data to geom_text:
geom_text(data = tibble(label = round(((-3:3) * sd_price) + mean_price),
y = -0.25,
x = -3:3),
aes(x, y, label = label))
Is there a way to specify that a custom_annotation only applies to one facet of a ggplot?
For example, if I run the following code
library(tidyverse)
library(grid)
text_grob=grobTree(textGrob("text",x=0.5, y=0.6, rot=90,
gp=gpar(col="red")))
ggplot(mtcars, aes(x=mpg, y =drat))+
geom_point() +
facet_wrap(~cyl) +
annotation_custom(overrep_grob)
I get this
How can I only keep the rightmost red "text" annotation and not add the "text" annotation to the first two facets? Note I can't use geom_text or annotate because I need to make use of textGrob's relative text positioning
egg has geom_custom,
library(ggplot2)
library(grid)
library(egg)
d = data.frame(cyl=6, drat = 4, mpg = 15)
d$grob <- list(textGrob("text",rot=90, hjust = 0, gp=gpar(col="red")))
ggplot(mtcars, aes(x=mpg, y=drat))+
geom_point() +
facet_wrap(~cyl) +
geom_custom(data = d, aes(data = grob), grob_fun = identity)
You could also do this using geom_text by calculating the relative position of the text needed. Note that here, the relative position is slightly different than the one you use above because here I define the relative position as some proportion of the dynamic range. You can choose a different value for rel to get the position you need. I find that this way makes the positioning less arbitrary.
library(tidyverse)
rel_pos <- function(.data, var, rel){
var <- enquo(var)
.data %>%
summarise(x = sum(max(!!var), min(!!var))* rel) %>% .[1, "x"]
}
my_text <- data_frame(mpg = rel_pos(mtcars, mpg, 0.5),
drat = rel_pos(mtcars, drat, 0.6) ,
cyl = 8, lab = "text")
ggplot(mtcars, aes(x=mpg, y =drat))+
geom_point() +
facet_wrap(~cyl)+
geom_text(data = my_text, aes(label = lab), color = "red", angle = 90)
Created on 2018-08-15 by the reprex
package (v0.2.0).
I am learning R with the Australian athletes data set.
By using ggplot, I can plot a histogram like this.
library(DAAG)
ggplot(ais, aes(wt, fill = sex)) +
geom_histogram(binwidth = 5)
By using summary(ais$wt), the 3rd Quartile is 84.12. Now I want to split the data by the wt 84.12. and plot 2 similar histograms accordingly (side by side)
The split is:
ais1 = ais$wt[which(ais$wt>=0 & ais$wt<=84.12)]
ais2 = ais$wt[which(ais$wt>84.12)]
But I don’t know how to fit them in the plotting. I tried but it doesn't work:
ggplot(ais1, aes(wt, fill = sex)) +...
How can I plot the histograms (2 similar histograms accordingly, side by side)?
Add the split as a column to your data
ais$wt_3q = ifelse(ais$wt < 84.12, "Quartiles 1-3", "Quartile 4")
Then use facets:
ggplot(ais, aes(wt, fill = sex)) +
geom_histogram(binwidth = 5) +
facet_wrap(~ wt_3q)
The created variable is a factor, if you specify the order of the levels you can order the facets differently (lots of questions on here showing that if you search for them - same as reordering bars for a ggplot barplot). You can also let the scales vary - look at ?facet_wrap for more details.
Generally, you shouldn't create more data frames. Creating ais1 and ais2 is usually avoidable, and your life will be simpler if you use a single data frame for a single data set. Adding a new column for grouping makes it easy to keep things organized.
We can do this with ggarrange to arrange the plot objects for each subset
library(DAAG)
library(ggplot2)
library(ggpubr)
p2 <- ais %>%
filter(wt>=0, wt<=84.12) %>%
ggplot(., aes(wt, fill = sex)) +
geom_histogram(binwidth = 5) +
coord_cartesian(ylim = c(0, 30))
p1 <- ais %>%
filter(wt>84.12) %>%
ggplot(., aes(wt, fill = sex)) +
geom_histogram(binwidth = 5) +
coord_cartesian(ylim = c(0, 30))
ggarrange(p1, p2, ncol =2, nrow = 1, labels = c("p1", "p2"))
-output
If I want to order the bars in a ggplot2 barchart from largest to smallest, then I'd usually update the factor levels of the bar category, like so
one_group <- data.frame(
height = runif(5),
category = gl(5, 1)
)
o <- order(one_group$height, decreasing = TRUE)
one_group$category <- factor(one_group$category, levels = one_group$category[o])
p_one_group <- ggplot(one_group, aes(category, height)) +
geom_bar(stat = "identity")
p_one_group
If have have several groups of barcharts that I'd like in different facets, with each facet having bars ordered from largest to smallest (and different x-axes) then the technique breaks down.
Given some sample data
two_groups <- data.frame(
height = runif(10),
category = gl(5, 2),
group = gl(2, 1, 10, labels = letters[1:2])
)
and the plotting code
p_two_groups <- ggplot(two_groups, aes(category, height)) +
geom_bar(stat = "identity") +
facet_grid(. ~ group, scales = "free_x")
p_two_groups
what do I need to do to get the bar ordering right?
If it helps, an equivalent problem to solve is: how do I update factor levels after I've done the faceting?
here is a hack:
two_groups <- transform(two_groups, category2 = factor(paste(group, category)))
two_groups <- transform(two_groups, category2 = reorder(category2, rank(height)))
ggplot(two_groups, aes(category2, height)) +
geom_bar(stat = "identity") +
facet_grid(. ~ group, scales = "free_x") +
scale_x_discrete(labels=two_groups$category, breaks=two_groups$category2)
make UNIQUE factor variable for all entries (category2)
reorder the variable based on the height
plot on the variable: aes(x=category2)
re-label the axis using original value (category) for the variable (category2) in scale_x_discrete.
Here is a hack to achieve what you want. I was unable to figure out how to get the category values below the tick marks. So if someone can help fix that, it would be wonderful. Let me know if this works
# add a height rank variable to the data frame
two_groups = ddply(two_groups, .(group), transform, hrank = rank(height));
# plot the graph
p_two_groups <- ggplot(two_groups, aes(-hrank, height)) +
geom_bar(stat = "identity") +
facet_grid(. ~ group, scales = "free_x") +
opts(axis.text.x = theme_blank()) +
geom_text(aes(y = 0, label = category, vjust = 1.5))