Label every n-th x-axis tick on boxplot - r

I would like to remove every n-th x-axis tick labels from a geom_boxplot (ggplot).
For example take this dummy dataframe:
Lat <- c(rep(50.70,3), rep(51.82,3), rep(52.78,3), rep(56.51,3))
y <- c(seq(1,2, by=0.5), seq(1,3, by=1), seq(2,6,by=2), seq(1,5,by=2))
df <- as.data.frame(cbind(Lat, y))
I can make a ggplot boxplot like so:
box_plot <- ggplot(df, aes(x=as.factor(Lat), y=y))+
geom_boxplot()+
labs(x="Latitude")+
scale_y_continuous(breaks = pretty_breaks(n=6)) +
theme_classic()
box_plot
However I would like to remove the labels from the middle two boxes.
I know I can achieve this by changing the labels to simply be blank (as below).
However, my real dataframe has many more than 4 ticks so this would be time consuming never mind more likely for human error!
box_plot2 <- ggplot(df, aes(x=as.factor(Lat), y=y))+
geom_boxplot()+
labs(x="Latitude")+
scale_y_continuous(breaks = pretty_breaks(n=6)) +
scale_x_discrete(labels=c("50.70", " ", " ", "56.51"))+
theme_classic()
box_plot2
Is there a way to produce the above plot without having to manually set the labels?
For example label every n-th tick on the x axis?
Thanks in advance!

This can be achieved like. As an example I just plot "every" third tick. Basic idea is to add an index for the factor levels. This index can then be used to specify the breaks or ticks one wants to plot. Try this:
Lat <- c(rep(50.70,3), rep(51.82,3), rep(52.78,3), rep(56.51,3))
y <- c(seq(1,2, by=0.5), seq(1,3, by=1), seq(2,6,by=2), seq(1,5,by=2))
df <- as.data.frame(cbind(Lat, y))
library(ggplot2)
library(scales)
library(dplyr)
df <- df %>%
mutate(Lat1 = as.factor(Lat),
Lat1_index = as.integer(Lat1))
# Which ticks should be shown on x-axis
breaks <- df %>%
# e.g. plot only every third tick
mutate(ticks_to_plot = Lat1_index %% 3 == 0) %>%
filter(ticks_to_plot) %>%
pull(Lat1)
box_plot2 <- ggplot(df, aes(x=Lat1, y=y))+
geom_boxplot()+
labs(x="Latitude")+
scale_y_continuous(breaks = pretty_breaks(n=6)) +
scale_x_discrete(breaks = breaks)+
theme_classic()
box_plot2
Created on 2020-03-30 by the reprex package (v0.3.0)

Related

Combine scale_x_upset with scale_y_break

I made an upset plot using the ggupset package and added a break to the y axis with scale_y_break from the ggbreakpackage.
However, when I add scale_y_break, the combination matrix under the bar plot disappears.
Is there a way to combine the combination matrix of the plot made without scale_y_break with the bar plot portion of a plot made with scale_y_break? I can't seem to be able to access the grobs of these plots or use any other workaround. If anyone could help, I would greatly appreciate it!
Example with scale_x_upset and scale_y_break:
df = tidy_movies %>% distinct(title, year, length, .keep_all=TRUE)
ggplot(df, aes(x=Genres)) + geom_bar() + scale_x_upset(n_intersections = 20)+ scale_y_break(breaks = c(750,1000))
I would like to combine the barplot portion of the plot created with:
df = tidy_movies %>% distinct(title, year, length, .keep_all=TRUE)
ggplot(df, aes(x=Genres)) + geom_bar() + scale_x_upset(n_intersections = 20)+ scale_y_break(breaks = c(750,1000))
with the combination matrix portion of the plot made with:
df = tidy_movies %>% distinct(title, year, length, .keep_all=TRUE)
ggplot(df, aes(x=Genres)) + geom_bar() + scale_x_upset(n_intersections = 20)
Thanks!

Inserting horizontal line to line chart in ggplot2

I must plot 25 plots, each with its own dataset. I need to insert a horizontal line into each plot. Problem is, the coordinates cannot be hardcoded as each dataset's range varies.
I need to have the horizontal line always to be at the first value of the according dataset
This is my geom for the line that I tried (the y-axis intercept is hardcoded in this case and doesnt help).
+ geom_hline(yintercept=c(75,0), linetype="dotted")
I can grab the value (which is at the identical position in each dataset for each plot) for each line's y-intersepction with this:
dataset[1, 6]
which I could also store in a vector like this
coord <- dataset[1, 6]
But not having any success bringing this together
I tried with no luck:
+ geom_hline(yintercept=coord, linetype="dotted")
Example Code:
a <- c(10,40,30,22)
b <- c(1,2,3,4)
df <- data.frame(a,b)
try <- df %>% ggplot(aes(x = b, y = a)) + geom_line() + scale_y_continuous(expand = c(0,0), limits = c(0, NA)) + geom_hline(yintercept=c(30,0), linetype="dotted") + theme_tq()
Thanks in advance
I don't understand what exactly is causing you trouble. If I loop through a list of dataframes, I can set the yintercept of each corresponding plot without too much trouble. Example below:
library(ggplot2)
library(patchwork)
# Split the economics dataset as an example
datasets <- split(economics, cut(seq_len(nrow(economics)), 9))
# Loop through list of dataframes, set hline to [1, 6] (drop because tibble)
plots <- lapply(datasets, function(df) {
ggplot(df, aes(date, unemploy)) +
geom_line() +
scale_y_continuous(limits = c(0, NA)) +
geom_hline(yintercept = c(df[1, 6, drop = TRUE], 0),
linetype = "dotted")
})
# For visualisation purposes
wrap_plots(plots)
Created on 2020-12-04 by the reprex package (v0.3.0)

Format ggplot2 axis labels such that only numbers > 9999 have commas

I'm trying to adhere to a publication style guide whereby only numbers with 5 or more digits have commas. Have searched this but not found a way to override the defaults when using 'labels=comma.' Below is an example:
require(dplyr)
require(ggplot2)
require(scales)
# create mock dataframe
temp <- mpg %>% mutate(newvar=(hwy*300))
ggplot(temp, aes(x=cyl, y=newvar)) + geom_point() +
scale_y_continuous(labels=comma) +
labs(title="When using 'labels=comma'...",
subtitle="How format axis labels such that commas only appear for numbers > 9999?")
Using this example, would like the lowermost y-axis labels to read "4000", "6000" etc. Could achieve this manually but that's not worth the bother, as have many graphs with scales encompassing this range. Any suggestions?
We can use an anonymous function within scale_x_continuous:
library(scales)
library(ggplot2)
# generate dummy data
x <- 9998:10004
df <- data.frame(x, y = seq_along(x))
ggplot(df, aes(x = x, y = y))+
geom_point()+
scale_x_continuous(labels = function(l) ifelse(l <= 9999, l, comma(l)))

Add labels above top axis in ggplot2 graph while keeping original x axis on bottom

I'm trying to add some labels to a ggplot2 boxplot to indicate the number of observations, and I'd like that annotation to appear above the top axis of the graph. I can add them inside the graph pretty easily, and I suspect there's an application of ggplot_gtable that might do this, but I don't understand how to use that (a point in the direction of a good tutorial would be much appreciated). Here's some example data with labels:
Count <- sample(100:500, 3)
MyData <- data.frame(Category = c(rep("A", Count[1]), rep("B", Count[2]),
rep("C", Count[3])),
Value = c(rnorm(Count[1], 10),
rnorm(Count[2], 20),
rnorm(Count[3], 30)))
MyCounts <- data.frame(Category = c("A", "B", "C"),
Count = Count)
MyCounts$Label <- paste("n =", MyCounts$Count)
ggplot(MyData, aes(x = Category, y = Value)) +
geom_boxplot() +
annotate("text", x = MyCounts$Category, y = 35,
label = MyCounts$Label)
What I'd love is for the "n = 441" and other labels to appear above the graph rather than just inside the upper boundary. Any suggestions?
Rather than separately calculating the counts, you can add the counts with geom_text and the original data frame (MyData). The key is that we need to add stat="count" inside geom_text so that counts will be calculated and can be used as the text labels.
theme_set(theme_classic())
ggplot(MyData, aes(x = Category, y = Value)) +
geom_boxplot() +
geom_text(stat="count", aes(label=paste0("n=",..count..)), y=1.05*max(MyData$Value)) +
expand_limits(y=1.05*max(MyData$Value))
To put the labels above the plot, add some space above the plot area for the text labels and then use the code in the answer linked by #aosmith to override clipping:
library(grid)
theme_set(theme_bw())
p = ggplot(MyData, aes(x = Category, y = Value)) +
geom_boxplot() +
geom_text(stat="count", aes(label=paste0("n=",..count..)),
y=1.06*max(MyData$Value), size=5) +
theme(plot.margin=margin(t=20))
# Override clipping
gt <- ggplot_gtable(ggplot_build(p))
gt$layout$clip[gt$layout$name == "panel"] <- "off"
grid.draw(gt)

Plotting two variables using ggplot2 - same x axis

I have two graphs with the same x axis - the range of x is 0-5 in both of them.
I would like to combine both of them to one graph and I didn't find a previous example.
Here is what I got:
c <- ggplot(survey, aes(often_post,often_privacy)) + stat_smooth(method="loess")
c <- ggplot(survey, aes(frequent_read,often_privacy)) + stat_smooth(method="loess")
How can I combine them?
The y axis is "often privacy" and in each graph the x axis is "often post" or "frequent read".
I thought I can combine them easily (somehow) because the range is 0-5 in both of them.
Many thanks!
Example code for Ben's solution.
#Sample data
survey <- data.frame(
often_post = runif(10, 0, 5),
frequent_read = 5 * rbeta(10, 1, 1),
often_privacy = sample(10, replace = TRUE)
)
#Reshape the data frame
survey2 <- melt(survey, measure.vars = c("often_post", "frequent_read"))
#Plot using colour as an aesthetic to distinguish lines
(p <- ggplot(survey2, aes(value, often_privacy, colour = variable)) +
geom_point() +
geom_smooth()
)
You can use + to combine other plots on the same ggplot object. For example, to plot points and smoothed lines for both pairs of columns:
ggplot(survey, aes(often_post,often_privacy)) +
geom_point() +
geom_smooth() +
geom_point(aes(frequent_read,often_privacy)) +
geom_smooth(aes(frequent_read,often_privacy))
Try this:
df <- data.frame(x=x_var, y=y1_var, type='y1')
df <- rbind(df, data.frame(x=x_var, y=y2_var, type='y2'))
ggplot(df, aes(x, y, group=type, col=type)) + geom_line()

Resources