Format ggplot2 axis labels such that only numbers > 9999 have commas - r

I'm trying to adhere to a publication style guide whereby only numbers with 5 or more digits have commas. Have searched this but not found a way to override the defaults when using 'labels=comma.' Below is an example:
require(dplyr)
require(ggplot2)
require(scales)
# create mock dataframe
temp <- mpg %>% mutate(newvar=(hwy*300))
ggplot(temp, aes(x=cyl, y=newvar)) + geom_point() +
scale_y_continuous(labels=comma) +
labs(title="When using 'labels=comma'...",
subtitle="How format axis labels such that commas only appear for numbers > 9999?")
Using this example, would like the lowermost y-axis labels to read "4000", "6000" etc. Could achieve this manually but that's not worth the bother, as have many graphs with scales encompassing this range. Any suggestions?

We can use an anonymous function within scale_x_continuous:
library(scales)
library(ggplot2)
# generate dummy data
x <- 9998:10004
df <- data.frame(x, y = seq_along(x))
ggplot(df, aes(x = x, y = y))+
geom_point()+
scale_x_continuous(labels = function(l) ifelse(l <= 9999, l, comma(l)))

Related

Change axes label and scale using ggplot and patchwork in R

(I am trying to make this question as short and concise as possible, as other related answers may be tough for the non-savvy like myself.)
With the following code in mind, is it possible to have both y-axes on the same scale (that of the graph with the highest y-limit), and to have independent labels for each of the axes (namely the y-axes)? I tried to use facet_wrap but haven't so far been able to succeed as Layer 1 is missing)
library(ggplot2)
library(patchwork)
d <- cars
d$Obs <- c(1:50)
f1 <- function(a) {
ggplot(data=d, aes_string(x="Obs", y=a)) +
geom_line() +
labs(x="Observation",y="Speed/Distance")
}
f1("speed") + f1("dist")
You could add two additional arguments to your function, one for the axis label and one for your desired limits.
library(ggplot2)
library(patchwork)
d <- cars
d$Obs <- c(1:50)
f1 <- function(a, y_lab) {
ggplot(data = d, aes_string(x = "Obs", y = a)) +
geom_line() +
scale_y_continuous(limits = range(c(d$speed, d$dist))) +
labs(x = "Observation", y = y_lab)
}
f1("speed", "Speed") + f1("dist", "Distance")
Reshape wide-to-long, then use facet. Instead of having different y-axis labels we will have facet labels:
library(ggplot2)
library(tidyr)
pivot_longer(d, 1:2, names_to = "grp") %>%
ggplot(aes(x = Obs, y = value)) +
geom_line() +
facet_wrap(vars(grp))

Label every n-th x-axis tick on boxplot

I would like to remove every n-th x-axis tick labels from a geom_boxplot (ggplot).
For example take this dummy dataframe:
Lat <- c(rep(50.70,3), rep(51.82,3), rep(52.78,3), rep(56.51,3))
y <- c(seq(1,2, by=0.5), seq(1,3, by=1), seq(2,6,by=2), seq(1,5,by=2))
df <- as.data.frame(cbind(Lat, y))
I can make a ggplot boxplot like so:
box_plot <- ggplot(df, aes(x=as.factor(Lat), y=y))+
geom_boxplot()+
labs(x="Latitude")+
scale_y_continuous(breaks = pretty_breaks(n=6)) +
theme_classic()
box_plot
However I would like to remove the labels from the middle two boxes.
I know I can achieve this by changing the labels to simply be blank (as below).
However, my real dataframe has many more than 4 ticks so this would be time consuming never mind more likely for human error!
box_plot2 <- ggplot(df, aes(x=as.factor(Lat), y=y))+
geom_boxplot()+
labs(x="Latitude")+
scale_y_continuous(breaks = pretty_breaks(n=6)) +
scale_x_discrete(labels=c("50.70", " ", " ", "56.51"))+
theme_classic()
box_plot2
Is there a way to produce the above plot without having to manually set the labels?
For example label every n-th tick on the x axis?
Thanks in advance!
This can be achieved like. As an example I just plot "every" third tick. Basic idea is to add an index for the factor levels. This index can then be used to specify the breaks or ticks one wants to plot. Try this:
Lat <- c(rep(50.70,3), rep(51.82,3), rep(52.78,3), rep(56.51,3))
y <- c(seq(1,2, by=0.5), seq(1,3, by=1), seq(2,6,by=2), seq(1,5,by=2))
df <- as.data.frame(cbind(Lat, y))
library(ggplot2)
library(scales)
library(dplyr)
df <- df %>%
mutate(Lat1 = as.factor(Lat),
Lat1_index = as.integer(Lat1))
# Which ticks should be shown on x-axis
breaks <- df %>%
# e.g. plot only every third tick
mutate(ticks_to_plot = Lat1_index %% 3 == 0) %>%
filter(ticks_to_plot) %>%
pull(Lat1)
box_plot2 <- ggplot(df, aes(x=Lat1, y=y))+
geom_boxplot()+
labs(x="Latitude")+
scale_y_continuous(breaks = pretty_breaks(n=6)) +
scale_x_discrete(breaks = breaks)+
theme_classic()
box_plot2
Created on 2020-03-30 by the reprex package (v0.3.0)

Align multiple ggplot graphs with and without legends [duplicate]

This question already has answers here:
Align multiple plots in ggplot2 when some have legends and others don't
(6 answers)
Closed 5 years ago.
I'm trying to use ggplot to draw a graph comparing the absolute values of two variables, and also show the ratio between them. Since the ratio is unitless and the values are not, I can't show them on the same y-axis, so I'd like to stack vertically as two separate graphs with aligned x-axes.
Here's what I've got so far:
library(ggplot2)
library(dplyr)
library(gridExtra)
# Prepare some sample data.
results <- data.frame(index=(1:20))
results$control <- 50 * results$index
results$value <- results$index * 50 + 2.5*results$index^2 - results$index^3 / 8
results$ratio <- results$value / results$control
# Plot absolute values
plot_values <- ggplot(results, aes(x=index)) +
geom_point(aes(y=value, color="value")) +
geom_point(aes(y=control, color="control"))
# Plot ratios between values
plot_ratios <- ggplot(results, aes(x=index, y=ratio)) +
geom_point()
# Arrange the two plots above each other
grid.arrange(plot_values, plot_ratios, ncol=1, nrow=2)
The big problem is that the legend on the right of the first plot makes it a different size. A minor problem is that I'd rather not show the x-axis name and tick marks on the top plot, to avoid clutter and make it clear that they share the same axis.
I've looked at this question and its answers:
Align plot areas in ggplot
Unfortunately, neither answer there works well for me. Faceting doesn't seem a good fit, since I want to have completely different y scales for my two graphs. Manipulating the dimensions returned by ggplot_gtable seems more promising, but I don't know how to get around the fact that the two graphs have a different number of cells. Naively copying that code doesn't seem to change the resulting graph dimensions for my case.
Here's another similar question:
The perils of aligning plots in ggplot
The question itself seems to suggest a good option, but rbind.gtable complains if the tables have different numbers of columns, which is the case here due to the legend. Perhaps there's a way to slot in an extra empty column in the second table? Or a way to suppress the legend in the first graph and then re-add it to the combined graph?
Here's a solution that doesn't require explicit use of grid graphics. It uses facets, and hides the legend entry for "ratio" (using a technique from https://stackoverflow.com/a/21802022).
library(reshape2)
results_long <- melt(results, id.vars="index")
results_long$facet <- ifelse(results_long$variable=="ratio", "ratio", "values")
results_long$facet <- factor(results_long$facet, levels=c("values", "ratio"))
ggplot(results_long, aes(x=index, y=value, colour=variable)) +
geom_point() +
facet_grid(facet ~ ., scales="free_y") +
scale_colour_manual(breaks=c("control","value"),
values=c("#1B9E77", "#D95F02", "#7570B3")) +
theme(legend.justification=c(0,1), legend.position=c(0,1)) +
guides(colour=guide_legend(title=NULL)) +
theme(axis.title.y = element_blank())
Try this:
library(ggplot2)
library(gtable)
library(gridExtra)
AlignPlots <- function(...) {
LegendWidth <- function(x) x$grobs[[8]]$grobs[[1]]$widths[[4]]
plots.grobs <- lapply(list(...), ggplotGrob)
max.widths <- do.call(unit.pmax, lapply(plots.grobs, "[[", "widths"))
plots.grobs.eq.widths <- lapply(plots.grobs, function(x) {
x$widths <- max.widths
x
})
legends.widths <- lapply(plots.grobs, LegendWidth)
max.legends.width <- do.call(max, legends.widths)
plots.grobs.eq.widths.aligned <- lapply(plots.grobs.eq.widths, function(x) {
if (is.gtable(x$grobs[[8]])) {
x$grobs[[8]] <- gtable_add_cols(x$grobs[[8]],
unit(abs(diff(c(LegendWidth(x),
max.legends.width))),
"mm"))
}
x
})
plots.grobs.eq.widths.aligned
}
df <- data.frame(x = c(1:5, 1:5),
y = c(1:5, seq.int(5,1)),
type = factor(c(rep_len("t1", 5), rep_len("t2", 5))))
p1.1 <- ggplot(diamonds, aes(clarity, fill = cut)) + geom_bar()
p1.2 <- ggplot(df, aes(x = x, y = y, colour = type)) + geom_line()
plots1 <- AlignPlots(p1.1, p1.2)
do.call(grid.arrange, plots1)
p2.1 <- ggplot(diamonds, aes(clarity, fill = cut)) + geom_bar()
p2.2 <- ggplot(df, aes(x = x, y = y)) + geom_line()
plots2 <- AlignPlots(p2.1, p2.2)
do.call(grid.arrange, plots2)
Produces this:
// Based on multiple baptiste's answers
Encouraged by baptiste's comment, here's what I did in the end:
library(ggplot2)
library(dplyr)
library(gridExtra)
# Prepare some sample data.
results <- data.frame(index=(1:20))
results$control <- 50 * results$index
results$value <- results$index * 50 + 2.5*results$index^2 - results$index^3 / 8
results$ratio <- results$value / results$control
# Plot ratios between values
plot_ratios <- ggplot(results, aes(x=index, y=ratio)) +
geom_point()
# Plot absolute values
remove_x_axis =
theme(
axis.ticks.x = element_blank(),
axis.text.x = element_blank(),
axis.title.x = element_blank())
plot_values <- ggplot(results, aes(x=index)) +
geom_point(aes(y=value, color="value")) +
geom_point(aes(y=control, color="control")) +
remove_x_axis
# Arrange the two plots above each other
grob_ratios <- ggplotGrob(plot_ratios)
grob_values <- ggplotGrob(plot_values)
legend_column <- 5
legend_width <- grob_values$widths[legend_column]
grob_ratios <- gtable_add_cols(grob_ratios, legend_width, legend_column-1)
grob_combined <- gtable:::rbind_gtable(grob_values, grob_ratios, "first")
grob_combined <- gtable_add_rows(
grob_combined,unit(-1.2,"cm"), pos=nrow(grob_values))
grid.draw(grob_combined)
(I later realised I didn't even need to extract the legend width, since the size="first" argument to rbind tells it just to have that one override the other.)
It feels a bit messy, but it is exactly the layout I was hoping for.
An alternative & quite easy solution is as follows:
# loading needed packages
library(ggplot2)
library(dplyr)
library(tidyr)
# Prepare some sample data
results <- data.frame(index=(1:20))
results$control <- 50 * results$index
results$value <- results$index * 50 + 2.5*results$index^2 - results$index^3 / 8
results$ratio <- results$value / results$control
# reshape into long format
long <- results %>%
gather(variable, value, -index) %>%
mutate(facet = ifelse(variable=="ratio", "ratio", "values"))
long$facet <- factor(long$facet, levels=c("values", "ratio"))
# create the plot & remove facet labels with theme() elements
ggplot(long, aes(x=index, y=value, colour=variable)) +
geom_point() +
facet_grid(facet ~ ., scales="free_y") +
scale_colour_manual(breaks=c("control","value"), values=c("green", "red", "blue")) +
theme(axis.title.y=element_blank(), strip.text=element_blank(), strip.background=element_blank())
which gives:

Create part-fixed, part-free axis limits on facets with ggplot?

I'd like to create a faceted plot using ggplot2 in which the minimum limit of the y axis will be fixed (say at 0) and the maximum limit will be determined by the data in the facet (as it is when scales="free_y". I was hoping that something like the following would work, but no such luck:
library(plyr)
library(ggplot2)
#Create the underlying data
l <- gl(2, 10, 20, labels=letters[1:2])
x <- rep(1:10, 2)
y <- c(runif(10), runif(10)*100)
df <- data.frame(l=l, x=x, y=y)
#Create a separate data frame to define axis limits
dfLim <- ddply(df, .(l), function(y) max(y$y))
names(dfLim)[2] <- "yMax"
dfLim$yMin <- 0
#Create a plot that works, but has totally free scales
p <- ggplot(df, aes(x=x, y=y)) + geom_point() + facet_wrap(~l, scales="free_y")
#Add y limits defined by the limits dataframe
p + ylim(dfLim$yMin, dfLim$yMax)
It's not too surprising to me that this throws an error (length(lims) == 2 is not TRUE) but I can't think of a strategy to get started on this problem.
In your case, either of the following will work:
p + expand_limits(y=0)
p + aes(ymin=0)

Plotting two variables using ggplot2 - same x axis

I have two graphs with the same x axis - the range of x is 0-5 in both of them.
I would like to combine both of them to one graph and I didn't find a previous example.
Here is what I got:
c <- ggplot(survey, aes(often_post,often_privacy)) + stat_smooth(method="loess")
c <- ggplot(survey, aes(frequent_read,often_privacy)) + stat_smooth(method="loess")
How can I combine them?
The y axis is "often privacy" and in each graph the x axis is "often post" or "frequent read".
I thought I can combine them easily (somehow) because the range is 0-5 in both of them.
Many thanks!
Example code for Ben's solution.
#Sample data
survey <- data.frame(
often_post = runif(10, 0, 5),
frequent_read = 5 * rbeta(10, 1, 1),
often_privacy = sample(10, replace = TRUE)
)
#Reshape the data frame
survey2 <- melt(survey, measure.vars = c("often_post", "frequent_read"))
#Plot using colour as an aesthetic to distinguish lines
(p <- ggplot(survey2, aes(value, often_privacy, colour = variable)) +
geom_point() +
geom_smooth()
)
You can use + to combine other plots on the same ggplot object. For example, to plot points and smoothed lines for both pairs of columns:
ggplot(survey, aes(often_post,often_privacy)) +
geom_point() +
geom_smooth() +
geom_point(aes(frequent_read,often_privacy)) +
geom_smooth(aes(frequent_read,often_privacy))
Try this:
df <- data.frame(x=x_var, y=y1_var, type='y1')
df <- rbind(df, data.frame(x=x_var, y=y2_var, type='y2'))
ggplot(df, aes(x, y, group=type, col=type)) + geom_line()

Resources