ggplot2 integer multiple of minor breaks per major break - r

This answer shows how you can specify where the minor breaks should go. In the documentation it says that minor_breaks can be a function. This, however, takes as input the plot limits not, as I expected, the location of the major gridlines below and above.
It doesn't seem very simple to make a script that will return me, say, 4 minors per major. This is something I would like to do since I have a script that I want to use on multiple different datasets. I don't know the limits beforehand, so I can't hard code them in. I can of course create a function that gets the values I need from the dataset before plotting, but it seems overkill.
Is there a general way to state the number of minor breaks per major break?

You can extract the majors from the plot, and from there calculate what minors you want and set it for your plot.
df <- data.frame(x = 0:10,
y = 0:10)
p <- ggplot(df, aes(x,y)) + geom_point()
majors <- ggplot_build(p)$panel$ranges[[1]]$x.major_source
multiplier <- 4
minors <- seq(from = min(majors),
to = max(majors),
length.out = ((length(majors) - 1) * multiplier) + 1)
p + scale_x_continuous(minor_breaks = minors)

I think scales::extended_breaks is the default function for a continuous scale. You can set the number of breaks in this function, and make the number of minor_breaks a integer multiple of the number of breaks.
library(ggplot2)
library(scales)
nminor <- 7
nmajor <- 5
ggplot(iris, aes(x = Species, y = Sepal.Length)) +
geom_point() +
scale_y_continuous(breaks = extended_breaks(n = nmajor), minor_breaks = extended_breaks(n = nmajor * nminor) )

Using ggplot2 version 3, I have to modify Eric Watt's code above a bit to get it to work (I can't comment on that instead since I don't have a 50 reputation yet)
library(ggplot2)
df <- data.frame(x = 0:10,
y = 10:20)
p <- ggplot(df, aes(x,y)) + geom_point()
majors <- ggplot_build(p)$layout$panel_params[[1]]$x.major_source;majors
multiplier <- 10
minors <- seq(from = min(majors),
to = max(majors),
length.out = ((length(majors) - 1) * multiplier) + 1);minors
p + scale_x_continuous(minor_breaks = minors)
If I copy paste the same code in my editor, it doesn't create majors (NULL), and so the next line gives an error.

Related

How to write the abscissa of the maximum under x'x

I'm plotting a time series data, say 'data1', I use plot.ts(data1)
then I use abline(which.max(data1))
Now I want to add the abscissa of the maximum point, say x-abscissa=19, but sometimes it appears confounded to the number that already exist in x-axis,
My question: how can I write the abscissa of the maximum below the number that already exist on x'x.
s=c(1,1.5,2,4,1,1,5,3,5,2,3,5,2,5,2,2,4,2,7,5,2,3,5,2,3,5,2,3,5,2,3,5)
plot.ts(s)
abline(v=which.max(s), col= "red", lty=2, lwd=1)
axis(1,which.max(s))
Does this work for you Salman?
# axis(1,which.max(s))
library(glue)
label <- glue("Max is {max(s)}")
text(which.max(s), 0.2*max(s), label)
Ah, OK, like this? Not really sure what x'x means.
(I swapped to tidyverse from base R, which is much easier to use, and gives beautiful plots)
library(glue)
library(tibble)
library(ggplot2)
s <- c(1,1.5,2,4,1,1,5,3,5,2,3,5,2,5,2,2,4,2,7,5,2,3,5,2,3,5,2,3,5,2,3,5)
s <- tibble(x = 1:32, y = s)
label <- glue("Max is {max(s$y)}")
ref_line <- which.max(s$y)
ggplot(s, aes(x, y)) +
geom_line() +
labs(tag = label) +
theme(plot.tag.position = c(.65, 0.02)) +
geom_vline(xintercept = ref_line, col = "red")

ggpot2: space axis ticks unevenly between equidistant values

I have searched SO and other online sources to no avail.
Is there a way to scale an axis such that z-scores will better reflect the actual difference from 0 to 1 and from 1 to 2 (or any other equally spaced score)?
If I have an x-axis with z-scores ranging from -3 to 3 and axis ticks at every integer between, is there a way to have those axis ticks which are closer to 0 be spaced smaller than those that are farther?
Example:
-3 -2 -1 0 1 2 3
|----------|------|--|--|------|----------|
Am I missing some axis scaling method which accepts both the breaks as values but also the position of the breaks relative to the entire scale?
EDIT:
Maybe not quite a reprex, but this is the structure of the data and basic method of visualization:
df <-
data.frame(
metric = c('metric1', 'metric2', 'metric3'),
z_score = c(2, -1.5, 2.8)
)
df %>%
ggplot(aes(x = metric, y = z_score)) +
geom_col() +
coord_flip() +
ylim(-4,4)
The code above produces a plot where the z_score axis has evenly spaced breaks, whereas I would like the breaks to be "pulled" toward zero like I attempted to draw above.
What you describe seems to correspond to a modulus transformation, but I don't know how to choose the correct parameters to get the exact transformation that you want.
Here is an example:
library(ggplot2)
library(scales)
df <- data.frame(
metric = c('metric1', 'metric2', 'metric3'),
z_score = c(2, -1.5, 2.8)
)
ggplot(df, aes(x = metric, y = z_score)) +
geom_col() +
coord_flip() +
scale_y_continuous(trans = modulus_trans(2),
limits = c(-4, 4),
breaks = c(-3:3))
Created on 2020-05-28 by the reprex package (v0.3.0)
The trick to this is to use a new transformation object. There are several already defined in scales::, and the closest I found (though it is opposite, in a sense) is:
ggplot(df, aes(x = metric, y = z_score)) +
geom_col() +
coord_flip() +
scale_y_continuous(trans=scales::pseudo_log_trans(0.2, 2),
limits = c(-3, 3), breaks = -3:3)
But that has the opposite expansion I think you want. Since one way to see the opposite of pseudo_log would be pseudo_exp, and I didn't find one, here's an attempt:
pseudo_exp_trans <- function(pow = 2) {
scales::trans_new(
"pseudo_exp",
function(x) sign(x) * abs(x^pow),
function(x) sign(x) * abs(x)^(1/pow))
}
ggplot(df, aes(x = metric, y = z_score)) +
geom_col() +
coord_flip() +
scale_y_continuous(trans=pseudo_exp_trans(),
limits = c(-3, 3), breaks = -3:3)
Just play with the pow= argument to find the growth-rate you want in the axis.

Remove outliers and reduce yLim appropriately for each facet in ggplot2

I am currently making a facet multi box plot using ggplot2, where I have cleared the outliers and set the yLim to 5000.
However, not all of the boxplots (the ones at the beginning of the image below) go anywhere near 5000. How can I reduce the y axis for only a select few of these boxplots in the image? I've tried multiple answers from the community, but they seem to be outdated.
Here is the code I am using:
require(reshape2)
require(ggplot2)
data_frame <- read.csv("results.csv", header=T)
p <- ggplot(data=data_frame, aes(x='', y=value)) + geom_boxplot(outlier.shape=NA, aes(fill=policy))
p <- p + facet_wrap( ~ level, scales="free") + coord_cartesian(ylim = c(0, 5000))
p <- p + xlab("") + ylab("Authorisation Time (ms)") + ggtitle("Title")
ggsave("bplots.png", plot=last_plot(), device=png())
As noted above, you pretty much have to filter before plotting, but this doesn't need to be done by editing any files, or even by creating new dataframes. Using dplyr you can just chain this into the processing of your data. I've done a hopefully reproducible example below with some made-up data (as I don't have yours). I created a function to filter by the same procedures as the boxplot is using. It's a bit hacky, but hopefully works as one potential solution:
require(ggplot2)
require(dplyr)
data_frame <- data.frame(value = c(rnorm(2000, mean = 100, sd = 20), rnorm(2000, mean = 1000, sd = 500)),
level = c(rep(1,2000), rep(2, 2000)),
policy = factor(c(rep(c(rep(1, 500), rep(2, 500), rep(3, 500), rep(4, 500)), 2))))
# filtering function - turns outliers into NAs to be removed
filter_lims <- function(x){
l <- boxplot.stats(x)$stats[1]
u <- boxplot.stats(x)$stats[5]
for (i in 1:length(x)){
x[i] <- ifelse(x[i]>l & x[i]<u, x[i], NA)
}
return(x)
}
data_frame %>%
group_by(level, policy) %>% # do the same calcs for each box
mutate(value2 = filter_lims(value)) %>% # new variable (value2) so as not to displace first one)
ggplot(aes(x='', y=value2, fill = policy)) +
geom_boxplot(na.rm = TRUE, coef = 5) + # remove NAs, and set the whisker length to all included points
facet_wrap( ~ level, scales="free") +
xlab("") + ylab("Authorisation Time (ms)") + ggtitle("Title")
Resulting in the following (simplified) plot:

How to expand ggplot y axis limits to include maximum value

Often in plots the Y axis value label is chopped off below the max value being plotted.
For example:
library(tidyverse)
mtcars %>% ggplot(aes(x=mpg, y = hp))+geom_point()
I know of scale_y_continous - but I can't figure out a smart way to do this. Maybe I'm just overthinking things. I don't wish to mess up the 'smart' breaks that are generated automatically.
I might try to go about this manually...
mtcars %>% ggplot(aes(x=mpg, y=hp, color=as.factor(carb)))+geom_point() + scale_y_continuous(limits = c(0,375))
But this doesn't work like I mentioned above because of the 'smart breaks'. Is there anyway for me to extend the default break interval to 1 more, so that in this case it would be 400? Of course I would want this to be flexible for whatever dataset I am working with.
You can use expand_limits() to increase the maximum y-axis value. You can also ensure that the maximum y-axis value is rounded up to the next highest value on the scale of the data, e.g., next highest tens value, next highest hundreds value, etc., depending on the whether the highest value in the data is within the tens, hundreds, etc.
For example, the function below finds the base 10 log of the maximum y value and rounds it down. This gives us the base ten scale of the maximum y value (e.g., tens, hundreds, thousands, etc.). It then rounds the maximum y-axis value up to the nearest ten, hundred, etc., that is higher than the maximum y value.
expandy = function(vec, ymin=NULL) {
max.val = max(vec, na.rm=TRUE)
min.log = floor(log10(max.val))
expand_limits(y=c(ymin, ceiling(max.val/10^min.log)*10^min.log))
}
p = mtcars %>% ggplot(aes(x=mpg, y = hp)) +
geom_point()
p + expandy(mtcars$hp)
p + expandy(mtcars$hp, 0)
Or, to make things a bit easier, you could set up the function so that the y-range data is collected directly from the plot:
library(gridExtra)
expandy = function(plot, ymin=0) {
max.y = max(layer_data(plot)$y, na.rm=TRUE)
min.log = floor(log10(max.y))
expand_limits(y=c(ymin, ceiling(max.y/10^min.log)*10^min.log))
}
p = mtcars %>% ggplot(aes(x=mpg, y = hp)) +
geom_point()
grid.arrange(p, p + expandy(p), ncol=2)
p = iris %>% ggplot(aes(x=Sepal.Width, y=Petal.Width)) +
geom_point()
grid.arrange(p, p + expandy(p), ncol=2)
Choosing a step for breaking the y axis you can use the ceiling() function
library(gridExtra)
p1 <- mtcars %>% ggplot(aes(x=mpg, y = hp)) + geom_point()
p2 <- p1 +
scale_y_continuous(
limits = c(0, ceiling(max(mtcars$hp)/50)*50),
breaks = seq(0, ceiling(max(mtcars$hp)/50)*50, 50)
)
p3 <- p1 + scale_y_continuous(
limits = c(0, ceiling(max(mtcars$hp)/100)*100),
breaks = seq(0, ceiling(max(mtcars$hp)/100)*100, 100)
)
grid.arrange(p1, p2, p3, ncol=3)
For the p2 the ste is 50 while for p3 the step is 100
Here a solution that allow any kind of numeric scales:
expandy <- function(y, base, v_min = NULL) {
max.val <- max(y, na.rm = TRUE)
expand_limits(
y = c(
v_min,
base * (max.val %/% base + as.logical(max.val %% base))
)
)
}
here is a rather simple answer, just set one limit to NA:
mtcars %>%
ggplot(aes(x=mpg, y=hp, color=as.factor(carb))) +
geom_point() +
scale_y_continuous(limits = c(0, NA))

ggplot2: minor breaks in scale_x_datetime

This code:
library(lubridate)
library(ggplot2)
library(scales)
.months <- 3
.minor.intervals <- 4
.minor.intervals.num <- .months * .minor.intervals
sdate <- as.POSIXct("2015-01-01")
edate <- sdate + months(.months)
df <- data.frame(x = seq(from = sdate, to = edate,
length.out = .minor.intervals.num * 2),
y = 1:(.minor.intervals.num * 2))
p <- ggplot(df, aes(x = x, y = y))
xbm <- seq(from = sdate, to = edate, length.out = .minor.intervals.num)
p <- p + scale_x_datetime(limits = c(sdate, edate),
breaks = date_breaks("month"),
minor_breaks = xbm)
p <- p + geom_line() + geom_point()
plot(p)
gives me error: Error in Ops.POSIXt((x - from[1]), diff(from)) : '/' not defined for "POSIXt" objects
If I comment minor_breaks part — everything works.
If I change minor_breaks part to minor_breaks = date_breaks("week") — everything works too.
But I want to split month exactly for 4 parts...
How to fix it?
I have found a way to solve the problem, but I must admit that I am not sure why it has to be done this way. It seems that minor_breaks expects numeric values and not dates as input.
I created the breaks with the following code:
maj.breaks <- sdate + months(0:.months)
min.breaks <- do.call(c,
lapply(1:.months,function(m) {
seq(maj.breaks[m],maj.breaks[m+1],length.out = .minor.intervals+1)
})
)
which relies on the variables as defined in your example. Note the difference to your way of defining the minor breaks: since each month has different length, it is not enough to simply split the range between the start and end dates into the appropriate number of segments. You have to split each month by itself.
As mentioned above, you then have to convert min.breaks to numeric before you pass it to minor_breaks. I produce the plot as follows:
p <- ggplot(df, aes(x = x, y = y)) +
scale_x_datetime(limits = c(sdate, edate),
breaks = maj.breaks,
minor_breaks = as.numeric(min.breaks)) +
geom_line() + geom_point()
plot(p)
This is identical to your code up to the inputs for breaks and minor_breaks. There is no need to use the vector maj.breaks since your version works just as well. But I think it is interesting to note that breaks works with input of class POSIXct, while minor_breaks expects numeric values. Unfortunately, I don't know the reason for this.

Resources