How to expand ggplot y axis limits to include maximum value - r

Often in plots the Y axis value label is chopped off below the max value being plotted.
For example:
library(tidyverse)
mtcars %>% ggplot(aes(x=mpg, y = hp))+geom_point()
I know of scale_y_continous - but I can't figure out a smart way to do this. Maybe I'm just overthinking things. I don't wish to mess up the 'smart' breaks that are generated automatically.
I might try to go about this manually...
mtcars %>% ggplot(aes(x=mpg, y=hp, color=as.factor(carb)))+geom_point() + scale_y_continuous(limits = c(0,375))
But this doesn't work like I mentioned above because of the 'smart breaks'. Is there anyway for me to extend the default break interval to 1 more, so that in this case it would be 400? Of course I would want this to be flexible for whatever dataset I am working with.

You can use expand_limits() to increase the maximum y-axis value. You can also ensure that the maximum y-axis value is rounded up to the next highest value on the scale of the data, e.g., next highest tens value, next highest hundreds value, etc., depending on the whether the highest value in the data is within the tens, hundreds, etc.
For example, the function below finds the base 10 log of the maximum y value and rounds it down. This gives us the base ten scale of the maximum y value (e.g., tens, hundreds, thousands, etc.). It then rounds the maximum y-axis value up to the nearest ten, hundred, etc., that is higher than the maximum y value.
expandy = function(vec, ymin=NULL) {
max.val = max(vec, na.rm=TRUE)
min.log = floor(log10(max.val))
expand_limits(y=c(ymin, ceiling(max.val/10^min.log)*10^min.log))
}
p = mtcars %>% ggplot(aes(x=mpg, y = hp)) +
geom_point()
p + expandy(mtcars$hp)
p + expandy(mtcars$hp, 0)
Or, to make things a bit easier, you could set up the function so that the y-range data is collected directly from the plot:
library(gridExtra)
expandy = function(plot, ymin=0) {
max.y = max(layer_data(plot)$y, na.rm=TRUE)
min.log = floor(log10(max.y))
expand_limits(y=c(ymin, ceiling(max.y/10^min.log)*10^min.log))
}
p = mtcars %>% ggplot(aes(x=mpg, y = hp)) +
geom_point()
grid.arrange(p, p + expandy(p), ncol=2)
p = iris %>% ggplot(aes(x=Sepal.Width, y=Petal.Width)) +
geom_point()
grid.arrange(p, p + expandy(p), ncol=2)

Choosing a step for breaking the y axis you can use the ceiling() function
library(gridExtra)
p1 <- mtcars %>% ggplot(aes(x=mpg, y = hp)) + geom_point()
p2 <- p1 +
scale_y_continuous(
limits = c(0, ceiling(max(mtcars$hp)/50)*50),
breaks = seq(0, ceiling(max(mtcars$hp)/50)*50, 50)
)
p3 <- p1 + scale_y_continuous(
limits = c(0, ceiling(max(mtcars$hp)/100)*100),
breaks = seq(0, ceiling(max(mtcars$hp)/100)*100, 100)
)
grid.arrange(p1, p2, p3, ncol=3)
For the p2 the ste is 50 while for p3 the step is 100

Here a solution that allow any kind of numeric scales:
expandy <- function(y, base, v_min = NULL) {
max.val <- max(y, na.rm = TRUE)
expand_limits(
y = c(
v_min,
base * (max.val %/% base + as.logical(max.val %% base))
)
)
}

here is a rather simple answer, just set one limit to NA:
mtcars %>%
ggplot(aes(x=mpg, y=hp, color=as.factor(carb))) +
geom_point() +
scale_y_continuous(limits = c(0, NA))

Related

How to change the limits from scale_y_continuous depending on the plot in R?

I want to draw boxplots with the number of observations on top. The problem is that depending on the information and the outliers, the y-axis changes. For that reason, I want to change the limits of scale_y_continuous automatically. Is it possible to do this?
This is a reproducible example:
library(dplyr)
library(ggplot2)
myFreqs <- mtcars %>%
group_by(cyl, am) %>%
summarise(Freq = n())
myFreqs
p <- ggplot(mtcars, aes(factor(cyl), drat, fill=factor(am))) +
stat_boxplot(geom = "errorbar") +
geom_boxplot() +
stat_summary(geom = 'text', label = paste("n = ", myFreqs$Freq), fun = max, position = position_dodge(width = 0.77), vjust=-1)
p
The idea is to increase at least +1 to the maximum value of the plot with the highest y-axis value (in the case explained above, it would be the second boxplot with n=8)
I have tried to change the y-axis with scale_y_continuous like this:
p <- p + scale_y_continuous(limits = c(0, 5.3))
p
However, I don't want to put the limits myself, I want to find a way to modify the limits according to the plots that I have. (Because... what if the information changes?).
Is there a way to do something like this? With min and max --> scale_y_continuous(limits = c(min(x), max(x)))
Thanks very much in advance
Thanks to #teunbrand and #caldwellst I got the solution that I needed it.
There are 3 solutions that work perfectly:
1-
p + scale_y_continuous(limits = function(x){
c(min(x), (max(x)+0.1))
})
p
2-
library(tidyverse)
p + scale_y_continuous(limits = ~ c(min(.x), max(.x) + 0.1))
3-
p + scale_y_continuous(limits = function(x){
c(min(x), ceiling(max(x) * 1.1))
})

Additional x axis on ggplot

I'm aware there are similar posts but I could not get those answers to work in my case.
e.g. Here and here.
Example:
diamonds %>%
ggplot(aes(scale(price) %>% as.vector)) +
geom_density() +
xlim(-3, 3) +
facet_wrap(vars(cut))
Returns a plot:
Since I used scale, those numbers are the zscores or standard deviations away from the mean of each break.
I would like to add as a row underneath the equivalent non scaled raw number that corresponds to each.
Tried:
diamonds %>%
ggplot(aes(scale(price) %>% as.vector)) +
geom_density() +
xlim(-3, 3) +
facet_wrap(vars(cut)) +
geom_text(aes(label = price))
Gives:
Error: geom_text requires the following missing aesthetics: y
My primary question is how can I add the raw values underneath -3:3 of each break? I don't want to change those breaks, I still want 6 breaks between -3:3.
Secondary question, how can I get -3 and 3 to actually show up in the chart? They have been trimmed.
[edit]
I've been trying to make it work with geom_text but keep hitting errors:
diamonds %>%
ggplot(aes(x = scale(price) %>% as.vector)) +
geom_density() +
xlim(-3, 3) +
facet_wrap(vars(cut)) +
geom_text(label = price)
Error in layer(data = data, mapping = mapping, stat = stat, geom = GeomText, :
object 'price' not found
I then tried changing my call to geom_text()
geom_text(data = diamonds, aes(price), label = price)
This results in the same error message.
You can make a custom labeling function for your axis. This takes each label on the axis and performs a custom transform for you. In your case you could paste the z score, a line break, and the z-score times the standard deviation plus the mean. Because of the distribution of prices in the diamonds data set, this means that z scores below about -1 represent negative prices. This may not be a problem in your own data. For clarity I have drawn in a vertical line representing $0
labeller <- function(x) {
paste0(x,"\n", scales::dollar(sd(diamonds$price) * x + mean(diamonds$price)))
}
diamonds %>%
ggplot(aes(scale(price) %>% as.vector)) +
geom_density() +
geom_vline(aes(xintercept = -0.98580251364833), linetype = 2) +
facet_wrap(vars(cut)) +
scale_x_continuous(label = labeller, limits = c(-3, 3)) +
xlab("price")
We can use the sec_axis functionality in scale_x_continuous. To use this functionality we need to manually scale your data. This will add a secondary axis at the top of the plot, not underneath. So it's not quite exactly what you're looking for.
library(tidyverse)
# manually scale the data
mean_price <- mean(diamonds$price)
sd_price <- sd(diamonds$price)
diamonds$price_scaled <- (diamonds$price - mean_price) / sd_price
# make the plot
ggplot(diamonds, aes(price_scaled))+
geom_density()+
facet_wrap(~cut)+
scale_x_continuous(sec.axis = sec_axis(~ mean_price + (sd_price * .)),
limits = c(-3, 4), breaks = -3:3)
You could cheat a bit by passing some dummy data to geom_text:
geom_text(data = tibble(label = round(((-3:3) * sd_price) + mean_price),
y = -0.25,
x = -3:3),
aes(x, y, label = label))

Facet_wrap and scale="free" unexpectedly centers y-axis at zero in ggplot2

From this dataframe
df <- data.frame(cat=c(rep("X", 20),rep("Y", 20), rep("Z",20)),
value=c(runif(20),runif(20)*100, rep(0, 20)),
var=rep(LETTERS[1:5],12))
i want to create facetted boxplots.
library(ggplot2)
p1 <- ggplot(df, aes(var,value)) + geom_boxplot() + facet_wrap(~cat, scale="free")
p1
The results is aesthetically dissactisfactory as it centers the y-axis of the empty panel at zero. I want to start all y-scales at zero. I tried several answers from this earlier question:
p1 + scale_y_continuous(expand = c(0, 0)) # not working
p1 + expand_limits(y = 0) #not working
p1 + scale_y_continuous(limits=c(0,NA)) ## not working
p1 + scale_y_continuous(limits=c(0,100)) ## partially working, but defeats scale="free"
p1 + scale_y_continuous(limits=c(0,max(df$value))) ## partially working, see above
p1 + scale_y_continuous(limits=c(0,max(df$value))) + expand_limits(y = 0)## partially working, see above
One solution would possibly be to replace the zero's with very tiny values, but maybe you can find a more straightforward solution. Thank you.
A simpler solution would be to pass a function as the limits argument:
p1 <- ggplot(df, aes(var,value)) + geom_boxplot() + facet_wrap(~cat, scale="free") +
scale_y_continuous(limits = function(x){c(0, max(0.1, x))})
The function takes per facet the automatically calculated limits as x argument, where you can apply any transformation on them, such as for example choosing the maximum between 0.1 and the true maximum.
The result is still subject to scale expansion though.
This might be a bit of a work around, but you could use geom_blank() to help set your axis dimension. For example:
df <- data.frame(cat=c(rep("X", 20),rep("Y", 20), rep("Z",20)),
value=c(runif(20),runif(20)*100, rep(0, 20)),
var=rep(LETTERS[1:5],12))
# Use this data frame to set min and max for each category
# NOTE: If the value in this DF is smaller than the max in df it will be overridden
# by the max(df$value)
axisData <- data.frame(cat = c("X", "X", "Y", "Y", "Z", "Z"),
x = 'A', y = c(0, 1, 0, 100, 0, 1))
p1 <- ggplot(df, aes(var,value)) +
geom_boxplot() +
geom_blank(data = axisData, aes(x = x, y = y)) +
facet_wrap(~cat, scale="free")
p1

showing count on x-axis for dot plot

I'd like to have a dot plot that shows the count on the x-axis. How can you get the dotplot below to show the count on the x-asix?
Thank you.
date = seq(as.Date("2016/1/5"), as.Date("2016/1/11"), "day")
value = c(11,11,12,12,13,14,14)
dat =data.frame(date = date, value = value)
dat
library(ggplot2)
library(ggplot2)
ggplot(dat, aes(x = value)) + geom_dotplot(binwidth = .8) +
scale_y_discrete(breaks= seq(1,max(table(dat$value))+2,1),
labels = seq(1,max(table(dat$value))+2,1) ) #tried using scale_y discrete but it does nothing
ylim(0, A) gives what you want, where A is the number of stacked dots necessary to count 1.00 density. We can calculate the exact value of A (but a little complexly ; Dialogical approach gives you approximate value).
(I reffered to post1, post2, and post3)
library(ggplot2); library(grid)
date = seq(as.Date("2016/1/5"), as.Date("2016/1/12"), "day")
value = c(11,11,12,12,13,14,14,14)
dat =data.frame(date = date, value = value)
### base plot
g <- ggplot(dat, aes(x = value)) + geom_dotplot(binwidth = 0.8) + coord_flip()
g # output to read parameter
### calculation of width and height of panel
grid.ls(view=TRUE,grob=FALSE)
seekViewport('panel.3-4-3-4')
real_width <- convertWidth(unit(1,'npc'), 'inch', TRUE)
real_height <- convertHeight(unit(1,'npc'), 'inch', TRUE)
### calculation of other values
height_coordinate_range <- diff(ggplot_build(g)$panel$ranges[[1]]$y.range)
real_binwidth <- real_height / height_coordinate_range * 0.8 # 0.8 is the argument binwidth
num_balls <- real_width / 1.1 / real_binwidth # the number of stacked balls. 1.1 is expanding value.
g + ylim(0, num_balls)
# The dirty balls border probably comes from my environment.
You can add coord_flip() to switch the x and y axes in ggplot. Here's an example with your script:
date = seq(as.Date("2016/1/5"), as.Date("2016/1/11"), "day")
value = c(11,11,12,12,13,14,14)
dat =data.frame(date = date, value = value)
dat
Edit, count on x-axis:
This will give a dotplot with simplified commands, and the counts as labels on the x-axis. Note: The binwidth has been changed from 0.8 to 1 to accommodate the use of ylim rather than scales.
library(ggplot2)
ggplot(dat, aes(x = value)) +
geom_dotplot(binwidth = 1) +
coord_flip() +
ylim(0,max(table(dat$value))+2)
Edit, count on y-axis:
library(ggplot2)
ggplot(dat, aes(x = value)) +
geom_dotplot(binwidth = 1) +
ylim(0,max(table(dat$value))+2)

Combining scale_y_sqrt() and limits drops first y-axis break

I want to combine
an y-axis sqrt scale and
set y-axis limits.
The problem is, that scale_y_sqrt( limits = c(0,10)) results in the y-axis losing the first break (0).
How can I rewrite this to get the desired result?
R code of minimum example:
library(ggplot2)
library(grid)
library(gridExtra)
N <- 10
test_data <- data.frame(
idx <- 1:N,
vals <- runif( N, min = 0, max = 10)
)
grid.arrange(
ggplot( test_data, aes(x = idx)) +
geom_line( aes(y = vals)) +
scale_y_continuous( limits = c(0,10)),
ggplot( test_data, aes(x = idx)) +
geom_line( aes(y = vals)) +
scale_y_sqrt( limits = c(0,10)),
ncol = 2
)
plot output:
left plot has correct axis breaks, but without sqrt scale
right plot has correct scaling, but misses the '0'-break
This appears to be a known issue. See the GitHub discussion which also provide some workarounds.

Resources