Combining scale_y_sqrt() and limits drops first y-axis break - r

I want to combine
an y-axis sqrt scale and
set y-axis limits.
The problem is, that scale_y_sqrt( limits = c(0,10)) results in the y-axis losing the first break (0).
How can I rewrite this to get the desired result?
R code of minimum example:
library(ggplot2)
library(grid)
library(gridExtra)
N <- 10
test_data <- data.frame(
idx <- 1:N,
vals <- runif( N, min = 0, max = 10)
)
grid.arrange(
ggplot( test_data, aes(x = idx)) +
geom_line( aes(y = vals)) +
scale_y_continuous( limits = c(0,10)),
ggplot( test_data, aes(x = idx)) +
geom_line( aes(y = vals)) +
scale_y_sqrt( limits = c(0,10)),
ncol = 2
)
plot output:
left plot has correct axis breaks, but without sqrt scale
right plot has correct scaling, but misses the '0'-break

This appears to be a known issue. See the GitHub discussion which also provide some workarounds.

Related

Facet_wrap and scale="free" unexpectedly centers y-axis at zero in ggplot2

From this dataframe
df <- data.frame(cat=c(rep("X", 20),rep("Y", 20), rep("Z",20)),
value=c(runif(20),runif(20)*100, rep(0, 20)),
var=rep(LETTERS[1:5],12))
i want to create facetted boxplots.
library(ggplot2)
p1 <- ggplot(df, aes(var,value)) + geom_boxplot() + facet_wrap(~cat, scale="free")
p1
The results is aesthetically dissactisfactory as it centers the y-axis of the empty panel at zero. I want to start all y-scales at zero. I tried several answers from this earlier question:
p1 + scale_y_continuous(expand = c(0, 0)) # not working
p1 + expand_limits(y = 0) #not working
p1 + scale_y_continuous(limits=c(0,NA)) ## not working
p1 + scale_y_continuous(limits=c(0,100)) ## partially working, but defeats scale="free"
p1 + scale_y_continuous(limits=c(0,max(df$value))) ## partially working, see above
p1 + scale_y_continuous(limits=c(0,max(df$value))) + expand_limits(y = 0)## partially working, see above
One solution would possibly be to replace the zero's with very tiny values, but maybe you can find a more straightforward solution. Thank you.
A simpler solution would be to pass a function as the limits argument:
p1 <- ggplot(df, aes(var,value)) + geom_boxplot() + facet_wrap(~cat, scale="free") +
scale_y_continuous(limits = function(x){c(0, max(0.1, x))})
The function takes per facet the automatically calculated limits as x argument, where you can apply any transformation on them, such as for example choosing the maximum between 0.1 and the true maximum.
The result is still subject to scale expansion though.
This might be a bit of a work around, but you could use geom_blank() to help set your axis dimension. For example:
df <- data.frame(cat=c(rep("X", 20),rep("Y", 20), rep("Z",20)),
value=c(runif(20),runif(20)*100, rep(0, 20)),
var=rep(LETTERS[1:5],12))
# Use this data frame to set min and max for each category
# NOTE: If the value in this DF is smaller than the max in df it will be overridden
# by the max(df$value)
axisData <- data.frame(cat = c("X", "X", "Y", "Y", "Z", "Z"),
x = 'A', y = c(0, 1, 0, 100, 0, 1))
p1 <- ggplot(df, aes(var,value)) +
geom_boxplot() +
geom_blank(data = axisData, aes(x = x, y = y)) +
facet_wrap(~cat, scale="free")
p1

How to expand ggplot y axis limits to include maximum value

Often in plots the Y axis value label is chopped off below the max value being plotted.
For example:
library(tidyverse)
mtcars %>% ggplot(aes(x=mpg, y = hp))+geom_point()
I know of scale_y_continous - but I can't figure out a smart way to do this. Maybe I'm just overthinking things. I don't wish to mess up the 'smart' breaks that are generated automatically.
I might try to go about this manually...
mtcars %>% ggplot(aes(x=mpg, y=hp, color=as.factor(carb)))+geom_point() + scale_y_continuous(limits = c(0,375))
But this doesn't work like I mentioned above because of the 'smart breaks'. Is there anyway for me to extend the default break interval to 1 more, so that in this case it would be 400? Of course I would want this to be flexible for whatever dataset I am working with.
You can use expand_limits() to increase the maximum y-axis value. You can also ensure that the maximum y-axis value is rounded up to the next highest value on the scale of the data, e.g., next highest tens value, next highest hundreds value, etc., depending on the whether the highest value in the data is within the tens, hundreds, etc.
For example, the function below finds the base 10 log of the maximum y value and rounds it down. This gives us the base ten scale of the maximum y value (e.g., tens, hundreds, thousands, etc.). It then rounds the maximum y-axis value up to the nearest ten, hundred, etc., that is higher than the maximum y value.
expandy = function(vec, ymin=NULL) {
max.val = max(vec, na.rm=TRUE)
min.log = floor(log10(max.val))
expand_limits(y=c(ymin, ceiling(max.val/10^min.log)*10^min.log))
}
p = mtcars %>% ggplot(aes(x=mpg, y = hp)) +
geom_point()
p + expandy(mtcars$hp)
p + expandy(mtcars$hp, 0)
Or, to make things a bit easier, you could set up the function so that the y-range data is collected directly from the plot:
library(gridExtra)
expandy = function(plot, ymin=0) {
max.y = max(layer_data(plot)$y, na.rm=TRUE)
min.log = floor(log10(max.y))
expand_limits(y=c(ymin, ceiling(max.y/10^min.log)*10^min.log))
}
p = mtcars %>% ggplot(aes(x=mpg, y = hp)) +
geom_point()
grid.arrange(p, p + expandy(p), ncol=2)
p = iris %>% ggplot(aes(x=Sepal.Width, y=Petal.Width)) +
geom_point()
grid.arrange(p, p + expandy(p), ncol=2)
Choosing a step for breaking the y axis you can use the ceiling() function
library(gridExtra)
p1 <- mtcars %>% ggplot(aes(x=mpg, y = hp)) + geom_point()
p2 <- p1 +
scale_y_continuous(
limits = c(0, ceiling(max(mtcars$hp)/50)*50),
breaks = seq(0, ceiling(max(mtcars$hp)/50)*50, 50)
)
p3 <- p1 + scale_y_continuous(
limits = c(0, ceiling(max(mtcars$hp)/100)*100),
breaks = seq(0, ceiling(max(mtcars$hp)/100)*100, 100)
)
grid.arrange(p1, p2, p3, ncol=3)
For the p2 the ste is 50 while for p3 the step is 100
Here a solution that allow any kind of numeric scales:
expandy <- function(y, base, v_min = NULL) {
max.val <- max(y, na.rm = TRUE)
expand_limits(
y = c(
v_min,
base * (max.val %/% base + as.logical(max.val %% base))
)
)
}
here is a rather simple answer, just set one limit to NA:
mtcars %>%
ggplot(aes(x=mpg, y=hp, color=as.factor(carb))) +
geom_point() +
scale_y_continuous(limits = c(0, NA))

Different size facets at x-axis

Length of x-axis is important for my plot because it allows one to compare between facets, therefore I want facets to have different x-axis sizes. Here is my example data:
group1 <- seq(1, 10, 2)
group2 <- seq(1, 20, 3)
x = c(group1, group2)
mydf <- data.frame (X =x , Y = rnorm (length (x),5,1),
groups = c(rep(1, length (group1)), rep(2, length(group2))))
And my code:
p1 = ggplot(data=mydf,aes(x=X,y=Y,color=factor(groups)) )+
geom_point(size=2)+
scale_x_continuous(labels=comma)+
theme_bw()
p1+facet_grid(groups ~ .,scales = "fixed",space="free_x")
And the resulting figure:
Panel-1 has x-axis values less then 10 whereas panel-2 has x-axis value extending to 20. Still both panels and have same size on x-axis. Is there any way to make x-axis panel size different for different panels, so that they correspond to their (x-axis) values?
I found an example from some different package that shows what I am trying to do, here is the figure:
Maybe something like this can get you started. There's still some formatting to do, though.
library(grid)
library(gridExtra)
library(dplyr)
library(ggplot2)
p1 <- ggplot(data=mydf[mydf$groups==1,],aes(x=X,y=Y))+
geom_point(size=2)+
theme_bw()
p2 <- ggplot(data=mydf[mydf$groups==2,],aes(x=X,y=Y))+
geom_point(size=2)+
theme_bw()
summ <- mydf %>% group_by(groups) %>% summarize(len=diff(range(X)))
summ$p <- summ$len/max(summ$len)
summ$q <- 1-summ$p
ng <- nullGrob()
grid.arrange(arrangeGrob(p1,ng,widths=summ[1,3:4]),
arrangeGrob(p2,ng,widths=summ[2,3:4]))
I'm sure there's a way to make this more general, and the axes don't line up perfectly yet, but it's a beginning.
Here is a solution following OP's clarifying comment ("I guess axis will be same but the boxes will be of variable size. Is it possible by plotting them separately and aligning in grid?").
library(plyr); library(ggplot2)
buffer <- 0.5 # Extra space around the box
#Calculate box parameters
mydf.box <- ddply(mydf, .(groups), summarise,
max.X = max(X) + buffer,
min.X = 0,
max.Y = max(Y) + buffer,
min.Y = 0,
X = mean(X), Y = mean(Y)) #Dummy values for X and Y needed for geom_rect
p2 <- ggplot(data=mydf,aes(x=X, y=Y) )+
geom_rect(data = mydf.box, aes( xmax = max.X, xmin = min.X,
ymax = max.Y, ymin = min.Y),
fill = "white", colour = "black", fill = NA) +
geom_point(size=2) + facet_grid(groups ~ .,scales = "free_y") +
theme_classic() +
#Extra formatting to make your plot like the example
theme(panel.background = element_rect(fill = "grey85"),
strip.text.y = element_text(angle = 0),
strip.background = element_rect(colour = NA, fill = "grey65"))

Conditional colouring of a geom_smooth

I'm analyzing a series that varies around zero. And to see where there are parts of the series with a tendency to be mostly positive or mostly negative I'm plotting a geom_smooth. I was wondering if it is possible to have the color of the smooth line be dependent on whether or not it is above or below 0. Below is some code that produces a graph much like what I am trying to create.
set.seed(5)
r <- runif(22, max = 5, min = -5)
t <- rep(-5:5,2)
df <- data.frame(r+t,1:22)
colnames(df) <- c("x1","x2")
ggplot(df, aes(x = x2, y = x1)) + geom_hline() + geom_line() + geom_smooth()
I considered calculating the smoothed values, adding them to the df and then using a scale_color_gradient, but I was wondering if there is a way to achieve this in ggplot directly.
You may use the n argument in geom_smooth to increase "number of points to evaluate smoother at" in order to create some more y values close to zero. Then use ggplot_build to grab the smoothed values from the ggplot object. These values are used in a geom_line, which is added on top of the original plot. Last we overplot the y = 0 values with the geom_hline.
# basic plot with a larger number of smoothed values
p <- ggplot(df, aes(x = x2, y = x1)) +
geom_line() +
geom_smooth(linetype = "blank", n = 10000)
# grab smoothed values
df2 <- ggplot_build(p)[[1]][[2]][ , c("x", "y")]
# add smoothed values with conditional color
p +
geom_line(data = df2, aes(x = x, y = y, color = y > 0)) +
geom_hline(yintercept = 0)
Something like this:
# loess data
res <- loess.smooth(df$x2, df$x1)
res <- data.frame(do.call(cbind, res))
res$posY <- ifelse(res$y >= 0, res$y, NA)
res$negY <- ifelse(res$y < 0, res$y, NA)
# plot
ggplot(df, aes(x = x2, y = x1)) +
geom_hline() +
geom_line() +
geom_line(data=res, aes(x = x, y = posY, col = "green")) +
geom_line(data=res, aes(x = x, y = negY, col = "red")) +
scale_color_identity()

Using ggplot2, how can I create a histogram or bar plot where the last bar is the count of all values greater than some number?

I would like to plot a histogram of my data to show its distribution, but I have a few outliers that are really high compared to most of the values, which are < 1.00. Rather than having one or two bars scrunched up at the far left and then nothing until the very far right side of the graph, I'd like to have a histogram with everything except the outliers and then add a bar at the end where the label underneath it is ">100%". I can do that with ggplot2 using geom_bar() like this:
X <- c(rnorm(1000, mean = 0.5, sd = 0.2),
rnorm(10, mean = 10, sd = 0.5))
Data <- data.frame(table(cut(X, breaks=c(seq(0,1, by=0.05), max(X)))))
library(ggplot2)
ggplot(Data, aes(x = Var1, y = Freq)) + geom_bar(stat = "identity") +
scale_x_discrete(labels = paste0(c(seq(5,100, by = 5), ">100"), "%"))
The problem is that, for the size I need this to be, the labels end up overlapping or needing to be plotted at an angle for readability. I don't really need all of the bars labeled. Is there some way to either
A) plot this in a different manner other than geom_bar() so that I don't need to manually add that last bar or
B) only label some of the bars?
I will try to answer B.
I don't know if there is a parameter that would let you do B) but you can manually define a function to do that for you. I.e.:
library(ggplot2)
X <- c(rnorm(1000, mean = 0.5, sd = 0.2),
rnorm(10, mean = 10, sd = 0.5))
Data <- data.frame(table(cut(X, breaks=c(seq(0,1, by=0.05), max(X)))))
#the function will remove one label every n labels
remove_elem <- function(x,n) {
for (i in (1:length(x))) {
if (i %% n == 0) {x[i]<-''}
}
return(x)
}
#make inital labels outside ggplot (same way as before).
labels <-paste0(c(seq(5,100, by = 5),'>100'),'%')
Now using that function inside the ggplot function:
ggplot(Data, aes(x = Var1, y = Freq)) + geom_bar(stat = "identity") +
scale_x_discrete(labels = remove_elem(labels,2))
outputs:
I don't know if this is what you are looking for but it does the trick!

Resources