ggplot2 missing labels after custom scaling of axis - r

I am attempting to apply a custom scaling of my x-axis using ggplot2 and scales::trans_new(). However, when I do some of the axis labels go missing. Can someone help me figure out why?
Setup:
library(tidyverse)
# the data
ds <- tibble(
myx = c(1, .5, .1, .01, .001, 0),
myy = 1:6
)
# the custom transformation
forth_root_trans_rev <- scales::trans_new(
name = "sign_fourth_root_rev",
transform = function (x) { - abs(x)^(1/4) },
inverse = function (x) { x^4 }
)
Plot 1:
When I try and plot this the label for x = 0 gets lost.
# plot - missing x-label at `0`
ggplot(ds, aes(x = myx, y = myy)) +
geom_line() +
geom_point() +
scale_x_continuous(
trans = forth_root_trans_rev,
breaks = sort(unique(ds$myx)),
)
Plot 2
When I add some space on both sides of the graph, even more x-labels get lost.
# plot - missing x-labels below 0.5
ggplot(ds, aes(x = myx, y = myy)) +
geom_line() +
geom_point() +
scale_x_continuous(
trans = forth_root_trans_rev,
breaks = sort(unique(ds$myx)),
expand = expand_scale(mult = c(.1, .6))
)
I presume this is related to this old issue: https://github.com/tidyverse/ggplot2/issues/980. Nevertheless, I can't figure out how to apply this transformation and retain all x-labels.
Where am I going wrong?

The problem here is due to the combination of two factors:
Your x-axis values (after transformation) fall in the [-1, 0] range, so any expansion (whether additive or multiplicative) will nudge the final range to cover both positive and negative values.
Your custom transformation is not one-to-one in the [<some negative number>, <some positive number>] region.
How it occurred
Somewhere deep inside the all code used to build the ggplot object (you can run ggplot2:::ggplot_build.ggplot before printing the plot & step into layout$setup_panel_params(), but I don't recommend this for casual users... the rabbit hole goes really deep down there), x-axis breaks are calculated in the following manner:
Obtain limits for the transformed values (for c(1, .5, .1, .01, .001, 0) in the question, this will be (-1, 0)).
Add expansion to the limits, if applicable (default expansion for a continuous axis is 5% on either side, so the limits become (-1.05, 0.05)).
Apply the inverse transformation on the limits (taking x^4 on the limits yields (1.215506, 0.000006)).
Apply the transformation on both user-inputted breaks & limits (for breaks, c(1, .5, .1, .01, .001, 0) becomes (-1.0000000, ..., 0.0000000), but for limits, (1.215506, 0.000006) now becomes (-1.05, -0.05), which is narrower than (-1.05, 0.05)).
Breaks beyond the limit's range are dropped (since the limits now stop at -0.05, the break at 0 is dropped).
How to get around this
You can modify your transformation with the use of sign() to preserve positive / negative values, such that the transformation is one-to-one in the full range, as suggested by Hadley in the discussion on the GH issue you linked. For example:
# original
forth_root_trans_rev <- scales::trans_new(
name = "sign_fourth_root_rev",
transform = function (x) { - abs(x)^(1/4) },
inverse = function (x) { x^4 }
)
# new
forth_root_trans_rev2 <- scales::trans_new(
name = "sign_fourth_root_rev",
transform = function (x) { -sign(x) * abs(x)^(1/4) },
inverse = function (x) { -sign(x) * abs(x)^4 }
)
library(dplyr)
library(tidyr)
# comparison of two transformations
# y1 shows a one-to-one mapping in either (-Inf, 0] or [0, Inf) but not both;
# y2 shows a one-to-one mapping in (-Inf, Inf)
data.frame(x = seq(-1, 1, 0.01)) %>%
mutate(y1 = x %>% forth_root_trans_rev$transform() %>% forth_root_trans_rev$inverse(),
y2 = x %>% forth_root_trans_rev2$transform() %>% forth_root_trans_rev2$inverse()) %>%
gather(trans, y, -x) %>%
ggplot(aes(x, y, colour = trans)) +
geom_line() +
geom_vline(xintercept = 0, linetype = "dashed") +
facet_wrap(~trans)
Usage
p <- ggplot(ds, aes(x = myx, y = myy)) +
geom_line() +
geom_point() +
theme(panel.grid.minor = element_blank())
p +
scale_x_continuous(
trans = forth_root_trans_rev2,
breaks = sort(unique(ds$myx))
)
p +
scale_x_continuous(
trans = forth_root_trans_rev2,
breaks = sort(unique(ds$myx)),
expand = expand_scale(mult = c(.1, .6)) # with different expansion factor, if desired
)

Related

Different objects are not showing up on my ggplot2

I'm studying the returns to college admission for marginal student and i'm trying to make a ggplot2 of the following data which is, average salaries of students who finished or didn't finish their masters in medicin and the average 'GPA' (foreign equivalent) distance to the 'acceptance score':
SalaryAfter <- c(287.780,305.181,323.468,339.082,344.738,370.475,373.257,
372.682,388.939,386.994)
DistanceGrades <- c("<=-1.0","[-0.9,-0.5]","[-0.4,-0.3]","-0,2","-0.1",
"0.0","0.1","[0.2,0.3]","[0.4,0.5]",">=0.5")
I have to do a Regression Discontinuity Design (RDD), so to do the regression - as far as i understand it - i have to rewrite the DistanceGrades to numeric so i just created a variable z
z <- -5:4
where 0 is the cutoff (ie. 0 is equal to "0.0" in DistanceGrades).
I then make a dataframe
df <- data.frame(z,SalaryAfter)
Now my attempt to create the plot gets a bit messy (i use the package 'fpp3', but i suppose that it is just the ggplot2 and maybe dyplr packages)
df %>%
select(z, SalaryAfter) %>%
mutate(D = as.factor(ifelse(z >= -0.1, 1, 0))) %>%
ggplot(aes(x = z, y = SalaryAfter, color = D)) +
geom_point(stat = "identity") +
geom_smooth(method = "lm") +
geom_vline(xintercept = 0) +
theme(panel.grid = element_line(color = "white",
size = 0.75,
linetype = 1)) +
xlim(-6,5) +
xlab("Distance to acceptance score") +
labs(title = "Figur 1.1", subtitle = "Salary for every distance to the acceptance score")
Which plots:
What i'm trying to do is firstly, split the data with a dummy variable D=1 if z>0 and D=0 if z<0. Then i plot it with a linear regression and a vertical line at z=0. Lastly i write the title and subtilte. Now i have two problems:
The x axis is displaying -5, -2.5, ... but i would like for it to show all the integers, the rational numbers have no relation to the z variable which is discrete. I have tried to fix this with several different methods, but none of them have worked, i can't remember all the ways i have tried (theme(panel.grid...),scale_x_discrete and many more), but the outcome has all been pretty similar. They all cause the x-axis to be completely removed such that there is no numbers and sometimes it even removes the axis title.
i would like for the regression channel for the first part of the data to extend to z=0
When i try to solve both of these problems i again get similar results, most of the things i try is not producing an error message when i run the code, but they either do nothing to my plot or they remove some of the existing elements which leaves me made of questions. I suppose that the error is caused by some of the elements not working together but i have no idea.
Try this:
library(tidyverse)
SalaryAfter <- c(287.780,305.181,323.468,339.082,344.738,370.475,373.257,
372.682,388.939,386.994)
DistanceGrades <- c("<=-1.0","[-0.9,-0.5]","[-0.4,-0.3]","-0,2","-0.1",
"0.0","0.1","[0.2,0.3]","[0.4,0.5]",">=0.5")
z <- -5:4
df <- data.frame(z,SalaryAfter) %>%
select(z, SalaryAfter) %>%
mutate(D = as.factor(ifelse(z >= -0.1, 1, 0)))
# Fit a lm model for the left part of the panel
fit_data <- lm(SalaryAfter~z, data = filter(df, z <= -0.1)) %>%
predict(., newdata = data.frame(z = seq(-5, 0, 0.1)), interval = "confidence") %>%
as.data.frame() %>%
mutate(z = seq(-5, 0, 0.1), D = factor(0, levels = c(0, 1)))
# Plot
ggplot(mapping = aes(color = D)) +
geom_ribbon(data = filter(fit_data, z <= 0 & -1 <= z),
aes(x = z, ymin = lwr, ymax = upr),
fill = "grey70", color = "transparent", alpha = 0.5) +
geom_line(data = fit_data, aes(x = z, y = fit), size = 1) +
geom_point(data = df, aes(x = z, y = SalaryAfter), stat = "identity") +
geom_smooth(data = df, aes(x = z, y = SalaryAfter), method = "lm") +
geom_vline(xintercept = 0) +
theme(panel.grid = element_line(color = "white",
size = 0.75,
linetype = 1)) +
scale_x_continuous(limits = c(-6, 5), breaks = -6:5) +
xlab("Distance to acceptance score") +
labs(title = "Figure 1.1", subtitle = "Salary for every distance to the acceptance score")

When using a color transformation in ggplot2, change the legend gradient instead of the legend break positions

Suppose I have a raster plot where the fill color gradient isn't used very efficiently because the values are skewed, like this:
library(ggplot2)
set.seed(20)
d = expand.grid(x = seq(0, 10, len = 100), y = seq(0, 10, len = 100))
d = transform(d, z =
1e-4 * ((x - 2)^2 + (2*y - 4)^2 + 10*rnorm(nrow(d)))^2)
ggplot(d) +
geom_raster(aes(x, y, fill = z)) +
scale_fill_distiller(palette = "Spectral",
limits = c(0, 12), breaks = 0 : 12) +
theme(legend.key.height = unit(20, "mm"))
I can quantile-transform the color scale like this:
ggplot(d) +
geom_raster(aes(x, y, fill = z)) +
scale_fill_distiller(palette = "Spectral",
limits = c(0, 12), breaks = 0 : 12,
trans = scales::trans_new("q",
function(x) ecdf(d$z)(x),
function(x) unname(quantile(d$z, x)))) +
theme(legend.key.height = unit(20, "mm"))
I like what this does for the main part of the plot, but not the legend. The legend uses the same gradient as the original, while moving the breaks according to the transformation. I'd prefer to keep the breaks where they are, while transforming the gradient instead. Also, I'd like to avoid the floating-point noise that's been added to the break labels. How can I accomplish these changes?
I had a very similar idea to chemdork123, but wanted to stay a bit closer to the quantile idea. The idea is to set an exact palette of colours (i.e., one colour for every value) and space this out such that it follows the data.
library(ggplot2)
library(scales)
#> Warning: package 'scales' was built under R version 4.0.3
set.seed(20)
d = expand.grid(x = seq(0, 10, len = 100), y = seq(0, 10, len = 100))
d = transform(d, z =
1e-4 * ((x - 2)^2 + (2*y - 4)^2 + 10*rnorm(nrow(d)))^2)
# The 'distiller' palette outside of the scale,
# we need this to generate `length(d$z)` number of colours.
pal <- gradient_n_pal(brewer_pal(palette = "Spectral", direction = -1)(7))
ggplot(d) +
geom_raster(aes(x, y, fill = z)) +
scale_fill_gradientn(
colours = pal(c(0, rescale(seq_along(d$z)), 1)), # <- extra 0, 1 for out-of-bounds
limits = c(0, 12), breaks = 0:12,
values = c(0, rescale(sort(d$z), from = c(0, 12)), 1) # <- extra 0, 1 again
) +
theme(legend.key.height = unit(10, "mm"))
Created on 2021-03-31 by the reprex package (v1.0.0)
You can use the values argument for the scale_fill_distiller() function. The distiller scales extend brewer to continuous scales by interpolating 7 colors from any palette. By default, the scaling is linearly applied from 0 (lowest value on the scale) to 1 (highest value on the scale). You can recreate this mapping via: scales::rescale(1:7). If you supply a new vector to the values argument, you can remap each of the 7 colors to a new location. You do not have to supply 7 values - the rest are interpolated linearly - just as long as you specify the max at 1 (or you'll cut the scale).
Best way is to play around with it - I've tried mapping to specific functions before, but honestly it tends to work for me the best when I just mess with the numbers until I get something I like:
ggplot(d) +
geom_raster(aes(x, y, fill = z)) +
scale_fill_distiller(palette = "Spectral", values = c(0,0.05,0.1, 0.5,1)) +
theme(legend.key.height = unit(20, "mm"))

divide the y axis to make part with a score <25 occupies the majority in ggplot

I want to divide the y axis for the attached figure to take part with a score <25 occupies the majority of the figure while the remaining represent a minor upper part.
I browsed that and I am aware that I should use scale_y_discrete(limits .I used this p<- p+scale_y_continuous(breaks = 1:20, labels = c(1:20,"//",40:100)) but it doesn't work yet.
I used the attached data and this is my code
Code
p<-ggscatter(data, x = "Year" , y = "Score" ,
color = "grey", shape = 21, size = 3, # Points color, shape and size
add.params = list(color = "blue", fill = "lightgray"), # Customize reg. line
add = "loess", #reg.line
conf.int = T,
cor.coef = F, cor.method = "pearson",
xlab = "Year" , ylab= "Score")
p<-p+ coord_cartesian(xlim = c(1980, 2020));p
Here is as close as I could get getting a fake axis break and resizing the upper area of the plot. I still think it's a bad idea and if this were my plot I'd much prefer a more straightforward axis transform.
First, we'd need a function that generates a transform that squeezes all values above some threshold:
library(ggplot2)
library(scales)
# Define new transform
my_transform <- function(threshold = 25, squeeze_factor = 10) {
force(threshold)
force(squeeze_factor)
my_transform <- trans_new(
name = "trans_squeeze",
transform = function(x) {
ifelse(x > threshold,
((x - threshold) * (1 / squeeze_factor)) + threshold,
x)
},
inverse = function(x) {
ifelse(x > threshold,
((x - threshold) * squeeze_factor) + threshold,
x)
}
)
return(my_transform)
}
Next we apply that transformation to the y-axis and add a fake axis break. I've used vanilla ggplot2 code as I find the ggscatter() approach confusing.
ggplot(data, aes(Year, Score)) +
geom_point(color = "grey", shape = 21, size = 3) +
geom_smooth(method = "loess", fill = "lightgray") +
# Add fake axis lines
annotate("segment", x = -Inf, xend = -Inf,
y = c(-Inf, Inf), yend = c(24.5, 25.5)) +
# Apply transform to y-axis
scale_y_continuous(trans = my_transform(25, 10),
breaks = seq(0, 80, by = 10)) +
scale_x_continuous(limits = c(1980, 2020), oob = oob_keep) +
theme_classic() +
# Turn real y-axis line off
theme(axis.line.y = element_blank())
You might find it informative to read Hadley Wickham's view on discontinuous axes. People sometimes mock weird y-axes.

ggpairs in r: How to (1) adjust axis values, and (2) split long variable names over two (or more lines)

I have two questions about plotting with ggpairs in r:
(1) I have some unavoidably long variable names that are not shown in full in the default output of ggpairs. How can I adjust ggpairs so that the whole name is visible (e.g. can labels be split over multiple lines, or displayed at 45 degrees, etc.)?
and (2), How do I set a custom range for axis limits for individual variables?
For example, the following code gives us the plot below:
library(GGally)
set.seed(99)
really_long_variable_name_1 <- round(runif(50, 0, 1), 2)
really_long_variable_name_2 <- round(runif(50, 0, 0.8), 2)
really_long_variable_name_3 <- round(runif(50, 0, 0.6), 2)
really_long_variable_name_4 <- round(runif(50, 0, 100), 2)
df <- data.frame(really_long_variable_name_1,
really_long_variable_name_2,
really_long_variable_name_3,
really_long_variable_name_4)
ggpairs(df)
(1) How do I adjust the plot so that full variable names are visible (in this case, the labels on the Y axis)?
and (2) How would I set the axes limits at 0 to 1 for the first three variables, and 0 to 100 for the fourth?
I can set all axes limits to the same values using a function like the one below:
custom_range <- function(data, mapping, ...) {
ggplot(data = data, mapping = mapping, ...) +
geom_point(...) +
scale_x_continuous(limits = c(0, 1)) +
scale_y_continuous(limits = c(0, 1))
}
ggpairs(df,
lower = list(continuous = custom_range))
but how would I set axis limits for the fourth variable, really_long_variable_name_4, so that X ranges from 0 to 100?
Many thanks.
First you can modify your column names to recognize _ as separation points:
cleanname = function(x,lab="\n"){
sapply(x, function(c){
paste(unlist(strsplit(as.character(c) , split="_")),collapse=lab)
})
}
colnames(df) = cleanname(colnames(df))
#Using your function
custom_range <- function(data, mapping, ...) {
ggplot(data = data, mapping = mapping, ...) +
geom_point(...) +
scale_x_continuous(limits = c(0, 1)) +
scale_y_continuous(limits = c(0, 1))
}
Add theme element to specify if you'd like text at an angle, size, color, etc. Note that this is just modifying your X names, but the Y names also have new lines at each _
myplot = ggpairs(df,
lower = list(continuous = custom_range)) +
theme(strip.text.x = element_text(size = 6, angle = 45))
Next, index each plot and overwrite the existing scale. I'm sure there is a cleaner way of coding this
myplot[4,1]= myplot[4,1] + scale_y_continuous(limits = c(0,100))
myplot[4,2]= myplot[4,2] + scale_y_continuous(limits = c(0,100))
myplot[4,3]= myplot[4,3] + scale_y_continuous(limits = c(0,100))
myplot

R/ggplot2: Collapse or remove segment of y-axis from scatter-plot

I'm trying to make a scatter plot in R with ggplot2, where the middle of the y-axis is collapsed or removed, because there is no data there. I did it in photoshop below, but is there a way to create a similar plot with ggplot?
This is the data with a continuous scale:
But I'm trying to make something like this:
Here is the code:
ggplot(data=distance_data) +
geom_point(
aes(
x = mdistance,
y = maxZ,
shape = factor(subj),
color = factor(side),
size = (cSA)
)
) +
scale_size_continuous(range = c(4, 10)) +
theme(
axis.text.x = element_text(colour = "black", size = 15),
axis.text.y = element_text(colour = "black", size = 15),
axis.title.x = element_text(colour = "black", size= 20, vjust = 0),
axis.title.y = element_text(colour = "black", size= 20),
legend.position = "none"
) +
ylab("Z-score") +
xlab("Distance")
You could do this by defining a coordinate transformation. A standard example are logarithmic coordinates, which can be achieved in ggplot by using scale_y_log10().
But you can also define custom transformation functions by supplying the trans argument to scale_y_continuous() (and similarly for scale_x_continuous()). To this end, you use the function trans_new() from the scales package. It takes as arguments the transformation function and its inverse.
I discuss first a special solution for the OP's example and then also show how this can be generalised.
OP's example
The OP wants to shrink the interval between -2 and 2. The following defines a function (and its inverse) that shrinks this interval by a factor 4:
library(scales)
trans <- function(x) {
ifelse(x > 2, x - 1.5, ifelse(x < -2, x + 1.5, x/4))
}
inv <- function(x) {
ifelse(x > 0.5, x + 1.5, ifelse(x < -0.5, x - 1.5, x*4))
}
my_trans <- trans_new("my_trans", trans, inv)
This defines the transformation. To see it in action, I define some sample data:
x_val <- 0:250
y_val <- c(-6:-2, 2:6)
set.seed(1234)
data <- data.frame(x = sample(x_val, 30, replace = TRUE),
y = sample(y_val, 30, replace = TRUE))
I first plot it without transformation:
p <- ggplot(data, aes(x, y)) + geom_point()
p + scale_y_continuous(breaks = seq(-6, 6, by = 2))
Now I use scale_y_continuous() with the transformation:
p + scale_y_continuous(trans = my_trans,
breaks = seq(-6, 6, by = 2))
If you want another transformation, you have to change the definition of trans() and inv() and run trans_new() again. You have to make sure that inv() is indeed the inverse of inv(). I checked this as follows:
x <- runif(100, -100, 100)
identical(x, trans(inv(x)))
## [1] TRUE
General solution
The function below defines a transformation where you can choose the lower and upper end of the region to be squished, as well as the factor to be used. It directly returns the trans object that can be used inside scale_y_continuous:
library(scales)
squish_trans <- function(from, to, factor) {
trans <- function(x) {
if (any(is.na(x))) return(x)
# get indices for the relevant regions
isq <- x > from & x < to
ito <- x >= to
# apply transformation
x[isq] <- from + (x[isq] - from)/factor
x[ito] <- from + (to - from)/factor + (x[ito] - to)
return(x)
}
inv <- function(x) {
if (any(is.na(x))) return(x)
# get indices for the relevant regions
isq <- x > from & x < from + (to - from)/factor
ito <- x >= from + (to - from)/factor
# apply transformation
x[isq] <- from + (x[isq] - from) * factor
x[ito] <- to + (x[ito] - (from + (to - from)/factor))
return(x)
}
# return the transformation
return(trans_new("squished", trans, inv))
}
The first line in trans() and inv() handles the case when the transformation is called with x = c(NA, NA). (It seems that this did not happen with the version of ggplot2 when I originally wrote this question. Unfortunately, I don't know with which version this startet.)
This function can now be used to conveniently redo the plot from the first section:
p + scale_y_continuous(trans = squish_trans(-2, 2, 4),
breaks = seq(-6, 6, by = 2))
The following example shows that you can squish the scale at an arbitrary position and that this also works for other geoms than points:
df <- data.frame(class = LETTERS[1:4],
val = c(1, 2, 101, 102))
ggplot(df, aes(x = class, y = val)) + geom_bar(stat = "identity") +
scale_y_continuous(trans = squish_trans(3, 100, 50),
breaks = c(0, 1, 2, 3, 50, 100, 101, 102))
Let me close by stressing what other already mentioned in comments: this kind of plot could be misleading and should be used with care!

Resources