understanding trans_new on ggplot - r

Coming from this example (pasted here)
trans_cube <- trans_new(name = "cube root",
transform = cube_root,
inverse = cube)
# dummy data
plot_data <- data.frame(x = 1:10,
y = cube(1:10))
# without applying a transform
ggplot(plot_data, aes(x = x, y = y)) +
geom_point()
# applying a transform
ggplot(plot_data, aes(x = x, y = y)) +
geom_point() +
coord_trans(y = trans_cube)
Why the simple transformation of just adding a constant (using scales::trans_new) won't work?
trans_add <- trans_new(name = "add",
transform = function(x) x + 200,
inverse = function(x) x -200)
ggplot(plot_data, aes(x = x, y = y)) + geom_point() +
coord_trans(y = trans_add )
The reason I'm asking is because I would like to back transform my y-axis and I need to do it using coord_trans or scale_y_continuous or similar.
UPDATE
I managed to show the back transformation with custom breaks by using the following, however I would like to know why the trans_new doesn't work.
fun_sc_back <- function(x){x*187+266} # transform
fun_sc <- function(x){(x-266)/187} # inverse
p_tr +
scale_y_continuous(breaks =fun_sc(c(250, 500, 750)), labels=fun_sc_back)

Related

Define custom transformation of ggplot axis labels with trans_new function

I am working on percentage changes between periods and struggling with logaritmic transformation of labels. Here is an example based on the storms dataset:
library(dplyr)
library(ggplot2)
library(scales)
df <- storms |>
group_by(year) |>
summarise(wind = mean(wind)) |>
mutate(lag = lag(wind, n = 1)) |>
mutate(perc = (wind / lag) - 1) |>
tidyr::drop_na()
I want to visualize the distribution of percentages, making the percentage change symmetrical (log difference) with log1p.
ggplot(df, aes(x = log1p(perc))) +
geom_histogram(bins = 5)
x-axis with log1p values
At this point I wanted to transform the x-axis label back to the original percentage value.
I tried to create my own transformation with trans_new, and applied it to the labels in scale_x_continuous, but I can't make it work.
trans_perc <- trans_new(
name = "trans_perc",
transform = log1p_trans(),
inverse = function(x)
expm1(x),
breaks = breaks_log(),
format = percent_format(),
domain = c(-Inf, Inf)
)
ggplot(df, aes(x = log1p(perc))) +
geom_histogram(bins = 5) +
scale_x_continuous(labels = trans_perc)
Currently, the result is:
Error in get_labels():
! breaks and labels are different lengths
Run rlang::last_error() to see where the error occurred.
Thanks!
EDIT
I am adding details on the different output I am getting from Alan's first answer:
trans_perc <- trans_new(
name = "trans_perc",
transform = log1p,
inverse = expm1,
breaks = pretty_breaks(5),
format = percent_format(),
domain = c(-Inf, Inf)
)
library(ggpubr)
a <- ggplot(df, aes(x = log1p(perc))) +
geom_histogram(bins = 5)
b <- ggplot(df, aes(x = log1p(perc))) +
geom_histogram(bins = 5) +
scale_x_continuous(trans = trans_perc)
c <- ggplot(df, aes(x = perc)) +
geom_histogram(bins = 5) +
scale_x_continuous(trans = trans_perc)
ggarrange(a, b, c,
ncol = 3,
labels = c("Log on Value only",
"Log on Value and X",
"Log on X only"))
[different outcomes]:(https://i.stack.imgur.com/dCW2m.png
If I understand you correctly, you want to keep the shape of the histogram, but change the labels so that they reflect the value of the perc column rather the transformed log1p(perc) value. If that is the case, there is no need for a transformer object. You can simply put the reverse transformation (plus formatting) as a function into the labels argument of scale_x_continuous.
ggplot(df, aes(x = log1p(perc))) +
geom_histogram(bins = 5) +
scale_x_continuous("Percentage Change",
breaks = log1p(pretty(df$perc, 5)),
labels = ~ percent(expm1(.x)))
Note that although the histogram remains symmetrical in shape, the axis labels represent the back-transformed values of the original axis labels.
The point of a transformer object is to do all this for you without having to pass a transformed data set (i.e. without having to pass log1p(perc)). So in your case, you could do:
trans_perc <- trans_new(
name = "trans_perc",
transform = log1p,
inverse = expm1,
format = percent_format(),
domain = c(-Inf, Inf)
)
ggplot(df, aes(x = perc)) +
geom_histogram(bins = 5) +
scale_x_continuous(trans = trans_perc)
Which gives essentially the same result

Use of inverse parameter in trans_new scales package

I've been trying to use the function trans_new with the scales package however I can't get it to display labels correctly
# percent to fold change
fun1 <- function(x) (x/100) + 1
# fold change to percent
inv_fun1 <- function(x) (x - 1) * 100
percent_to_fold_change_trans <- trans_new(name = "transform", transform = fun1, inverse = inv_fun1)
plot_data <- data.frame(x = 1:10,
y = inv_fun1(1:10))
# Plot raw data
p1 <- ggplot(plot_data, aes(x = x, y = y)) +
geom_point()
# This doesn't really change the plot
p2 <- ggplot(plot_data, aes(x = x, y = y)) +
geom_point() +
coord_trans(y = percent_to_fold_change_trans)
p1 and p2 are identical whereas I'm expecting p2 to be a diagonal line since we are reversing the inverting function. If I replace the inverse parameter in trans_new with another function (like fun(x) x) I can see the correct transformation but the labels are completely off. Any ideas of how to define the inverse parameters to get the right label positions?
You wouldn't expect a linear function like fun1 to change the appearance of the y axis. Remember, you are not transforming the data, you are transforming the y axis. This means that you are effectively changing the positions of the horizontal gridlines, but not the values they represent.
Any function that produces a linear transformation will result in fixed spacing between the horizontal grid lines, which is what you have already. The plot therefore won't change.
Let's take a simple example:
plot_data <- data.frame(x = 1:10, y = 1:10)
p <- ggplot(plot_data, aes(x = x, y = y)) +
geom_point() +
scale_y_continuous(breaks = 1:10)
p
Now let's create a straightforward non-linear transformation:
little_trans <- trans_new(name = "transform",
transform = function(x) x^2,
inverse = function(x) sqrt(x))
p + coord_trans(y = little_trans)
Note the values on the y axis are the same, but because we applied a non-linear transformation, the distances between the gridlines now varies.
In fact, if we plot a transformed version of our data, we would get the same shape:
ggplot(plot_data, aes(x = x, y = y^2)) +
geom_point() +
scale_y_continuous(breaks = (1:10)^2)
In a sense, this is all that the transform does, except it applies the inverse transform to the axis labels. We could do that manually here:
ggplot(plot_data, aes(x = x, y = y^2)) +
geom_point() +
scale_y_continuous(breaks = (1:10)^2, labels = sqrt((1:10)^2))
Now, suppose I instead do a more complicated but linear function of x:
little_trans <- trans_new(name = "transform",
transform = function(x) (0.1 * x + 20) / 3,
inverse = function(x) (x * 3 - 20) / 0.1)
ggplot(plot_data, aes(x = x, y = y)) +
geom_point() +
coord_trans(y = little_trans)
It's unchanged from before. We can see why if we again apply our transform directly:
ggplot(plot_data, aes(x = x, y = (0.1 * y + 20) / 3)) +
geom_point() +
scale_y_continuous(breaks = (0.1 * (1:10) + 20) / 3)
Obviously, if we do the inverse transform on the axis labels we will have 1:10, which means we will just have the original plot back.
The same holds true for any linear transform, and therefore the results you are getting are exactly what are to be expected.

How to underline text in a plot title or label? (ggplot2)

Please pardon my ignorance if this is a simple question, but I can't seem to figure out how to underline any part of a plot title. I'm using ggplot2.
The best I could find was
annotate("segment") done by hand, and I have created a toy plot to illustrate its method.
df <- data.frame(x = 1:10, y = 1:10)
rngx <- 0.5 * range(df$x)[2] # store mid-point of plot based on x-axis value
rngy <- 0.5 * range(df$y)[2] # stores mid-point of y-axis for use in ggplot
ggplot(df, aes(x = x, y = y)) +
geom_point() +
ggtitle("Oh how I wish for ..." ) +
ggplot2::annotate("text", x = rngx, y = max(df$y) + 1, label = "underlining!", color = "red") +
# create underline:
ggplot2::annotate("segment", x = rngx-0.8, xend = rngx + 0.8, y= 10.1, yend=10.1)
uses bquote(underline() with base R
pertains to lines over and under nodes on a graph
uses plotmath and offers a workaround, but it didn't help
Try this:
ggplot(df, aes(x = x, y = y)) + geom_point() +
ggtitle(expression(paste("Oh how I wish for ", underline(underlining))))
Alternatively, as BondedDust points out in the comments, you can avoid the paste() call entirely, but watch out for the for:
ggplot(df, aes(x = x, y = y)) + geom_point() +
ggtitle(expression(Oh~how~I~wish~'for'~underline(underlining)))
Or another, even shorter approach suggested by baptiste that doesn't use expression, paste(), or the many tildes:
ggplot(df, aes(x = x, y = y)) + geom_point() +
ggtitle(~"Oh how I wish for "*underline(underlining))

Most succinct way to label/annotate extreme values with ggplot?

I'd like to annotate all y-values greater than a y-threshold using ggplot2.
When you plot(lm(y~x)), using the base package, the second graph that pops up automatically is Residuals vs Fitted, the third is qqplot, and the fourth is Scale-location. Each of these automatically label your extreme Y values by listing their corresponding X value as an adjacent annotation. I'm looking for something like this.
What's the best way to achieve this base-default behavior using ggplot2?
Updated scale_size_area() in place of scale_area()
You might be able to take something from this to suit your needs.
library(ggplot2)
#Some data
df <- data.frame(x = round(runif(100), 2), y = round(runif(100), 2))
m1 <- lm(y ~ x, data = df)
df.fortified = fortify(m1)
names(df.fortified) # Names for the variables containing residuals and derived qquantities
# Select extreme values
df.fortified$extreme = ifelse(abs(df.fortified$`.stdresid`) > 1.5, 1, 0)
# Based on examples on page 173 in Wickham's ggplot2 book
plot = ggplot(data = df.fortified, aes(x = x, y = .stdresid)) +
geom_point() +
geom_text(data = df.fortified[df.fortified$extreme == 1, ],
aes(label = x, x = x, y = .stdresid), size = 3, hjust = -.3)
plot
plot1 = ggplot(data = df.fortified, aes(x = .fitted, y = .resid)) +
geom_point() + geom_smooth(se = F)
plot2 = ggplot(data = df.fortified, aes(x = .fitted, y = .resid, size = .cooksd)) +
geom_point() + scale_size_area("Cook's distance") + geom_smooth(se = FALSE, show_guide = FALSE)
library(gridExtra)
grid.arrange(plot1, plot2)

Using ggplot2 how can I plot points with an aes() after plotting lines?

I'm using ggplot2 to show lines and points on a plot. What I am trying to do is to have the lines all the same color, and then to show the points colored by an attribute. My code is as follows:
# Data frame
dfDemo <- structure(list(Y = c(0.906231077471568, 0.569073561538186,
0.0783433165521566, 0.724580209473378, 0.359136092118470, 0.871301974471722,
0.400628333618918, 1.41778205350433, 0.932081770977729, 0.198188442350644
), X = c(0.208755495088456, 0.147750173706688, 0.0205864576474412,
0.162635017485883, 0.118877260137735, 0.186538613831806, 0.137831912094464,
0.293293029083812, 0.219247919537514, 0.0323148791663826), Z = c(11112951L,
11713300L, 14331476L, 11539301L, 12233602L, 15764099L, 10191778L,
12070774L, 11836422L, 15148685L)), .Names = c("Y", "X", "Z"
), row.names = c(NA, 10L), class = "data.frame")
# Variables
X = array(0.1,100)
Y = seq(length=100, from=0, by=0.01)
# make data frame
dfAll <- data.frame()
# make data frames using loop
for (x in c(1:10)){
# spacemate calc
Floors = array(x,100)
# include label
Label = paste(' ', toString(x), sep="")
df1 <- data.frame(X = X * x, Y = Y, Label)
# merge df1 to cumulative df, dfAll
dfAll <- rbind(dfAll, df1)
}
# plot
pl <- ggplot(dfAll, aes(x = X, y = Y, group = Label, colour = 'Measures')) + geom_line()
# add points to plot
pl + geom_point(data=dfDemo, aes(x = X, y = Y)) + opts(legend.position = "none")
This almost works, but I am unable to color the points by Z when I do this. I can plot the points separately, colored by Z using the following code:
ggplot(dfDemo, aes(x = X, y = Y, colour = Z)) + geom_point()
However, if I use the similar code after plotting the lines:
pl + geom_point(data=dfDemo, aes(x = X, y = Y, colour = Z)) + opts(legend.position = "none")
I get the following error:
Error: Continuous variable () supplied to discrete scale_hue.
I don't understand how to add the points to the chart so that I can colour them by a value. I appreciate any suggestion how to solve this.
The issue is that they are colliding the two colour scales, one from the ggplot call and the other from geom_point. If you want the lines of one colour and the points of different colours then you need to erase the colour setting from ggplot call and put it inside the geom_line outside the aes call so it isn't mapped. Use I() to define the colour otherwise it will think is just a variable.
pl <- ggplot(dfAll, aes(x = X, y = Y, group = Label)) +
geom_line(colour = I("red"))
pl + geom_point(data=dfDemo, aes(x = X, y = Y, colour = Z)) +
opts(legend.position = "none")
HTH

Resources