I've been trying to use the function trans_new with the scales package however I can't get it to display labels correctly
# percent to fold change
fun1 <- function(x) (x/100) + 1
# fold change to percent
inv_fun1 <- function(x) (x - 1) * 100
percent_to_fold_change_trans <- trans_new(name = "transform", transform = fun1, inverse = inv_fun1)
plot_data <- data.frame(x = 1:10,
y = inv_fun1(1:10))
# Plot raw data
p1 <- ggplot(plot_data, aes(x = x, y = y)) +
geom_point()
# This doesn't really change the plot
p2 <- ggplot(plot_data, aes(x = x, y = y)) +
geom_point() +
coord_trans(y = percent_to_fold_change_trans)
p1 and p2 are identical whereas I'm expecting p2 to be a diagonal line since we are reversing the inverting function. If I replace the inverse parameter in trans_new with another function (like fun(x) x) I can see the correct transformation but the labels are completely off. Any ideas of how to define the inverse parameters to get the right label positions?
You wouldn't expect a linear function like fun1 to change the appearance of the y axis. Remember, you are not transforming the data, you are transforming the y axis. This means that you are effectively changing the positions of the horizontal gridlines, but not the values they represent.
Any function that produces a linear transformation will result in fixed spacing between the horizontal grid lines, which is what you have already. The plot therefore won't change.
Let's take a simple example:
plot_data <- data.frame(x = 1:10, y = 1:10)
p <- ggplot(plot_data, aes(x = x, y = y)) +
geom_point() +
scale_y_continuous(breaks = 1:10)
p
Now let's create a straightforward non-linear transformation:
little_trans <- trans_new(name = "transform",
transform = function(x) x^2,
inverse = function(x) sqrt(x))
p + coord_trans(y = little_trans)
Note the values on the y axis are the same, but because we applied a non-linear transformation, the distances between the gridlines now varies.
In fact, if we plot a transformed version of our data, we would get the same shape:
ggplot(plot_data, aes(x = x, y = y^2)) +
geom_point() +
scale_y_continuous(breaks = (1:10)^2)
In a sense, this is all that the transform does, except it applies the inverse transform to the axis labels. We could do that manually here:
ggplot(plot_data, aes(x = x, y = y^2)) +
geom_point() +
scale_y_continuous(breaks = (1:10)^2, labels = sqrt((1:10)^2))
Now, suppose I instead do a more complicated but linear function of x:
little_trans <- trans_new(name = "transform",
transform = function(x) (0.1 * x + 20) / 3,
inverse = function(x) (x * 3 - 20) / 0.1)
ggplot(plot_data, aes(x = x, y = y)) +
geom_point() +
coord_trans(y = little_trans)
It's unchanged from before. We can see why if we again apply our transform directly:
ggplot(plot_data, aes(x = x, y = (0.1 * y + 20) / 3)) +
geom_point() +
scale_y_continuous(breaks = (0.1 * (1:10) + 20) / 3)
Obviously, if we do the inverse transform on the axis labels we will have 1:10, which means we will just have the original plot back.
The same holds true for any linear transform, and therefore the results you are getting are exactly what are to be expected.
Related
Coming from this example (pasted here)
trans_cube <- trans_new(name = "cube root",
transform = cube_root,
inverse = cube)
# dummy data
plot_data <- data.frame(x = 1:10,
y = cube(1:10))
# without applying a transform
ggplot(plot_data, aes(x = x, y = y)) +
geom_point()
# applying a transform
ggplot(plot_data, aes(x = x, y = y)) +
geom_point() +
coord_trans(y = trans_cube)
Why the simple transformation of just adding a constant (using scales::trans_new) won't work?
trans_add <- trans_new(name = "add",
transform = function(x) x + 200,
inverse = function(x) x -200)
ggplot(plot_data, aes(x = x, y = y)) + geom_point() +
coord_trans(y = trans_add )
The reason I'm asking is because I would like to back transform my y-axis and I need to do it using coord_trans or scale_y_continuous or similar.
UPDATE
I managed to show the back transformation with custom breaks by using the following, however I would like to know why the trans_new doesn't work.
fun_sc_back <- function(x){x*187+266} # transform
fun_sc <- function(x){(x-266)/187} # inverse
p_tr +
scale_y_continuous(breaks =fun_sc(c(250, 500, 750)), labels=fun_sc_back)
I'm trying to plot isoclines under a scatterplot using ggplot but I can't figure out how to use stat_functioncorrectly.
The isoclines are based on the distance formula:
sqrt((x1-x2)^2 + (y1-y2)^2)
and would look like these
concentric circles, except the center would be the origin of the plot:
What I've tried so far is calling the distance function within ggplot like so (Note: I use x1=1 and y1=1 because in my real problem I also have fixed values)
distance <- function(x, y) {sqrt((x - 1)^2 + (y - 1)^2)}
ggplot(my_data, aes(x, y))+
geom_point()+
stat_function(fun=distance)
but R returns the error:
Computation failed in 'stat_function()': argument "y" is missing, with
no default
How do I correctly feed x and y values to stat_function so that it plots a generic plot of the distance formula, with the center at the origin?
For anything a bit complicated, I avoid the use of the stat functions. They are mostly aimed at quick calculations. They are usually limited to calculating y based on x. I would just pre-calculate the data and the plot with stat_contour instead:
distance <- function(x, y) {sqrt((x - 1)^2 + (y - 1)^2)}
d <- expand.grid(x = seq(0, 2, 0.02), y = seq(0, 2, 0.02))
d$dist <- mapply(distance, x = d$x, y = d$y)
ggplot(d, aes(x, y)) +
geom_raster(aes(fill = dist), interpolate = T) +
stat_contour(aes(z = dist), col = 'white') +
coord_fixed() +
viridis::scale_fill_viridis(direction = -1)
I have a very simple question but so far couldn't find easy solution for that. Let's say I have a some data that I want to fit and show its x axis value where y is in particular value. In this case let's say when y=0 what is the x value. Model is very simple y~x for fitting but I don't know how to estimate x value from there. Anyway,
sample data
library(ggplot2)
library(scales)
df = data.frame(x= sort(10^runif(8,-6,1),decreasing=TRUE), y = seq(-4,4,length.out = 8))
ggplot(df, aes(x = x, y = y)) +
geom_point() +
#geom_smooth(method = "lm", formula = y ~ x, size = 1,linetype="dashed", col="black",se=FALSE, fullrange = TRUE)+
geom_smooth(se=FALSE)+
labs(title = "Made-up data") +
scale_x_log10(breaks = c(1e-6,1e-4,1e-2,1),
labels = trans_format("log10", math_format(10^.x)),limits = c(1e-6,1))+
geom_hline(yintercept=0,linetype="dashed",colour="red",size=0.6)
I would like to convert 1e-10 input to 10^-10 format and annotate it on the plot. As I indicated in the plot.
thanks in advance!
Because geom_smooth() uses R functions to calculate the smooth line, you can attain the predicted values outside the ggplot() environment. One option is then to use approx() to get a linear approximations of the x-value, given the predicted y-value 0.
# Define formula
formula <- loess(y~x, df)
# Approximate when y would be 0
xval <- approx(x = formula$fitted, y = formula$x, xout = 0)$y
# Add to plot
ggplot(...) + annotate("text", x = xval, y = 0 , label = yval)
I'm analyzing a series that varies around zero. And to see where there are parts of the series with a tendency to be mostly positive or mostly negative I'm plotting a geom_smooth. I was wondering if it is possible to have the color of the smooth line be dependent on whether or not it is above or below 0. Below is some code that produces a graph much like what I am trying to create.
set.seed(5)
r <- runif(22, max = 5, min = -5)
t <- rep(-5:5,2)
df <- data.frame(r+t,1:22)
colnames(df) <- c("x1","x2")
ggplot(df, aes(x = x2, y = x1)) + geom_hline() + geom_line() + geom_smooth()
I considered calculating the smoothed values, adding them to the df and then using a scale_color_gradient, but I was wondering if there is a way to achieve this in ggplot directly.
You may use the n argument in geom_smooth to increase "number of points to evaluate smoother at" in order to create some more y values close to zero. Then use ggplot_build to grab the smoothed values from the ggplot object. These values are used in a geom_line, which is added on top of the original plot. Last we overplot the y = 0 values with the geom_hline.
# basic plot with a larger number of smoothed values
p <- ggplot(df, aes(x = x2, y = x1)) +
geom_line() +
geom_smooth(linetype = "blank", n = 10000)
# grab smoothed values
df2 <- ggplot_build(p)[[1]][[2]][ , c("x", "y")]
# add smoothed values with conditional color
p +
geom_line(data = df2, aes(x = x, y = y, color = y > 0)) +
geom_hline(yintercept = 0)
Something like this:
# loess data
res <- loess.smooth(df$x2, df$x1)
res <- data.frame(do.call(cbind, res))
res$posY <- ifelse(res$y >= 0, res$y, NA)
res$negY <- ifelse(res$y < 0, res$y, NA)
# plot
ggplot(df, aes(x = x2, y = x1)) +
geom_hline() +
geom_line() +
geom_line(data=res, aes(x = x, y = posY, col = "green")) +
geom_line(data=res, aes(x = x, y = negY, col = "red")) +
scale_color_identity()
I'm having serious problems trying to get my head around stat_function in R's ggplot2. I started off with this trivial example:
ggplot(data.frame(x = c(1, 1e4)), aes(x)) + stat_function(fun = function(x) x)
which works as expected. Unfortunately, when I add log scales for both x and y axes so:
ggplot(data.frame(x = 1:1e4), aes(x)) +
scale_x_log10() +
scale_y_log10() +
stat_function(fun = function(x) x)
I get the following result, which is a pretty nasty violation of the identity function.
Is there something very basic that I'm missing? What is then the correct and least hacky way to plot a function on log scale?
EDIT:
Inspired by the answers I went on and experimented with scales and the aesthetics parameter. I was even more puzzled to find out that I got what I expected using the code below:
ggplot(data.frame(x = 1:1e4, y = 1:1e4), aes(x, y)) +
scale_x_log10() +
scale_y_log10() +
stat_function(fun = function(x) x)
with an apparently unused vector of y values (unused by stat_function that is). Do the axis transformations depend on the availability of data?
When you use scale_x_log10() then x values are log transformed, and only then used for calculation of y values with stat_function(). Then x values are backtransformed to original values to make scale. y values remain as calculated from log transformed x. You can check this by plotting values without scale_y_log10(). In plot there is straight line.
ggplot(data.frame(x=1:1e4), aes(x)) +
stat_function(fun = function(x) x) +
scale_x_log10()
If you apply scale_y_log10() you log transform already calculated y values, so curve is plotted.
In ggplot2, the rule is that scale transformation precedes statistical transformation which in turn precedes coordinate transformation. In this context, the function (via stat_function()) is the statistical transformation.
If you use a scale_x/y_*() function in a ggplot2 call, it will apply the scale transformation(s) first before computing the function.
Case 0: Plot in the original scales of x and y.
ggplot(data.frame(x = 1:1e4, y = 1:1e4), aes(x, y)) +
stat_function(fun = function(x) x)
Case 1a: Both x and y are log transformed before the function is computed because of the presence of scale_x/y_log10(). You can see this from the values on their respective scales (compare to Case 0).
ggplot(data.frame(x = 1:1e4, y = 1:1e4), aes(x, y)) +
stat_function(fun = function(x) x) +
scale_x_log10() +
scale_y_log10()
Case 1b: x is log transformed in the original data frame. Consequently, the function actually operates on the log10(x) values, so will still be a straight line, but on the log10 scale in both x and y.
ggplot(data.frame(x = log10(seq(1e4)), y = seq(1e4)), aes(x, y)) +
stat_function(fun = function(x) x)
Case 1c: The same as 1b, with one exception: the x-scale is in the original units but the y-scale is in log10(x) units, because the scale transformation on x occurs before the statistical transformation f(y) = y is computed, where y = log10(x).
ggplot(data.frame(x = seq(1e4), y = seq(1e4)), aes(x, y)) +
stat_function(fun = function(x) x) +
scale_x_log10()
Case 2: By contrast, coordinate transformations take place after statistical transformation; i.e., the function is computed in the original units first and then the coordinate transformation on x takes place, which warps the function:
ggplot(data.frame(x = seq(1e4), y = seq(1e4)), aes(x, y)) +
stat_function(fun = function(x) x) +
coord_trans(xtrans = "log10")
...unless, of course, you apply the same transformation to both x and y:
ggplot(data.frame(x = seq(1e4), y = seq(1e4)), aes(x, y)) +
stat_function(fun = function(x) x) +
coord_trans(xtrans = "log10", ytrans = "log10")