I would like to plot a dataframe (X,Y) data together with a fitted function and the derivative of the fitted function.
fit <- lm(data$Y ~ poly(data$X,32,raw=TRUE))
data$fitted_values <- predict(fit, data.frame(x=data$X))
As far as I understood, this gives me a polynomial function of the 32nd degree, fit, that I use to calculate the function values and store them in data$fitted. Plotting these series works like a charm with ggplot2.
ggplot(data, aes(x=X)) +
geom_line(aes(y = Y), colour="red") +
geom_line(aes(y = predict), colour="blue")
So far so good. But what I'm would like to plot too is the first derivative, data$Y', of the fitted function fit. What I'm interested in is the gradient of the fitted function.
My Question: How can I get the derivative function of fit?
I assume I can then "predict" the absolute values for plotting afterwards. Correct?
First, i'll create some test data that "kind of" looks like yours
set.seed(15)
rr<-density(faithful$eruptions)
dd<-data.frame(x=rr$x)
dd$y=rr$y+ runif(8,0,.05)
fit <- lm(y ~ poly(x,32,raw=TRUE), dd)
dd$fitted <- fitted(fit)
ggplot(dd, aes(x=x)) +
geom_line(aes(y = y), colour="red") +
geom_line(aes(y = fitted), colour="blue")
Then, because you have a special form of a polynomial we can easily calculate the derivative by multiplying each of the coefficients by the power and shifting all the terms down. Here's a helper function to calcualte the new coefficients
deriv_coef<-function(x) {
x <- coef(x)
stopifnot(names(x)[1]=="(Intercept)")
y <- x[-1]
stopifnot(all(grepl("^poly", names(y))))
px <- as.numeric(gsub("poly\\(.*\\)","",names(y)))
rr <- setNames(c(y * px, 0), names(x))
rr[is.na(rr)] <- 0
rr
}
which we can use like...
dd$slope <- model.matrix(fit) %*% matrix(deriv_coef(fit), ncol=1)
And now I can plot
ggplot(dd, aes(x=x)) +
geom_line(aes(y = y), colour="red") +
geom_line(aes(y = fitted), colour="blue") +
geom_line(aes(y = slope), colour="green")
and we can see that the inflection points correspond to places where the derivative is zero.
You can approximate the derivative by first sorting the data with respect to X, then finding the differences between each pair of consecutive values.
data <- d[order(d$X), ]
data$derivative = c(diff(d$fitted_values) / diff(d$X), NA)
(Note how I added an NA to the end, since taking the differences makes it slightly shorter). Afterwards you can plot this:
ggplot(data, aes(X, derivative)) + geom_line()
Allegedly the quantchem package can do it with the derivative function.
Description
Calculate derivative of polynomial for given x.
Usage
derivative(obj, x)
Arguments
obj: an object of class 'lm', fitted in y ~ x + I(x^2) + I(x^3) + ...
way.
x: a vector of x values
Examples
x = 1:10 y = jitter(x+x^2)
fit = lm(y~x+I(x^2))
derivative(fit,1:10)
Source
Note: All of this said, it didn't work for me and my data.
Related
>DF
x.values y.values
0 1.0000000
2 0.5443318
4 0.4098546
6 0.3499007
ggplot(DF, aes(x=x.values, y=y.values)) +
geom_point() +
geom_smooth(se=FALSE, method = "glm", formula= y ~ poly(x,2))
Gives me a polynomial fit to the data which looks like this:
(source: preview.ibb.co)
From the image I can visually estimate the the extrapolated x.value for y.value=0.5, to be ~2.5-2.6.
However, when I estimate the interpolated x.value outside of ggplot, I get
a value of 2.78.
M <- glm(formula = y.values ~ poly(x.values,2), data = DF)
t0.5 <- approx(x = M$fitted, y = DF$x.values, xout=0.50)$y
t0.5
[1] 2.780246
Can anyone please explain this discrepancy?
The model is predicting y.values from x.values, so the fitted values of the model are y.values, not x.values. Thus, the code should be t0.5 <- approx(x = DF$x.values, y = fitted(M), xout=0.50)$y. After making this change, you can see that linear interpolation and model prediction are what one would expect by visual inspection of the plot.
p = ggplot(DF, aes(x=x.values, y=y.values)) +
geom_point() +
geom_smooth(se=FALSE, method = "glm", formula= y ~ poly(x,2))
M <- glm(formula = y.values ~ poly(x.values,2), data = DF)
# linear interpolation of fitted values at x.values=0.5
t0.5 <- approx(x = DF$x.values, y = fitted(M), xout=0.50)$y
# glm model prediction at x.values=0.5
predy = predict(M, newdata=data.frame(x.values=0.5))
# Data frame with linear interpolation of predictions along the full range of x.values
interp.fit = as.data.frame(approx(x=DF$x.values, y=fitted(M),
xout=seq(min(DF$x.values), max(DF$x.values),length=100)))
p +
geom_line(data=interp.fit, aes(x,y), colour="red", size=0.7) +
annotate(x=0.5, y=t0.5, geom="point", shape=3, colour="red", size=4) +
annotate(x=0.5, y=predy, geom="point", shape=16, colour="purple", size=4)
In response to the comment: To calculate x at any given y, you could use the quadratic formula. The regression equation is:
y = a*x^2 + b*x + c
Where a, b, and c are the regression coefficients (with the order reversed relative to the values returned by coef(M)).
0 = a*x^2 + b*x + (c - y)
Now just apply the quadratic formula to get the two values of x for any given value of y (where y is constrained to be in the range of the regression function), noting that the c coefficient in the standard quadratic formula is here replaced by c - y.
I'm trying to plot isoclines under a scatterplot using ggplot but I can't figure out how to use stat_functioncorrectly.
The isoclines are based on the distance formula:
sqrt((x1-x2)^2 + (y1-y2)^2)
and would look like these
concentric circles, except the center would be the origin of the plot:
What I've tried so far is calling the distance function within ggplot like so (Note: I use x1=1 and y1=1 because in my real problem I also have fixed values)
distance <- function(x, y) {sqrt((x - 1)^2 + (y - 1)^2)}
ggplot(my_data, aes(x, y))+
geom_point()+
stat_function(fun=distance)
but R returns the error:
Computation failed in 'stat_function()': argument "y" is missing, with
no default
How do I correctly feed x and y values to stat_function so that it plots a generic plot of the distance formula, with the center at the origin?
For anything a bit complicated, I avoid the use of the stat functions. They are mostly aimed at quick calculations. They are usually limited to calculating y based on x. I would just pre-calculate the data and the plot with stat_contour instead:
distance <- function(x, y) {sqrt((x - 1)^2 + (y - 1)^2)}
d <- expand.grid(x = seq(0, 2, 0.02), y = seq(0, 2, 0.02))
d$dist <- mapply(distance, x = d$x, y = d$y)
ggplot(d, aes(x, y)) +
geom_raster(aes(fill = dist), interpolate = T) +
stat_contour(aes(z = dist), col = 'white') +
coord_fixed() +
viridis::scale_fill_viridis(direction = -1)
I have a very simple question but so far couldn't find easy solution for that. Let's say I have a some data that I want to fit and show its x axis value where y is in particular value. In this case let's say when y=0 what is the x value. Model is very simple y~x for fitting but I don't know how to estimate x value from there. Anyway,
sample data
library(ggplot2)
library(scales)
df = data.frame(x= sort(10^runif(8,-6,1),decreasing=TRUE), y = seq(-4,4,length.out = 8))
ggplot(df, aes(x = x, y = y)) +
geom_point() +
#geom_smooth(method = "lm", formula = y ~ x, size = 1,linetype="dashed", col="black",se=FALSE, fullrange = TRUE)+
geom_smooth(se=FALSE)+
labs(title = "Made-up data") +
scale_x_log10(breaks = c(1e-6,1e-4,1e-2,1),
labels = trans_format("log10", math_format(10^.x)),limits = c(1e-6,1))+
geom_hline(yintercept=0,linetype="dashed",colour="red",size=0.6)
I would like to convert 1e-10 input to 10^-10 format and annotate it on the plot. As I indicated in the plot.
thanks in advance!
Because geom_smooth() uses R functions to calculate the smooth line, you can attain the predicted values outside the ggplot() environment. One option is then to use approx() to get a linear approximations of the x-value, given the predicted y-value 0.
# Define formula
formula <- loess(y~x, df)
# Approximate when y would be 0
xval <- approx(x = formula$fitted, y = formula$x, xout = 0)$y
# Add to plot
ggplot(...) + annotate("text", x = xval, y = 0 , label = yval)
I'm having serious problems trying to get my head around stat_function in R's ggplot2. I started off with this trivial example:
ggplot(data.frame(x = c(1, 1e4)), aes(x)) + stat_function(fun = function(x) x)
which works as expected. Unfortunately, when I add log scales for both x and y axes so:
ggplot(data.frame(x = 1:1e4), aes(x)) +
scale_x_log10() +
scale_y_log10() +
stat_function(fun = function(x) x)
I get the following result, which is a pretty nasty violation of the identity function.
Is there something very basic that I'm missing? What is then the correct and least hacky way to plot a function on log scale?
EDIT:
Inspired by the answers I went on and experimented with scales and the aesthetics parameter. I was even more puzzled to find out that I got what I expected using the code below:
ggplot(data.frame(x = 1:1e4, y = 1:1e4), aes(x, y)) +
scale_x_log10() +
scale_y_log10() +
stat_function(fun = function(x) x)
with an apparently unused vector of y values (unused by stat_function that is). Do the axis transformations depend on the availability of data?
When you use scale_x_log10() then x values are log transformed, and only then used for calculation of y values with stat_function(). Then x values are backtransformed to original values to make scale. y values remain as calculated from log transformed x. You can check this by plotting values without scale_y_log10(). In plot there is straight line.
ggplot(data.frame(x=1:1e4), aes(x)) +
stat_function(fun = function(x) x) +
scale_x_log10()
If you apply scale_y_log10() you log transform already calculated y values, so curve is plotted.
In ggplot2, the rule is that scale transformation precedes statistical transformation which in turn precedes coordinate transformation. In this context, the function (via stat_function()) is the statistical transformation.
If you use a scale_x/y_*() function in a ggplot2 call, it will apply the scale transformation(s) first before computing the function.
Case 0: Plot in the original scales of x and y.
ggplot(data.frame(x = 1:1e4, y = 1:1e4), aes(x, y)) +
stat_function(fun = function(x) x)
Case 1a: Both x and y are log transformed before the function is computed because of the presence of scale_x/y_log10(). You can see this from the values on their respective scales (compare to Case 0).
ggplot(data.frame(x = 1:1e4, y = 1:1e4), aes(x, y)) +
stat_function(fun = function(x) x) +
scale_x_log10() +
scale_y_log10()
Case 1b: x is log transformed in the original data frame. Consequently, the function actually operates on the log10(x) values, so will still be a straight line, but on the log10 scale in both x and y.
ggplot(data.frame(x = log10(seq(1e4)), y = seq(1e4)), aes(x, y)) +
stat_function(fun = function(x) x)
Case 1c: The same as 1b, with one exception: the x-scale is in the original units but the y-scale is in log10(x) units, because the scale transformation on x occurs before the statistical transformation f(y) = y is computed, where y = log10(x).
ggplot(data.frame(x = seq(1e4), y = seq(1e4)), aes(x, y)) +
stat_function(fun = function(x) x) +
scale_x_log10()
Case 2: By contrast, coordinate transformations take place after statistical transformation; i.e., the function is computed in the original units first and then the coordinate transformation on x takes place, which warps the function:
ggplot(data.frame(x = seq(1e4), y = seq(1e4)), aes(x, y)) +
stat_function(fun = function(x) x) +
coord_trans(xtrans = "log10")
...unless, of course, you apply the same transformation to both x and y:
ggplot(data.frame(x = seq(1e4), y = seq(1e4)), aes(x, y)) +
stat_function(fun = function(x) x) +
coord_trans(xtrans = "log10", ytrans = "log10")
I have a bivariate data set:
set.seed(45)
require(mvtnorm)
sigma <- matrix(c(3,2,2,3), ncol=2)
df <- as.data.frame(rmvnorm(100, sigma=sigma))
names(df) <- c("u", "v")
Setting up v as the dependent variable, with ggplot I can easily show the "usual" least-squares regression of v on u:
require(ggplot2)
qplot(u, v, data=df) + geom_smooth(aes(u, v), method="lm", se=FALSE)
... but I'd also like to show the least-squares regression of u on v (at the same time).
This is how I naively tried to do it, by passing a different aes to geom_smooth:
last_plot() + geom_smooth(aes(v, u), method="lm", color="red", se=FALSE)
Of course, that doesn't quite work. The second geom_smooth shows the inverse of the proper line (I think). I'm expecting it to have a steeper slope than the first line.
Moreover, the confidence intervals are wrongly shaped. I don't particularly care about those, but I do think they might be a clue.
Am I asking for something that can't easily be done with ggplot2?
EDIT: Here is a bit more, showing the lines I expect:
# (1) Least-squares regression of v on u
mod <- lm(v ~ u, data=df)
v_intercept <- coef(mod)[1]
v_slope <- coef(mod)[2]
last_plot() + geom_abline(
intercept = v_intercept,
slope = v_slope,
color = "blue",
linetype = 2
)
# (2) Least-squares regression of u on v
mod2 <- lm(u ~ v, data=df)
u_intercept <- coef(mod2)[1]
u_slope <- coef(mod2)[2]
# NOTE: we have to solve for the v-intercept and invert the slope
# because we're still in the original (u, v) coordinate frame
last_plot() + geom_abline(
intercept = - u_intercept / u_slope,
slope = 1 / u_slope,
color = "red",
linetype = 2
)
ggplot(df) +
geom_smooth(aes(u,v), method='lm') +
geom_smooth(aes(v,u), method='lm', colour="red")