Smooth interpolation of my data - r

I'm trying create smooth lines on a plot which include the maxima of the data points. I've searched around a lot and have tinkered with loess() and ksmooth() but I'm yet to make it work.
My best attempt so far has been with ksmooth() but the line doesn't pass through the maximum data point
I'm a chemist, not a statistician, so the methods/descriptions of various smoothing techniques often go over my head. Any suggestions would be really appreciated.
Edit: Just wanted to make a few things clearer. Basically what I'm after is a smoothed version of the following plot with the line passing through the maximum y value.
To generate the plot in the first picture I used the following code:
plot(ChiM~Temp, xlim=c(2,6), ylim=c(0,0.225), lwd=2, pch=16, col='red',subset=(v=='20'), main='Out-of-Phase AC Suscetability Plot', xlab='Temperature (K)', ylab=expression(chi[M]*'" (cm'^3*~'mol'^-1*')'))
setone <- subset(DSM32ac, v=='20') #v=20 is the subset of the data I have provided
attach(setone)
lines(ksmooth(Temp, ChiM, 'normal', bandwidth=0.5), col='red',lwd=2)
I hope this makes things a little clearer. If you need any more information to answer this question just let me know.
Edit 2: I've removed the data since I can't make a neat table. If it's really important I'll try and put it back in.

Try this:
y <- c(.07, .12, .17, .11, .04, .02, .01)
x <- seq_along(y)
s <- spline(x, y)
plot(y ~ x)
lines(s)
giving:

You can try this:
n <- 10 # generate 10 data points
d <- data.frame(x = 1:n, y = rnorm(n))
# with loes smoothing (span parameter controls the degree of smoothing)
library(ggplot2)
ggplot() + geom_point(data=d,aes(x,y), size=5) +
geom_smooth(data=d,aes(x,y, colour='span=0.5'), span=0.5, se=FALSE) +
geom_smooth(data=d,aes(x,y, colour='span=0.6'), span=0.6, se=FALSE) +
geom_smooth(data=d,aes(x,y, colour='span=0.7'), span=0.7, se=FALSE) +
geom_smooth(data=d,aes(x,y, colour='span=0.8'), span=0.8, se=FALSE)
# with B-spline curves using lm (degree of polynomial fitted controls the smoothness)
ggplot() +
geom_point(data=d, aes(x, y), size=5) +
geom_smooth(data=d, aes(x, y,col='degree=3'), method = "lm", formula = y ~ splines::bs(x, 3), se = FALSE) +
geom_smooth(data=d, aes(x, y,col='degree=4'), method = "lm", formula = y ~ splines::bs(x, 4), se = FALSE) +
geom_smooth(data=d, aes(x, y,col='degree=5'), method = "lm", formula = y ~ splines::bs(x, 5), se = FALSE) +
geom_smooth(data=d, aes(x, y,col='degree=6'), method = "lm", formula = y ~ splines::bs(x, 6), se = FALSE) +
geom_smooth(data=d, aes(x, y,col='degree=7'), method = "lm", formula = y ~ splines::bs(x, 6), se = FALSE)
# with smooth.spline (spar parameter control smoothness of the fitted curve)
colors <- rainbow(100)
plot(d$x, d$y, pch=19, xlab='x', ylab='y')
i <- 1
for (spar in seq(0.001,1,length=100)) {
lines(smooth.spline(d$x, d$y, spar=spar, all.knots=TRUE)$y, col=colors[i])
i <- i + 1
}
points(d$x, d$y, pch=19)

Related

How to fit non-linear function to data in ggplot2 using maximum likelihood model in R?

The data set (x.test, y.test) is an exponential fit. I'm trying to fit a custom non-linear function and attached is the code. The regular points plot just fine but I'm unable to get the fit line to work. Any suggestions?
x.test <- runif(50,2,8)
y.test <- 0.5^(x.test)
df <- data.frame(x.test, y.test)
library(ggpmisc)
my.formula <- y ~ lambda/ (1 + aii*x)
ggplot(data = df, aes(x=x.test,y=y.test)) +
geom_point(shape=21, fill="white", color="red", size=3) +
stat_smooth(method="nls",formula = y.test ~ lambda/ (1 + aii*x.test), method.args=list(start=c(lambda=1000,aii=-816.39)),se=F,color="red") +
geom_smooth(method="lm", formula = my.formula , col = "red") + stat_poly_eq(formula = my.formula, aes(label = stringr::str_wrap(paste(..eq.label.., ..rr.label.., sep = "~~~"))), parse = TRUE, size = 2.5, col = "red") + stat_function(fun=function (x.test){
y.test ~ lambda/ (1 + aii*x.test)}, color = "blue")
A few things:
you need to use y and x as the variable names in the formula argument to geom_smooth, regardless of what the names are in your data set
you need better starting values (see below)
there's a GLM trick you can use to fit this model; doesn't always work (can be numerically unstable), but it doesn't need starting values and will work more often than nls()
I don't think lm() and stat_poly_eq() are going to work as expected (or maybe at all) with a nonlinear formula ...
simulate data
(same as your code but using set.seed() - probably not important here but good practice)
set.seed(101)
x.test <- runif(50,2,8)
y.test <- 0.5^(x.test)
df <- data.frame(x.test, y.test)
attempt nls fit with your starting values
It's usually a good idea to troubleshoot by fitting any smoothing terms outside of ggplot2, so you have fewer layers to dig through to find the problems:
nls(y.test ~ lambda/(1+ aii*x.test),
start = list(lambda=1000,aii=-816.39),
data = df)
Error in nls(y.test ~ lambda/(1 + aii * x.test), start = list(lambda = 1000, :
singular gradient
OK, still doesn't work. Let's use glm() to get better starting values: we use an inverse-link GLM:
1/y = b0 + b1*x
y = 1/(b0 + b1*x)
= (1/b0)/(1 + (b1/b0)*x)
So:
g1 <- glm(y.test ~ x.test, family = gaussian(link = "inverse"))
s0 <- with(as.list(coef(g1)), list(lambda = 1/`(Intercept)`, aii = x.test/`(Intercept)`))
This gives lambda = -0.09, aii = -0.638 (with a little bit more work we could probably also figure out how to eyeball these by looking at the starting point and scale of the curve).
ggplot(data = df, aes(x=x.test,y=y.test)) +
geom_point(shape=21, fill="white", color="red", size=3) +
stat_smooth(method="nls",
formula = y ~ lambda/ (1 + aii*x),
method.args=list(start=s0),
se=FALSE,color="red") +
stat_smooth(method = "glm",
formula = y ~ x,
method.args = list(gaussian(link = "inverse")),
color = "blue", linetype = 2)

How can I add confidence intervals to a scatterplot for a regression on two variables?

I need to create an insightful graphic with a regression line, data points, and confidence intervals. I am not looking for smoothed lines. I have tried multiple codes, but I just can't get it right.
I am looking for something like this:
Some codes I have tried:
p <- scatterplot(df.regsoft$w ~ df.regsoft$b,
data = df.regsoft,
boxplots = FALSE,
regLine = list(method=lm, col="red"),
pch = 16,
cex = 0.7,
xlab = "Fitted Values",
ylab = "Residuals",
legend = TRUE,
smooth = FALSE)
abline(coef = confint.lm(result.rs))
But this doesn't create what I want to create, however it is closest to what I intended. Notice that I took out "smooth" since this is not really what I am looking for.
How can I make this plot interactive?
If you don't mind switch to ggplot and the tidyverse, then this is simply a geom_smooth(method = "lm"):
library(tidyverse)
d <- tibble( #random stuff
x = rnorm(100, 0, 1),
y = 0.25 * x + rnorm(100, 0, 0.25)
)
m <- lm(y ~ x, data = d) #linear model
d %>%
ggplot() +
aes(x, y) + #what to plot
geom_point() +
geom_smooth(method = "lm") +
theme_bw()
without method = "lm" it draws a smoothed line.
As for the Conf. interval (Obs 95%) lines, it seems to me that's simply a quantile regression. In that case, you can use the quantreg package.
If you want to make it interactive, you can use the plotly package:
library(plotly)
p <- d %>%
ggplot() +
aes(x, y) +
geom_point() +
geom_smooth(method = "lm") +
theme_bw()
ggplotly(p)
================================================
P.S.
I am not completely sure this is what the figure you posted is showing (I guess so), but to add the quantile lines, I would just perform two quantile regressions (upper and lower) and then calculate the values of the quantile lines for your data:
library(tidyverse)
library(quantreg)
d <- tibble( #random stuff
x = rnorm(100, 0, 1),
y = 0.25 * x + rnorm(100, 0, 0.25)
)
m <- lm(y ~ x, data = d) #linear model
# 95% quantile, two tailed
rq_low <- rq(y ~ x, data = d, tau = 0.025) #lower quantile
rq_high <- rq(y ~ x, data = d, tau = 0.975) #upper quantile
d %>%
mutate(low = rq_low$coefficients[1] + x * rq_low$coefficients[2],
high = rq_high$coefficients[1] + x * rq_high$coefficients[2]) %>%
ggplot() +
geom_point(aes(x, y)) +
geom_smooth(aes(x, y), method = "lm") +
geom_line(aes(x, low), linetype = "dashed") +
geom_line(aes(x, high), linetype = "dashed") +
theme_bw()

Geom_smooth - linear regression through x-axis intercept

I would like to force a linear regression through a specific x-axis crossing point using "geom_smooth" in ggplot2:
geom_smooth(aes(x = x, y = y), method = "lm", formula = y ~ x)
Intuitively, choosing an x-axis intercept, one would use the formula y = a * (x - b) + c.
Implementing this in the "formula" code as e.g. :
geom_smooth(aes(x = x, y = y), method = "lm", formula = y ~ x - 5)
Does not work.
I am not sure it is possible to do this just using geom_smooth. However, you could predict the regression outside of your ggplot2 call, using an offset to set the intercept required and plot it subsequently.
For example:
set.seed(1)
# Generate some data
x <- 1:10
y <- 3 + 2*x + rnorm(length(x), 0, 2)
# Simple regression
z_1 <- lm(y ~ x)
# Regression with no intercept
z_2 <- lm(y ~ x + 0)
# Regression with intercept at (0,3) - the 'true' intercept
z_3 <- lm(y ~ x + 0, offset=rep(3, length(x)))
# See the coefficients
coef(z_1)
#(Intercept) x
# 2.662353 2.109464
coef(z_2)
# x
#2.4898
coef(z_3)
# x
#1.775515
# Combine into one dataframe
df <- cbind.data.frame(x,y,predict(z_1),predict(z_2), predict(z_3))
# Plot the three regression lines
library(ggplot2)
ggplot(df) + geom_point(aes(x,y)) +
geom_line(aes(x,predict(z_1)), color = "red") +
geom_line(aes(x,predict(z_2)), color = "blue") +
geom_line(aes(x,predict(z_3)), color = "green") +
scale_x_continuous(limits = c(0,10)) +
scale_y_continuous(limits = c(0,30))
You'll need to use the offset function for the x-intercept that's already locked in. That's passed via the method.args argument of geom_smooth, since not all smoothing methods can use that argument.
You'll also need to specify the orientation argument to confirm that you've got an x-intercept, rather than the y-intercept.
I also specified the number of smoothing points to plot (n) and the offset repeats to match -- not sure if that's strictly necessary.
Some gymnastics to be sure, but hopefully it helps.
library("tidyverse")
mtcars %>%
ggplot(aes(disp, hp)) +
geom_point() +
geom_smooth(method = "lm",
orientation = "y",
formula = y ~ x + 0,
color= "blue",
se = FALSE,
n = nrow(mtcars),
method.args=list(offset=rep(100, nrow(mtcars))),
fullrange = TRUE) +
scale_x_continuous(limits =c(0, 600))
#> Warning: Removed 5 rows containing missing values (geom_smooth).
Created on 2020-07-08 by the reprex package (v0.3.0)

How can I add a different legend for different quantiles in the same graph in ggplot2?

ggplotRegression <- function (fit) {
require(ggplot2)
ggplot(fit$model, aes_string(x = names(fit$model)[2], y =
names(fit$model)[1])) +
geom_point() +
stat_smooth(method = "lm", col = "red") +
labs(title = paste("Adj R2 = ",signif(summary(fit)$adj.r.squared,
5),
"Intercept =",signif(fit$coef[[1]],5 ),
" Slope =",signif(fit$coef[[2]], 5),
" P =",signif(summary(fit)$coef[2,4], 5)))
}
taus <-c(0.05 , 0.25, 0.50 , 0.75, 0.90 , 0.95)
m <- ggplotRegression( lm(formula = BMI ~ height_in_m
+weight_in_kg+ Highest_Education_level +
wealth_index + age_in_year_groups, data = dat_new))
m+geom_quantile(quantiles=taus, lwd=1.5 , col="green4",
fill=taus)
Now I want to add specific colours for each quantiles and also add spcific legend for each quantiles .
Many ggplot statistics let you use the results of the calculation enclosed in .., for example with geom_density you can use ..count.. in the aes.
With geom_quantile you can use ..quantile..
df <- data_frame(x = rnorm(100), y = rnorm(100))
ggplot(df, aes(x, y)) +
geom_point() +
geom_quantile(aes(colour = as.factor(..quantile..)))
The trick is to find out what these variables are called. Geoms that need to calculate statistics, such as geom_quantile and geom_density, have an associated ggproto object such as StatQuantile and StatDensity which has the code for the calculations in an element called compute_group.
The last command of StatQuantile$compute_group is
plyr::ldply(quantiles, quant_pred, data = data, method = method,
formula = formula, weight = weight, grid = grid, method.args = method.args)
The function here, quant_pred - which you can see with ggplot2:::quant_pred, returns a list. The components of this list, including quantile, can be used in the aes.

Use ggplot to plot partial effects obtained with effects library

I would like to use ggplot to replicate the plots partial effects (with partial residuals), as obtained with the "effect" package. To do this I need to retrieve some information.
This is the plot I want to replicate with ggplot.
library(effects)
mod <- lm(log(prestige) ~ income:type + education, data=Prestige)
eff = effect("education", mod, partial.residuals=T)
plot(eff)
From the eff object I am able to retrieve the partial residuals, as eff$residuals, but they are not sufficient to replicate the plot. I think that what I need is the both the residuals, AND the marginal predicted effect. However I was not able to retrieve them from my eff object.
Otherwise I only have the residuals scores that cannot be plotted against the line of the marginal effect.
Any hint on how to retrieve this information?
You have almost all the information available. This would take some more time to generalize, but here's some code that results in a figure approximately like from the effects package. Notice that the smoother is off, but I didn't bother to dig up why.
The code should be self explanatory. I only copied function closest from the package.
mod <- lm(log(prestige) ~ income:type + education, data=Prestige)
eff = effect("education", mod, partial.residuals=T)
library(ggplot2)
library(gridExtra)
closest <- function(x, x0) apply(outer(x, x0, FUN=function(x, x0) abs(x - x0)), 1, which.min)
x.fit <- unlist(eff$x.all)
trans <- I
x <- data.frame(lower = eff$lower, upper = eff$upper, fit = eff$fit, education = eff$x$education)
xy <- data.frame(x = x.fit, y = x$fit[closest(trans(x.fit), x$education)] + eff$residuals)
g <- ggplot(x, aes(x = education, y = fit)) +
theme_bw() +
geom_line(size = 1) +
geom_point(data = xy, aes(x = x, y = y), shape = 1, col = "blue", size = 2) +
geom_ribbon(aes(ymin = lower, ymax = upper), alpha = 0.5) +
geom_smooth(data = xy, aes(x = trans(x), y = y),
method = "loess", span = 2/3, linetype = "dashed", se = FALSE)
grid.arrange(plot(eff), g, ncol = 2)

Resources