I am struggling with "out-of-sample" prediction using loess. I get NA values for new x that are outside the original sample. Can I get these predictions?
x <- c(24,36,48,60,84,120,180)
y <- c(3.94,4.03,4.29,4.30,4.63,4.86,5.02)
lo <- loess(y~x)
x.all <- seq(3, 200, 3)
predict(object = lo, newdata = x.all)
I need to model full yield curve, i.e. interest rates for different maturities.
From the manual page of predict.loess:
When the fit was made using surface = "interpolate" (the default), predict.loess will not extrapolate – so points outside an axis-aligned hypercube enclosing the original data will have missing (NA) predictions and standard errors
If you change the surface parameter to "direct" you can extrapolate values.
For instance, this will work (on a side note: after plotting the prediction, my feeling is that you should increase the span parameter in the loess call a little bit):
lo <- loess(y~x, control=loess.control(surface="direct"))
predict(lo, newdata=x.all)
In addition to nico's answer: I would suggest to fit a gam (which uses penalized regression splines) instead. However, extrapolation is not advisable if you don't have a model based on science.
x <- c(24,36,48,60,84,120,180)
y <- c(3.94,4.03,4.29,4.30,4.63,4.86,5.02)
lo <- loess(y~x, control=loess.control(surface = "direct"))
plot(x.all <- seq(3,200,3),
predict(object = lo,newdata = x.all),
type="l", col="blue")
points(x, y)
library(mgcv)
fit <- gam(y ~ s(x, bs="cr", k=7, fx =FALSE), data = data.frame(x, y))
summary(fit)
lines(x.all, predict(fit, newdata = data.frame(x = x.all)), col="green")
I want to fit a linear regression line with a specified slope to a data set. I read this thread about doing the same with an explicit intercept.
0+ suppresses the fitting of the intercept; what is the corresponding trick for the slope?
For example, to fit a line with slope 1.5, I tried the following
set.seed(6)
x <- runif(100, -3, 3)
y <- 2 + x + rnorm(100)
model1<-lm(y ~ x)
plot(x,y)
abline(model1,col="red")
abline(coef(model1),1.5,col="dark green")
but second abline function just takes the intercept from model1 and slope 1.5. Whereas I would like the regression line to have slope 1.5, find the best fit to the data points, and then compute intercept from that regression line.
To find the value of the intercept, you don't actually need a regression. Since Y = a + b * X + ϵ, then E[Y - b * X] = E[a] + E[ϵ], and by assumption E[a] = a and E[ϵ] = 0, where E[] is the expectation operator. Therefore, a = E[Y - b * X].
Translated into R, this means the intercept a is:
b1 <- 1.5
a <- mean(y - b1 * x)
This is inspired by the comments to this question.
I suppose one approach would be to subtract 1.5*x from y and then fit y using only an intercept term:
mod2 <- lm(I(y-1.5*x)~1)
plot(x, y)
abline(mod2$coefficients, 1.5)
This represents the best linear fit with fixed slope 1.5. Of course, this fit is not very visually appealing because the simulated slope is 1 while the fixed slope is 1.5.
I'm stuck at a very specific problem where I have to find a function describing the (normalized) leaf shape of a plant. The problem is not just to find the polynomial that best describes the data, but also that it starts at (0,0) ends at (1,0) and moves through the point of maximum width (x_ymax, 1) without ever going wider.
An alternate option I tried is Hermite interpolation, using those 3 specific points as control points but the function it provides is way off the actual shape of the leaf, unless I provide more control points.
Is there a specific function for this or do I need to make some manual conversion? Or would there be better or alternate options to tackling this problem?
Thanks in advance!
I'm not sure if this would always work, but here is an example of a "Generalized Additive Model" that uses a cyclic spline. When you specify that the model should not have an intercept (i.e. include -1 in formula, then it should pass through y=0. You will have to scale your predictor variable to be between 0 and 1 in order for the ends to pass through the points you mentioned (see here for more info.).
Example
# required model
library(mgcv)
# make data
n <- 200
tmp <- seq(0,20*pi,,n)
x <- tmp / (2*pi)
mon <- x%%1
err <- rnorm(n, sd=0.5)
y <- sin(tmp) + err + 1
plot(x, y, t="l")
df <- data.frame(x, y, mon)
# GAM with intercept
fit1 <- gam(y ~ s(mon, bs = "cc", k = 12), data=df)
summary(fit1)
plot(fit1)
# GAM without intercept
fit2 <- gam(y ~ s(mon, bs = "cc", k = 12) - 1, data=df) # note "-1" for no intercept
summary(fit2)
plot(fit2)
Graphically:
The red curve is the original curve, the result of the regression. The blue curve is the monotonic version of the red curve. The thing is that it is increasing instead of decreasing! How can I "turn" this blue curve to fit to the red one?
## data
x <- c(1.009648,1.017896,1.021773,1.043659,1.060277,1.074578,1.075495,1.097086,1.106268,1.110550,1.117795,1.143573,1.166305,1.177850,1.188795,1.198032,1.200526,1.223329,1.235814,1.239068,1.243189,1.260003,1.262732,1.266907,1.269932,1.284472,1.307483,1.323714,1.326705,1.328625,1.372419,1.398703,1.404474,1.414360,1.415909,1.418254,1.430865,1.431476,1.437642,1.438682,1.447056,1.456152,1.457934,1.457993,1.465968,1.478041,1.478076,1.485995,1.486357,1.490379,1.490719)
y <- c(0.5102649,0.0000000,0.6360097,0.0000000,0.8692671,0.0000000,1.0000000,0.0000000,0.4183691,0.8953987,0.3442624,0.0000000,0.7513169,0.0000000,0.0000000,0.0000000,0.0000000,0.1291901,0.4936121,0.7565551,1.0085108,0.0000000,0.0000000,0.1655482,0.0000000,0.1473168,0.0000000,0.0000000,0.0000000,0.1875293,0.4918018,0.0000000,0.0000000,0.8101771,0.6853480,0.0000000,0.0000000,0.0000000,0.0000000,0.4068802,1.1061434,0.0000000,0.0000000,0.0000000,0.0000000,0.0000000,0.0000000,0.0000000,0.0000000,0.0000000,0.6391678)
fit1 <- c(0.5102649100,0.5153380934,0.5177234836,0.5255544980,0.5307668662,0.5068087080,0.5071001179,0.4825657520,0.4832969250,0.4836378194,0.4842147729,0.5004039310,0.4987301366,0.4978800742,0.4978042478,0.4969807064,0.5086987191,0.4989497612,0.4936121200,0.4922210302,0.4904593166,0.4775197108,0.4757040857,0.4729265271,0.4709141776,0.4612406896,0.4459316517,0.4351338346,0.4331439717,0.4318664278,0.3235179189,0.2907908968,0.1665721429,0.1474035158,0.1443999345,0.1398517097,0.1153991839,0.1142140393,0.1022584672,0.1002410843,0.0840033244,0.0663669309,0.0629119398,0.0627979240,0.0473336492,0.0239237481,0.0238556876,0.0084990298,0.0077970954,0.0000000000,-0.0006598571)
fit2 <- c(-0.0006598571,0.0153328298,0.0228511733,0.0652889427,0.0975108758,0.1252414661,0.1270195143,0.1922510501,0.2965234797,0.3018551305,0.3108761043,0.3621749370,0.4184150225,0.4359301495,0.4432114081,0.4493565757,0.4510158144,0.4661865431,0.4744926045,0.4766574718,0.4796937554,0.4834718810,0.4836125426,0.4839450098,0.4841092849,0.4877317306,0.4930561638,0.4964939389,0.4970089201,0.4971376528,0.4990394601,0.5005881678,0.5023814257,0.5052125977,0.5056691690,0.5064254338,0.5115481820,0.5117259449,0.5146054557,0.5149729419,0.5184178197,0.5211542908,0.5216215426,0.5216426533,0.5239797875,0.5273573222,0.5273683002,0.5293994824,0.5295130266,0.5306236672,0.5307303109)
## picture
plot(x, y)
lines(x, fit1, col=2) # red curve
lines(x, fit2, col=4) # blue curve
lines(x, fit2[length(fit2):1])
This, of course, does not work due to the structure of X values.
Methodologically:
The object "fit2" is the output of the function rearrangement(). It is always monotonically increasing. So in the other words, I am not sure how to match values x to y.
library(Rearrangement)
fit2 <- rearrangement(x=as.data.frame(x), y=fit1)
The reason why we can't just reverse y is that the intervals along your x aren't constant.
Essentially what you need to do, then, is to reverse not only y, but also the vector of gap widths between successive pairs of x values. We can do the latter with:
rev(diff(x))
We then just need to get their cumulative sum and add the minimum x value, so that we have not the gap widths but the x values themselves:
min(x) + cumsum(c(0, rev(diff(x))))
These are then your new x values, which you can plot up:
lines(min(x) + cumsum(c(0, rev(diff(x)))), rev(fit2))
Edit Better way to deal with your problem:
Since your curve is monotonically decreasing, and rearrangement only returns monotonically increasing curves:
## rearrange the negative fit1
fit3 <- rearrangement(x=as.data.frame(x), y = - fit1)
## plot the negative rearranged fit3
plot(x, y)
lines(x, -fit3); points(x, -fit3, col=2)
lines(x, fit2); points(x, fit2, col=3)
So no fancy ´diff´ rearrangements for plotting is needed. The x values you get in fit3 are the same from your data and fit1.
Another approach under the assumption that you have an object fit that can be used with predict (if for example you used something like glm to do the regression):
## New x data, equidistant
newx <- data.frame(x = seq(1, 1.5, 0.01))
## Predict using the fitted model
pr <- predict(fit, type = "response", newdata = newx)
## Make the result monotonic
re <- rearrangement(x = newx, y = pr)
## Plot reversing the order of `newx`
lines(rev(newx$x), re)
Hope it helps,
alex