R language, nonlinear model formula predict - r

I fit an exponential formula with a set of data (x, y). then I want to calculate the y values from the formula with x values beyond the actual data set. It does't work, always prints the y values for the actual x values. Here is the code. What have I done wrong? What's the solution for my task with R language:
data <- data.frame(x=seq(1,69), y=othertable[1:69, 2])
nlsxypw <- nls(data$y ~ a*data$x^b, col2_60, start=list(a=2200000, b=0))
predict(nlsxypw)
#here I want to calculate the y values for x = 70-80
xnew <- seq(70, 80, 1)
predict(nlsxypw, xnew)
#it doesn't print these values, still the actual values for x=1~69.

This is kind of a strange feature with predict.nls (possibly other predict methods as well?), but you have to supply the new data with the same name that your model was defined in terms of:
set.seed(123)
Data <- data.frame(
x = 1:69,
y = ((1:69)**2)+rnorm(69,0,5))
nlsxypw <- nls(y ~ a*(x^b),
data=Data,
start=list(a=2.5, b=1))
##
xnew <- 70:80
## note how newdata is specified
y.pred <- predict(nlsxypw, newdata=list(x=xnew))
> y.pred
[1] 4900.355 5041.359 5184.364 5329.368 5476.373 5625.377 5776.381 5929.386 6084.390 6241.393 6400.397
##
with(
Data,
plot(x,y,pch=20,
xlim=c(0,90),
ylim=c(0,6700)))
lines(fitted(nlsxypw),col="red")
points(
x=xnew,
y=y.pred,
pch=20,
col="blue")
##

Related

How to extrapolate loess result? [duplicate]

I am struggling with "out-of-sample" prediction using loess. I get NA values for new x that are outside the original sample. Can I get these predictions?
x <- c(24,36,48,60,84,120,180)
y <- c(3.94,4.03,4.29,4.30,4.63,4.86,5.02)
lo <- loess(y~x)
x.all <- seq(3, 200, 3)
predict(object = lo, newdata = x.all)
I need to model full yield curve, i.e. interest rates for different maturities.
From the manual page of predict.loess:
When the fit was made using surface = "interpolate" (the default), predict.loess will not extrapolate – so points outside an axis-aligned hypercube enclosing the original data will have missing (NA) predictions and standard errors
If you change the surface parameter to "direct" you can extrapolate values.
For instance, this will work (on a side note: after plotting the prediction, my feeling is that you should increase the span parameter in the loess call a little bit):
lo <- loess(y~x, control=loess.control(surface="direct"))
predict(lo, newdata=x.all)
In addition to nico's answer: I would suggest to fit a gam (which uses penalized regression splines) instead. However, extrapolation is not advisable if you don't have a model based on science.
x <- c(24,36,48,60,84,120,180)
y <- c(3.94,4.03,4.29,4.30,4.63,4.86,5.02)
lo <- loess(y~x, control=loess.control(surface = "direct"))
plot(x.all <- seq(3,200,3),
predict(object = lo,newdata = x.all),
type="l", col="blue")
points(x, y)
library(mgcv)
fit <- gam(y ~ s(x, bs="cr", k=7, fx =FALSE), data = data.frame(x, y))
summary(fit)
lines(x.all, predict(fit, newdata = data.frame(x = x.all)), col="green")

How to predict gam model with random effect in R?

I am working on predicting gam model with random effect to produce 3D surface plot by plot_ly.
Here is my code;
x <- runif(100)
y <- runif(100)
z <- x^2 + y + rnorm(100)
r <- rep(1,times=100) # random effect
r[51:100] <- 2 # replace 1 into 2, making two groups
df <- data.frame(x, y, z, r)
gam_fit <- gam(z ~ s(x) + s(y) + s(r,bs="re"), data = df) # fit
#create matrix data for `add_surface` function in `plot_ly`
newx <- seq(0, 1, len=20)
newy <- seq(0, 1, len=30)
newxy <- expand.grid(x = newx, y = newy)
z <- matrix(predict(gam_fit, newdata = newxy), 20, 30) # predict data as matrix
However, the last line results in error;
Error in model.frame.default(ff, data = newdata, na.action = na.act) :
variable lengths differ (found for 'r')
In addition: Warning message:
In predict.gam(gam_fit, newdata = newxy) :
not all required variables have been supplied in newdata!
Thanks to the previous answer, I am sure that above codes work without random effect, as in here.
How can I predict gam models with random effect?
Assuming you want the surface conditional upon the random effects (but not for a specific level of the random effect), there are two ways.
The first is to provide a level for the random effect but exclude that term from the predicted values using the exclude argument to predict.gam(). The second is to again use exclude but this time to not provide any data for the random effect and instead stop predict.gam() from checking the newdata using the argument newdata.guaranteed = TRUE.
Option 1:
newxy1 <- with(df, expand.grid(x = newx, y = newy, r = 2))
z1 <- predict(gam_fit, newdata = newxy1, exclude = 's(r)')
z1 <- matrix(z1, 20, 30)
Option 2:
z2 <- predict(gam_fit, newdata = newxy, exclude = 's(r)',
newdata.guaranteed=TRUE)
z2 <- matrix(z2, 20, 30)
These produce the same result:
> all.equal(z1, z2)
[1] TRUE
A couple of notes:
Which you use will depend on how complex the rest of you model is. I would generally use the first option as it provides an extra check against me doing something stupid when creating the data. But in this instance, with a simple model and set of covariates it seems safe enough to trust that newdata is OK.
Your example uses a random slope (was that intended?), not a random intercept as r is not a factor. If your real example uses a factor random effect then you'll need to be a little more careful when creating the newdata as you need to get the levels of the factor right. For example:
expand.grid(x = newx, y = newy,
r = with(df, factor(2, levels = levels(r))))
should get the right set-up for a factor r

Associating 3 Attributes using linear regression or non linear regression in R

I am having 3 variables x, y, z. each contains an equal amount of data say 40 numbers. using linear regression
x <- 1:40
y <- 1:40/2
z <- 41:80
model <- lm(x~y)
i can associate x and y values can create a data model to predict x values.
a <- data.frame(x = 52)
res <- predict(model, a)
it will predict y value based on the association. now i can plot the prediction line using the following code
plot(x, y)
plotdata <- cbind(x, predict(model))
lines(plotdata[order(x),], col = "red") .
So my question is if I have three variables x,y,z. how to associate and predict.
lm(x~y~z)
is not working. plotting can be done by using
library(rgl)
plot3d(x,y,z)
model1 <- (x~y+z)
plotdata <- cbind(y,z, predict(model))
lines3d(plotdata[order(y,z),], col = "red")
Thanks in advance.

Why is leave-one-out cross-validation of GLM model (package=boot) failing when data contains NaN's?

This is a fairly simple procedure - refitting GLM model with subset of data (training set) and calculating the accuracy of the prediction on the remaining data. I am trying to run a "leave-one-out" strategy on a data set (i.e. training subset is length = n-1) using the cv.glm function of the package boot.
Am I doing something wrong, or is this really the case that the function doesn't seem to handle NA's? I'm guessing that this is fairly easy to program on my own, but I would appreciate any advise if there is some other mistake that I am making. Cheers.
Example:
require(boot)
#create data
n <- 100
x <- runif(n)
e <- rnorm(n, sd=100)
a <- 5
b <- 3
y <- exp(a + b*x) + e
plot(y ~ x)
plot(y ~ x, log="y")
#make some y's NaN
set.seed(1)
y[sample(n, 0.1*n)] <- NaN
#fit glm model
df <- data.frame(y=y, x=x)
glm.fit <- glm(y ~ x, data=df, family=gaussian(link="log"))
summary(glm.fit)
#calculate mean error of prediction (leave-one-out cross-validation)
cv.res <- cv.glm(df, glm.fit)
cv.res$delta
[1] NA NA
You're right. The function is not set up to handle NAs. The various options for the na.action argument of the glm() function don't really help, either. The easiest way to deal with it, is to remove the NAs from the data frame at the outset.
sub <- df[!is.na(df$y), ]
glm.fit <- glm(y ~ x, data=sub, family=gaussian(link="log"))
summary(glm.fit)
# calculate mean error of prediction (leave-one-out cross-validation)
cv.res <- cv.glm(sub, glm.fit)
cv.res$delta

How to generate random Y at specific X from a linear model in R?

Say we have a linear model f1 that was fit to some x and y data points:
f1 <- lm(y ~ x,data=d)
How can I generate new y values at new x values (that are different from the old x values but are within the range of the old x values) using this f1 fit in R?
stats:::simulate.lm allows you to sample from a linear model fitted with lm. (In contrast to the approach of #Bulat this uses unbiased estimates of the residual variance). To simulate at different values of the independent variable, you could hack around like this:
# simulate example data
x <- runif(20, 0, 100)
y <- 5*x + rnorm(20, 0, 10)
df <- data.frame(x, y)
# fit linear model
mod <- lm(y ~ x, data = df)
# new values of the independent variable
x_new <- 1:100
# replaces fitted values of the model object with predictions for new data,
mod$fitted.values <- predict(mod, data.frame(x=x_new)) # "hack"
# simulate samples appropriate noise and adds it the models `fitted.values`
y_new <- simulate(mod)[, 1] # simulate can return multiple samples (as columns), we only need one
# visualize original data ...
plot(df)
# ... alongside simulated data at new values of the independent variable (x)
points(x_new, y_new, col="red")
(original data in black, simulated in red)
I am looking at the same problem.
In simple terms it can be done by using sample from residuals:
mod <- lm(y ~ x, data = df)
x_new <- c(5) # value that you need to simulate for.
pred <- predict(mod, newdata=data.frame(x = x_new))
err <- sample(mod$residuals, 1)
y <- pred + err
There is a simulate(fit, nsim = 10, XX = x_new) function, that is supposed to do it for you.
You can use predict for this:
x <- runif(20, 0, 100)
y <- 5*x + rnorm(20, 0, 10)
df <- data.frame(x, y)
df
plot(df)
mod <- lm(y ~ x, data = df)
x_new <- 1:100
pred <- predict(mod, newdata=data.frame(x = x_new))
plot(df)
points(x_new, pred)

Resources