Example data:
x <- 1:10
y <- x + c(-0.5,0.5)
a <- rnorm(10)
b <- rnorm(10)
c <- rnorm(10)
data <- data.frame(x,y,a, b,c)
Here I wrote a function to calc different models at the same time on a single ind variable (z) as bellow.
func <-function(z){
fit1 <- lm( y~ x + z )
fit2 <- lm( y~x + I(z^2))
fit3 <- lm( y~poly(x,3) + z)
library(splines)
fit4 <- lm( y~ns(x, 3) + z)
fit5 <- lm( y~ns(x, 9) + z)
return(list(fit1, fit2, fit3, fit4, fit5))
}
mod1 <- func(data$a)
mod2 <- func(data$b)
mod1 and mod2 contain a list of 5 models each. Now first I want to select 2nd and 4th models of mod1 and 3rd, 4th, 5th models of mod2 and then plot these selected predicted models over observed value of a and b respectively. Second want to check residual plot of them.
sel1 <- mod1[c(2,4)]
sel2 <- mod2[c(3,4,5)]
I am trying to use predict function but gives me an error "no applicable method for 'predict' applied to an object of class "list".
I my question is how to plot predicted lines over observed values and check residual plots for the selected models from a list.
pred <- predict(sel1)
plot(resid(sel1))
Thanks in advance!
I am building a quadratic model with lm in R:
y <- data[[1]]
x <- data[[2]]
x2 <- x^2
quadratic.model = lm(y ~ x + x2)
Now I want to display both the predicted values and the actual values on a plot. I tried this:
par(las=1,bty="l")
plot(y~x)
P <- predict(quadratic.model)
lines(x, P)
but the line comes up all squiggely. Maybe it has to do with the fact that it's quadratic? Thanks for any help.
You need order():
P <- predict(quadratic.model)
plot(y~x)
reorder <- order(x)
lines(x[reorder], P[reorder])
My answer here is related: Problems displaying LOESS regression line and confidence interval
I fit an exponential formula with a set of data (x, y). then I want to calculate the y values from the formula with x values beyond the actual data set. It does't work, always prints the y values for the actual x values. Here is the code. What have I done wrong? What's the solution for my task with R language:
data <- data.frame(x=seq(1,69), y=othertable[1:69, 2])
nlsxypw <- nls(data$y ~ a*data$x^b, col2_60, start=list(a=2200000, b=0))
predict(nlsxypw)
#here I want to calculate the y values for x = 70-80
xnew <- seq(70, 80, 1)
predict(nlsxypw, xnew)
#it doesn't print these values, still the actual values for x=1~69.
This is kind of a strange feature with predict.nls (possibly other predict methods as well?), but you have to supply the new data with the same name that your model was defined in terms of:
set.seed(123)
Data <- data.frame(
x = 1:69,
y = ((1:69)**2)+rnorm(69,0,5))
nlsxypw <- nls(y ~ a*(x^b),
data=Data,
start=list(a=2.5, b=1))
##
xnew <- 70:80
## note how newdata is specified
y.pred <- predict(nlsxypw, newdata=list(x=xnew))
> y.pred
[1] 4900.355 5041.359 5184.364 5329.368 5476.373 5625.377 5776.381 5929.386 6084.390 6241.393 6400.397
##
with(
Data,
plot(x,y,pch=20,
xlim=c(0,90),
ylim=c(0,6700)))
lines(fitted(nlsxypw),col="red")
points(
x=xnew,
y=y.pred,
pch=20,
col="blue")
##
Say we have a linear model f1 that was fit to some x and y data points:
f1 <- lm(y ~ x,data=d)
How can I generate new y values at new x values (that are different from the old x values but are within the range of the old x values) using this f1 fit in R?
stats:::simulate.lm allows you to sample from a linear model fitted with lm. (In contrast to the approach of #Bulat this uses unbiased estimates of the residual variance). To simulate at different values of the independent variable, you could hack around like this:
# simulate example data
x <- runif(20, 0, 100)
y <- 5*x + rnorm(20, 0, 10)
df <- data.frame(x, y)
# fit linear model
mod <- lm(y ~ x, data = df)
# new values of the independent variable
x_new <- 1:100
# replaces fitted values of the model object with predictions for new data,
mod$fitted.values <- predict(mod, data.frame(x=x_new)) # "hack"
# simulate samples appropriate noise and adds it the models `fitted.values`
y_new <- simulate(mod)[, 1] # simulate can return multiple samples (as columns), we only need one
# visualize original data ...
plot(df)
# ... alongside simulated data at new values of the independent variable (x)
points(x_new, y_new, col="red")
(original data in black, simulated in red)
I am looking at the same problem.
In simple terms it can be done by using sample from residuals:
mod <- lm(y ~ x, data = df)
x_new <- c(5) # value that you need to simulate for.
pred <- predict(mod, newdata=data.frame(x = x_new))
err <- sample(mod$residuals, 1)
y <- pred + err
There is a simulate(fit, nsim = 10, XX = x_new) function, that is supposed to do it for you.
You can use predict for this:
x <- runif(20, 0, 100)
y <- 5*x + rnorm(20, 0, 10)
df <- data.frame(x, y)
df
plot(df)
mod <- lm(y ~ x, data = df)
x_new <- 1:100
pred <- predict(mod, newdata=data.frame(x = x_new))
plot(df)
points(x_new, pred)
I've read the answers to this question and they are quite helpful, but I need help.
I have an example data set in R as follows:
x <- c(32,64,96,118,126,144,152.5,158)
y <- c(99.5,104.8,108.5,100,86,64,35.3,15)
I want to fit a model to these data so that y = f(x). I want it to be a 3rd order polynomial model.
How can I do that in R?
Additionally, can R help me to find the best fitting model?
To get a third order polynomial in x (x^3), you can do
lm(y ~ x + I(x^2) + I(x^3))
or
lm(y ~ poly(x, 3, raw=TRUE))
You could fit a 10th order polynomial and get a near-perfect fit, but should you?
EDIT:
poly(x, 3) is probably a better choice (see #hadley below).
Which model is the "best fitting model" depends on what you mean by "best". R has tools to help, but you need to provide the definition for "best" to choose between them. Consider the following example data and code:
x <- 1:10
y <- x + c(-0.5,0.5)
plot(x,y, xlim=c(0,11), ylim=c(-1,12))
fit1 <- lm( y~offset(x) -1 )
fit2 <- lm( y~x )
fit3 <- lm( y~poly(x,3) )
fit4 <- lm( y~poly(x,9) )
library(splines)
fit5 <- lm( y~ns(x, 3) )
fit6 <- lm( y~ns(x, 9) )
fit7 <- lm( y ~ x + cos(x*pi) )
xx <- seq(0,11, length.out=250)
lines(xx, predict(fit1, data.frame(x=xx)), col='blue')
lines(xx, predict(fit2, data.frame(x=xx)), col='green')
lines(xx, predict(fit3, data.frame(x=xx)), col='red')
lines(xx, predict(fit4, data.frame(x=xx)), col='purple')
lines(xx, predict(fit5, data.frame(x=xx)), col='orange')
lines(xx, predict(fit6, data.frame(x=xx)), col='grey')
lines(xx, predict(fit7, data.frame(x=xx)), col='black')
Which of those models is the best? arguments could be made for any of them (but I for one would not want to use the purple one for interpolation).
Regarding the question 'can R help me find the best fitting model', there is probably a function to do this, assuming you can state the set of models to test, but this would be a good first approach for the set of n-1 degree polynomials:
polyfit <- function(i) x <- AIC(lm(y~poly(x,i)))
as.integer(optimize(polyfit,interval = c(1,length(x)-1))$minimum)
Notes
The validity of this approach will depend on your objectives, the assumptions of optimize() and AIC() and if AIC is the criterion that you want to use,
polyfit() may not have a single minimum. check this with something like:
for (i in 2:length(x)-1) print(polyfit(i))
I used the as.integer() function because it is not clear to me how I would interpret a non-integer polynomial.
for testing an arbitrary set of mathematical equations, consider the 'Eureqa' program reviewed by Andrew Gelman here
Update
Also see the stepAIC function (in the MASS package) to automate model selection.
The easiest way to find the best fit in R is to code the model as:
lm.1 <- lm(y ~ x + I(x^2) + I(x^3) + I(x^4) + ...)
After using step down AIC regression
lm.s <- step(lm.1)
For example, if we want to fit a polynomial of degree 2, we can directly do it by solving a system of linear equations in the following way:
The following example shows how to fit a parabola y = ax^2 + bx + c using the above equations and compares it with lm() polynomial regression solution. Hope this will help in someone's understanding,
x <- c(32,64,96,118,126,144,152.5,158)
y <- c(99.5,104.8,108.5,100,86,64,35.3,15)
x4 <- sum(x^4)
x3 <- sum(x^3)
x2 <- sum(x^2)
x1 <- sum(x)
yx1 <- sum(y*x)
yx2 <- sum(y*x^2)
y1 <- sum(y)
A <- matrix(c(x4, x3, x2,
x3, x2, x1,
x2, x1, length(x)), nrow=3, byrow=TRUE)
B <- c(yx2,
yx1,
y1)
coef <- solve(A, B) # solve the linear system of equations, assuming A is not singular
coef1 <- lm(y ~ x + I(x^2))$coef # solution with lm
coef
# [1] -0.01345808 2.01570523 42.51491582
rev(coef1)
# I(x^2) x (Intercept)
# -0.01345808 2.01570523 42.51491582
plot(x, y, xlim=c(min(x), max(x)), ylim=c(min(y), max(y)+10), pch=19)
xx <- seq(min(x), max(x), 0.01)
lines(xx, coef[1]*xx^2+coef[2]*xx+coef[3], col='red', lwd=3, lty=5)
lines(xx, coef1[3]*xx^2+ coef1[2]*xx+ coef1[1], col='blue')
legend('topright', legend=c("solve", "lm"),
col=c("red", "blue"), lty=c(5,1), lwd=c(3,1), cex=0.8,
title="quadratic fit", text.font=4)