Issue with plotting logistic regression in R [duplicate] - r

I am building a quadratic model with lm in R:
y <- data[[1]]
x <- data[[2]]
x2 <- x^2
quadratic.model = lm(y ~ x + x2)
Now I want to display both the predicted values and the actual values on a plot. I tried this:
par(las=1,bty="l")
plot(y~x)
P <- predict(quadratic.model)
lines(x, P)
but the line comes up all squiggely. Maybe it has to do with the fact that it's quadratic? Thanks for any help.

You need order():
P <- predict(quadratic.model)
plot(y~x)
reorder <- order(x)
lines(x[reorder], P[reorder])
My answer here is related: Problems displaying LOESS regression line and confidence interval

Related

Confidence interval for linear regression

I need to create confidence interval for linear regression using R-lang. I followed a few tutorials, yet my result is quite different. As far as I am concerned, I should get two lines, one above and one below the main line, as shown here.
Unfortunately what I got is a few stacked lines, as shown here.
Could anyone help me to understand what am I doing wrong?
Here's sample of my code:
speed <- c(61,225,110,51,114,68,24,24,133,83,83,92,93,37,111,172,142,105,143,77,154,108,98,164,124,97,90,87,137,71,73,74,62,88,100,101,126,113,49)
length <- c(58,149,90,55,91,69,31,35,109,77,78,82,86,44,89,121,106,98,116,65,111,88,86,122,104,85,72,80,105,74,71,66,73,72,72,90,91,98,59);
cars <- data.frame(speed, length)
modelReg <- lm(length ~ speed, data = cars)
x <- cars$speed
conf_interval <- predict(modelReg, newdata = data.frame(seq(from=min(x),to=max(x),by = 1)),interval = 'confidence')
lines(x,conf_interval[,2],lty=2)
lines(x,conf_interval[,3],lty=2)
After the first four lines of your code above, use Gosink's plot.add.ci function:
# John Gosink's Intervals Plotter (from http://gosink.org/?page_id=120)
plot.add.ci <- function(x, y, interval='prediction', level=0.9,
regressionColor='red', ...) {
xOrder <- order(x)
x <- x[xOrder]
y <- y[xOrder]
fit <- lm(y ~ x, data=data.frame(x=x, y=y))
newX <- data.frame(x=jitter(x))
fitPred <- predict.lm(fit,newdata=newX,interval=interval,level=level, ...)
abline(lm(y ~ x), col=regressionColor)
lines(newX$x, fitPred[,2], lty=2, ...)
lines(newX$x, fitPred[,3], lty=2, ...)
}
plot(cars$speed,cars$length)
abline(modelReg,col="red")
plot.add.ci(speed, length, level=0.95, interval="confidence", lwd=3)
Which gives this plot (change level if you want a different confidence level, or drop interval= for a prediction interval):

Calculating coefficients of bivariate linear regression

Question to be answered
Does anyone know how to solve the attached problem in two lines of code? I believe an as.matrix would work to create a matrix, X, and then use X %*% X, t(X), and solve(X) to get the answer. However, it does not seem to be working. Any answers will help, thanks.
I would recommend using read.csv instead of read.table
It would be useful for you to go over the difference of the two functions in this thread: read.csv vs. read.table
df <- read.csv("http://pengstats.macssa.com/download/rcc/lmdata.csv")
model1 <- lm(y ~ x1 + x2, data = df)
coefficients(model1) # get the coefficients of your regression model1
summary(model1) # get the summary of model1
Based on the answer of #kon_u, here is an example to do it by hands:
df <- read.csv("http://pengstats.macssa.com/download/rcc/lmdata.csv")
model1 <- lm(y ~ x1 + x2, data = df)
coefficients(model1) # get the coefficients of your regression model1
summary(model1) # get the summary of model1
### Based on the formula
X <- cbind(1, df$x1, df$x2) # the column of 1 is to consider the intercept
Y <- df$y
bhat <- solve(t(X) %*% X) %*% t(X) %*% Y # coefficients
bhat # Note that we got the same coefficients with the lm function

Non-Linear Modeling with nls in R [duplicate]

I am building a quadratic model with lm in R:
y <- data[[1]]
x <- data[[2]]
x2 <- x^2
quadratic.model = lm(y ~ x + x2)
Now I want to display both the predicted values and the actual values on a plot. I tried this:
par(las=1,bty="l")
plot(y~x)
P <- predict(quadratic.model)
lines(x, P)
but the line comes up all squiggely. Maybe it has to do with the fact that it's quadratic? Thanks for any help.
You need order():
P <- predict(quadratic.model)
plot(y~x)
reorder <- order(x)
lines(x[reorder], P[reorder])
My answer here is related: Problems displaying LOESS regression line and confidence interval

Messy plot when plotting predictions of a polynomial regression using lm() in R

I am building a quadratic model with lm in R:
y <- data[[1]]
x <- data[[2]]
x2 <- x^2
quadratic.model = lm(y ~ x + x2)
Now I want to display both the predicted values and the actual values on a plot. I tried this:
par(las=1,bty="l")
plot(y~x)
P <- predict(quadratic.model)
lines(x, P)
but the line comes up all squiggely. Maybe it has to do with the fact that it's quadratic? Thanks for any help.
You need order():
P <- predict(quadratic.model)
plot(y~x)
reorder <- order(x)
lines(x[reorder], P[reorder])
My answer here is related: Problems displaying LOESS regression line and confidence interval

Fitting polynomial model to data in R

I've read the answers to this question and they are quite helpful, but I need help.
I have an example data set in R as follows:
x <- c(32,64,96,118,126,144,152.5,158)
y <- c(99.5,104.8,108.5,100,86,64,35.3,15)
I want to fit a model to these data so that y = f(x). I want it to be a 3rd order polynomial model.
How can I do that in R?
Additionally, can R help me to find the best fitting model?
To get a third order polynomial in x (x^3), you can do
lm(y ~ x + I(x^2) + I(x^3))
or
lm(y ~ poly(x, 3, raw=TRUE))
You could fit a 10th order polynomial and get a near-perfect fit, but should you?
EDIT:
poly(x, 3) is probably a better choice (see #hadley below).
Which model is the "best fitting model" depends on what you mean by "best". R has tools to help, but you need to provide the definition for "best" to choose between them. Consider the following example data and code:
x <- 1:10
y <- x + c(-0.5,0.5)
plot(x,y, xlim=c(0,11), ylim=c(-1,12))
fit1 <- lm( y~offset(x) -1 )
fit2 <- lm( y~x )
fit3 <- lm( y~poly(x,3) )
fit4 <- lm( y~poly(x,9) )
library(splines)
fit5 <- lm( y~ns(x, 3) )
fit6 <- lm( y~ns(x, 9) )
fit7 <- lm( y ~ x + cos(x*pi) )
xx <- seq(0,11, length.out=250)
lines(xx, predict(fit1, data.frame(x=xx)), col='blue')
lines(xx, predict(fit2, data.frame(x=xx)), col='green')
lines(xx, predict(fit3, data.frame(x=xx)), col='red')
lines(xx, predict(fit4, data.frame(x=xx)), col='purple')
lines(xx, predict(fit5, data.frame(x=xx)), col='orange')
lines(xx, predict(fit6, data.frame(x=xx)), col='grey')
lines(xx, predict(fit7, data.frame(x=xx)), col='black')
Which of those models is the best? arguments could be made for any of them (but I for one would not want to use the purple one for interpolation).
Regarding the question 'can R help me find the best fitting model', there is probably a function to do this, assuming you can state the set of models to test, but this would be a good first approach for the set of n-1 degree polynomials:
polyfit <- function(i) x <- AIC(lm(y~poly(x,i)))
as.integer(optimize(polyfit,interval = c(1,length(x)-1))$minimum)
Notes
The validity of this approach will depend on your objectives, the assumptions of optimize() and AIC() and if AIC is the criterion that you want to use,
polyfit() may not have a single minimum. check this with something like:
for (i in 2:length(x)-1) print(polyfit(i))
I used the as.integer() function because it is not clear to me how I would interpret a non-integer polynomial.
for testing an arbitrary set of mathematical equations, consider the 'Eureqa' program reviewed by Andrew Gelman here
Update
Also see the stepAIC function (in the MASS package) to automate model selection.
The easiest way to find the best fit in R is to code the model as:
lm.1 <- lm(y ~ x + I(x^2) + I(x^3) + I(x^4) + ...)
After using step down AIC regression
lm.s <- step(lm.1)
For example, if we want to fit a polynomial of degree 2, we can directly do it by solving a system of linear equations in the following way:
The following example shows how to fit a parabola y = ax^2 + bx + c using the above equations and compares it with lm() polynomial regression solution. Hope this will help in someone's understanding,
x <- c(32,64,96,118,126,144,152.5,158)
y <- c(99.5,104.8,108.5,100,86,64,35.3,15)
x4 <- sum(x^4)
x3 <- sum(x^3)
x2 <- sum(x^2)
x1 <- sum(x)
yx1 <- sum(y*x)
yx2 <- sum(y*x^2)
y1 <- sum(y)
A <- matrix(c(x4, x3, x2,
x3, x2, x1,
x2, x1, length(x)), nrow=3, byrow=TRUE)
B <- c(yx2,
yx1,
y1)
coef <- solve(A, B) # solve the linear system of equations, assuming A is not singular
coef1 <- lm(y ~ x + I(x^2))$coef # solution with lm
coef
# [1] -0.01345808 2.01570523 42.51491582
rev(coef1)
# I(x^2) x (Intercept)
# -0.01345808 2.01570523 42.51491582
plot(x, y, xlim=c(min(x), max(x)), ylim=c(min(y), max(y)+10), pch=19)
xx <- seq(min(x), max(x), 0.01)
lines(xx, coef[1]*xx^2+coef[2]*xx+coef[3], col='red', lwd=3, lty=5)
lines(xx, coef1[3]*xx^2+ coef1[2]*xx+ coef1[1], col='blue')
legend('topright', legend=c("solve", "lm"),
col=c("red", "blue"), lty=c(5,1), lwd=c(3,1), cex=0.8,
title="quadratic fit", text.font=4)

Resources