dividing data and fitting in R

dividing data and fitting in R - r

I've the following rainfall data for the year 1951:
dat.1951=c(122,122,122,122,122,122,122,122,122,122,122,121,121,121,121,120,119,119,117,117,117,115,115,115,114,112,112,111,110,109,106,105,104,103,102,99,97,95,91,89,88,86,84,83,83,82,82,79,77,77,76,74,74,72,72,71,70,69,66,65,64,61,61,58,56,56,54,53,51,49,48,47,46,46,46,45,42,40,39,38,37,36,36,35,34,33,33,32,30,30,29,28,28,27,25,25,23,22,21,20,20,20,20,20,19,19,18,18,18,16,16,15,15,15,15,15,14,14,14,14,14,14,14,14,14,14,14,13,13,12,12,11,11,11,11,11,11,11,11,11,11,11,11,11,11,10,10,10,9,8,8,8,8,8,8,8,8,8,8,8,8,7,7,6,6,6,6,5,5,5,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,3,3,3,3,3,2,2,2,2,2,2,2,2,2,2,2,1,1)
I want to fit this data. I break this data into 2 regions (head & tail). One where points are less than 100 (head) and the rest (for > 100) - tail. I could fit an exponential to the head portion (see the code below). For the tail, I want to fit a curve and want to plot both the portions in a single plot along with the data. Can anyone help?
dat.1951<-dat.1951[dat.1951 > 0]
dat.1951.tail<-dat.1951[dat.1951 >= 100]
dat.1951.head<-dat.1951[dat.1951 < 100]
x.head<-seq(1,length(dat.1951.head))
log.data<-log(dat.1951.head)
idf.head<-data.frame(x.head,dat.1951.head)
idf.head$dat.1951.head<-log(idf.head$dat.1951.head)
model=lm(idf.head$dat.1951.head ~ idf.head$x.head,data=idf.head)
summary(model)
plot(dat.1951.head)
lines(idf.head$x.head,exp(fitted(model)),col="blue")

I'm not sure why you want to (1) break the data into two regions, (2) eliminate records where there was no rainfall, and (3) fit the model you describe. You may want to consult with a statistician on these matters.
To answer your question, though, I came up with an example for a second model and show the fits from both models on the same plot.
x <- seq(dat.1951)
sel <- dat.1951 >= 100
model1 <- lm(dat.1951[sel] ~ poly(x[sel], 2))
model2 <- lm(log(dat.1951[!sel]) ~ x[!sel])
plot(dat.1951, cex=1.5)
lines(x[sel], fitted(model1), col="blue", lwd=3)
lines(x[!sel], exp(fitted(model2)), col="navy", lwd=3)
Just for grins, I added a third model that fits all of the data with a generalized additive model using the function gam() from the package mgcv.
library(mgcv)
model3 <- gam(dat.1951 ~ s(x))
lines(x, fitted(model3), col="orange", lwd=3, lty=2)

Related

Is there a nice way to plot two regression lines on one plot?

I need to plot two regresion lines y(x) and x(y) on one plot. Is there any better way than doing this (and if not is it the right way to do it?):
1. counting regression parameters y~x
2. counting regression parameters x~y
3. transforming parameters from 2)
4. plotting two lines with abline
ever graphical package is good for me

Using the builtin BOD as an example:
plot(demand ~ Time, BOD)
fm1 <- lm(demand ~ Time, BOD)
fm2 <- lm(Time ~ demand, BOD)
abline(fm1)
lines(demand ~ fitted(fm2), BOD, col = "red")

Plotting a 95% confidence interval for a lm object

How can I calculate and plot a confidence interval for my regression in r? So far I have two numerical vectors of equal length (x,y) and a regression object(lm.out). I have made a scatterplot of y given x and added the regression line to this plot. I am looking for a way to add a 95% prediction confidence band for lm.out to the plot. I've tried using the predict function, but I don't even know where to start with that :/. Here is my code at the moment:
x=c(1,2,3,4,5,6,7,8,9,0)
y=c(13,28,43,35,96,84,101,110,108,13)
lm.out <- lm(y ~ x)
plot(x,y)
regression.data = summary(lm.out) #save regression summary as variable
names(regression.data) #get names so we can index this data
a= regression.data$coefficients["(Intercept)","Estimate"] #grab values
b= regression.data$coefficients["x","Estimate"]
abline(a,b) #add the regression line
Thank you!
Edit: I've taken a look at the proposed duplicate and can't quite get to the bottom of it.

You have yo use predict for a new vector of data, here newx.
x=c(1,2,3,4,5,6,7,8,9,0)
y=c(13,28,43,35,96,84,101,110,108,13)
lm.out <- lm(y ~ x)
newx = seq(min(x),max(x),by = 0.05)
conf_interval <- predict(lm.out, newdata=data.frame(x=newx), interval="confidence",
level = 0.95)
plot(x, y, xlab="x", ylab="y", main="Regression")
abline(lm.out, col="lightblue")
lines(newx, conf_interval[,2], col="blue", lty=2)
lines(newx, conf_interval[,3], col="blue", lty=2)
EDIT
as it is mention in the coments by Ben this can be done with matlines as follow:
plot(x, y, xlab="x", ylab="y", main="Regression")
abline(lm.out, col="lightblue")
matlines(newx, conf_interval[,2:3], col = "blue", lty=2)

I'm going to add a tip that would have saved me a lot of frustration when trying the method given by #Alejandro Andrade: If your data are in a data frame, then when you build your model with lm(), use the data= argument rather than $ notation. E.g., use
lm.out <- lm(y ~ x, data = mydata)
rather than
lm.out <- lm(mydata$y ~ mydata$x)
If you do the latter, then this statement
predict(lm.out, newdata=data.frame(x=newx), interval="confidence", level = 0.95)
seems to either ignore the new values passed using newdata= or there's a silent error. Either way, the output is the predictions from the original data, not the new data.
Also, be sure your x variable gets the same name in the new data frame that it had
in the original. That's easier to figure out because you do get an error, but knowing it ahead of time might save you a round of debugging.
Note: Tried to add this as a comment, but don't have enough reputation points.

get equation of linear SVM regression line

I'm struggling to find a way to get the equation for a linear SVM model in the regression case, since most of the questions deal with classification... I have fit it with caret package.
1- univariate case
set.seed(1)
fit=train(mpg~hp, data=mtcars, method="svmLinear")
plot(x=mtcars$hp, y=predict(fit, mtcars), pch=15)
points(x=mtcars$hp, y=mtcars$mpg, col="red")
abline(lm(mpg~hp, mtcars), col="blue")
Which gives the plot with red=actual, black=fitted, and blue line is classic regression. In this case I know I could manually calculate the SVM prediction line from 2 points, but is there a way to get directly the equation from the model structure? I actually need the equation like this y=a+bx (here mpg=?+?*hp ) with values in the original scale.
2-multivariate
same question but with 2 dependent variables (mpg~hp+wt)
Thanks,

Yes, I believe there is. Take a look at this answer, which is similar, but does not use the caret library. If you add svp = fit$finalModelto the example, you should be able to follow it almost exactly. I applied a similar technique to your data below. I scaled the data to fit nicely on the plot of the vectors since the library scales the data at runtime.
require(caret)
set.seed(1)
x = model.matrix(data=mtcars, mpg ~ scale(hp)) #set up data
y = mtcars$mpg
fit=train(x, y, method="svmLinear") #train
svp = fit$finalModel #extract s4 model object
plot(x, xlab="", ylab="")
w <- colSums(coef(svp)[[1]] * x[unlist(alphaindex(svp)),])
b <- b(svp)
abline(b/w[1],-w[2]/w[1], col='red')
abline((b+1)/w[1],-w[2]/w[1],lty=2, col='red')
abline((b-1)/w[1],-w[2]/w[1],lty=2, col='red')
And your second question:
x = model.matrix(data=mtcars, mpg ~ scale(hp) + scale(wt) - 1) #set up data
fit=train(x, y, method="svmLinear") #train
svp = fit$finalModel #extract s4 model object
plot(x, xlab="", ylab="")
w <- colSums(coef(svp)[[1]] * x[unlist(alphaindex(svp)),])
b <- b(svp)
abline(b/w[1],-w[2]/w[1], col='red')
abline((b+1)/w[1],-w[2]/w[1],lty=2, col='red')
abline((b-1)/w[1],-w[2]/w[1],lty=2, col='red')
Edit
The above answer concerns plotting a boundary, not the linear SVM regression line. To answer the question, one easy way to get the line is to extract the predicted values and plot the regression. You actually only need a couple of points to get the line, but for simplicity, I used the following code.
abline(lm(predict(fit, newdata=mtcars) ~ mtcars$hp), col='green')
or
abline(lm(predict(fit) ~ mtcars$hp), col='green')

`gam` package: extra shift spotted when sketching data on `plot.gam`

I try to fit a GAM using the gam package (I know mgcv is more flexible, but I need to use gam here). I now have the problem that the model looks good, but in comparison with the original data it seems to be offset along the y-axis by a constant value, for which I cannot figure out where this comes from.
This code reproduces the problem:
library(gam)
data(gam.data)
x <- gam.data$x
y <- gam.data$y
fit <- gam(y ~ s(x,6))
fit$coefficients
#(Intercept) s(x, 6)
# 1.921819 -2.318771
plot(fit, ylim = range(y))
points(x, y)
points(x, y -1.921819, col=2)
legend("topright", pch=1, col=1:2, legend=c("Original", "Minus intercept"))
Chambers, J. M. and Hastie, T. J. (1993) Statistical Models in S (Chapman & Hall) shows that there should not be an offset, and this is also intuitively correct (the smooth should describe the data).
I noticed something comparable in mgcv, which can be solved by providing the shift parameter with the intercept value of the model (because the smooth is seemingly centred). I thought the same could be true here, so I subtracted the intercept from the original data-points. However, the plot above shows this idea wrong. I don't know where the extra shift comes from. I hope someone here may be able to help me.
(R version. 3.3.1; gam version 1.12)

I think I should first explain various output in the fitted GAM model:
library(gam)
data(gam.data)
x <- gam.data$x
y <- gam.data$y
fit <-gam(y ~ s(x,6), model = FALSE)
## coefficients for parametric part
## this includes intercept and null space of spline
beta <- coef(fit)
## null space of spline smooth (a linear term, just `x`)
nullspace <- fit$smooth.frame[,1]
nullspace - x ## all 0
## smooth space that are penalized
## note, the backfitting procedure guarantees that this is centred
pensmooth <- fit$smooth[,1]
sum(pensmooth) ## centred
# [1] 5.89806e-17
## estimated smooth function (null space + penalized space)
smooth <- nullspace * beta[2] + pensmooth
## centred smooth function (this is what `plot.gam` is going to plot)
c0 <- mean(smooth)
censmooth <- smooth - c0
## additive predictors (this is just fitted values in Gaussian case)
addpred <- beta[1] + smooth
You can first verify that addpred is what fit$additive.predictors gives, and since we are fitting additive models with Gaussian response, this is also as same as fit$fitted.values.
What plot.gam does, is to plot censmooth:
plot.gam(fit, col = 4, ylim = c(-1.5,1.5))
points(x, censmooth, col = "gray")
Remember, there is
addpred = beta[0] + censmooth + c0
If you want to shift original data y to match this plot, you not only need to subtract intercept (beta[0]), but also c0 from y:
points(x, y - beta[1] - c0)

Predict Future values using polynomial regression in R

Was trying to predict the future value of a sample using polynomial regression in R. The y values within the sample forms a wave pattern.
For example
x = 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16
y= 1,2,3,4,5,4,3,2,1,0,1,2,3,4,5,4
But when the graph is plotted for future values the resultant y values was completely different from what was expected. Instead of a wave pattern, was getting a graph where the y values keep increasing.
futurY = 17,18,19,20,21,22
Tried different degrees of polynomial regression, but the predicted results for futurY were drastically different from what was expected
Following is the sample R code which was used to get the results
dfram <- data.frame('x'=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16))
dfram$y <- c(1,2,3,4,5,4,3,2,1,0,1,2,3,4,5,4)
plot(dfram,dfram$y,type="l", lwd=3)
pred <- data.frame('x'=c(17,18,19,20,21,22))
myFit <- lm(y ~ poly(x,5), data=dfram)
newdata <- predict(myFit, pred)
print(newdata)
plot(pred[,1],data.frame(newdata)[,1],type="l",col="red", lwd=3)
Is this the correct technique to be used for predicting the unknown future y values OR should I be using other techniques like forecasting?

# Reproducing your data frame
dfram <- data.frame("x" = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16),
"y" = c(1,2,3,4,5,4,3,2,1,0,1,2,3,4,5,4))
From your graph I've got the phase and period of the signal. There're better ways of calculating that automatically.
# Phase and period
fase = 1
per = 10
In the linear model function I've put the triangular signal equations.
fit <- lm(y ~ I((((trunc((x-fase)/(per/2))%%2)*2)-1) * (x-fase)%%(per/2))
+ I((((trunc((x-fase)/(per/2))%%2)*2)-1) * ((per/2)-((x-fase)%%(per/2))))
,data=dfram)
# Predict the old data
p_olddata <- predict(fit,type="response")
# Predict the new data
newdata <- data.frame('x'=c(17,18,19,20,21,22))
p_newdata <- predict(fit,newdata,type="response")
# Ploting Old and new data
plot(x=c(dfram$x,newdata$x),
y=c(p_olddata,p_newdata),
col=c(rep("blue",length(p_olddata)),rep("green",length(p_olddata))),
xlab="x",
ylab="y")
lines(dfram)
Where the black line is the original signal, the blue circles are the prediction for the original points and the green circles are the prediction for the new data.
The graph shows a perfect fit for the model because there's no noise in the data. In a real dataset you may find it so the fit will not look as nice as that.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

dividing data and fitting in R - r

Related

Is there a nice way to plot two regression lines on one plot?

Plotting a 95% confidence interval for a lm object

get equation of linear SVM regression line

`gam` package: extra shift spotted when sketching data on `plot.gam`

Predict Future values using polynomial regression in R

Categories

Resources