What exactly happened when predict() is used in lines()?

What exactly happened when predict() is used in lines()? - r

When I am doing linear modeling, I can just use predict() within lines() and get nice plotting. For example
Year <- 1:15
Sales <- c(301,320,372,423,500,608,721,826,978,1135,1315,1530,1800,2152,2491)
YearSales <- data.frame(Year,Sales)
logYearSales.fit <- lm(log(Sales)~Year)
plot(Year,log(Sales))
lines(Year,predict(logYearSales.fit),col="red",lwd=2)
However, when I combine nls() and lines(), some odds will happen, like this:
library(MASS)
survey1<-survey[!is.na(survey$Pulse)&!is.na(survey$Height),c("Pulse","Height")]
expn <- function(b0,b1,x){
model.func <- b0 + b1*log(x)
Z <- cbind(1,log(x))
dimnames(Z) <- list(NULL, c("b0","b1"))
attr(model.func,"gradient") <- Z
model.func
}
survey1<-as.data.frame(survey1)
aa=nls(Height~expn(b0,b1,Pulse), data=survey1,start=c(b0=180,b1=2),trace=TRUE)
plot(survey1)
lines(survey1[,1],predict(aa),col="red",lwd=2)
You can see that the red curve is so thick and as if contaning many lines. But I just cannot understand this.

The problem is that the Pulse values are not in order. Try
survey1 <- survey1[order(survey1$Pulse), ]
and repeat.

Related

regression of two vectors by kalman filter example

Here is an example of how I do a roll linear regression of two vectors
x <- rnorm(100)
y <- rnorm(100)
res <- rep(NA,length(x))
for(i in 5:length(x)){
ii <- (i-4):i
LR <- lm(y[ii]~x[ii])$residuals
res[i] <- tail(LR,1)}
I would like to see an example of how this can be done with an adaptive kalman filter
I'm interested in this article but the code examples are too incomprehensible for me, I would like to get a simpler and more understandable use case

R: How to plot custom range of polynomial produced by lm poly fit

I'm confused by the coefficients produced by the output of lm
Here's a copy of the data I'm working with
(postprocessed.csv)
"","time","value"
"1",1,2.61066016308988
"2",2,3.41246054742996
"3",3,3.8608767964033
"4",4,4.28686048552237
"5",5,4.4923132964825
"6",6,4.50557049744317
"7",7,4.50944447661246
"8",8,4.51097373134893
"9",9,4.48788748823809
"10",10,4.34603985656981
"11",11,4.28677073671406
"12",12,4.20065901625172
"13",13,4.02514194962519
"14",14,3.91360194972916
"15",15,3.85865748409081
"16",16,3.81318053258601
"17",17,3.70380706527433
"18",18,3.61552922363713
"19",19,3.61405310598722
"20",20,3.64591327503384
"21",21,3.70234435835577
"22",22,3.73503970503372
"23",23,3.81003078640584
"24",24,3.88201196162666
"25",25,3.89872518158949
"26",26,3.97432743542362
"27",27,4.2523675144599
"28",28,4.34654855854847
"29",29,4.49276038902684
"30",30,4.67830892029687
"31",31,4.91896819673664
"32",32,5.04350767355202
"33",33,5.09073406942046
"34",34,5.18510849382162
"35",35,5.18353176529036
"36",36,5.2210776270173
"37",37,5.22643491929207
"38",38,5.11137006553725
"39",39,5.01052467981257
"40",40,5.0361056705898
"41",41,5.18149486951409
"42",42,5.36334869132276
"43",43,5.43053620818444
"44",44,5.60001072279525
I have fitted a 4th order polynomial to this data using the following script:
library(ggplot2)
library(matrixStats)
library(forecast)
df_input <- read.csv("postprocessed.csv")
x <- df_input$time
y <- df_input$value
df <- data.frame(x, y)
poly4model <- lm(y~poly(x, degree=4), data=df)
v <- seq(30, 40)
vv <- poly4model$coefficients[1] +
poly4model$coefficients[2] * v +
poly4model$coefficients[3] * (v ^ 2) +
poly4model$coefficients[4] * (v ^ 3) +
poly4model$coefficients[5] * (v ^ 4)
pdf("postprocessed.pdf")
plot(df)
lines(v, vv, col="red", pch=20, lw=3)
dev.off()
I initially tried using the predict function to do this, but couldn't get that to work, so resorted to implementing this "workaround" using some new vectors v and vv to store the data for the line in the region I am trying to plot.
Ultimatly, I am trying to do this:
Fit a 4th order polynomial to the data
Plot the 4th order polynomial over the range of data in one color
Plot the 4th order polynomial over the range from the last value to the last value + 10 (prediction) in a different color
At the moment I am fairly sure using v and vv to do this is not "the best way", however I would have thought it should work. What is happening is that I get very large values.
Here is a screenshot from Desmos. I copied and pasted the same coefficients as shown by typing poly4model$coefficients into the console. However, something must have gone wrong because this function is nothing like the data.
I think I've provided enough info to be able to run this short script. However I will add the pdf as well.

It is easiest to use the predict function to create your line. To do that, you pass the model and a data frame with the desired independent variables to the predict function.
x <- df_input$time
y <- df_input$value
df <- data.frame(x, y)
poly4model <- lm(y~poly(x, degree=4), data=df)
v <- seq(30, 40)
#Notice the column in the dataframe is the same variable name
# as the variable in the model!
predict(poly4model, data.frame(x=v))
plot(df)
lines(v, predict(poly4model, data.frame(x=seq(30, 40))), col="red", pch=20, lw=3)
NOTE
The function poly "Returns or evaluates orthogonal polynomials of degree 1 to degree over the specified set of points x: these are all orthogonal to the constant polynomial of degree 0." To return the "normal" polynomial coefficients one needs to use the "raw=TRUE" option in the function.
poly4model <- lm(y~poly(x, degree=4, raw=TRUE), data=df)
Now your equation above will work.

Remove linear trend from raster stack R

Trying remove the linear trend (detrend) from a monthly precipitation raster stack for the US from 1979-2015 (https://www.northwestknowledge.net/metdata/data/monthly/pr_gridMET.nc). These data are large enough that using those data as an example would be a bit unruly here so I am going to use the data from the raster package for sake of efficiency. The working model I have currently is to use `raster"::calc`` on a linear model and pull the residuals. My understanding is that those residuals are the detrended series, but I am not 100% sure that is correct. The code I am using is as follows:
library(raster)
fn <- raster(system.file("external/test.grd", package="raster"))
fn2 <- fn+1000
fn3 <- fn +500
fn4 <- fn +750
fn5 <- fn+100
fns <- stack(fn, fn2, fn3, fn4, fn5)
time <- 1:nlayers(fns)
# Get residuals to detrend the raw data
get_residuals <- function(x) {
if (is.na(x[1])){
rep(NA, length(x)) }
else {
m <- lm(x~time)
q <- residuals(m)
return(q)
}
}
detrended_fns <- calc(fns, get_residuals) # Create our residual (detrended) time series stack
I feel like I'm missing something here. Can anyone confirm that I'm on the right track here? If I'm not any suggestions on how to properly detrend these data would be helpful! thanks!

The residuals remove the slope and the intercept and you get anomalies. Perhaps you only want to remove the slope? In that case you could add the intercept to the residuals in get_residuals
q <- residuals(m) + coefficients(m)[1]
Or better:
q <- residuals(m) + predict(m)[1]
So that you use year 1 (and not year 0) as the base, and it would also work if time is, say, 2000:2004
You could also take the last year, mid year, or average as base.

Linear Regression in R for Date and some dependant output

Actually I need to calculate the parameters theta0 and theta1 using linear regression.
My data frame (data.1) consists of two columns, first one is a date-time and the second one is a result which is dependent on this date.
Like this:
data.1[[1]] data.1[[2]]
2004-07-08 14:30:00 12.41
Now, I have this code for which iterates over a number of times to calculate the parameter theta0, theta1
x=as.vector(data.1[[1]])
y=as.vector(data.1[[2]])
plot(x,y)
theta0=10
theta1=10
alpha=0.0001
initialJ=100000
learningIterations=200000
J=function(x,y,theta0,theta1){
m=length(x)
sum=0
for(i in 1:m){
sum=sum+((theta0+theta1*x[i]-y[i])^2)
}
sum=sum/(2*m)
return(sum)
}
updateTheta=function(x,y,theta0,theta1){
sum0=0
sum1=0
m=length(x)
for(i in 1:m){
sum0=sum0+(theta0+theta1*x[i]-y[i])
sum1=sum1+((theta0+theta1*x[i]-y[i])*x[i])
}
sum0=sum0/m
sum1=sum1/m
theta0=theta0-(alpha*sum0)
theta1=theta1-(alpha*sum1)
return(c(theta0,theta1))
}    
for(i in 1:learningIterations){
thetas=updateTheta(x,y,theta0,theta1)
tempSoln=0
tempSoln=J(x,y,theta0,theta1)
if(tempSoln<initialJ){
initialJ=tempSoln
}
if(tempSoln>initialJ){
break
}
theta0=thetas[1]
theta1=thetas[2]
#print(thetas)
#print(initialJ)
plot(x,y)
lines(x,(theta0+theta1*x), col="red")
}
lines(x,(theta0+theta1*x), col="green")
Now I want to calculate theta0 and theta1 using the following scenarios:
y=data.1[[2]] and x=dates which are similar irrespective of the year
y=data.1[[2]] and x=months which are similar irrespective of the year
Please suggest..

As #Nicola said, you need to use the lm function for linear regression in R.
If you'd like to learn more about linear regression check out this or follow this tutorial
First you would have to determine your formula. You want to calculate Theta0 and Theta1 using data.1[[2]] and dates/months.
Your first formula would be something along the lines of:
formula <- Theta0 ~ data.1[[2]] + dates
Then you would create the linear model
variablename <- lm(formula, dataset)
After this you can use the output for various calculations.
For example you can calculate anova, or just print the summary:
anova(variablename)
summary(variablename)
Sidenote:.
I noticed your assigning variables by using =. This is not recommended parenthesis. For more information check out Google's R Style Guide
In R it would be preferred to use <- to assign variables.
Taking the first bit of your code, it would become:
x <- as.vector(data.1[[1]])
y <- as.vector(data.1[[2]])
plot(x,y)
theta0 <- 10
theta1 <- 10
alpha <- 0.0001
initialJ <- 100000
learningIterations <- 200000

Problems with points and apply R for linear discriminant analysis

I have some coding question, which arise doing some exercises in linear discriminant analysis. We are using the Iris data:
## Read in dataset, set seed, load package
Iris <- iris[,-(1:2)]
grIris <- as.integer(iris[,"Species"])
set.seed(16)
library(MASS)
## Read n
n <- nrow(Iris)
As you can see, we delte the first and second column of iris. What I want to do is a bootstrap for this data using linear discriminant analysis, here is my code:
ind <- replicate(B,sample(seq(1:n),n,replace=TRUE))
This generates the indices I want to use. Note B is some large number, e.g. 1000. Now I want to use apply, but why does the following code doesn't work?
bst.sample <- apply(ind,2,lda(Species~Petal.Length+Petal.Width,data=Iris[ind,]))
where Species, Petal.Length etc. are the data from iris. If I use a for loop everything works fine, but of course I would like to implement in this more elegant way.
My second question is about points. I also wanted to calculate the estimated means, which I've done by the following code
est.lda <- vector("list",B)
est.qda <- vector("list",B)
mu_hat_1 <- mu_hat_2 <- mu_hat_3 <- matrix(0,ncol=B,nrow=2)
for (i in 1:B){
est.lda[[i]] <- lda(Species~Petal.Length+Petal.Width,data=Iris[ind[,i],])
mu_hat_1[,i] <- est.lda[[i]]$means[1,]
mu_hat_2[,i] <- est.lda[[i]]$means[2,]
mu_hat_3[,i] <- est.lda[[i]]$means[3,]
est.qda[[i]] <- qda(Species~Petal.Length+Petal.Width,data=Iris[ind[,i],])
}
plot(mu_hat_1[1,],mu_hat_1[2,],pch=4)
points(mu_hat_2[1,],mu_hat_2[2,],pch=4,col=2)
points(mu_hat_3[1,],mu_hat_3[2,],pch=4,col=3)
The plot at the end should show three region with the expected mean of the three classes. However just the first plot is shown.
Thank you for your help.

B <- 10
ind <- replicate(B,sample(seq(1:n),n,replace=TRUE))
#you need to pass a function to apply
bst.sample <- apply(ind,2,
function(i) lda(Species~Petal.Length+Petal.Width,data=Iris[i,]))
#extract means
bst.means <- lapply(bst.sample,function(x) x$means)
#bind means into array
library(abind)
bst.means <- do.call(function(...) abind(..., along=3), bst.means)
#you need to make sure that alle points are inside the axis limits
plot(bst.means[1,1,],bst.means[1,2,],
xlim=range(bst.means[,1,]), ylim=range(bst.means[,2,]),
xlab=dimnames(bst.means)[[2]][1],ylab=dimnames(bst.means)[[2]][2],
col=1)
points(bst.means[2,1,],bst.means[2,2,], col=2)
points(bst.means[3,1,],bst.means[3,2,], col=3)
legend("topleft", legend=dimnames(bst.means)[[1]], col=1:3, pch=1)