Doing linear prediction with R: How to access the predicted parameter(s)? - r

I am new to R and I am trying to do linear prediction. Here is some simple data:
test.frame<-data.frame(year=8:11, value= c(12050,15292,23907,33991))
Say if I want to predict the value for year=12. This is what I am doing (experimenting with different commands):
lma=lm(test.frame$value~test.frame$year) # let's get a linear fit
summary(lma) # let's see some parameters
attributes(lma) # let's see what parameters we can call
lma$coefficients # I get the intercept and gradient
predict(lm(test.frame$value~test.frame$year))
newyear <- 12 # new value for year
predict.lm(lma, newyear) # predicted value for the new year
Some queries:
if I issue the command lma$coefficients for instance, a vector of two values is returned to me. How to pick only the intercept value?
I get lots of output with predict.lm(lma, newyear) but cannot understand where the predicted value is. Can someone please clarify?
Thanks a lot...

intercept:
lma$coefficients[1]
Predict, try this:
test.frame <- data.frame(year=12, value=0)
predict.lm(lma, test.frame)

Related

AR(2) model simulation with initial points

I have an AR(2) model as r_t=0.03+0.2r_{t-2}+a_t with var(a)=0.1. I want to simulate this 1000 times with r_0=-0.02 and r_{-1}=0.01. This is what I wrote but I'm pretty sure it's not correct, because when the first two values that it produces are not the ones I gave it. How do I do this?
mod1 <- arima.sim(list(ar=c(0,0.2), order=c(2,0,0)), n=1000, n.start=2, start.innov= c(0.01,-0.02), sd=sqrt(0.1))+0.03
> head(mod1)
Time Series:
Start = 1
End = 6
Frequency = 1
[1] 0.2629583 -0.6263199 -0.2020755 -0.2865246 -0.3953399 -0.7285421
Thanks in advance!
I figured that what this function outputs is from the next point of what is given to it. for example, if you know r_5=0.01 and r_6=-0.02, and want to forecast 1-step and 2-step ahead with simulation, then n=2 and start.innov=c(0.01,-0.02). If you put innov=c(0,0), what you get will be same as your analytical solution.

depmixs4 list of state estimates

I have fitted a time series of data to a Hidden Markov model using the fit(depmix()) functions in the package depmixs4. I want to obtain a time series of estimated states, where the estimate for state x is the mean value assigned to state x. Currently, I'm only getting a times series of values that say the index of the state (e.g. 1,3,5,2,5...).
This is my current code:
set.seed(9)
hmm9 <- depmix(volume ~ 1, data=data.frame(volume), nstates=9)
fitted_hmm9 <- fit(hmm9)
summary(fitted_hmm9)
state_ests9 <- posterior(fitted_hmm9)
state_ests9[,1]
The last part, state_ests9[,1], is the time series of state indices, while the actual expected values for states are stored somewhere in summary(fitted_hmm9).
By "estimated state" I assume you want the predicted response (i.e. "volume") according to each state. To get state-wise predicted values for responses, you can use the predict method, which however needs to be called on the response sub-model, which is somewhat hidden in the fitted depmix object.
In a fitted depmix object, there is a slot called "response", which is a list of lists, structured as
fitted_model#response[[state_id]][[response_id]]
In your case, I think "volume" is univariate, so that there is a single response and response_id is always 1. For a model with 9 states, state_id would vary from 1 to 9.
The following code (including randomly generated values for "volume" to make it reproducible) gives you what you want (I think):
set.seed(123)
volume <- rnorm(10000)
hmm9 <- depmix(volume ~ 1, data=data.frame(volume), nstates=9)
fitted_hmm9 <- fit(hmm9)
summary(fitted_hmm9)
state_ests9 <- posterior(fitted_hmm9)
state_ests9[,1]
# construct matrix for state-dependent predictions
pred_resp9 <- matrix(0.0,ncol=9,nrow=nrow(state_ests9))
# fill matrix column-wise by using the "predict" method of
# corresponding response model
for(i in 1:9) {
pred_resp9[,i] <- predict(fitted_hmm9#response[[i]][[1]])
}
## use MAP state estimates to extract a single time-series
## with predictions
pred_resp9[cbind(1:nrow(pred_resp9),state_ests9[,1])]

k-Fold Cross Validation in R - Negative Value Predictions

My predicted values are all negative. I would have expected 0's or 1's. Can anyone see where i am going wrong?
fold = 10
end = nrow(birthwt)
fold_2 = floor(end/fold)
df_i = birthwt[sample(nrow(birthwt)),] # random sort the dataframe birthwt
tester = df_i[1:fold_2,] # remove first tenth of rows - USE PREDICT ON THIS DATA
trainer = df_i[-c(1:fold_2),] # all other than the first tenth of rows - USE GLM ON THIS DATA
mod = glm(low~lwt,family=binomial,data=trainer)
ypred = predict(mod,data=tester) # predicted values
The default for predict.glm is to give you the value of the link (on the scale of the linear predictors) before transformation. If you want to predict the response, use
ypred <- predict(mod, data=tester, type="response")
If may be helpful to read the ?predict.glm help file.

how to use previous observations to forecast the next period using for loops in r?

I have made 1000 observations for xt = γ1xt−1 + γ2xt−2 + εt [AR(2)].
What I would like to do is to use the first 900 observations to estimate the model, and use the remaining 100 observations to predict one-step ahead.
This is what I have done so far:
data2=arima.sim(n=1000, list(ar=c(0.5, -0.7))) #1000 observations simulated, (AR (2))
arima(data2, order = c(2,0,0), method= "ML") #estimated parameters of the model with ML
fit2<-arima(data2[1:900], c(2,0,0), method="ML") #first 900 observations used to estimate the model
predict(fit2, 100)
But the problem with my code right now is that the n.ahead=100 but I would like to use n.ahead=1 and make 100 predictions in total.
I think I need to use for loops for this, but since I am a very new user of Rstudio I haven't been able to figure out how to use for loops to make predictions. Can anyone help me with this?
If I've understood you correctly, you want one-step predictions on the test set. This should do what you want without loops:
library(forecast)
data2 <- arima.sim(n=1000, list(ar=c(0.5, -0.7)))
fit2 <- Arima(data2[1:900], c(2,0,0), method="ML")
fit2a <- Arima(data2[901:1000], model=fit2)
fc <- fitted(fit2a)
The Arima command allows a model to be applied to a new data set without the parameters being re-estimated. Then fitted gives one-step in-sample forecasts.
If you want multi-step forecasts on the test data, you will need to use a loop. Here is an example for two-step ahead forecasts:
fcloop <- numeric(100)
h <- 2
for(i in 1:100)
{
fit2a <- Arima(data2[1:(899+i)], model=fit2)
fcloop[i] <- forecast(fit2a, h=h)$mean[h]
}
If you set h <- 1 above you will get almost the same results as using fitted in the previous block of code. The first two values will be different because the approach using fitted does not take account of the data at the end of the training set, while the approach using the loop uses the end of the training set when making the forecasts.

Extracting the Model Object in R from str()

I have a logit model object fit using glm2. The predictors are continuous and time varying so I am using basis splines. When I predict(FHlogit, foo..,) the model object it provides a prediction. All is well.
Now, what I would like to do is extract the part of FHLogit and the basis matrix the provides the prediction. I do not want to extract information about the model from str(FHLogit) I am trying to extract the part that says Beta * Predictor = 2. So, I can manipulate the basis matrix for each predictor
I don't think using basis splines will affect this. If so, please provide a reproducible example.
Here's a simple case:
df1 <- data.frame(y=c(0,1,0,1),
x1=seq(4),
x2=c(1,3,2,6))
library(glm2)
g1 <- glm2(y ~ x1 + x2, data=df1)
### default for type is "link"
> stats::predict.glm(g1, type="link")
1 2 3 4
0.23809524 0.66666667 -0.04761905 1.14285714
Now, being unsure how these no.s were arrived at we can look at the source for the above, with predict.glm. We can see that type="link" is the simplest case, returning
pred <- object$fitted.values # object is g1 in this case
These values are the predictions resulting from the original data * the coefficients, which we can verify with e.g.
all.equal(unname(predict.glm(g1, type="link")[1]),
unname(coef(g1)[1] + coef(g1)[2]*df1[1, 2] + coef(g1)[3]*df1[1, 3]))

Resources