I have created a model in R using the forecast package.
My source of learning this is from here:
https://robjhyndman.com/hyndsight/dailydata/
I am using the last section which includes fourier series as such:
y <- ts(x, frequency=7)
z <- fourier(ts(x, frequency=365.25), K=5)
zf <- fourier(ts(x, frequency=365.25), K=5, h=100)
fit <- auto.arima(y, xreg=cbind(z,holiday), seasonal=FALSE)
fc <- forecast(fit, xreg=cbind(zf,holidayf), h=100)
After I create this model, is there a way I can do a cross validation k-fold test to determine the error and adjusted error?
I know how to do it with a generalized linear model as such:
library(boot)
lm1 <- glm(ValuePerSqFt ~ Units + SqFt + Boro, data = housing)
lm1cv <- cv.glm(housing, lm1, K=5)
lm1cv$delta
[1] 1870.31 1869.352
This shows the error and adjusted error.
Is there a function in the forecast package that can do this and it will help me compare the accuracy of this model with the glm model?
Related
I want to evaluate the out-of-sample one-step-ahead performance of a nnetar time series forecasting model in R. I'm looking for something analogous to the following code, but using nnetar.
#prepare train and test set
train <- lynx[1:100]
test <- lynx[101:length(lynx)]
# fitting out of sample
train.fit <- auto.arima(train)
plot(forecast(train.fit, h = 20))
test.fit <- Arima(test, model = train.fit)
one.step <- fitted(test.fit)
The following attempt with nnetar doesn't raise any errors, but the one.step result has 8 NAs.
fit <- nnetar(train)
plot(forecast(fit, h = 20))
fit2 <- nnetar(test, model = fit)
one.step <- fitted(fit2)
I don't even get that far with my real data which holds 700 training points of 5 day frequency weekday-only data. The test.series holds 28 days.
fit <- nnetar(train.series)
fit2 <- nnetar(test.series, model = fit)
On the fit2 line above, I get the error:
Error in nnet.default(x = c(0.223628229573182, -0.783157335744087, -0.560497369997497, :
weights vector of incorrect length
In addition: Warning message:
In nnetar(test.series, model = fit) :
Reducing number of lagged inputs due to short series
Any help/examples would be appreciated.
The following works using v8.3 of the forecast package:
library(forecast)
series <- ts(rnorm(728), freq=5)
train.series <- subset(series, end=700)
test.series <- subset(series, start=701)
fit <- nnetar(train.series)
fit2 <- nnetar(test.series, model = fit)
one.step <- fitted(fit2)
However, note that you can't get the first few fitted values that way because the second call to nnetar knows nothing about the earlier data.
The following is better
fit2 <- nnetar(series, model = fit)
one.step <- subset(fitted(fit2), start=701)
I would like to create confusion matrices for a multinomial logistic regression as well as a proportional odds model but I am stuck with the implementation in R. My attempt below does not seem to give the desired output.
This is my code so far:
CH <- read.table("http://data.princeton.edu/wws509/datasets/copen.dat", header=TRUE)
CH$housing <- factor(CH$housing)
CH$influence <- factor(CH$influence)
CH$satisfaction <- factor(CH$satisfaction)
CH$contact <- factor(CH$contact)
CH$satisfaction <- factor(CH$satisfaction,levels=c("low","medium","high"))
CH$housing <- factor(CH$housing,levels=c("tower","apartments","atrium","terraced"))
CH$influence <- factor(CH$influence,levels=c("low","medium","high"))
CH$contact <- relevel(CH$contact,ref=2)
model <- multinom(satisfaction ~ housing + influence + contact, weights=n, data=CH)
summary(model)
preds <- predict(model)
table(preds,CH$satisfaction)
omodel <- polr(satisfaction ~ housing + influence + contact, weights=n, data=CH, Hess=TRUE)
preds2 <- predict(omodel)
table(preds2,CH$satisfaction)
I would really appreciate some advice on how to correctly produce confusion matrices for my 2 models!
You can refer -
Predict() - Maybe I'm not understanding it
Here in predict() you need to pass unseen data for prediction.
I am using following dataset: http://www.sgi.com/tech/mlc/db/churn.data
And the variable description: http://www.sgi.com/tech/mlc/db/churn.names
Ii did preliminary coding but I am really not able to make out how to perform a logistic regression and Random Forest techniques to this data to predict the importance of variables and churn rate.
nm <- read.csv("http://www.sgi.com/tech/mlc/db/churn.names",
skip=4, colClasses=c("character", "NULL"), header=FALSE, sep=":")[[1]]
nm
dat <- read.csv("http://www.sgi.com/tech/mlc/db/churn.data", header=FALSE, col.names=c(nm, "Churn"))
dat
View(dat)
View(dat)
library(survival)
s <- with(dat, Surv(account.length, as.numeric(Churn)))
model <- coxph(s ~ total.day.charge + number.customer.service.calls, data=dat[, -4])
summary(model)
plot(survfit(model))
Also I am not able to figure out how to use the model that I built in my further analysis.
please help me.
Do you have any example code of what you're trying to do? What further analysis do you have planned? If you're just trying to run a logistic regression on the data, the general format is:
lr <- glm(Churn ~ international.plan + voice.mail.plan + number.vmail.messages
+ account.length, family = "binomial", data = dat)
Try help(glm) and help(randomForest)
Is there a function within caret (or another package) that can perform a Breusch-Pagan / Cook-Weisberg test for heteroskedasticity on an 'nnet' model trained using caret?
E.g. something similar to library(car); ncvTest or library(lmtest); bptest for lm objects, but that works on nnet objects created from caret?
Example data
library(caret)
set.seed(4)
n <- 100
x1i <- rnorm(n)
x2i <- rnorm(n)
yi <- rnorm(n)
dat <- data.frame(yi, x1i, x2i)
mod <- train(yi ~., data=dat, method="nnet", trace=FALSE, linout=TRUE)
This produces the plot of fitted vs residuals:
No there is not anything like that in the package right now.
I would fit a SARIMAX model with temperature as exogenous variable in R. Can I do that with xreg function present in the package TSA?
I thought to fit the model as:
fit1 = arima(x, order=c(p,d,q), seasonal=list(order=c(P,D,Q), period=S), xreg=temp)
is that correct or I have to use other function of R?
if it itsn't correct: which steps should I use?
Thanks.
Check out the forecast package, it's great:
# some random data
x <- ts(rnorm(120,0,3) + 1:120 + 20*sin(2*pi*(1:120)/12), frequency=12)
temp = rnorm(length(x), 20, 30)
require(forecast)
# build the model (check ?auto.arima)
model = auto.arima(x, xreg = data.frame(temp = temp))
# some random predictors
temp.reg = data.frame(temp = rnorm(10, 20, 30))
# forecasting
forec = forecast(model, xreg = temp.reg)
# quick way to visualize things
plot(forec)
# model diagnosis
tsdiag(model)
# model info
summary(forec)
I won't suggest you to use auto.arima(). Depending on the model you want to fit it may return poor results, as for example when working with some complex SARIMA models the difference between the models done manually and with auto.arima() were noticeable, auto.arima() do not even returned white noise innovations (as it is expected), while manual fits, of course, did.