Error in rep(1, n.ahead) : invalid 'times' argument in R - r

I'm working on dataset to forecast with ARIMA, and I'm so close to the last step but I'm getting error and couldn't find reference to figure out what I'm missing.
I keep getting error message when I do the following command:
ForcastData<-forecast(fitModel,testData)
Error in rep(1, n.ahead) : invalid 'times' argument
I'll give brief view on the work I did where I have changed my dataset from data frame to Time series and did all tests to check volatility, and Detect if data stationary or not.
Then I got the DataAsStationary as good clean data to apply ARIMA, but since I wanna train the model on train data and test it on the other part of the data, I splitted dataset into training 70% and testing 30%:
ind <-sample(2, nrow(DataAsStationary), replace = TRUE, prob = c(0.7,0.3))
traingData<- DataStationary1[ind==1,]
testData<- DataStationary1[ind==2,]
I used Automatic Selection Algorithm and found that Arima(2,0,3) is the best.
autoARIMAFastTrain1<- auto.arima(traingData, trace= TRUE, ic ="aicc", approximation = FALSE, stepwise = FALSE)
I have to mentioned that I did check the if residuals are Uncorrelated (White Noise) and deal with it.
library(tseries)
library(astsa)
library(forecast)
After that I used the training dataset to fit the model:
fitModel <- Arima(traingData, order=c(2,0,3))
fitted(fitModel)
ForcastData<-forecast(fitModel,testData)
output <- cbind(testData, ForcastData)
accuracy(testData, ForcastData)
plot(outp)
Couldn't find any resource about the error:
Error in rep(1, n.ahead) : invalid 'times' argument
Any suggestions!! Really
I tried
ForcastData<-forecast.Arima(fitModel,testData)
but I get error that
forecast.Arima not found !
Any idea why I get the error?

You need to specify the arguments to forecast() a little differently; since you didn't post example data, I'll demonstrate with the gold dataset in the forecast package:
library(forecast)
data(gold)
trainingData <- gold[1:554]
testData <- gold[555:1108]
fitModel <- Arima(trainingData, order=c(2, 0, 3))
ForcastData <- forecast(fitModel, testData)
# Error in rep(1, n.ahead) : invalid 'times' argument
ForcastData <- forecast(object=testData, model=fitModel) # no error
accuracy(f=ForcastData) # you only need to give ForcastData; see help(accuracy)
ME RMSE MAE MPE MAPE MASE
Training set 0.4751156 6.951257 3.286692 0.09488746 0.7316996 1.000819
ACF1
Training set -0.2386402
You may want to spend some time with the forecast package documentation to see what the arguments for the various functions are named and in what order they appear.
Regarding your forecast.Arima not found error, you can see this answer to a different question regarding the forecast package -- essentially that function isn't meant to be called by the user, but rather called by the forecast function.
EDIT:
After receiving your comment, it seems the following might help:
library(forecast)
# Read in the data
full_data <- read.csv('~/Downloads/onevalue1.csv')
full_data$UnixHour <- as.Date(full_data$UnixHour)
# Split the sample
training_indices <- 1:floor(0.7 * nrow(full_data))
training_data <- full_data$Lane1Flow[training_indices]
test_data <- full_data$Lane1Flow[-training_indices]
# Use automatic model selection:
autoARIMAFastTrain1 <- auto.arima(training_data, trace=TRUE, ic ="aicc",
approximation=FALSE, stepwise=FALSE)
# Fit the model on test data:
fit_model <- Arima(training_data, order=c(2, 0, 3))
# Do forecasting
forecast_data <- forecast(object=test_data, model=fit_model)
# And plot the forecasted values vs. the actual test data:
plot(x=test_data, y=forecast_data$fitted, xlab='Actual', ylab='Predicted')
# It could help more to look at the following plot:
plot(test_data, type='l', col=rgb(0, 0, 1, alpha=0.7),
xlab='Time', ylab='Value', xaxt='n', ylim=c(0, max(forecast_data$fitted)))
ticks <- seq(from=1, to=length(test_data), by=floor(length(test_data)/4))
times <- full_data$UnixHour[-training_indices]
axis(1, lwd=0, lwd.ticks=1, at=ticks, labels=times[ticks])
lines(forecast_data$fitted, col=rgb(1, 0, 0, alpha=0.7))
legend('topright', legend=c('Actual', 'Predicted'), col=c('blue', 'red'),
lty=1, bty='n')

I was able to run
ForcastData <- forecast(object=testData, model=fitModel)
without no error
and Now want to plot the testData and the forecasting data and check if my model is accurate:
so I did:
output <- cbind(testData, ForcastData)
plot(output)
and gave me the error:
Error in error(x, ...) :
improper length of one or more arguments to merge.xts
So when I checked ForcastData, it gave the output:
> ForcastData
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
2293201 -20.2831770 -308.7474 268.1810 -461.4511 420.8847
2296801 -20.1765782 -346.6400 306.2868 -519.4593 479.1061
2300401 -18.3975657 -348.8556 312.0605 -523.7896 486.9945
2304001 -2.2829565 -332.7483 328.1824 -507.6860 503.1201
2307601 2.7023277 -327.8611 333.2658 -502.8509 508.2555
2311201 4.5777316 -328.6756 337.8311 -505.0893 514.2447
2314801 4.3198927 -331.4470 340.0868 -509.1913 517.8310
2318401 3.8277285 -332.7898 340.4453 -510.9844 518.6398
2322001 1.4364973 -335.2403 338.1133 -513.4662 516.3392
2325601 -0.4013561 -337.0807 336.2780 -515.3080 514.5053
I thought I will get list of result as I have in my testData. I need to get the chart that shows 2 lines of actual data(testData), and expected data(ForcastData).
I have really went through many documentation about forcast, but I can't find something explain what I wanna do.

Related

Holt Winters For Weekly Volume And Errors In R

I'm trying to use Holt Winters and prediction function for stock index weekly volume from last 10 years, however i am still getting error. Can you help me please?
This is what i'm trying to do now:
volumen<-read.csv(file.choose(), header = TRUE, sep = ";")
lines(volumen[,6])
HoltWinters(volumen)
This is error I'm getting on third row:
Error in decompose(ts(x[1L:wind], start = start(x), frequency = f), seasonal) :
the time series has no periods or has less than 2
For prediction i have below code, however it does not seems to work with previous error:
lines(predict(volumen.hw,n.ahead=12),col=2)
Data in R Studio looks correct. I have decided to use file.choose() to make this code more universal. I am using *.csv file. Could someone guide me or advise what the code should look like to apply the Holt and Winters method and prediction?
It's hard to be 100% sure but
HoltWinters(lynx)
generates the same message as you are gettin,g but
HoltWinters(lynx, gamma = FALSE)
generates
Holt-Winters exponential smoothing with trend and without seasonal
component.
Call: HoltWinters(x = lynx, gamma = FALSE)
Smoothing parameters:
alpha: 1
beta : 0
gamma: FALSE
Coefficients: [,1]
a 3396
b 52
Which I learned from reading the examples in the HoltWinters documentation.
first of all it would be nice if you put your data here (if it is not private).
Secondly as far as I know you only can user HoltWinters() or any other method in the forecasting package to a vector or a time series so loading the entire dataset (volume) without specifying the rows could lead you to a problem.
Finally I recommend you to try the HW to an auxiliary vector containing the data that you want to study and also specify the frequency of the time series:
aux_train<-as.ts(volumen$variable, start=1, end=0.9*nrow(volume), freq="yourfrecuency")
prediction<-forecast(aux_train, h="number of forecast", method="hw")
accuracy(prediction, volumen$value)
I have finally won this battle - I have deleted my code and started from scratch. Here is what I came with:
dane2<-read.csv2(file.choose(), header = TRUE, sep = ";", dec=",")
dane2 <-ts(dane2[,5], start=c(2008,1),frequency=52)
past <- window(dane2, end = 2017)
future <- window(dane2, start = 2017)
model <- HoltWinters(past, seasonal = "additive")
model2 <- HoltWinters(past, seasonal = "multiplicative")
pred <- predict(model, n.ahead = 52)
pred2 <- predict(model2, n.ahead = 52)
dane2.hw<-HoltWinters(dane2)
predict(dane2.hw,n.ahead=52)
par(mfrow = c(2,1))
plot(model, predicted.values = pred)
lines(future, col="blue")
plot(model2, predicted.values = pred2)
lines(future, col="blue")
Now it works, so thank you for your answers.

Error in predict.randomForest

I was hoping someone would be able to help me out with an issue I am having with the prediction function of the randomForest package in R. I keep getting the same error when I try to predict my test data:
Here's my code so far:
extractFeatures <- function(RCdata) {
features <- c(4, 9:13, 17:20)
fea <- RCdata[, features]
fea$Week <- as.factor(fea$Week)
fea$Age_Range <- as.factor(fea$Age_Range)
fea$Race <- as.factor(fea$Race)
fea$Referral_Source <- as.factor(fea$Referral_Source)
fea$Referral_Source_Category <- as.factor(fea$Referral_Source_Category)
fea$Rehire <- as.factor(fea$Rehire)
fea$CLFPR_.HS <- as.factor(fea$CLFPR_.HS)
fea$CLFPR_HS <- as.factor(fea$CLFPR_HS)
fea$Job_Openings <- as.factor(fea$Job_Openings)
fea$Turnover <- as.factor(fea$Turnover)
return(fea)
}
gp <- runif(nrow(RCdata))
RCdata <- RCdata[order(gp), ]
train <- RCdata[1:4600, ]
test <- RCdata[4601:6149, ]
rf <- randomForest(extractFeatures(train), suppressWarnings(as.factor(train$disposition_category)), ntree=100, importance=TRUE)
testpredict <- predict(rf, extractFeatures(test))
"Error in predict.randomForest(rf, extractFeatures(test)) :
Type of predictors in new data do not match that of the training data."
I have tried adding in the following line to the code, and still receive the same error:
testpredict <- predict(rf, extractFeatures(test), type="prob")
I found the source of the error being the fact that the training data has a level or two that is not found in the test data. So when I tried another suggestion I found online to adjust the levels of the test data to that of the training data, I keep getting NULL values in the fields I am using in both the training and test sets.
levels(test$Referral)
NULL
I can see the levels when I use the function, however.
levels(as.factor(test$Referral))
So then I tried the same suggestion I found online with adjusting the levels of the test to equal that of the training data using the following function and received an error:
levels(as.factor(test$Referral)) -> levels(as.factor(train$Referral))
Error in `levels<-.factor`(`*tmp*`, value = c(... :
number of levels differs
I am sure there is something simple I am missing (I am still very new to R), so any insight you can provide would be unbelievably helpful. Thanks!

Create Forecast and Check Accuracy

I have data of the form SaleDateTime = '2015-01-02 23:00:00.000' SaleCount=4.
I'm trying to create an hourly forecast for the next 12 hours, using the code below.
I'm new to forecasting and could definitely appreciate some advice.
I'm trying to partition the data, train a model, plot the forecast with x axis of the form '2015-01-02 23:00:00.000', and test the accuracy of the model on a test time series.
I'm getting the error message below, when I try to run the accuracy as shown. Does anyone know why I'm getting the error message below?
When I run the plot as shown below it has an x axis from 0 to 400, does anyone know how to show that as something like '2015-01-02 23:00:00.000'? I would also like to narrow the plot to the last say 3 months.
My understanding is that if you don't specify a model for forecast, then it tries to fit the best model it can to the data for the forecast. Is that correct?
How do I filter for the same timeseries range with the forecast as the ts1Test that I'm trying to run accuracy on, is it something like ts(fcast2, start=2001, end = 8567) ?
Since I'm using the zoo package is the as.POSIXct step unnecessary, could I just do eventdata <- zoo(Value, order.by = SaleDateTime) instead?
library("forecast")
library("zoo")
SampleData<-SampleData
Value<-SampleData[,c("SaleDateTime","SaleCount")]
rDateTime<-as.POSIXct(SampleData$SaleDateTime, format="%Y-%m-%d %H:%M:%S")
eventdata <- zoo(Value, order.by = rDateTime)
##Partitioning data Training/Testing
ts1SampleTrain<-eventdata[1:2000,]
ts1Train<-ts(ts1SampleTrain$SaleCount, frequency=24)
ts1SampleTest<-eventdata[2001:28567,]
ts1Test<-ts(ts1SampleTest$SaleCount, frequency=24)
#Training Model
fcast2<-forecast(ts1Train,h=8567)
plot(fcast2)
accuracy(fcast2,ts1Test)
New Error:
Error in -.default(xx, ff[1:n]) : non-numeric argument to binary operator
To make your accuracy test run you should ensure that the length of your test data ts1Test and your forecasting horizon, h in fcast2<-forecast(ts1Train,h=8567) are of the same length. Now you have 26567 datapoints vs 8567.
Following your approach, the next toy example will work:
library(forecast)
library(zoo)
Value <- rnorm(1100)
rDateTime <- seq(as.POSIXct('2012-01-01 00:00:00'), along.with=Value, by='hour')
eventDate <- ts(zoo(Value, order.by=rDateTime), frequency = 24)
tsTrain <-eventDate[1:1000]
tsTest <- eventDate[1001:1100]
fcast<-forecast(tsTrain,h=100)
accuracy(fcast, tsTest)
ME RMSE MAE MPE MAPE MASE ACF1
Training set -2.821378e-04 9.932745e-01 7.990188e-01 1.003861e+02 1.007542e+02 7.230356e-01 4.638487e-02
Test set 0.02515008 1.02271839 0.86072703 99.79208174 100.14023919 NA NA
Concerning your other two questions:
Use of POSIX timestamps and zoo package. You don't need them to
use forecast. ts(Value, frequency) would suffice.
Plotting time series object with datetimes as your labels. The
following code snippet should get you started in this direction. Look for
axis function that provides the desired behavior:
par(mar=c(6,2,1,1)) # bottom, left, top, right margins
plot(tsTrain, type="l", xlab="", xaxt="n")
axis(side=1, at=seq(1,1000,100), label=format(rDateTime[seq(1,1000,100)], "%Y-%m-%d"), las=2)

Duplicate data when using gstat or automap package in R

I am trying to using ordinary kriging to spatially predict data where an animal will occur based on predictor variables using the gstat or automap package in R. I have many (over 100) duplicate coordinate points, which I cannot throw out since those stations were sampled multiple times over many years. Every time that I run the code below for ordinary kriging, I get an LDL error, which is due to the duplicate points. Does anyone know how to fix this problem without throwing out data? I have tried the code from the automap package that is supposed to correct for duplicates but I can't get that to work. Thank you for the help!
coordinates(fish) <- ~ LONGITUDE+LATITUDE
x.range <- range(fish#coords[,1])
y.range <- range(fish#coords[,2])
grd <- expand.grid(x=seq(from=x.range[1], to=x.range[2], by=3), y=seq(from=y.range[1], to=y.range[2], by=3))
coordinates(grd) <- ~ x+y
plot(grd, pch=16, cex=.5)
gridded(grd) <- TRUE
library(gstat)
zerodist(fish) ###146 duplicate points
v <- variogram(log(WATER_TEMP) ~1, fish, na.rm=TRUE)
plot(v)
vgm()
f <- vgm(1, "Sph", 300, 0.5)
print(f)
v.fit <- fit.variogram(v,f)
plot(v, model=v.fit) ####In fit.variogram(v, d) : Warning: singular model in variogram fit
krg <- krige(log(WATER_TEMP) ~ 1, fish, grd, v.fit)
## [using ordinary kriging]
##"chfactor.c", line 131: singular matrix in function LDLfactor()Error in predict.gstat(g, newdata = newdata, block = block, nsim = nsim,: LDLfactor
##automap code for correcting for duplicates
fish.dup = rbind(fish, fish[1,]) # Create duplicate
coordinates(fish.dup) = ~LONGITUDE + LATITUDE
kr = autoKrige(WATER_TEMP, fish.dup, grd)
###Error in inherits(formula, "SpatialPointsDataFrame"):object 'WATER_TEMP' not found
###somehow my predictor variables are no longer available when in a Spatial Points Data Frame??
automap::autoKrige expects a formula as first argument, try
kr = autoKrige(WATER_TEMP~1, fish.dup, grd)
automaphas a very simple fix for duplicate observations, and that is to discard them. So, automapdoes not really solves the issue you have. I see some options:
Discard the duplicates.
Slightly perturb the coordinates of the duplicates so that they are not on exactly the same location anymore.
Perform space-time kriging using gstat.
In regard to your specific issue, please make your example reproducible. What I can guess is that rbind of your fish object is not doing what you expect...
Alternatively you can use the function jitterDupCoords of geoR package.
https://cran.r-project.org/web/packages/geoR/geoR.pdf

Error in approxfun(x.values.1, y.values.1, method = "constant", f = 1, : zero non-NA points

I am using R's ROCR package to calculate the area under the curve of large data-sets. However, The code does not work for all the datasets except a few.
the code i have used:
pred <- prediction(mydata$Total.Regexes, mydata$actual)
perf <- performance(pred, "tpr", "fpr")
I checked the dataset there is no non-Na points present in the dataset. However, since the dataset is huge, it may go out of my sight. So, is there any other process to refine the dataset (for non-NA values, if there any) without disturbing the remaining values?
And this is the error it shows for a few dataset:
Error in approxfun(x.values.1, y.values.1, method = "constant", f = 1, :
zero non-NA points
I checked using:
is.na(dataset)
dataset <- na.omit(dataset)
but still it doesn't work. There is no non-Na values present in the dataset.. I can't reproduce the error with a simple dataset, so I've posted the problem dataset in my dropbox.
https://www.dropbox.com/s/pjko6o6h23m43le/DC4.csv
Please Help!
I had a similar problem.
By "casting" the arguments to prediction I managed to get everything working properly.
Try:
pred <- prediction(as.numeric(mydata$Total.Regexes), as.numeric(mydata$actual))
perf <- performance(pred, "tpr", "fpr")
I had the similar problem. This is a 'bad' way to solve it:
MODEL <- glm(y ~ x + z, my_data, family = "binomial")
pred_probab <- predict(MODEL, type = "response")
type = "response"is specified for return prediction sample as probabilities
pr <- prediction(pred_probab, Two_levels_factor)
Error in approxfun(x.values.1, y.values.1, method = "constant", f = 1, :
zero non-NA points
The sample "Two_levels_factor" with n = 2000 has only one level value "positive_result". For logit regression it must have two levels.
levels(Two_levels_factor)
[1] "negative_result"
Two_levels_factor[1] <- "positive_result"
levels(Two_levels_factor)
[1] "positive_result" "negative_result"
pr <- prediction(pred_probab, Two_levels_factor)

Resources