How to fit AR process (with nonconsecutive lags) to Time Series? - r

I want to estimate the coefficients for an AR process based on weekly data where the lags occur at t-1, t-52, and t-53. I will naturally lose a year of data to do this.
I currently tried:
lags <- rep(0,54)
lags[1]<- NA
lags[52] <- NA
lags[53] <- NA
testResults <- arima(data,order=c(53,0,0),fixed=lags)
Basically I tried using an ARIMA and shutting off the MA/differencing. I used 0's for the terms I wanted to exclude (plus intercept, and NAs for the terms I wanted.
I get the following error:
Error in optim(init[mask], armafn, method = optim.method, hessian =TRUE, :
non-finite finite-difference value [1]
In addition: Warning message:
In arima(data, order = c(53, 0, 0), fixed = lags) :
some AR parameters were fixed: setting transform.pars = FALSE
I'm hoping there is an easier method or potential solution to this error. I want to avoid creating columns with the lagged variables and simply running a regression. Thanks!

Related

convergence issues with dynamic factor analysis

I am using the R package MARSS to run a dynamic factor analysis. I have 8 timeseries and all of the time series have at least 1 NA value (range 1-20 of 50 years/timeseries).
When I ran my model with just 23 years of data (the years where all timeseries had no NA values), it had both Abstol and log-log convergence after 293,368 iterations (maxit was set to 1,000,000). However, after trying it again with the full time series, I only have Abstol convergence after 1,000,000 iterations and this took 2 days to run.
I can't seem to find any guidance on how many NA values a DFA can handle nor what is typically used for maxit. Are there any tools to determine if there are too many NA values in a timeseries for a DFA?
Here is how I have specified model. Note: I haven't provided any data because I don't think anyone wants to run this model given how long it presently takes.
library(MARSS)
listMod = list(m = mm, R = "diagonal and unequal")
listInit = list(x0 = matrix(rep(0, mm), mm, 1))
listCont = list(maxit = 1000000, allow.degen = TRUE)
dfa1 <- MARSS(y = data, # matrix with 50 columns (years) & 8 rows (each timeseries); 84 NA values
form = "dfa",
z.score = FALSE, # timeseries were individually centred and scaled while preparing the dataset (mean = 0, sd = 1)
model = listMod,
inits = listInit,
control = listCont)
Results:
Warning! Abstol convergence only. Maxit (=1e+06) reached before log-log convergence.
Alert: Numerical warnings were generated. Print the $errors element of output to see the warnings.
MARSS fit is
Estimation method: kem
Convergence test: conv.test.slope.tol = 0.5, abstol = 0.001
WARNING: Abstol convergence only no log-log convergence.
maxit (=1e+06) reached before log-log convergence.
The likelihood and params might not be at the ML values.
Try setting control$maxit higher.
Convergence warnings
2998019 warnings. First 10 shown. Type cat(object$errors) to see the full list.
Warning: the R.(Y1,Y1) parameter value has not converged.
Warning: the R.(Y2,Y2) parameter value has not converged.
Warning: the R.(Y7,Y7) parameter value has not converged.
Warning: the logLik parameter value has not converged.
Type MARSSinfo("convergence") for more info on this warning.
MARSSkem warnings. Type MARSSinfo() for help.
iter=412 Setting element of R to 0, blocked. See MARSSinfo("R0blocked"). The error is due to the following MARSSkemcheck errors.
MARSSkemcheck error: t=1: For method=kem (EM), if an element of the diagonal of R is 0, the corresponding row of Z must be fixed. See MARSSinfo('AZR0').
iter=413 Setting element of R to 0, blocked. See MARSSinfo("R0blocked"). The error is due to the following MARSSkemcheck errors.
MARSSkemcheck error: t=1: For method=kem (EM), if an element of the diagonal of R is 0, the corresponding row of Z must be fixed. See MARSSinfo('AZR0').
iter=414 Setting element of R to 0, blocked. See MARSSinfo("R0blocked"). The error is due to the following MARSSkemcheck errors.
MARSSkemcheck error: t=1: For method=kem (EM), if an element of the diagonal of R is 0, the corresponding row of Z must be fixed. See MARSSinfo('AZR0').
iter=415 Setting element of R to 0, blocked. See MARSSinfo("R0blocked"). The error is due to the following MARSSkemcheck errors.
MARSSkemcheck error: t=1: For method=kem (EM), if an element of the diagonal of R is 0, the corresponding row of Z must be fixed. See MARSSinfo('AZR0').

error in shape() function in evir library

I have a dataframe where a column is a mix of positive and negative numbers and the first entry is NA. I'm trying to run the shape function as
shape(data$col, models = 30, start = 30, end = 400, ci=.90,reverse = TRUE,auto.scale = TRUE)
where the data in 'col' is [NA, -0.2663194135, -3.7665034719, -0.2072122334, 1.5721742718, -9.142419, -8.954330, -5.167314, 11.805930, 9.533830, 7.065835]
but I get an error that says
Error in optim(theta, negloglik, hessian = TRUE, ..., tmp = excess) :
non-finite value supplied by optim
Can someone help me figure out what it means? I've googled it but haven't found anything concrete
It's not clear what you are trying to do here. Calling shape allows you to see how altering the threshold or nextremes parameters in the gpd function will alter the xi parameter of the resulting generalised Pareto distribution model.
There are a few reasons why the example you supplied doesn't work. Let's first of all show an example of what does work. The exponential distribution is a special case of a GPD with mu = 0 and xi = 0, so a sample drawn from the exponential distribution should do the trick:
library(evir) # For the shape() function
set.seed(69) # Makes this example reproducible
x <- rexp(300) # Random sample of 300 elements drawn from exponential distribution
shape(x)
Fine.
However, your sample contains an NA. What happens if we make a single value NA in our sample?
x[1] <- NA
shape(x)
#> Error in optim(theta, negloglik, hessian = TRUE, ..., tmp = excess) :
#> non-finite value supplied by optim
So, no NAs allowed.
Unfortunately, you will find that you still get the same error if you remove your NA value. There are two reasons for this. Firstly, you have 9 non-NA samples. What happens if we try a length-9 exponential sample?
shape(rexp(9))
#> Error in optim(theta, negloglik, hessian = TRUE, ..., tmp = excess) :
#> non-finite finite-difference value [1]
We will find that the model will fail to fit with fewer than about 16 data points.
But that's not the only problem. What if we try to get a plot for data that can't be drawn from a generalized Pareto distribution?
# Maybe a uniform distribution?
shape(runif(300, 1, 10))
#> Error in optim(theta, negloglik, hessian = TRUE, ..., tmp = excess) :
#> non-finite finite-difference value [1]
#> In addition: Warning message:
#> In sqrt(diag(varcov)) : NaNs produced
#>
So in effect, you need a bigger sample with no NAs, and it needs to conform approximately to a GPD, otherwise the gpd function will throw an error.
I might be able to help if you let us know the bigger picture of what you are trying to do.

Error in panel spatial model in R using spml

I am trying to fit a panel spatial model in R using the package spml. I first define the NxN weighting matrix as follows
neib <- dnearneigh(coordinates(coord), 0, 50, longlat = TRUE)
dlist <- nbdists(neib, coordinates(coord))
idlist <- lapply(dlist, function(x) 1/x)
w50 <- nb2listw(neib,zero.policy=TRUE, glist=idlist, style="W")
Thus I define two observations to be neighbours if they are distant within a range of 50km at most. The weights attached to each pairs of neighbour observations correspond to the inverse of their distance, so that closer neighbours receive higher weights. I also use the option zero.policy=TRUE so that observations which do not have neighbours are associated with a vector of zero weights.
Once I do this I try to fit the panel spatial model in the following way
mod <- spml(y ~ x , data = data_p, listw = w50, na.action = na.fail, lag = F, spatial.error = "b", model = "within", effect = "twoways" ,zero.policy=TRUE)
but I get the following error and warning messages
Error in lag.listw(listw, u) : Variable contains non-finite values In
addition: There were 50 or more warnings (use warnings() to see the
first 50)
Warning messages: 1: In mean.default(X[[i]], ...) : argument is not
numeric or logical: returning NA
...
50: In mean.default(X[[i]], ...) : argument is not numeric or
logical: returning NA
I believe this to be related to the non-neighbour observations. Can please anyone help me with this? Is there any way to deal with non-neighbour observations besides the zero.policy option?
Many many thanks for helping me.
You should check two things:
1) Make sure that the weight matrix is row-normalized.
2) Treat properly if you have any NA values in the dataset and as well in the W matrix.

H2O.GeneralizedLowRankModel objective is NA when passing loss by column

I am working with h2o glrm function. When I am trying to pass loss_by_col argument in order to specify different loss function for each column in my DataFrame (I have normal, poisson and binomial variables, so I am passing "Quadratic", "Poisson" and "Logistic" loss), the objective is not getting computed. The testmodel#model$objective returns NaN. But at the same time summary shows that there was few iterations made and objective was NA for all of them. The quality of model is very bad, but the archetypes are somehow computed. So I am confused. How should pass different loss for every variable in my dataset? Here is a (i hope) reproducible example:
df <- data.frame(p1 = rpois(100, 5), n1 = rnorm(100), b1 = rbinom(100, 1, 0.5))
df$b1 <- factor(df$b1)
h2df <- as.h2o(df)
testmodel <- h2o.glrm(h2df,
k=3,
loss_by_col=c("Poisson", "Quadratic", "Logistic"),
transform="STANDARDIZE")
testmodel#model$objective
summary(testmodel)
plot(testmodel)
Please note that there is a jira ticket for this here
It's interesting that you don't get an error when you run your code snippet. When I run your code snippet I get the following error:
Error: DistributedException from localhost/127.0.0.1:54321: 'Poisson loss L(u,a) requires variable a >= 0', caused by java.lang.AssertionError: Poisson loss L(u,a) requires variable a >= 0
I can resolve this error by removing transform="STANDARDIZE", because standardization can lead to negative values. For more information on what the transformations do you can take a look at the user guide here for your convenience here is the definition of how standardize gets used Standardize: Standardizing subtracts the mean and then divides each variable by its standard deviation.

passing sparse xreg to stlf in R causes optimisation error

I am trying to forecast a time series, and regress on temperature. The residuals show a different behaviour at low and high temperatures so I want to use piecewise linear approach, so learn different coeffecients for temperatures above and below 35 degrees.
The data is in a dataframe data$x, data$Season, data$Temp.
#Create data frame
len<-365*3 + 1 +31
x<-rnorm(len,mean=4000000,sd=100000)
Season<-c(rep(3,62),rep(4,91),rep(1,90),rep(2,92),rep(3,92),rep(4,91),rep(1,90),rep(2,92),rep(3,92),rep(4,91),rep(1,91),rep(2,92),rep(3,61))
Temp<-rnorm(len,mean=20,sd=5)
data<-data.frame(x,Season,Temp)
#Create model matrix
season_dummy<-model.matrix(~as.factor(data$Season)+0)
Temp_max=pmax(0,data$Temp-35) # creates 0, or a difference
Temp_restore<-restore_temp_up(Temp_max,data$Temp,35) # restores difference to original value
Temp_season_matrix_max=Temp_restore * season_dummy
#Create time-series and forecast
data_ts<-ts(data$x[1:1000],freq=365,start=c(2009,182))
len_train<-length(data_ts)
xreg1<-Temp_season_matrix_max[1:len_train,]
newxreg1<-Temp_season_matrix_max[(len_train+1):(len_train+30),]
stlf(data_ts,method="arima",h=30,xreg=xreg1,newxreg=newxreg1,s.window="periodic")
> Error in optim(init[mask], armaCSS, method = optim.method, hessian = FALSE, :
non-finite value supplied by optim
Error in auto.arima(x, xreg = xreg, seasonal = FALSE, ...) :
No suitable ARIMA model found
In addition: Warning message:
In auto.arima(x, xreg = xreg, seasonal = FALSE, ...) :
Unable to calculate AIC offset
>
Other threads suggest changing method solver from CSS to ML, but I cant edit these parameters in stlf. The help file shows an optional parameter "forecastfunction" but there are no examples of real explanation how to use it.
Note - when I set the min temperature to say 20, instead of 35, this works ok - I am sure it is because the xreg matrix containing temperatures above 35 degress is sparse (most temperatures are below this value), but I am not sure how to get around this.
(I have included code for restore_temp_up - possibly inefficient, but included here for question completion.)
restore_temp_up<-function(x,original,k){
if(!is.vector(x))
stop('x must be a vector')
for (i in 1:length(x)){
if(!is.na(x[i])){
if (x[i] > 0){
x[i]<-x[i]+k
}
if (original[i] == k){
x[i]<-original[i] ## this is the case if original WAS =k, then dont know whether original is 0,
}
}
}
return(x)
}
Your design matrix is rank deficient so the regression is singular. To see this:
> eigen(t(xreg1) %*% xreg1)$val
[1] 1321.223 0.000 0.000 0.000
You cannot fit a regression model with a rank deficient design matrix.

Resources