Error when adding fourth regressor to VAR - R - r

I am building a vector autoregressive model and got stuck on some problem.
My regressors are some sentiment and financial values. For testing robustness I wanted to add multiple other economic variables to the model.
The problem I encounter is: when adding a fourth regressor I only get an error message in R.
I can use three from any combination, but as soon as I add a fourth one, it wont work (see error message below)...
My Code:
library(dplyr)
library(readr)
library(tidyverse)
library(urca)
library(vars)
library(tseries)
library(forecast)
library(stargazer)
tr <- ts(TR$tr, start = c(2011, 1), frequency = 4) #4 because quarterly
Index1 <- ts(Index1$Value, start = c(2011, 1), frequency = 4)
Index2 <- ts(Index2$Value, start = c(2011, 1), frequency = 4)
Control1 <- ts(CPI$Value, start = c(2011, 1), frequency = 4)
Control2 <- ts(Spread$Value, start = c(2011, 1), frequency = 4)
# for finding optimal lags
tr.bv <- cbind(TR$tr, Index1$Value, Index2$Value, CPI$Value, Spread$Value)
colnames(tr.bv) <- cbind("Total Return", "Index1", "Index2", "CPI", "Spread")
lagselect <- VARselect(tr.bv, lag.max = 10, type = "const")
lagselect$selection
# Building the model
Model <- VAR(tr.bv, p = 10, type = "const", season = NULL, exog = NULL)
summary(Model_LSTM)
The error message I get:
Error in solve.default(Sigma) :
Lapack routine dgesv: system is exactly singular: U[1,1] = 0
In addition: Warning message:
In cor(resids) : Standarddeviation equals zero
I did build the same model in Python using the statsmodel VAR function -> here I only get 0's as p-values or nan's...
Hopefully someone can help me?

The problem likely lies with your data and the final parameter you have added to your model (possibly multicollinearity or overfitting). A reproducible example would be helpful here.
See: https://stats.stackexchange.com/questions/446707/var-model-error-in-solve-defaultsigma-system-is-computationally-singular-r

Related

XGBoost in R: an error in the lengths of my matrix and Vector components

I am using (or trying to use) xgboost (in R) for the first time as I'd like to compare data science techniques. I have a dfataset I used to build a scorecard / PD model a few years ago and want to see how XGBoost performs on the same data
I use this code:-
keeps1 <- c("gb_estassval_woe","gb_prof_occ_woe","biztype_woe", "gb_educ_woe", "gb_region_woe", "gb_resstat_woe", "gb_leg_tel_woe", "gb_age_woe", "gb_TimeJob_woe", "gb_gender_woe", "gb_marstat_woe", "gb")
cc_dev1a <-subset(cc_dev_fin,,keeps1)
sparse_matrix <- sparse.model.matrix(gb ~ ., data =cc_dev1a)[,-1]
#gb is my 1,0 target variable
output_vector = cc_dev1a[,gb ] == "Marked"
str(cc_dev1a)
str(sparse_matrix)
str(output_vector)
head(output_vector)
#checking the number of records the output_vector has 57483, where sparse_matrix has 57157
#i thought the reduction may be due to na / missing value but can't find any (my variables are all processed to give a Weight of Evidence value, i.e. an integer
bst <- xgboost(data = sparse_matrix, label = output_vector, max_depth = 4,
eta = 1, nthread = 2, nrounds = 10,objective = "binary:logistic")
and get this annoying wee beastie of an error:-
Error in setinfo.xgb.DMatrix(dmat, names(p), p[[1]]) :
The length of labels must equal to the number of rows in the input data
please, what is causing it and how to i solve it.
thank you in advance

Prediction of VAR model with exogenous variables in R

I'm building a very simple VAR model of inflation and a fuel price index. My data are from 1998 to Sep 2021. When running the simple VAR model I get a prediction for inflation with no problems whatsoever, however I can't get a prediction when I incorporate a "crisis dummy" as an exogenous variable. This variable has 0's for the most part and 1's when I want to impose a crisis assumption to the data.
library(urca)
library(vars)
library(mFilter)
library(tseries)
library(TSstudio)
library(forecast)
library(tidyverse)
#-----Loading the Dataset (283 rows x 4 columns[date, inflation, fuel price index & crisis dummy])
data <- read_csv2(file.choose())
head(data)
nrow(data)
#-----Turning the dataset's variabels into time series in order to apply the model
inf <- ts(data$inflation, start = c(1998,3,1), frequency = 12)
fuel_index <- ts(data$`fuel_ndex_lag2`, start = c(1998,3,1), frequency = 12)
crisis <- ts(data$`crisis`, start = c(1998,3,1), frequency = 12)
#Buidling the Model
v <- cbind(inf, fuel_index)
lagselect <- VARselect(v, lag.max = 12, type ="both" )
lagselect$selection
lagselect$criteria
#Building the VAR using the (283x1) timeseries object "crisis" as my exogenous variable
var <- VAR(v, p=5, type = "const", season = NULL, exog = crisis)
summary(var)
#-----Loading a CSV file that I've already prepared and turning it into a time series object in #order to use it
#as the new exogenous variable in the prediction block of the model. Since I will predict for only 1 period ahead,
#this new object is exactly the same as "crisis" but with an additional row (284x1). I fill that row with a 1
#(I also tried leaving the extra row blank)
new_crisis <- read_csv2(file.choose())
new_crisis <- ts(new_crisis$crisis, start = c(1998,3,1), frequency = 12)
#VAR forecasting
var_forecast <- predict(var, dumvar = new_crisis, n.ahead = 1, ci = 0.95)
When I run the last command I get the following error message:
"Error in predict.varest(var, dumvar = new_crisis, n.ahead = 1, ci = 0.95) : Row number of dumvar is unequal to n.ahead."
Since I am predicting for 1 additional period, shouldn't my dumvar value be just one extra row longer than my initial exogenous variable? What am I missing? Also, should this extra row be blank or should I make a decision of whether to fill it with a 1 or 0?
I'll appreciate any help on this.

Using svychisq with subset -- error "object of type 'symbol' is not subsettable"

I'm trying to find the adjusted-chi square p-value of the differences in prevalence of the ordinal variable (with 4 levels), y_4, in the dataset as a surveydesign, svydes, made from the dataset, ds, using a subset of the three-valued categorical variable, x_3. I'm interested in the p-value of prevalence of the different levels of y_4, in the subset of respondents that have x_3 == 1. I've tried
library(survey)
y_4 = sample(0:3,100, TRUE)
x_3 = sample(1:3, 100, TRUE)
finalwt = runif(n = 100,min = 0, max = 1)
ds = as.data.frame(cbind(y_4,x_3,finalwt))
svydes = svydesign(ids = ~0, weights = ~finalwt, data = ds)
x1 = subset(svydes,x_3 == 1)
svychisq(~y_4 , x1, statistic ="F",na.rm = TRUE)
I get the following error: "Error in formula[[2]][[2]] : object of type 'symbol' is not subsettable".
Does anyone know what's going on or how to fix it?
Similarly, trying some similar with the survey package, the following gives the same error:
summary(svytable(~y_4,x1),statistic = "F")
I've tried to make a reproducible example, however my actual survey sample has clusters and strata, so I'm not sure how to reproduce that in my example. Suggestions welcome! The actual surveydesign code looks like the following:
svydes= svydesign(id =~`_PSU`,strata =~ `_STSTR`, weights =~finalwt, data = ds, nest = TRUE )
The help page for svychisq says
At the moment, svychisq works only for 2-dimensional tables.
If it did work for one-way tables (which it doesn't) the syntax you have would be correct.

R: IRFs in a SVAR model, can't display specified model

I am doing a SVAR (structural vector auto regression) analysis in which I want to plot IRFs (impulse response functions). My time series have length 137 and I only use 3 variables, furthermore I select 1 lag when specifying the VAR model.
Specifying the VAR model works fine, but when I want to summarize it I get the following error message
VAR_reduced <- VAR(VAR_data_1, p = 1, type = "both")
summary(VAR_reduced)
Error in solve.default(Sigma) :
system is computationally singular: reciprocal condition number = 1.03353e-16
From what I read in another question this error usually come up when there are not enough observations leading to overfitting, but in my example this should not be a problem, as I have enough observations.
As R does not display an error message if I don't run the summary command it is still possible to calculate the IRFs using:
plot(irf(VAR_reduced, n.ahead = 40))
But, the plot seems rather counter-intuitive, as there is no reaction from any variable other than assets. Therefore, my guess is that the error message hints at something I got wrong but haven't realised yet.
Is this correct, that is do I need to solve that error, or do my IRFs have nothing to do with this?
For completeness here is all the code:
library(quantmod)
library(urca)
library(vars)
library(tseries)
getSymbols('CPILFESL',src='FRED')
getSymbols('INDPRO',src='FRED')
getSymbols('WALCL',src='FRED')
CPI <- ts(CPILFESL, frequency = 12, start = c(1957,1))
output <- ts(INDPRO, frequency = 12, start = c(1919,1))
assets <- as.xts(WALCL)
assets <- to.monthly(assets, indexAt='yearmon', drop.time = TRUE)
assets <- ts(assets[,4], frequency = 12, start = c(2002,12))
assets <- window(assets, start = c(2008,9), end = c(2020,1))
CPI <- window(CPI, start = c(2008,9), end = c(2020,1))
output <- window(output, start = c(2008,9), end = c(2020,1))
loutput <- log(output)
lCPI <- log(CPI)
data_0 <- cbind(loutput, lCPI, assets)
plot(data_0)
VAR_data_1 <- ts.intersect(diff(loutput), diff(lCPI), diff(assets, differences = 2))
VAR_reduced <- VAR(VAR_data_1, p = 1, type = "both")
summary(VAR_reduced)

object of type 'closure' is not subsettable with pglm::pglm function in R to estimate panel logit

I'm trying to use the pglm function from pglm package to obtain a random effects panel estimation of an ordered logistic model.
When testing the standalone function pglm it gives me the desired result. Here's my specifications:
pglm::pglm(as.numeric(y)~x1+x2+x3, df,family = pglm::ordinal('logit'),
model = "random", method = "bfgs", print.level = 3, R = 5, index = 'Year')
where:
1. all explanatory variable {x1,x2,x3} are numerical doubles
2. y is an ordered categorical variable ranging from 1 to 22
The table also includes a "Year" variable ranging from 1996 to 2014, that will be used to build the panel data.
When trying to use the pglm function in another function:
pglm_fun <- function(df){
df <- data.frame(df)
pglm::pglm(as.numeric(y)~x1+x2+x3, data = df,family = pglm::ordinal('logit'),
model = "random", method = "bfgs", print.level = 3, R = 5, index = 'Year')
}
I get an error message occurring when calculating
pdata.frame(data, index)
Error in x[, !na.check] : object of type 'closure' is not subsettable.
When trying to run the code in the console, I do not have such an error and the pdata.frame() function works.
Example of data frame:
df = data.frame(y = sort(rep(1:4,20)),
x1 = rnorm(80),
x2 = rnorm(80),
x3 = rnorm(80),
Year = rep(sample(1995:1998, replace = FALSE),20))
Ok, I figured it out. In this case there are environments problems in R, since the funciton I was using was making copies of variables instead of using global variables directly.
The problem is solved by declaring individual names in the function and leaving the data frame as a global variable:
pglm_fun1 <- function("y","x1","x2","x3","Year"){
require("pglm")
pglm(as.numeric(y)~x1+x2+x3, data = df, family = ordinal('logit'),
model = "random", method = "bfgs", print.level = 3, R = 5, index = 'Year')
}

Resources