How to solve an error in VAR model estimation - r

I want to do Impulse response analysis by using Vector Error Correlation Model. Then I saw a this error message.
library(urca)
library(vars)
log_mort = as.data.frame(log1p(morts)) # morts is a vector of 5 values
log_mort
colnames(log_mort) = c("IandP","Cancer","Respiratory","Circulatory","Covid")
jo.mort <- ca.jo(log_mort,ecdet = "none",type = "eigen",K =2 ,spec = "transitory")
summary(jo.mort)
vec2var.mort = vec2var(jo.mort, r = 2)
irf(vec2var.mort,ortho = TRUE, cumulative = FALSE, boot = TRUE, ci = 0.95,
runs = 100, seed = NULL)
Error message:
Error in chol.default(sigma.u) : the leading minor of order 5 is not positive definite
Please teach me what happened and how to solve it

Related

Why did the MH sampler fail?

I am new to mirt models and tried to estimate a graded response model with the mirt-package in Rstudio using the mirt command:
x <- pid[,c(items.25,items.1,items.13)]
#x= dataframe with 1228 observations of 24 variables
# define bifactor model
#model syntax
mirtsyn <- paste0("G = ",paste0(colnames(x), collapse = ","),"\n",
"F1 = ",paste0(items.25,collapse = ","),"\n",
"F2 = ",paste0(items.1,collapse = ","),"\n",
"F3 = ",paste0(items.13,collapse = ","))
mirtmodel <- mirt.model(mirtsyn, itemnames=colnames(x))
# estimate model
fit <- mirt(x,model = mirtmodel,itemtype = "graded", SE = FALSE, method = "MHRM")
#itemtype = graded means its using a graded response model,
#method= MHRM (Metropolis-Hastings Robbins-Monro) means it uses stochastic methods for estimation'
I got this error message:
Error in draw.thetas(theta0 = gtheta0[[g]], pars = pars[[g]], fulldata = Data$fulldata[[g]], :
MH sampler failed. Model is likely unstable or may need better starting valuesFALSE

ParBayesianOptimization suddenly fails while logging epoch results

I am currently using the R package ParBayesianOptimization to tune parameters for ML methods. While searching for an optimal cost parameter for the svmLinear2 model (contained in caret), the optimization stopped with a sudden error after successfully completing 15 iterations.
Here is the error traceback:
Error in rbindlist(l, use.names, fill, idcol) :
Item 2 has 9 columns, inconsistent with item 1 which has 10 columns. To fill missing columns use fill=TRUE.
7.
rbindlist(l, use.names, fill, idcol)
6.
rbind(deparse.level, ...)
5.
rbind(scoreSummary, data.table(Epoch = rep(Epoch, nrow(NewResults)),
Iteration = 1:nrow(NewResults) + nrow(scoreSummary), inBounds = rep(TRUE,
nrow(NewResults)), NewResults))
4.
addIterations(optObj, otherHalting = otherHalting, iters.n = iters.n,
iters.k = iters.k, parallel = parallel, plotProgress = plotProgress,
errorHandling = errorHandling, saveFile = saveFile, verbose = verbose,
...)
3.
ParBayesianOptimization::bayesOpt(FUN = ...
So somehow the data tables storing the summary information each iteration suddenly differ in the number of columns present. Is this a common bug with the ParBayesianOptimization package? Has anyone else encountered a similar problem? Did you find a fix - other than rewriting the addIterations function to fill the missing columns?
EDIT:I don't have an explanation for why the error may suddenly occur after a number of successful iterations. However, this issue has reoccurred when using svmLinear and svmRadial. I was able to reconstruct a similar case with the same error on the iris dataset:
library(data.table)
library(caret)
library(ParBayesianOptimization)
set.seed(1234)
bayes.opt.bounds = list()
bayes.opt.bounds[["svmRadial"]] = list(C = c(0,1000),
sigma = c(0,500))
svmRadScore = function(...){
grid = data.frame(...)
mod = caret::train(Species~., data=iris, method = "svmRadial",
trControl = trainControl(method = "repeatedcv",
number = 7, repeats = 5),
tuneGrid = grid)
return(list(Score = caret::getTrainPerf(mod)[, "TrainAccuracy"], Pred = 0))
}
bayes.create.grid.par = function(bounds, n = 10){
grid = data.table()
params = names(bounds)
grid[, c(params) := lapply(bounds, FUN = function(minMax){
return(runif(n, minMax[1], minMax[2]))}
)]
return(grid)
}
prior.grid.rad = bayes.create.grid.par(bayes.opt.bounds[["svmRadial"]])
svmRadOpt = ParBayesianOptimization::bayesOpt(FUN = svmRadScore,
bounds = bayes.opt.bounds[["svmRadial"]],
initGrid = prior.grid.rad,
iters.n = 100,
acq = "ucb", kappa = 1, parallel = FALSE,plotProgress = TRUE)
Using this example, the error occurred on the 9th epoch.
Thanks!
It appears that the scoring function returned NAs in place of accuracy measures leading to the error later downstream. This has been described by the library's creator at
https://github.com/AnotherSamWilson/ParBayesianOptimization/issues/33.
It looks like the SVM is trying a cost of 0 during the 9th iteration. Given the problem statement the SVM is solving, the cost parameter should probably be positive.
According to AnotherSamWilson, this error may commonly occur when the scoring function "returns something unexpected".

How to use cvts on an hybridModel with XREG?

I am trying to do a cross validation using the cvts function from the forecastHybrid package using an "an" model (ARIMA + NNETAR) with external regressors.
I have two variables with 100 observations : Y and X
Note that:
length(Y) == length(X)
TRUE
I did this:
crossv =cvts(Y,
FUN=hybridModel, models="an", a.args=list(xreg=X,n.args=list(xreg=X),
rolling = TRUE, windowSize = 84, maxHorizon = 1, horizonAverage = FALSE)
and got this error
Error in { :
task 1 failed - "variable lengths differ (found for 'xregg')"
if I try to pass it as a function.
CUSTOM=function(x){hybridModel(x, models="an", a.args=list(xreg=X),n.args=list(xreg=X))}
crossv2 = cvts(Y,
FUN=CUSTOM,
rolling = TRUE,
windowSize = 84,
maxHorizon = 1,
horizonAverage = FALSE)
I get:
Error in { : task 1 failed - "object 'X' not found"
I can, of course do cross validation separately for nnetar and arima and then averaging the cross validation forecasting but any ideas why it’s not working via cvts + hybrid model ?
Thanks a lot.
I finally found the answer. We should add an other xreg argument in the FUN function. Then, a second xreg argument (from the cvts function) should be written outside the function. The value of the second xreg are then passed to the first xreg.
y = ts(rnorm(100), start = c(1999,1), frequency = 12)
x = ts(rnorm(100), start = c(1999,1), frequency = 12)
cv = cvts(y, FUN = function(z, xreg = xreg),
forecastHybrid::hybridModel(z,models = "an")},
xreg=as.matrix(x),rolling = TRUE, windowSize = 84,
maxHorizon = 1, horizonAverage = FALSE)
It worked!
accuracy(cv)
ME RMSE MAE
Forecast Horizon 1 -0.2173456 0.9328878 0.7368945

Random Forest using R

i'm working on building a predictive model for breast cancer data using R. After performing gcrma normalization, i generated the potential predictor variables. Now while i run the RF algorithm i encountered the following error
rf_output=randomForest(x=pred.data, y=target, importance = TRUE, ntree = 25001, proximity=TRUE, sampsize=sampsizes)
Error: Error in randomForest.default(x = pred.data, y = target, importance = TRUE, : Can not handle categorical predictors with more than 53 categories.
code:
library(randomForest)
library(ROCR)
library(Hmisc)
library(genefilter)
setwd("E:/kavya's project_work/final")
datafile<-"trainset_gcrma.txt"
clindatafile<-read.csv("mod clinical_details.csv")
outfile="trainset_RFoutput.txt"
varimp_pdffile="trainset_varImps.pdf"
MDS_pdffile="trainset_MDS.pdf"
ROC_pdffile="trainset_ROC.pdf"
case_pred_outfile="trainset_CasePredictions.txt"
vote_dist_pdffile="trainset_vote_dist.pdf"
data_import=read.table(datafile, header = TRUE, na.strings = "NA", sep="\t")
clin_data_import=clindatafile
clincaldata_order=order(clin_data_import[,"GEO.asscession.number"])
clindata=clin_data_import[clincaldata_order,]
data_order=order(colnames(data_import)[4:length(colnames(data_import))])+3 #Order data without first three columns, then add 3 to get correct index in original file
rawdata=data_import[,c(1:3,data_order)] #grab first three columns, and then remaining columns in order determined above
header=colnames(rawdata)
X=rawdata[,4:length(header)]
ffun=filterfun(pOverA(p = 0.2, A = 100), cv(a = 0.7, b = 10))
filt=genefilter(2^X,ffun)
filt_Data=rawdata[filt,]
#Get potential predictor variables
predictor_data=t(filt_Data[,4:length(header)])
predictor_names=c(as.vector(filt_Data[,3])) #gene symbol
colnames(predictor_data)=predictor_names
target= clindata[,"relapse"]
target[target==0]="NoRelapse"
target[target==1]="Relapse"
target=as.factor(target)
tmp = as.vector(table(target))
num_classes = length(tmp)
min_size = tmp[order(tmp,decreasing=FALSE)[1]]
sampsizes = rep(min_size,num_classes)
rf_output=randomForest(x=pred.data, y=target, importance = TRUE, ntree = 25001, proximity=TRUE, sampsize=sampsizes)
error:"Error in randomForest.default(x = pred.data, y = target, importance = TRUE, : Can not handle categorical predictors with more than 53 categories."
as i'm new to Machine learning i'm unable to proceed. kindly do the needful.
Thnks in advance.
It is hard to say without knowing the data. Run class or summary on all your predictor variables to ensure that they are not accidentally interpreted as characters or factors. If you really do have more than 53 levels, you will have to convert them to binary variables. Example:
mtcars$automatic <- mtcars$am == 0
mtcars$manual <- mtcars$am == 1

R: Holt-Winters with daily data (forecast package)

In the following example, I am trying to use Holt-Winters smoothing on daily data, but I run into a couple of issues:
# generate some dummy daily data
mData = cbind(seq.Date(from = as.Date('2011-12-01'),
to = as.Date('2013-11-30'), by = 'day'), rnorm(731))
# convert to a zoo object
zooData = as.zoo(mData[, 2, drop = FALSE],
order.by = as.Date(mData[, 1, drop = FALSE], format = '%Y-%m-%d'),
frequency = 7)
# attempt Holt-Winters smoothing
hw(x = zooData, h = 10, seasonal = 'additive', damped = FALSE,
initial = 'optimal', exponential = FALSE, fan = FALSE)
# no missing values in the data
sum(is.na(zooData))
This leads to the following error:
Error in ets(x, "AAA", alpha = alpha, beta = beta, gamma = gamma,
damped = damped, : You've got to be joking. I need more data! In
addition: Warning message: In ets(x, "AAA", alpha = alpha, beta =
beta, gamma = gamma, damped = damped, : Missing values encountered.
Using longest contiguous portion of time series
Emphasis mine.
Couple of questions:
1. Where are the missing values coming from?
2. I am assuming that the "need more data" arises from attempting to estimate 365 seasonal parameters?
Update 1:
Based on Gabor's suggestion, I have recreated a fractional index for the data where whole numbers are weeks.
I have a couple of questions.
1. Is this is an appropriate way of handling daily data when the periodicity is assumed to be weekly?
2. Is there is a more elegant way of handling the dates when working with daily data?
library(zoo)
library(forecast)
# generate some dummy daily data
mData = cbind(seq.Date(from = as.Date('2011-12-01'),
to = as.Date('2013-11-30'), by = 'day'), rnorm(731))
# conver to a zoo object with weekly frequency
zooDataWeekly = as.zoo(mData[, 2, drop = FALSE],
order.by = seq(from = 0, by = 1/7, length.out = 731))
# attempt Holt-Winters smoothing
hwData = hw(x = zooDataWeekly, h = 10, seasonal = 'additive', damped = FALSE,
initial = 'optimal', exponential = FALSE, fan = FALSE)
plot(zooDataWeekly, col = 'red')
lines(fitted(hwData))
hw requires a ts object not a zoo object. Use
zooDataWeekly <- ts(mData[,2], frequency=7)
Unless there is a good reason for specifying the model exactly, it is usually better to let R select the best model for you:
fit <- ets(zooDataWeekly)
fc <- forecast(fit)
plot(fc)

Resources