Automate Selection of Optimum GARCH model in R - r

I've created 7 GARCH models in R (CIGarch1, CIGarch2, etc.) using the functions ugarchspec and ugarchfit. I would like to automate the selection of the GARCH model with the lowest Akaike score and the model name to appear in my ugarchboot function. I tried the following code:
# Create Akaike value table to find model with minimum score ----
CIGarch_Akaike_tbl <- data.frame(model = c("CIGarch1","CIGarch2","CIGarch3","CIGarch4","CIGarch5","CIGarch6","CIGarch7"),akaike_score = c(CIGarch1_Akaike, CIGarch2_Akaike, CIGarch3_Akaike,
CIGarch4_Akaike, CIGarch5_Akaike, CIGarch6_Akaike,
CIGarch7_Akaike))
# Looks for minimum Akaike value and returns the model name to be used in the following predict function
min_Akaike_Garch_model <- as.character(CIGarch_Akaike_tbl[which(CIGarch_Akaike_tbl$akaike_score==min(CIGarch_Akaike_tbl$akaike_score)),1])
# Predicting stock price with ugarchboot ----
# Update model based on best_Garch_model
CIpredict <- ugarchboot(min_Akaike_Garch_model, n.ahead = 10,
method = c("Partial", "Full")[1])
I get the following error message:
"Error in UseMethod("ugarchboot") :
no applicable method for 'ugarchboot' applied to an object of class "character".
I've tried other as.#### functions, but to no avail. Is there a way to do what I'm trying to do?

Related

Unable to conduct variable importance in r

I'm trying to test the variable importance before running the actual regression. But, when I attempt to do so, I get this error:
Error in varImp(regressor, scale = FALSE) :
trying to get slot "responses" from an object (class "randomForest.formula") that is not an S4 object
I've tried looking up the error, but there wasn't much information available online. What can I do to fix this?
all = read.csv('https://raw.githubusercontent.com/bandcar/massShootings/main/all.csv')
# Check Variable importance with randomForest
regressor <- randomForest::randomForest(total_victims ~ . , data = all, importance=TRUE) # fit the random forest with default parameter
caret::varImp(regressor, scale = FALSE) # get variable importance, based on mean decrease in accuracy

Cannot use coxph.predict for type="expected" with newdata in Competing Risks context

I'm using a Cox Proportional Hazards (survival::coxph) model in a competing risks context- i.e. multiple event types with one endpoint for each observation. I'm having a hard time using the coxph.predict function to show an estimate of expected number of events given a supplied set of covariates and follow-up time.
Here is an example using the mgus2 dataset in the survival package:
library(survival)
#Modify data so each subject transitions only once to a state.
crdata <- mgus2
crdata$etime <- pmin(crdata$ptime, crdata$futime)
crdata$event <- ifelse(crdata$pstat==1, 1, 2*crdata$death)
crdata$event <- factor(crdata$event, 0:2, c("censor", "PCM", "death"))
cfit <- coxph(Surv(etime, event) ~ I(age/10) + sex + mspike,
id = id, crdata)
Once I fit a model, and create a "newdata" data frame, R throws an error.
I tried using a from-scratch dataframe but this results in an error suggesting that the column size or the number of rows does not mesh:
#providing both follow-up time and covariates
nd=data.frame(etime=81 ,sex= "M", age=60, mspike=1.2)
predict(cfit, newdata=nd ,type="expected")
> Data is not the same size as it was in the original fit
I get the same issue Using model.frame when extracting the same data.frame used fitting the model.
nd=model.frame(cfit)
predict(cfit,newdata=nd,type="expected")
> Data is not the same size as it was in the original fit
This results in the same error. Trying to use the original data frame to make predictions doesn't work either:
nd=crdata[1,]
predict(cfit,newdata=nd,type="expected")
> Data is not the same size as it was in the original fit
I'm wondering what I'm missing here. Thanks in advance!
I've updated my survival package from 2.7 to 3.1 and the error thrown states that "expected" predict type is not available for multistate coxph.
> predict(fit,type="expected",newdata=newdat)
Error in predict.coxphms(fit, type = "expected", newdata = newdat) :
predict method not yet available for multistate coxph

Error when calculating variable importance with categorical variables using the caret package (varImp)

I've been trying to compute the variable importance for a model with mixed scale features using the varImp function in the caret package. I've tried a number of approaches, including renaming and coding my levels numerically. In each case, I am getting the following error:
Error in auc3_(actual, predicted, ranks) :
Not compatible with requested type: [type=character; target=double].
The following dummy example should illustrate my point (edited to reflect #StupidWolf's correction):
library(caret)
#create small dummy dataset
set.seed(124)
dummy_data = data.frame(Label = factor(sample(c("a","b"),40, replace = TRUE)))
dummy_data$pred1 = ifelse(dummy_data$Label=="a",rnorm(40,-.5,2),rnorm(40,.5,2))
dummy_data$pred2 = factor(ifelse(dummy_data$Label=="a",rbinom(40,1,0.3),rbinom(40,1,0.7)))
# check varImp
control.lvq <- caret::trainControl(method="repeatedcv", number=10, repeats=3)
model.lvq <- caret::train(Label~., data=dummy_data,
method="lvq", preProcess="scale", trControl=control.lvq)
varImp.lvq <- caret::varImp(model.lvq, scale=FALSE)
The issue persists when using different models (like randomForest and SVM).
If anyone knows a solution or can tell me what is going wrong, I would highly appreciate that.
Thanks!
When you call varImp on lvq , it defaults to filterVarImp() because there is no specific variable importance for this model. Now if you check the help page:
For two class problems, a series of cutoffs is applied to the
predictor data to predict the class. The sensitivity and specificity
are computed for each cutoff and the ROC curve is computed.
Now if you read the source code of varImp.train() that feeds the data into filterVarImp(), it is the original dataframe and not whatever comes out of the preprocess.
This means in the original data, if you have a variable that is a factor, it cannot cut the variable, it will throw and error like this:
filterVarImp(data.frame(dummy_data$pred2),dummy_data$Label)
Error in auc3_(actual, predicted, ranks) :
Not compatible with requested type: [type=character; target=double].
So using my example and like you have pointed out, you need to onehot encode it:
set.seed(111)
dummy_data = data.frame(Label = rep(c("a","b"),each=20))
dummy_data$pred1 = rnorm(40,rep(c(-0.5,0.5),each=20),2)
dummy_data$pred2 = rbinom(40,1,rep(c(0.3,0.7),each=20))
dummy_data$pred2 = factor(dummy_data$pred2)
control.lvq <- caret::trainControl(method="repeatedcv", number=10, repeats=3)
ohe_data = data.frame(
Label = dummy_data$Label,
model.matrix(Label ~ 0+.,data=dummy_data))
model.lvq <- caret::train(Label~., data=ohe_data,
method="lvq", preProcess="scale",
trControl=control.lvq)
caret::varImp(model.lvq, scale=FALSE)
ROC curve variable importance
Importance
pred1 0.6575
pred20 0.6000
pred21 0.6000
If you use a model that doesn't have a specific variable importance method, then one option is that you can already calculate the variable importance first, and run the model after that.
Note that this problem can be circumvented by replacing ordinal features (with d levels) by its (d-1)-dimensional indicator encoding:
model.matrix(~dummy_data$pred2-1)[,1:(length(levels(dummy_data$pred2)-1)]
However, why does varImp not handle this automatically? Further, this has the drawback that it yields an importance score for each of the d-1 indicators, not one unified importance score for the original feature.

Anova test regression vs. knn in R

I'm trying to take an anova test for two different models in R: a lm model vs. a knn model. The problem is this error appears:
Error in anova.lmlist(object, ...) : models were not all fitted to the same size of dataset
I think this make sense because I want to know if there are statistical evidences of difference between models. In order to give you a reproducible example, here you have:
#Getting dataset
xtra <- read.csv("california.dat", comment.char="#")
names(xtra) <- c("Longitude", "Latitude", "HousingMedianAge",
"TotalRooms", "TotalBedrooms", "Population", "Households",
"MedianIncome", "MedianHouseValue")
n <- length(names(xtra)) - 1
names(xtra)[1:n] <- paste ("X", 1:n, sep="")
names(xtra)[n+1] <- "Y"
#Regression model
reg.model<-lm(Y~.,data=xtra)
#Knn-model
knn.model<-kknn(Y~.,train=xtra,test=xtra,kernel = "optimal")
anova(reg.model,knn.model)
What I'm doing wrong?
Thanks in advance.
My guess would be that the two models aren't comparable with anova() and this error is being thrown because one of the models will be deemed empty.
From the documentation for anova(object,...):
object - an object containing the results returned by a model fitting
function (e.g., lm or glm).
... - additional objects of the same type.
When you look to see if the models can be compared you can see they're of different types:
> class(knn.model)
[1] "kknn"
> class(reg.model)
[1] "lm"
Probably more importantly if you try and run anova() for knn.model you can see that you cannot apply the function to a kknn object:
> anova(knn.model)
Error in UseMethod("anova") :
no applicable method for 'anova' applied to an object of class "kknn"

how to calculate Probability for CNN model in R?

I have built and trained CNN model for Image classification using MXNET package and I predicted Test result against model data using below snippet of code.
pred_test <- predict(model,test_array)
pred_test_label <- max.col(t(pred_test))-1
print(pred_test_label)
Along with this I wanted to know what is the probability that Test Result matching with Model data, is there any way I can check on this?
You can do something like this:
# Prediction of test set
preds <- predict(model, test.array)
pred.label = max.col(t(preds))-1
accuracy <- function(label, pred) {
ypred = max.col(t(as.array(pred)))
return(sum((as.array(label) + 1) == ypred) / length(label))
}
print(paste0("Finish prediction...accuracy=", accuracy(test.y, preds)))
Add all the elements of pred_test column variable to get say out_sum and then divide every element of pred_text by out_sum. This way now output will sum to one and can be taken it as probability of each output node of CNN.
Alternatively, you can also get probability, if you could configure CNN model as below (note use of out_activation="softmax" below) at the time of model initialization:
model <- mx.mlp(train.x, train.y, hidden_node=10, out_node=5, out_activation="softmax")
Using this configuration, CNN model bound to give output sum to be 1 and thus can be taken each node of output as probability of each class corresponding to the each node of output.

Resources