From other posts on this platform, I found that Li Mak test on the standardised residuals is more appropriate to test a fitted GARCH model than the Ljung Box test. The Weighted.LM.test() from the WeightedPortTest package in R is used for it.
I’m trying this code but I’m getting an error. Since it a univariate test, I have extracted standardised residuals and cvar from the slot name fit:
std.resid1<-dccfit#mfit$stdresid[,1]
cvar1<-dccfit#mfit$cvar[,1]
Weighted.LM.test(std.resid1, cvar1, lag=10)
Error in std.resid1, cvar1, : Length of x and h.t must match
How do it get this to work? Any help is very much appreciated.
Firstly, you should not take standardized residuals so
instead of dccfit#mfit$stdresid[,1], take: dccfit#model$residuals[,1].
Then, in the documentation of the Weighted.LM.test, it says that h.t should be a numeric vector of the conditional variances, thus take instead:
dccfit#model$sigma[,1]^2
run the test:
Weighted.LM.test(dccfit#model$residuals[,1], dccfit#model$sigma[,1]^2, lag = 2,type = c("correlation", "partial"),fitdf = 1, weighted = FALSE)
Please correct me if I am wrong.
Related
I am trying to understand bootstrapping in R using the Boot package. I am trying to do a simple spearman rank correlation. I have some code based on a tutorial I found online but am having some issues interpreting the output. The code is below:
*Note: these data are just random numbers I used to help me learn how to run the boot function. They do not represent anything.
test_a=data.frame(v1 = c(1,5,8,3,2,9,5,10,3,5), v2 = c(3,4,7,2,1,10,3,8,8,2))
attach(test_a)
cor.test(v1, v2, method = "spearman")
function_2 = function(test_a, i) {
d2 = test_a[i,]
return(cor(d2$v1, d2$v2, method="spearman"))
}
set.seed(1)
test_boot = boot(test_a, function_2, R=1000)
test_boot
I get the following output:
boot(data = test_a, statistic = function_2, R = 1000)
Bootstrap Statistics :
original bias std. error
t1* 0.6397639 -0.04253283 0.2547429
Which all makes sense to me. But I guess my confusion is with the boot.ci function
ci = boot.ci(test_boot, conf=0.95)
I get the following output:
> ci
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 1000 bootstrap replicates
CALL :
boot.ci(boot.out = test_boot, conf = 0.95)
Intervals :
Level Normal Basic
95% ( 0.1830, 1.1816 ) ( 0.3173, 1.2987 )
Level Percentile BCa
95% (-0.0192, 0.9622 ) (-0.1440, 0.9497 )
Calculations and Intervals on Original Scale
And this is where I am a bit lost. I can't really find a source that explains in layman's terms in the context of a correlation coefficient, because obviously you cannot have a correlation > 1.0, yet this spits out a confidence interval (at least with two methods) that goes above 1. The sources that discuss these different confidence intervals frankly have been a bit confusing. Is there any one of these that is better for certain parameters than others? It is also possible I am completely misinterpreting what I am doing as well.
I also include the results of plot(test_boot) for your reference.
The eventual goal (with actual data) once I am more confident in running and interpreting results of bootstrapping would be to run tests for trends with time (Mann-Kendall test for trends and Thiel-Sen Slope Estimator, I cannot use parametric statistics with my data :/) and compare my observed dataset with bootstrapped samples.
Any help would really be appreciated. Thank you in advance!
The two top intervals are normal theory intervals. They use the bootstrap to calculate the standard error and then make symmetric intervals that may or may not respect the bounds of the statistic. The bottom two intervals are percentile intervals (the first is a raw percentile interval and the second is a bias-corrected, accelerated interval). These identify particular values of the bootstrap statistics that define the CI. As such, they will always respect the theoretical bounds of the statistic being bootstrapped.
I am having trouble getting the Brier Score for my Machine Learning Predictive models. The outcome "y" was categorical (1 or 0). Predictors are a mix of continuous and categorical variables.
I have created four models with different predictors, I will call them "model_1"-"model_4" here (except predictors, other parameters are the same). Example code of my model is:
Model_1=rfsrc(y~ ., data=TrainTest, ntree=1000,
mtry=30, nodesize=1, nsplit=1,
na.action="na.impute", nimpute=3,seed=10,
importance=T)
When I run the "Model_1" function in R, I got the results:
My question was how can I get the predicted possibility for those 412 people? And how to find the observed probability for each person? Do I need to calculate by hand? I found the function BrierScore() in "DescTools" package.
But I tried "BrierScore(Model_1)", it gives me no results.
codes I added:
library(scoring)
library(DescTools)
BrierScore(Raw_SB)
class(TrainTest$VL_supress03)
TrainTest$VL_supress03_nu<-as.numeric(as.character(TrainTest$VL_supress03))
class(TrainTest$VL_supress03_nu)
prediction_Raw_SB = predict(Raw_SB, TrainTest)
BrierScore(prediction_Raw_SB, as.numeric(TrainTest$VL_supress03) - 1)
BrierScore(prediction_Raw_SB, as.numeric(as.character(TrainTest$VL_supress03)) - 1)
BrierScore(prediction_Raw_SB, TrainTest$VL_supress03_nu - 1)
I tried some codes: have so many error messages:
One assumption I am making about your approach is that you want to compute the BrierScore on the data you train your model on (which is usually not the correct approach, google train-test split if you need more info there).
In general, therefore you should reflect on whether your approach is correct there.
The BrierScore method in DescTools only has a defined method for glm models, otherwise, it expects as input a vector of predicted probabilities and a vector of true values (see ?BrierScore).
What you would need to do though is to predict on your data using:
prediction = predict(model_1, TrainTest, na.action="na.impute")
and then compute the brier score using
BrierScore(as.numeric(TrainTest$y) - 1, prediction$predicted[, 1L])
(Note, that we transform TrainTest$y into a numeric vector of 0's and 1's in order to compute the brier score.)
Note: The randomForestSRC package also prints a normalized brier score when you call print(prediction).
In general, using one of the available workbenches for machine learning in R (mlr3, tidymodels, caret) might simplify this approach for you and prevent a lot of errors in this direction. This is a really good practice, especially if you are less experienced in ML as it can prevent many errors.
See e.g. this chapter in the mlr3 book for more information.
For reference, here is some very similar code using the mlr3 package, automatically also taking care of train-test splits.
data(breast, package = "randomForestSRC") # with target variable "status"
library(mlr3)
library(mlr3extralearners)
task = TaskClassif$new(id = "breast", backend = breast, target = "status")
algo = lrn("classif.rfsrc", na.action = "na.impute", predict_type = "prob")
resample(task, algo, rsmp("holdout", ratio = 0.8))$score(msr("classif.bbrier"))
I am currently trying to fit an adaBoost model in R using the gbm.fit model. I have tried everything I could but in the end my model keeps giving me prediction values outside of [0,1]. I understand that type = "response" only works for bernoulli but I keep getting values just outside of 0,1. Any thoughts? Thanks!
GBMODEL <- gbm.fit(
x=training.set,
y=training.responses,
distribution="adaboost",
n.trees=5000,
interaction.depth=1,
shrinkage=0.005,
train.fraction=1,
)
predictionvalues = predict(GBMODEL,
newdata=test.predictors,
n.trees=5000,
type="response")
it is correct to obtain y range outside [0,1] by gbm package choosing "adaboost" as your loss function.
After training, adaboost predicts category by the sign of output.
For instance, for binary class problem, y{-1,1}, the class lable will be signed to the sign of output y. So if you got y=0.9 or y=1.9 will give you the same result-observation belongs to y=1 class. However, y=1.9 simply suggests a more confident conclusion than y=0.9. (if you want to know why, I would suggest you to read margin-based explanation of adaboost, you will find very similar result with SVM).
Hope this can help you.
This may not be completely accurate mathematically, but I just did pnorm( predicted values) and you get values from 0 to 1, because the adaboost predicted values appear to be scaled on a Normal(0,1).
I am attempting to use boot.ci from R's boot package to calculate bias- and skew-corrected bootstrap confidence intervals from a parametric bootstrap. From my reading of the man pages and experimentation, I've concluded that I have to compute the jackknife estimates myself and feed them into boot.ci, but this isn't stated explicitly anywhere. I haven't been able to find other documentation, although to be fair I haven't looked at the original Davison and Hinkley book on which the code is based ...
If I naively run b1 <- boot(...,sim="parametric") and then boot.ci(b1), I get the error influence values cannot be found from a parametric bootstrap. This error occurs if and only if I specify type="all" or type="bca"; boot.ci(b1,type="bca") gives the same error. So does empinf(b1). The only way I can get things to work is to explicitly compute jackknife estimates (using empinf() with the data argument) and feed these into boot.ci.
Construct data:
set.seed(101)
d <- data.frame(x=1:20,y=runif(20))
m1 <- lm(y~x,data=d)
Bootstrap:
b1 <- boot(d$y,
statistic=function(yb,...) {
coef(update(m1,data=transform(d,y=yb)))
},
R=1000,
ran.gen=function(d,m) {
unlist(simulate(m))
},
mle=m1,
sim="parametric")
Fine so far.
boot.ci(b1)
boot.ci(b1,type="bca")
empinf(b1)
all give the error described above.
This works:
L <- empinf(data=d$y,type="jack",
stype="i",
statistic=function(y,f) {
coef(update(m1,data=d[f,]))
})
boot.ci(b1,type="bca",L=L)
Does anyone know if this is the way I'm supposed to be doing it?
update: The original author of the boot package responded to an e-mail:
... you are correct that the issue is that you are doing a
parametric bootstrap. The bca intervals implemented in boot are
non-parametric intervals and this should have been stated
explicitely somewhere. The formulae for parametric bca intervals
are not the same and depend on derivatives of the least favourable
family likelihood when there are nuisance parameters as in your
case. (See pp 206-207 in Davison & Hinkley) empinf assumes that the
statistic is in one of forms used for non-parametric bootstrapping
(which you did in your example call to empinf) but your original
call to boot (correctly) had the statistic in a different form
appropriate for parametric resampling.
You can certainly do what you're doing but I am not sure of the
theoretical properties of mixing parametric resampling with
non-parametric interval estimation.
After looking at the boot.ci page I decided to use a boot-object constructed along the lines of an example in Ch 6 of Davison and Hinkley and see whether it generated the errors you observed. I do get a warning but no errors.:
require(boot)
lmcoef <- function(data, i){
d <- data[i, ]
d.reg <- lm(y~x, d)
c(coef(d.reg)) }
lmboot <- boot(d, lmcoef, R=999)
m1
boot.ci(lmboot, index=2) # I am presuming that the interest is in the x-coefficient
#----------------------------------
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 999 bootstrap replicates
CALL :
boot.ci(boot.out = lmboot, index = 2)
Intervals :
Level Normal Basic
95% (-0.0210, 0.0261 ) (-0.0236, 0.0245 )
Level Percentile BCa
95% (-0.0171, 0.0309 ) (-0.0189, 0.0278 )
Calculations and Intervals on Original Scale
Warning message:
In boot.ci(lmboot, index = 2) :
bootstrap variances needed for studentized intervals
Q1:
I have been trying to get the AUC value for a classification problem and have been trying to use e1071 and ROCR packages in R for this. ROCR has a nice example "ROCR.simple" which has prediction values and label values.
library(ROCR)
data(ROCR.simple)
pred<-prediction(ROCR.simpe$predictions, ROCR.simple$labels)
auc<-performance(pred,"auc")
This gives the AUC value, no problem.
MY PROBLEM is: How do I get the type of data given by ROCR.simple$predictions in the above example?
I run my analysis like
library(e1071)
data(iris)
y<-Species
x<-iris[,1:2]
model<-svm(x,y)
pred<-predict(model,x)
Upto here I'm ok.
Then how do I get the kind of predictions that ROCR.simpe$predictions give?
Q2:
there is a nice example involving ROCR.xvals. This is a problem with 10 cross validations.
They run
pred<-prediction(ROCR.xval$predictions,ROCR.xval$labels)
auc<-performance(pred,"auc")
This gives results for all 10 cross validations.
My problem is:
How do I use
model<-svm(x,y,cross=10) # where x and y are as given in Q1
and get all 10 results of predictions and labels into a list as given in ROCR.xvals?
Q1. You could use
pred<-prediction(as.numeric(pred), as.numeric(iris$Species))
auc<-performance(pred,"auc")
BUT. number of classes is not equal to 2.
ROCR currently supports only evaluation of binary classification tasks (according to the error I got)
Q2. I don't think that the second can be done the way you want. I can only think to perform cross validations manualy i.e.
Get resample.indices (from package peperr)
cv.ind <- resample.indices(nrow(iris), sample.n = 10, method = c("cv"))
x <- lapply(cv.ind$sample.index,function(x){iris[x,1:2]})
y <- lapply(cv.ind$sample.index,function(x){iris[x,5]})
then generate models and predictions for each cv sample
model1<-svm(x[[1]],y[[1]])
pred1<-predict(model1,x[[1]])
etc.
Then you could manualy construct a list like ROCR.xval