fitted GEV df parameters / different results for different packages (lmomRFA, lmom, extRemes, nsRFA) - r

I am trying to fit a GEV df using L-Moment method for annual maxima value, but when using different packages, the parameters of the fitted GEV distribution (Location, scale and shape) were not similar.
I used different packages and the odd one was lmomRFA, could any one help me to identify the problem?
here is sample of what I am doing :
install.packages("extRemes")
install.packages("lmom")
install.packages("lmomRFA")
install.packages("nsRFA")
library (nsRFA)
library (extRemes)
library (lmomRFA)
library (lmom)
amax_standard <- c (0.6274510, 0.7545455, 0.6521739, 1.5102041, 2.0937500, 1.0000000, 0.7094017, 1.0315789, 1.3207547)
amax_standard_matrix <- as.matrix(amax_standard)## The AMAX as a matrix
amax_standard_vector <- as.vector(amax_standard_matrix) ## The AMAX as a vector
amax_standard_vector_order <- amax_standard_vector[order (amax_standard_vector)] ## The AMAX as an ordered vector
fit1 <- fevd(amax_standard_vector_order,type =("GEV"),method=("Lmoments")) #fittig EVD using the "extRemes" package
lmom <- samlmu(amax_standard_vector_order) # calculating the L-Moment ratios for the region. This function deals with all the observation as one vector
fit2 <- pelgev(lmom) #fittig EVD using the "lmom" package
regdata <- regsamlmu(amax_standard_matrix, nmom = 4, sort.data = TRUE, lcv = FALSE)
colnames (regdata) [4]<- c("t") # to make the column name acceptable for the "regfit" function
fit3 <- regfit(regdata, "gev") #fittig EVD using the "lmomRFA" package
the first two functions which are in the extRemes and lmom packages give similar results for the GEV df estimated parameter while the third function which is from the lmomRFA package provides different answer.
Moreover, I tried to use different package, which is the nsRFA just to estimate the parameters and used the function par.GEV and I provided the function with the estimated l-moment ratios which are identical for all the packages and can be found in the variable lmom and regdata to estimate the parameters and the results were similar to the extRemes and lmom packages, so what is the problem with the function in lmomRFA
check <- par.GEV (1.0777622, 0.2760841, 0.3553799)

Related

How to implement shapper:shap for whole dataset?

I have created a Random Forest model using the randomForest package
model_rf <- randomForest(y~ . , data = data_train,ntree=1000, keep.forest=TRUE,importance=TRUE)
To calculate Shapley values for the different features based on this RF model, I first create an "explainer object" and then use the "shapper" package
exp_rf <- DALEX::explain(model_rf, data = data_test[,-1], y = data_test[,1])
ive_rf <- shap(exp_rf, new_observation = data_test[1,-1])
To my knowledge, I can only apply the "shap" function to one observation (the "new_observation").
But I am looking for a way to calculate the shapley values for all of my respondents in my datafile.
I know this is possible in the "SHAP" package in Python; but is it also possible with the "shapper" package in R?
At the moment, I created a loop to calculate the shapley values for all respondents, but this will take me days to calculate for my entire datafile.
for(i in c(1:nrow(data_test)))
{
ive_rf <- shap(exp_rf,new_observation=data_test[i,-1])
shapruns<-cbind(shapruns,ive_rf[,"_attribution_"])
}
Any help would be much appreciated.
I recently published two R packages that are optimized for this kind of tasks: "kernelshap" (calculate SHAP values fast) and "shapviz" (plot SHAP values from any source). In your case, a working example would be:
library(randomForest)
library(kernelshap)
library(shapviz)
set.seed(1)
fit <- randomForest(Sepal.Length ~ ., data = iris,)
# Step 1: Calculate Kernel SHAP values
# bg_X is usually a small (50-200 rows) subset of the data
s <- kernelshap(fit, iris[-1], bg_X = iris)
# Step 2: Turn them into a shapviz object
sv <- shapviz(s)
# Step 3: Gain insights...
sv_importance(sv, kind = "bee")
sv_dependence(sv, v = "Petal.Length", color_var = "auto")

Summary statistics for weighted values using the ANESRAKE package in R

I have created weighted data with my survey using the anesrake and weights package. However, I am not sure how to use the weights afterwards, beside wpct function in the package. How can I compute say descriptive stats and integrate the weighted data with other functions/packages?
Reproducible data from the anesrake package:
data("anes04")
anes04$caseid <- 1:length(anes04$age)
anes04$agecats <- cut(anes04$age, c(0, 25,35,45,55,65,99))
levels(anes04$agecats) <- c("age1824", "age2534", "age3544",
"age4554", "age5564", "age6599")
marriedtarget <- c(.4, .6)
agetarg <- c(.10, .15, .17, .23, .22, .13)
names(agetarg) <- c("age1824", "age2534", "age3544",
"age4554", "age5564", "age6599")
targets <- list(marriedtarget, agetarg)
names(targets) <- c("married", "agecats")
outsave <- anesrake(targets, anes04, caseid=anes04$caseid,
verbose=TRUE)
caseweights <- data.frame(cases=outsave$caseid, weights=outsave$weightvec)
This will give me a new vector with weights for the dataframe. So, my question is, how can I analyze the data know? How can I incorporate these weights with summary statistics?
You could supply the weights as the weights= argument to survey::svydesign. Ideally, you'd do the raking in the survey package so that you could take account of the variance reductions from raking, but it's pretty standard (at least in public-use data) to analyse raked weights as if they were just sampling weights.
Or, if the raking specification you ended up with was simple enough to reproduce in survey::rake or survey::calibrate, you could redo the raking in the survey package.
The reason for using the survey package is the very wide range of other analyses it allows (and even more with svyVGAM).

How to obtain coefficients' p-values from a nested random effect model using lmeresampler

I estimated a mixed effect model with a nested random effect structure (participants were in different groups) with the lmer command of the lme4 package.
mixed.model <- lmer(ln.v ~ treatment*level+age+income+(1 | group/participant),data=data)
Then I bootstrapped the bootstrap command from the lmeresampler package because of the nested structure. I used the semi-parametric bootstrap.
boot.mixed.model <- bootstrap(model = mixed.model, type = "cgr", fn = extractor, B = 10000, resample=c(data$group,data$participant))
I can obtain bootsrapped confidence intervals via boot.ci (package boot) but in addition I want to report the coefficients' p-values. The output of the bootstrapped model boot.mixed.model provides only the bias and the standard error:
Bootstrap Statistics :
original bias std. error
t1* 0.658442415 -7.060056e-02 2.34685668
t2* -0.452128438 -2.755208e-03 0.17041300
…
What is the best way to calculate the p-values based on these values?
I am unaware of the package called lmeresampler, and it seems to have been removed from cran due to compatibility issues (failed cran checks).
Also, the question does not include data and extractor is not defined, so the example is not reproducible. However the output is the same as you would get by using the bootMer function from lme4 so produce and example using the inbuilt function.
Basically this follows the example from the help(bootMer) page, but expanded for the specific problem. If the object returend by the lmeresampler package is similar, it will contain the objects used.
Reproducible example
library(lme4)
data(Dyestuff, package = "lme4")
fm01ML <- lmer(Yield ~ 1|Batch, Dyestuff, REML = FALSE)
Now the bootMer function simply requires a function, which outputs a vector of interesting parameters.
StatFun <- function(merMod){
pars <- getME(merMod, c("fixef", "theta", "sigma"))
c(beta = pars$fixef, theta = unname(pars$theta * pars$sigma), sigma = pars$sigma) ### <<== Error corrected
}
We can perform our bootstrapping by using the bootMer, which also contains parametric options in type (i suggest reading the details in the help(bootMer) page for more information)
boo01 <- bootMer(fm01ML, StatFun, nsim = 100, seed = 101)
Now for more precise p-values, I'd advice p-values greater closer to 1000 but for time reasons it might not be feasible in every circumstance.
Regardless the output is stored in a matrix t, which we can use to perform a simple Kolmogorov-supremum test:
H0 <- c(0, 0, 0)
Test <- sweep(abs(boo01$t), 2, H0, "-") <= H0 ###<<=== Error corrected
pVals <- colSums(Test)/nrow(Test)
print(pVals)
#output#
beta.(Intercept) theta sigma
0.00 0.12 0.00

plot one of 500 trees in randomForest package

How can plot trees in output of randomForest function in same names packages in R? For example I use iris data and want to plot first tree in 500 output tress. my code is
model <-randomForest(Species~.,data=iris,ntree=500)
You can use the getTree() function in the randomForest package (official guide: https://cran.r-project.org/web/packages/randomForest/randomForest.pdf)
On the iris dataset:
require(randomForest)
data(iris)
## we have a look at the k-th tree in the forest
k <- 10
getTree(randomForest(iris[, -5], iris[, 5], ntree = 10), k, labelVar = TRUE)
You may use cforest to plot like below, I have hardcoded the value to 5, you may change as per your requirement.
ntree <- 5
library("party")
cf <- cforest(Species~., data=iris,controls=cforest_control(ntree=ntree))
for(i in 1:ntree){
pt <- prettytree(cf#ensemble[[i]], names(cf#data#get("input")))
nt <- new("Random Forest BinaryTree")
nt#tree <- pt
nt#data <- cf#data
nt#responses <- cf#responses
pdf(file=paste0("filex",i,".pdf"))
plot(nt, type="simple")
dev.off()
}
cforest is another implementation of random forest, It can't be said which is better but in general there are few differences that we can see. The difference is that cforest uses conditional inferences where we put more weight to the terminal nodes in comparison to randomForest package where the implementation provides equal weights to terminal nodes.
In general cofrest uses weighted mean and randomForest uses normal average. You may want to check this .

Weighted Portmanteau Test for Fitted GARCH process

I have fitted a GARCH process to a time series and analyzed the ACF for squared and absolute residuals to check the model goodness of fit. But I also want to do a formal test and after searching the internet, The Weighted Portmanteau Test (originally by Li and Mak) seems to be the one.
It's from the WeightedPortTest package and is one of the few (perhaps the only one?) that properly tests the GARCH residuals.
While going through the instructions in various documents I can't wrap my head around what the "h.t" argument wants. It says in the info in R that I need to assign "a numeric vector of the conditional variances". This may be simple to an experienced user, though I'm struggling to understand. What is it that I need to do and preferably how would I code it in R?
Thankful for any kind of help
Taken directly from the documentation:
h.t: a numeric vector of the conditional variances
A little toy example using the fGarch package follows:
library(fGarch)
library(WeightedPortTest)
spec <- garchSpec(model = list(alpha = 0.6, beta = 0))
simGarch11 <- garchSim(spec, n = 300)
fit <- garchFit(formula = ~ garch(1, 0), data = simGarch11)
Weighted.LM.test(fit#residuals, fit#h.t, lag = 10)
And using garch() from the tseries package:
library(tseries)
fit2 <- garch(as.numeric(simGarch11), order = c(0, 1))
summary(fit2)
# comparison of fitted values:
tail(fit2$fitted.values[,1]^2)
tail(fit#h.t)
# comparison of residuals after unstandardizing:
unstd <- fit2$residuals*fit2$fitted.values[,1]
tail(unstd)
tail(fit#residuals)
Weighted.LM.test(unstd, fit2$fitted.values[,1]^2, lag = 10)

Resources