I am preparing a predictive model using randomForest package in R. However I would like the function to report the other than accurace OOB error measure. In fact I want to use Gini coefficient (some name it Powerstat). I know how to calculate Gini, but the proglem is in implementing the error measure.
Thanks
Related
I am running an analysis of hospital length of stay based on a number of parameters in R, with one primary exposure. Many of the covariates are missing, most commonly lab values, because these aren't checked for all patients. I've worked out what seems to be a good multiple imputation schema using MICE. Because of imbalance between exposed and unexposed groups, I'm also weighting using propensity scores.
I've managed to run a successful weighted Poisson model with MICE and WeightThem. However, when I checked the models for overdispersion, it does appear that the variance is greater than the mean, implying I should be using a quasipoisson or negative binomial model. However, I can't find documentation on negative binomial models with WeightThem or WeightIt in R.
Does anyone have any experience? To run a negative binomial model, i can just use the following code:
results <- with(models, MASS::glm.nb(LOS ~ exposure + covariate1 + covariate2)
in which "models" is the multiply-imputed WeightIt object.
However, according to the WeightIt documentation, when using any glm model you need to run it as a svyglm to get proper standard errors:
results <- with(models, svyglm(LOS ~ exposure + covariate1 + covariate2,
family = poisson()))
There is a function in the sjstats package called svyglm.nb, but this requires creating a design matrix or the model won't run. I have no idea how/whether this is necessary - is the first version (just glm.nb) sufficient? Am I entirely thinking about this wrong?
Thanks so much, advice is much appreciated.
This question already has an answer here:
How to calculate the OOB of random forest?
(1 answer)
Closed 4 years ago.
I am a newbie in the field of Random Forest models and trying to interpret the outputs of several RF models. The datasets used are fairly large (approx 5,000 rows and more, five predictor variables, all numeric) and while the models (using R packages randomForest and RandomForestSRC for comparison and better plotting) seem to run fine and I'm getting %var explained around 40%, for some reason I seem unable to compute the OOB error. It should appear together wiht the confusion matrix in the RF summary but all I get is e.g. this:
The code I am currently running using randomForest package is:
rf3 <-randomForest(fishing_hours ~ . , data = data_fish, ntree = 1000, importance=TRUE, do.trace=100)
Trying to access OOB error rates using rf3$err.rate[,1], I am getting NULL as results or NAs are listed and plotting rf3 looks like this:
I am doing a regression - any chance to obtain error rates or advice on other useful model performance indicators?
Any help much appreciated - happy to share a sample dataset if needed.
The random.forest package only calculates OOB error (err.rate) and a confusion matrix when you're doing classification.
Mean squared error is what is normally used to determine error rate for Regression problems and you can access that from models$mse.
This answer on CrossValidated might also be helpful:
https://stats.stackexchange.com/questions/305046/best-way-to-evaluate-a-random-forest-model-accuracy-on-continuous-data
I would like to use linear discriminant analysis model (lda) on my weighted data. In my data set, I have one column with weights which are not integers (I cant just replicate rows). lda function from MASS package does not allow me to use vector of weights for observations. Do you know, how deal with it ? I have tried also with mlr package but learner classif.lda still uses lda implementation from MASS package, so I get error:
Error in checkLearnerBeforeTrain(task, learner, weights) :
Weights vector passed to train, but learner 'classif.lda' does not support that!
Do you know how to solve this problem ?
I have received AUCs and prediction from a collaborated generated in Weka. The statistical model behin that was cross validated, so my dataset with the predictions includes columns for fold, predicted probability and true class. Using this data I was unable to replicate the AUCs given the predicted probabilities in R. The values always differ slightly.
Additional details:
Weka was used via GUI, not command line
I checked the AUC in R with packages pROC and ROCR
I first tried calculating the AUC over the collected predictions (without regard to fold) and I got different AUCs
Then I tried calculating the AUCs per fold and averaging. This did also not match.
The model was ridge logistic regression and there is a single tie in the predictions
The first fold has one sample more than the others. I have tried taking a weighted average, but this did not work out either
I have even tested averaging the AUCs after logit-transformation (for normality)
Taking the median instead of the mean did not help either
I am familiar with how the AUC is calculated in R, but I don't see what Weka could do differently.
I'm using the randomForest package in R for prediction, and want to plot the out of bag (OOB) errors to see if I have enough trees, and to tune the mtry (number of variables at each split) variable. The package seems to automatically compute the OOB errors for classification tasks, but doesn't do so for regression tasks. Does anyone know if there is a way to look at the OOB errors for regressions tasks?
You can also look directly at the out of bag predictions:
data(airquality)
set.seed(131)
ozone.rf <- randomForest(Ozone ~ ., data=airquality, mtry=3,
importance=TRUE, na.action=na.omit)
ozone.rf$predicted
And then you can calculate also other measures like eg median absolute error.
As said in the comments the mse object is computed OOB. See page 20 in
https://datajobs.com/data-science-repo/Random-Forest-%5bLiaw-and-Weiner%5d.pdf
Hence, the mse object is already an estimate of the OOB mean squared error.