Use plot_model to produce confidence intervals plot for multiple models - r

I am using the function plot_model from the sjPlot package to generate a confidence interval plot for a fixed effect linear model (felm). For the individual base model, I encountered no issues, and generated the confidence interval plot effectively. Now, however, I am attempting to do this for multiple similarly-constructed models, but cannot make it work.
An individual model's code is as simple as the following:
plot <- plot_model(model1, show.values = TRUE)
Using the lapply function I generated the multiple models, which are now stored as a list object. However, I cannot find the way to put this list object (or its multiple individual models) into the plot_model (or plot_models) function. I have received various errors, including this one:
Warning: Could not access model information.Error in if (fam.info$is_linear) transform <- NULL else transform <- "exp" : argument is of length zero
Is there a way to place multiple similar models from a list into the plot_model function, so that the resulting confidence interval plots can be readily compared?
Update: with sample code and error:
plots <- plot_models(modellist[[1]], modellist[[2]], show.values = TRUE)
Warning: Could not access model information.Error: Sorry, `model_parameters()` failed with the following error (possible class 'numeric' not supported): $ operator is invalid for atomic vectors

Related

Issue with including p-value stars using sjPlot tab_model

I am plotting the results of multiple regressions in R using tab_model from sjPlot. The regression results are produced, however, I am now trying to add simultaneous displays of the numeric p-values and their corresponding stars for significance. By default the plot only renders the numeric values, so I added the term p.style = "scientific_stars". This generated the following error:
Warning: Could not access model information.Error in fam.info$is_linear || identical(fam.info$link_function, "identity") :
invalid 'x' type in 'x || y'
Unfortunately, due to the nature of the data, I cannot provide the exact data I am working with. However, the models are both basic OLS models produced using the lm function. The two models have the exact same variables, and number of observations.
The code for the function is the following:
tab_model(model_1, model_2, auto.label = TRUE, show.se = TRUE, show.ci = FALSE, show, p.style = "scientific_stars")
I have tried plotting a signle model, and chainging the other labels, but the same error is generated.
Is there anyway to fix this problem and include both the numeric values and the stars in the rendering?

Prediction Intervals for R tidymodels Stacked model from stacks()

Is it possible to calculate prediction intervals from a tidymodels stacked model?
Working through the example from the stacks() package here yields the stacked frog model (which can be downloaded here for reprex) and the testing data:
data("tree_frogs")
tree_frogs <- tree_frogs %>%
filter(!is.na(latency)) %>%
select(-c(clutch, hatched))
set.seed(1)
tree_frogs_split <- initial_split(tree_frogs)
tree_frogs_train <- training(tree_frogs_split)
tree_frogs_test <- testing(tree_frogs_split)
I tried to run something like this:
pi <- predict(tree_frogs_model_st, tree_frogs_test, type = "pred_int")
but this gives an error:
Error in UseMethod("stack_predict") : no applicable method for 'stack_predict' applied to an object of class "NULL"
Reading the documentation of stacks() I also tried passing "pred_int" in the opts list:
pi <- predict(tree_frogs_model_st, tree_frogs_test, opts = list(type = "pred_int"))
but this just gives: opts is only used with type = raw and was ignored.
For reference, I am trying to do a similar thing that is done in Ch.19 of Tidy Modeling with R book
lm_fit <- fit(lm_wflow, data = Chicago_train)
predict(lm_fit, Chicago_test, type = "pred_int")
which seems to work fine for a single model fit like lm_fit, but apparently not for a stacked model?
Am I missing something? Is it not possible to calculate prediction intervals for stacked models for some reason?
This is very difficult to do.
Even if glmnet produced a prediction interval, it would be a significant underestimate since it doesn’t know anything about the error in each of the ensemble members.
We would have to get the standard error of prediction from all of the models to compute it for the stacking model. A lot of these models don’t/can’t generate that standard error.
The alternative is the use bootstrapping to get the interval but you would have to bootstrap each model a large number of times to get the overall prediction interval.

How to create partial dependence plots for multinomial gbm?

I am trying to create partial dependence plots for my multinomial gbm predictions but I haven't been able to figure out how to produce the correct plots, the ones that I am getting have a single line instead of a line for every level of my response variable (in my case are 3 different species names). I have seen several examples but they require objects created with other packages (not gbm objects) and most of the examples don't include multinomial variables.
gbm fit
gbm.fit.final<-readRDS(file = "gbm_fit_final1_organism.rds")
getting table with variable importance
summary.gbm<-summary(
gbm.fit.final,
cBars = 10,
method = relative.influence,
las = 2)
The table looks like this:
var rel.inf
MA0356.1 22.641689
MA1071.1 21.707397
MA0311.1 16.010605
MA0210.1 7.249431
MA0271.1 4.958186
I used the following code to generate the partial dependence plot for the most important predictor variable:
gbm.fit.final %>%
partial(pred.var = "MA0356.1", n.trees = gbm.fit.final$n.trees, grid.resolution = 100, prob=T) %>%
autoplot(rug = TRUE, train = motifs_train.100) +
scale_y_continuous()
motifs_train.100 is the training data that I used to create the gbm fit (gbm.fit.final), I am not sure if it is necessary to add the training data.
I got the following plot:
plot with single line
I would like to get a plot like this one (I think I need to get marginal probabilities):
plot with a line for each level of response variable
I am very new to gbm package. I don't know if there is an argument of the function partial that I am omitting, or if there is a better function to do this. Any help would be appreciated. Thanks!

PCA in R using the caret package vs prcomp PCA

I have a dataframe data with more than 50 variables and I am trying to do a PCA in R using the caret package.
library(caret)
library(e1071)
trans <- preProcess(data,method=c("YeoJohnson", "center","scale", "pca"))
If I understand this code correctly, it applies a YeoJohnson transformation (because data has zeros in it), standardises data and than applies PCA (by default, the function keeps only the PCs that are necessary to explain at least 95% of the variability in the data)
However, when I use the prcomp command,
model<-prcomp(data,scale=TRUE)
I can get more outputs like printing the summary or doing plot(data, type = "l") which I am not able to do in trans. Does anyone know if there are any functions in caret package producing the same outputs as in prcomp?
You can access the principal components themselves with the predict function.
df <- predict(trans, data)
summary(df)
You won't have exactly the same output as with prcomp: while caret uses prcomp(), it discards the original prcomp class object and does not return it.

How to call randomForest predict for use with ROCR?

I am having a hard time understanding how to build a ROC curve and now I came to the conclusion that maybe I don't create the model correctly. I am running a randomforest model in the dataset where the class attribute "y_n" is 0 or 1. I have divided the datasets as bank_training and bank_testing for the prediction purpose.
Here are the steps i do:
bankrf <- randomForest(y_n~., data=bank_training, mtry=4, ntree=2,
keep.forest=TRUE, importance=TRUE)
bankrf.pred <- predict(bankrf, bank_testing, type='response',
predict.all=TRUE, norm.votes=TRUE)
Is it correct what I do till now? The bankrf.pred object that is created is a list object with 2 classes named: aggregate and individuals. I dont understand where did this 2 class names came out? Moreover when I run:
summary(bankrf.pred)
Length Class Mode
aggregate 22606 factor numeric
individual 45212 -none- character
What does this summary mean? The datasets (training & testing) are 22605 and 22606 long each. If someone can explain me what is happening I would be very grateful. I think there is something wrong in all this.
When I try to design the ROC curve with ROCR I use the following code:
library(ROCR)
pred <- prediction(bank_testing$y_n, bankrf.pred$c(0,1))
Error in is.data.frame(labels) : attempt to apply non-function
Is just a mistake in the way I try to create the ROC curve or is it from the beginning with randomForest?
The documentation for the function you are attempting to use includes this description of its two main arguments:
predictions A vector, matrix, list, or data frame containing the
predictions.
labels A vector, matrix, list, or data frame containing the true
class labels. Must have the same dimensions as 'predictions'.
You are currently passing the variable y_n to the predictions argument, and what looks to me like nonsense to the labels argument.
The predictions will be stored in the output of the random forest model. As documented at ?predict.randomForest, it will be a list with two components. aggregate will contain the predicted values for the entire forest, while individual will contain the predicted values for each individual tree.
So you probably want to do something like this:
predictions(bankrf.pred$aggregate, bank_testing$y_n)
See how that works? The predicted values are passed to the predictions argument, while the "labels" or true values, are passed to the labels argument.
You should erase the predict.all=TRUE argument from predict if you simply want to get the predicted classes. By using predict.all=TRUE you are telling the function to keep the predictions of all trees rather than the prediction from the forest.

Resources