How to create partial dependence plots for multinomial gbm? - r

I am trying to create partial dependence plots for my multinomial gbm predictions but I haven't been able to figure out how to produce the correct plots, the ones that I am getting have a single line instead of a line for every level of my response variable (in my case are 3 different species names). I have seen several examples but they require objects created with other packages (not gbm objects) and most of the examples don't include multinomial variables.
gbm fit
gbm.fit.final<-readRDS(file = "gbm_fit_final1_organism.rds")
getting table with variable importance
summary.gbm<-summary(
gbm.fit.final,
cBars = 10,
method = relative.influence,
las = 2)
The table looks like this:
var rel.inf
MA0356.1 22.641689
MA1071.1 21.707397
MA0311.1 16.010605
MA0210.1 7.249431
MA0271.1 4.958186
I used the following code to generate the partial dependence plot for the most important predictor variable:
gbm.fit.final %>%
partial(pred.var = "MA0356.1", n.trees = gbm.fit.final$n.trees, grid.resolution = 100, prob=T) %>%
autoplot(rug = TRUE, train = motifs_train.100) +
scale_y_continuous()
motifs_train.100 is the training data that I used to create the gbm fit (gbm.fit.final), I am not sure if it is necessary to add the training data.
I got the following plot:
plot with single line
I would like to get a plot like this one (I think I need to get marginal probabilities):
plot with a line for each level of response variable
I am very new to gbm package. I don't know if there is an argument of the function partial that I am omitting, or if there is a better function to do this. Any help would be appreciated. Thanks!

Related

How can I display all my model predicted values using whisker plots?

I'm working with a linear mixed model with sex and diel (day/night) as my predictors and depth displacement as my response in R. Here is the model:
displacement_lmm_hour <- lmer(Displacement~sex*Light + (1|Hour), data = avg_depth_df_hour)
I want to create a whisker plot displaying each predicted value for each of my predictors from the model. So, I tried using dwplot from the dotwhisker() library in R.
dwplot(displacement_lmm_hour, effects = "fixed")
This is what it came out with:
As you can see, it is only showing the first 'sets' (if you will) of predicted values. Ie. there's no males, or day time values shown. I realize this comes from the model itself and the summary() table of the model only shows those as well. But, how can I show these values for the 'hidden' predicted values that also come from the model?
I also tried using '''plot_model''', which allowed me to separate my predicted values, but I don't think the error bars are correct (why I tried the whisker plots instead)
plot_model(displacement_lmm_hour, type = "pred", terms = c("sex","Light"), axis.title = c("Sex", "Displacement")
Do you have an idea how to accomplish this using the dwplot function? Or another way to accomplish this in general?
Thanks!

GAM residuals missing in plot

I am applying a GAM model to my data: cell abundance over time.
The model works just fine (although I am aware of a pattern in my resiudals, but this is a different issue not relevant here).
It just fails to display the partial residuals in the final plot, although i set residuals = TRUE. Here is my output:
https://i.stack.imgur.com/C1MlY.png
also I used mgcv package.
Previously this code worked as I wanted, but on different data. Any ideas on why it is not working are welcome!
GAM_EA <- mgcv::gam(EUB_FISH ~ s(Day, by = Heatwave), data = HnH, method = "REML")
gam.check(GAM_EA) #Checking the model
mgcv::anova.gam(GAM_EA) #Retrieving the statistical results. See ?anova.gam
summary.gam(GAM_EA)
plot(GAM_EA, shift = coef(GAM_EA)[1], residuals = TRUE)
See argument by.resid in ?plot.gam. They way these are used in plot.gam would been meaningless for factor by terms unless you were to subset the partial residuals and plot only the residuals for observations in the specific level of the by factor.

Prediction Intervals for R tidymodels Stacked model from stacks()

Is it possible to calculate prediction intervals from a tidymodels stacked model?
Working through the example from the stacks() package here yields the stacked frog model (which can be downloaded here for reprex) and the testing data:
data("tree_frogs")
tree_frogs <- tree_frogs %>%
filter(!is.na(latency)) %>%
select(-c(clutch, hatched))
set.seed(1)
tree_frogs_split <- initial_split(tree_frogs)
tree_frogs_train <- training(tree_frogs_split)
tree_frogs_test <- testing(tree_frogs_split)
I tried to run something like this:
pi <- predict(tree_frogs_model_st, tree_frogs_test, type = "pred_int")
but this gives an error:
Error in UseMethod("stack_predict") : no applicable method for 'stack_predict' applied to an object of class "NULL"
Reading the documentation of stacks() I also tried passing "pred_int" in the opts list:
pi <- predict(tree_frogs_model_st, tree_frogs_test, opts = list(type = "pred_int"))
but this just gives: opts is only used with type = raw and was ignored.
For reference, I am trying to do a similar thing that is done in Ch.19 of Tidy Modeling with R book
lm_fit <- fit(lm_wflow, data = Chicago_train)
predict(lm_fit, Chicago_test, type = "pred_int")
which seems to work fine for a single model fit like lm_fit, but apparently not for a stacked model?
Am I missing something? Is it not possible to calculate prediction intervals for stacked models for some reason?
This is very difficult to do.
Even if glmnet produced a prediction interval, it would be a significant underestimate since it doesn’t know anything about the error in each of the ensemble members.
We would have to get the standard error of prediction from all of the models to compute it for the stacking model. A lot of these models don’t/can’t generate that standard error.
The alternative is the use bootstrapping to get the interval but you would have to bootstrap each model a large number of times to get the overall prediction interval.

Extract values used to make plot for parametric component of GAM in R

I have performed a GAM that includes both continuous smooth terms and a categorical variable. I have plotted the model (mod) using plot(mod,residuals=T,all.terms=T,pages=1). This produces plots of the two smooth parameters as well as the parametric parameter. I want to extract the values used to make these plots so I can re do them and make them look nicer. If I save the plot in an object, this gives me everything I need for the smooth terms, but doesn't contain any information about the parametric component: plot.mod=plot(mod,residuals=T,all.terms=T,select=0). But I can't see where the numbers are coming from for the default plotting of the parametric component. Is there a way to extract these as well?
Here is a reproducible example of what I have done so far
library(mgcv)
# create some data
data=data.frame(response=c(10,12,8,9,3,4,5,5,4,5,4,5,4,1),pred1=c(9,8,8,9,6,7,6,4,3,4,2,3,3,1),pred2=as.factor(c("A","C","B","B","A","A","C","B","C","A","C","B","A","B")),pred3=c(1,6,3,4,8,6,4,5,7,10,11,3,12,1))
# run the GAM
mod <- gam(response ~ s(pred1,k=8) + pred2 + s(pred3,k=5), data=data, family=gaussian(), method="REML")
# the default plot
plot(mod,residuals=T,all.terms=T,pages=1)
# save values in an object. But this only saves the smooth terms.
plot.mod=plot(mod,residuals=T,all.terms=T,select=0)
# How can I extract the values used to plot the parametric term?
The plot I'm trying to extract the data to make:
From the plot.gam documentation, termplot is used for the parametric terms, so
plot.para <- termplot(mod, se = TRUE, plot = FALSE)
saves that plot to a list.
The format is different than the others, but the data is there.

Trying to plot a model line in R but it has to many dimensions for abline

I have a binomial GLM run using proportions as the response variable and and multiple variables
glm(formula = GalFailNo3 ~ treatment * diameter1 * foundress,
family = quasibinomial, data = FigDatNo3)
I want to be able to plot the regression line but I am not sure how as I have only ever plotted simple lm models using abline. this is the data below in csv
https://drive.google.com/file/d/0B4KXwQhH5kwQZVNtc0tieXBaSE0/view?usp=sharing
the problem is this model has 8 dimensions and I haven't got a clue where to start, like what function or package will be useful.
any other info I need to present let me know please
I have tried to use the predict function with some dummy values but it doesn't appear to work.
or maybe there is another way to visualise models that I don't know about?

Resources