how to backtransform scaled-continuous predictors in glmmTMB with family=nbinom2(link = "log")?

how to backtransform scaled-continuous predictors in glmmTMB with family=nbinom2(link = "log")? - scale

I would like to be able to plot and interpret the following model
mod2Lano<-glmmTMB(Npasses~MoonSc+WindSpeedSc+LightTreatment+ DistanceParkSc+DistanceWaterSc+hundm_TreeSc+Hund_LightSc+(1|Location),data=LanoData, family=nbinom2(link = "log"))
after applying the following scaling to each continuous predictor:
LanoData$hundm_TreeSc<-scale(LanoData$hundm_Tree,center = TRUE, scale = F)
Do you have any advice?
Thank you,
M
I did not find any info online on how to proceed...

Related

Can't pull out any simulated p-value in "impacts" function of spatialreg package

I have a problem where I could not pull out simulated p-value even though I already put the following arguments. According to related [R documentation][1] and [RPubs][2], if we put the ztats argument and make it TRUE, it will shows simulated SE, z-stats, and p-value. However, when I run my code, there is no simulated SE, z-stats, and p-value in my result even though there is no error message. Below is my code:
test2 <- spml(default_risk ~ wui +islamic+wui_islamic+wui_3ma+wui_3ma_islamic+wui_2ma+wui_2ma_islamic+asset_growth+loan_share+liquidity+cost_to_income+inflation+gdp_growth,
panelsar, listw = listmatrix_spatreg, model = "pooling", effect = "time", lag = TRUE, spatial.error = "none")
summary(test2)
spillover <- impacts(test2, listmatrix_spatreg, time=8, zstats=TRUE)
summary(spillover)
Thank you for your attention
[1]: https://r-spatial.github.io/spatialreg/reference/impacts.html#:~:text=The%20calculation%20of%20impacts%20for,unlike%20the%20spatial%20error%20model).
[2]: https://rpubs.com/quarcs-lab/tutorial-spatial-regression

mlr3: obtaining response (predicted survival time) from surv.gbm

surv.gbm in the mlr3 framework outputs linear predictors, however what I'm really interested in are predicted survival times per case, which I want to compare with the actual survival times. Is there a way to obtain actual survival times?
In the mlr3 book, there is an example of a transformation between linear predictors and a distribution.
pod = po("distrcompose", param_vals = list(form = "ph", overwrite = FALSE))
prediction = pod$predict(list(base = prediction_distr, pred = prediction_lp))$output
Is there a way to change this pipeline so that it converts "lp" to "response" ?
Any help would be appriciated.

Yes this is definitely possible it just requires another transformation. Your first step is correct to compose a distribution from a linear predictor; as you're using surv.gbm only Cox PH is possible as the underlying model so default for distrcompose works for this.
Now you need to use crankcompose in order to create a survival time prediction from the distribution, you could use the mean, median, or mode of the distribution, people usually pick mean or median but that's your choice! Just make sure to include response = TRUE, overwrite = FALSE. Example code below, includes creating predictions and scoring with RMSE (surprisingly quite good!). I think the book may need updating...
Thanks,
Raphael
library(mlr3extralearners)
library(mlr3proba)
library(mlr3pipelines)
library(mlr3)
learn = ppl("crankcompositor", ppl("distrcompositor", lrn("surv.gbm")),
response = TRUE, overwrite = FALSE, method = "mean",
graph_learner = TRUE)
set.seed(1)
task = tgen("simsurv")$generate(50)
learn$train(task)
p = learn$predict(task)
p$score(msr("surv.rmse"))

R One-Class SVM - Get Probabilistic outputs

I am trying to find away to derive probabilistic outputs when predicting from a one-class svm in R. I know this is not supported in libsvm and I also know this question has been asked before and here a couple of years ago on SO but packages were not available at that time. I'm hoping things have changed now! Also this question is still valid as no approach implemented in R was given as a solution.
I could not find a package to do this so I tried two approaches myself to get around this:
Get the decision values and transform them through the use of the sigmoid activation function. This is described in this paper. Note the paragraph:
Furthermore, SVMs can also produce class probabilities as output instead of class labels. This
is can done by an improved implementation (Lin, Lin, and Weng 2001) of Platt’s a posteriori
probabilities (Platt 2000) where a sigmoid function is fitted to the decision values f of the binary SVM classifiers, A and B being estimated by minimizing the negative log-likelihood function
Use a logistic regression function on the predicted output and derive the probabilities from it. This approach was first described by Platt and an approach is outlined here
My problem is that to check if either of my two solutions are plausible, I tested these two approaches on a two-class svm problem as e1071, using libsvm, gives probabilities for two-class problems so this was taken as the 'truth'. I found that neither of my approaches aligned closely to libsvm.
Here are three graphs showing the resulting probabilities versus the known decision values.
Click to see image. Sorry I seem to have too low a reputation to embed the image which is frustrating! I'm not sure if someone in the community with a higher reputation can edit to embed?
I think my Platt approach is theoretically more sound but, as can be seen from the graph, it appears the logistic regression was somehow too good, the probabilities associated with either classification being extremely close to 1 for positive and 0 for negative.
My code for the Platt implementation is
platt_scale <- function(oc_svm, X){
# Get SVM predictions
y_pred <- predict(oc_svm$best.model,X)
#y_pred <- as.factor(ifelse(y_pred==T,"pos","neg"))
# Train using logistic regression with cross-validation
require(caret)
model <- train(x = X,
y = y_pred,
method = "glm",
family=binomial(),
trControl = trainControl(method = "cv",
number = 5),
control = list(maxit = 50) #BROUGHT IN TO STOP WARNING MESSAGES
)
return(predict(model,
newdata = X,
type = "prob")[,1])
}
I get the following warning when this runs
glm.fit: fitted probabilities numerically 0 or 1 occurred
So I am clearly doing something wrong! I feel like fixing this function is probably the best approach but I don't see where I have gone wrong? I am following the approach I mentioned earlier, here
I get the sigmoid of the decision values as follows
sig_mult <-e1071::sigmoid(decision_values)
The examples were done using the Iris dataset, full code is here
data(iris)
two_class<-iris[iris$Species %in% c("setosa","versicolor"),]
#Make Two-class SVM
svm_mult<-e1071::tune(svm,
train.x = two_class[,1:4],
train.y = factor(two_class[,5],levels=c("setosa", "versicolor")),
type="C-classification",
kernel="radial",
gamma=0.05,
cost=1,
probability = T,
tunecontrol = tune.control(cross = 5))
#Get related decision values
dec_vals_mult <-attr(predict(svm_mult$best.model,
two_class[,1:4],
decision.values = T #use decision values to get score
), "decision.values")
#Get related probabilities
prob_mult <-attr(predict(svm_mult$best.model,
two_class[,1:4],
probability = T #use decision values to get score
), "probabilities")[,1]
#transform decision values using sigmoid
sig_mult <-e1071::sigmoid(dec_vals_mult)
#Use Platt Implementation function to derive probabilities
platt_imp<-platt_scale(svm_mult,two_class[,1:4])
require(ggplot2)
data2<-as.data.frame(cbind(dec_vals_mult,sig_mult))
names(data2)<-c("Decision.Values","Sigmoid.Decision.Values(Prob)")
sig<-ggplot(data=data2,aes(x=Decision.Values,
y=`Sigmoid.Decision.Values(Prob)`,
colour=ifelse(Decision.Values<0,"neg","pos")))+
geom_point()+
ylim(0,1)+
theme(legend.position = "none")
data3<-as.data.frame(cbind(dec_vals_mult,prob_mult))
names(data3)<-c("Decision.Values","Probabilities")
actual<-ggplot(data=data3,aes(x=Decision.Values,
y=Probabilities,
colour=ifelse(Decision.Values<0,"neg","pos")))+
geom_point()+
ylim(0,1)+
theme(legend.position = "none")
data4<-as.data.frame(cbind(dec_vals_mult,platt_imp))
names(data4)<-c("Decision.Values","Platt")
plat_imp<-ggplot(data=data4,aes(x=Decision.Values,
y=Platt,
colour=ifelse(Decision.Values<0,"neg","pos")))+
geom_point()+
ylim(0,1)
require(ggpubr)
ggarrange(actual, plat_imp, sig,
labels = c("Actual", "Platt Implementation", "Sigmoid Transformation"),
ncol = 3,
label.x = -.05,
label.y = 1.001,
font.label = list(size = 8.5, color = "black", face = "bold", family = NULL),
common.legend = TRUE, legend = "bottom")

How to use covariates in rddtools rdd_reg_lm function?

I am trying to run a parametric RD regression using the rddtools R package. However, the package documentation is not very clear to me.
First: the function to define an RD object is:
rdd_data(y, x, covar, cutpoint, z, labels, data)
where covar, in the help file, means only "Exogeneous variables" . But what type? A data frame? A list?
Second: The function rdd_reg_lm again demands informing covariates in this way:
rdd_reg_lm(rdd_object, covariates = NULL, order = 1, bw = NULL,
slope = c("separate", "same"), covar.opt = list(strategy = c("include",
"residual"), slope = c("same", "separate"), bw = NULL),
covar.strat = c("include", "residual"), weights)
Where, according to the help file, the covariates argument means simply "Formula to include covariates". Again, it is not clear to me what is exactly the correct way of applying these covariates.
Moreover, is it possible to include multiple covariates in this function rdd_data() and rdd_reg_lm()?
I appreciate some help here. I have already read the help and vignette files again and again, searched in many blogs and still nothing.
I have already checked this topic below
How to include a linear trend in a regression discontinuity design using rddtools
which showed me the following example:
rd.medic <- rdd_data(y = er ,x = ageyrs, covar = ageyrs, cutpoint=65, data = medicare)
rd.reg <- rdd_reg_lm(rdd_object=rd.medic, covariates = 'ageyrs', slope =
("same"), covar.opt = list("include"))
Even so, the syntax is still not clear to me, as I am trying to add multiple covariates without success
Thanks!

You can create a data frame with your covariates and then include it in rdd_data.
covariates<-data.frame(z1=ageyrs, z2=ageyrs2)
rd.medic <- rdd_data(y = er ,x = ageyrs, covar = covariates, cutpoint=65, data = medicare)
rd.reg <- rdd_reg_lm(rdd_object=rd.medic, covariates =TRUE, slope =("same"))

Set G in prior using MCMCglmm, with categorical response and phylogeny

I am new to the MCMCglmm package in R, and rather new to glm models in general. I have a dataset of species traits and whether or not they have been introduced outside of their native range.
I would like to test whether being introduced (as a binary 0/1 response variable) can be explained by any of the species traits. I would also like to correct for phylogeny between species.
I was told that for a binary response I could use family =“threshold” and I should fix the residual variance at 1. But I am having some trouble with the other parameters needed for the prior.
I've specified the R value for the random effects, but if I specify R I must also specify G and it is not clear to me how to decide the values for this parameter. I've tried putting default values but I get error messages:
Error in MCMCglmm(fixed, random = ~species, data = data2, family = "threshold", :
prior$G has the wrong number of structures
I have read the help vignettes and course but have not found an example with a binary response, and it is not clear to me how to decide the values for the priors. This is what I have so far:
fixed=Intro_binary ~ Trait1+ Trait2 + Trait3
Ainv=inverseA(redTree1)$Ainv
binary_model = MCMCglmm(fixed, random=~species, data = data, family = "threshold", ginverse=list(species=Ainv),
 prior = list(
    G = list(),    #not sure about the parameters for random effects.
    R = list(V = 1, fix = 1)),  #to fix the residual variance at one
  nitt = 60000, burnin = 10000)
Any help or feedback would be greatly appreciated!

This one is a bit tricky with the information you provide. I'd say you can define G as a "weak" prior using:
priors <- list(R = list(V = 1, nu = 0.002),
G = list(V = 1, fix = 1)))
binary_model <- MCMCglmm(fixed, random = ~species, data = data,
family = "threshold",
ginverse = list(species = Ainv),
prior = priors,
nitt = 60000, burnin = 10000)
However, without more information on your analysis, I strongly suggest you plot your posteriors to have a look at the results and see if anything looks wrong. Have a look at the MCMCglmm package Course Notes for more info on how to set these priors (especially on what not to do in section 1.5 - you can also find more specific info on how to tune it to your model if it fits in the categories of the tutorial).