I am using a glm model to predict my depending variable. For that I need to choose a family of the distribution of my variable. unfortunately the exponential distribution is not part of the available objects of the argument "family".
Now I don't know on how I can proceed with my research.
This is my model. Does anyone have an idea, what I can do?
model<-train(duration~., data = data, method='glm', family= ???, trControl =trainControl(method = "repeatedcv", repeats = 10)
The exponential distribution is in the gamma family with dispersion parameter fixed at 1, so family = Gamma(link="log") should provide you with what you need. When interpreting significance or standard error of the fitted coefficients assuming exponential, you specify the dispersion.
Since your example wasn't reproducible, an example using glm and summary is:
mdl <- glm(formula = duration ~., family = Gamma(link="log"), data = your_data)
summary(mdl, dispersion = 1)
Related
I am utilizing the predictInterval() function from the merTools package. My model is fit utilizing a Poisson family specification like the below:
glmer(y ~ (1|key) + x, data = dat, family = poisson())
When I use predictInterval() to calculate the prediction interval associated with my model, I get the following warning message:
Warning message:
Prediction for NLMMs or GLMMs that are not mixed binomial regressions is not tested. Sigma set at 1.
I am taking this to mean that predictInterval() doesn't have an implementation for models fit with a Poisson distribution. I therefore do not trust the resulting interval.
Is my interpretation correct? I have searched around for similar issues but haven't found anything.
Any help would be greatly appreciated.
I use cforest of the party package in R to calculate conditional inference trees. Similarly to Random Forest, I would like to retrieve variance explained and the variance importance based on the OOB data (I read that Random Forest returns variance explained and variable importance based on OOB data). To do so with cforest I used the following code:
model <- party::cforest(y ~ x1 + x2 + x3 + x4 , data=trainings_set , control=cforest_unbiased(ntree=1000, minsplit=25 , minbucket=8 , mtry=4))
model.pred <- predict(model, type="response" , OOB=TRUE)
R2=1 - sum((trainings_set$y-model.pred)^2)/sum((trainings_set$y-mean(trainings_set$y))^2)
varimp_model=party::varimp(model, conditional = TRUE, threshold = 0.2, OOB = TRUE)
I am interested in whether the command OOB=TRUE would lead to the model being predicted and variable importance being returned based on the OOB data of the trainings_set?
I posted this question before under a different title, posting it again (slightly redrafted), I hope someone might be able to provide an answer?
The OOB parameter in cforest function is for a logical defining out-of-bag predictions.
This is only TRUE when you pass a newdata parameter in cforest which is generally a test data frame. If the newdata parameter is there and you have set OOB=TRUE, then you will get out-of-bag predictions on this newdata.
I hope this clarifies your doubt.
I am running a glmer with a random effect for count data (x) and two categorical variables (y and z):
fullmodel<-glmer(x~y*z + (1|Replicate), family = poisson, data = Data)
However, when I look at the dispersion parameter:
> dispersion_glmer(fullmodel)
[1] 2.338742
It is way higher than 1. Does this mean my model is over dispersed? How do I correct it. I want to keep my random effect but when I tried to swap the family to quasipoisson it says you can't use it for a glmer
How can I specify weight decay in a model fit by the mlogit?
The multinom() function of nnet allows you to specify weight decay for the model that is being fit, and mlogit uses this function behind the scenes to fit its models so I imagine that it should be possible to pass the decay argument to multinom, but have not so far found a way to do this.
So far I have attempted to simply pass a value in the model formula, like this.
library(mlogit)
set.seed(1)
data("Fishing", package = "mlogit")
Fishing$wts <- runif(nrow(Fishing)) #for some weights
Fish <- mlogit.data(Fishing, varying = c(2:9), shape = "wide", choice = "mode")
fit1 <- mlogit(mode ~ 0 | income, data = Fish, weights = wts, decay = .01)
fit2 <- mlogit(mode ~ 0 | income, data = Fish, weights = wts)
But the output is exactly the same:
identical(logLik(fit1), logLik(fit2))
[1] TRUE
mlogit() and nnet::multinom() both fit multinomial logistic models (predicting probability of class membership for multiple classes) but they use different algorithms to fit the model. nnet::multinom() uses a neural network to fit the model and mlogit() uses maximum likelihood.
Weight decay is a parameter for neural networks and is not applicable to maximum likelihood.
The effect of weight decay is keep the weights in the neural network from getting too large by penalizing larger weights during the weight update step of the fitting algorithm. This helps to prevent over-fitting and hopefully creates a more general model.
Consider using the pmlr function in the pmlr package. This function implements a "Penalized maximum likelihood estimation for multinomial logistic regression" when called with the default function parameter penalized = TRUE.
When using caret's train function to fit GBM classification models, the function predictionFunction converts probabilistic predictions into factors based on a probability threshold of 0.5.
out <- ifelse(gbmProb >= .5, modelFit$obsLevels[1], modelFit$obsLevels[2])
## to correspond to gbmClasses definition above
This conversion seems premature if a user is trying to maximize the area under the ROC curve (AUROC). While sensitivity and specificity correspond to a single probability threshold (and therefore require factor predictions), I'd prefer AUROC be calculated using the raw probability output from gbmPredict. In my experience, I've rarely cared about the calibration of a classification model; I want the most informative model possible, regardless of the probability threshold over which the model predicts a '1' vs. '0'. Is it possible to force raw probabilities into the AUROC calculation? This seems tricky, since whatever summary function is used gets passed predictions that are already binary.
"since whatever summary function is used gets passed predictions that are already binary"
That's definitely not the case.
It cannot use the classes to compute the ROC curve (unless you go out of your way to do so). See the note below.
train can predict the classes as factors (using the internal code that you show) and/or the class probabilities.
For example, this code will compute the class probabilities and use them to get the area under the ROC curve:
library(caret)
library(mlbench)
data(Sonar)
ctrl <- trainControl(method = "cv",
summaryFunction = twoClassSummary,
classProbs = TRUE)
set.seed(1)
gbmTune <- train(Class ~ ., data = Sonar,
method = "gbm",
metric = "ROC",
verbose = FALSE,
trControl = ctrl)
In fact, if you omit the classProbs = TRUE bit, you will get the error:
train()'s use of ROC codes requires class probabilities. See the classProbs option of trainControl()
Max