Predict fitted Probabilities for gmnl regression - r

I'm using the gmnl function to fit a mixed multinomial logit model.
Since I'm further interested in the predicted probaliities of that model, I want to obtain them by applying something like the predic function.
m4=gmnl(int_choice ~ 1+fico+annual_inc+int_emp_length+| time +grade+ last_fico |0, data = mldata, model="mixl",R=50,panel=TRUE,correlation = TRUE,ranp=c(annual_inc="n",int_emp_length="n"))
## how to mimic predict??
p_hat=predict(m4,type="probs")
Any suggestions?

What you are looking for is a simple conversion rule like this:
Convert a logit (glm output) to probability by computing exp() of the coefficients, which will give you the odds.
Convert the odds into probability using this formula: prob = odds / (1 + odds).
Very good explanation with examples can be found here: https://sebastiansauer.github.io/convert_logit2prob/

The solution can't be found in the vignette but is documented in the help-file. Typing help("fitted.gmnl") yields the following:
fitted(object, outcome = TRUE, ...)
if TRUE, then the fitted and residuals methods return a vector that corresponds to the chosen alternative, otherwise it returns a matrix where each column corresponds to each alternative.

I type str() over the gmnl model
I find an internal attribute called prob.alt that gives choice - residuals
So in your case m4$prob.alt gives some usefull values, (find the max by row gives the predicted choice)
(In my case (a latent class mnl) this do not help since it has the predicted latent class probabilities)

Related

Calculating VIF for ordinal logistic regression & multicollinearity in R

I am running an ordinal regression model. I have 8 explanatory variables, 4 of them categorical ('0' or '1') , 4 of them continuous. Beforehand I want to be sure there's no multicollinearity, so I use the variance inflation factor (vif function from the car package) :
mod1<-polr(Y ~ X1+X2+X3+X4+X5+X6+X7+X8, Hess = T, data=df)
vif(mod1)
but I get a VIF value of 125 for one of the variables, as well as the following warning :
Warning message: In vif.default(mod1) : No intercept: vifs may not be sensible.
However, when I convert my dependent variable to numeric (instead of a factor), and do the same thing with a linear model :
mod2<-lm(Y ~ X1+X2+X3+X4+X5+X6+X7+X8, data=df)
vif(mod2)
This time all the VIF values are below 3, suggesting that there's no multicollinearity.
I am confused about the vif function. How can it return VIFs > 100 for one model and low VIFs for another ? Should I stick with the second result and still do an ordinal model anyway ?
The vif() function uses determinants of the correlation matrix of the parameters (and subsets thereof) to calculate the VIF. In the linear model, this includes just the regression coefficients (excluding the intercept). The vif() function wasn't intended to be used with ordered logit models. So, when it finds the variance-covariance matrix of the parameters, it includes the threshold parameters (i.e., intercepts), which would normally be excluded by the function in a linear model. This is why you get the warning you get - it doesn't know to look for threshold parameters and remove them. Since the VIF is really a function of inter-correlations in the design matrix (which doesn't depend on the dependent variable or the non-linear mapping from the linear predictor into the space of the response variable [i.e., the link function in a glm]), you should get the right answer with your second solution above, using lm() with a numeric version of your dependent variable.

using lambda.min to extrace coefficients from model trained with glmnet

I am using glmnet to train the logistic regression model and then try to obtain the coefficients with the specific lambda. I used the simple example here:
load("BinomialExample.RData")
fit = glmnet(x, y, family = "binomial")
coef(fit, s = c(0.05,0.01))
I have checked the values of fit$lambda, however, I could not find the specific values of 0.05 or 0.01 in fit$lambda. So how could coef return the coefficients with a lambda not in the fit$lambda vector.
This is explained in the help for coef.glmnet, specifically the exact argument:
exact
This argument is relevant only when predictions are made at values of s (lambda) different from those used in the fitting of the original model. If exact=FALSE (default), then the predict function uses linear interpolation to make predictions for values of s (lambda) that do not coincide with those used in the fitting algorithm. While this is often a good approximation, it can sometimes be a bit coarse. With exact=TRUE, these different values of s are merged (and sorted) with object$lambda, and the model is refit before predictions are made.

Obtaining predicted (i.e. expected) values from the orm function (Ordinal Regression Model) from rms package in R

I've run a simple model using orm (i.e. reg <- orm(formula = y ~ x)) and I'm having trouble understanding how to get predicted values for Y. I've never worked with models that use multiple intercepts. I want to know for each and every value of Y in my dataset what the predicted value from the model would be. I tried predict(reg, type="mean") and this produced values that are close to the predicted values from an OLS regression, but I'm not sure if this is what I want. I really just want something analogous to OLS where you can obtain the E(Y) given a set of predictors. If possible, please provide code I can run to do this with a very short explanation.

R, Multinomial Regression: How to Find Conditional Probabilities?

In R, given a multinomial linear logit regression, I would need to obtain the conditional probability given some values of the predictors.
For example, using the function multinom from the package nnet, imagine to have computed fit <- multinom(response ~ predictor). From fit, how can I obtain the probability weights of the different response classes, given a certain value of the predictor?
I thought of using something like predict(fit,newdata,type=???), but I have no idea about how to continue.
I found a possible solution: predict(fit, newdata = predictor, "probs"). In this way, I was able to find the probability weights for all the values of the predictor: every row corresponds to a certain value.

How to fit a model I built to another data set and get residuals?

I fitted a mixed model to Data A as follows:
model <- lme(Y~1+X1+X2+X3, random=~1|Class, method="ML", data=A)
Next, I want to see how the model fits Data B and also get the estimated residuals. Is there a function in R that I can use to do so?
(I tried the following method but got all new coefficients.)
model <- lme(Y~1+X1+X2+X3, random=~1|Class, method="ML", data=B)
The reason you are getting new coefficients in your second attempt with data=B is that the function lme returns a model fitted to your data set using the formula you provide, and stores that model in the variable model as you have selected.
To get more information about a model you can type summary(model_name). the nlme library includes a method called predict.lme which allows you to make predictions based on a fitted model. You can type predict(my_model) to get the predictions using the original data set, or type predict(my_model, some_other_data) as mentioned above to generate predictions using that model but with a different data set.
In your case to get the residuals you just need to subtract the predicted values from observed values. So use predict(my_model,some_other_data) - some_other_data$dependent_var, or in your case predict(model,B) - B$Y.
You model:
model <- lme(Y~1+X1+X2+X3, random=~1|Class, method="ML", data=A)
2 predictions based on your model:
pred1=predict(model,newdata=A,type='response')
pred2=predict(model,newdata=B,type='response')
missed: A function that calculates the percent of false positives, with cut-off set to 0.5.
(predicted true but in reality those observations were not positive)
missed = function(values,prediction){sum(((prediction > 0.5)*1) !=
values)/length(values)}
missed(A,pred1)
missed(B,pred2)

Resources