In R, given a multinomial linear logit regression, I would need to obtain the conditional probability given some values of the predictors.
For example, using the function multinom from the package nnet, imagine to have computed fit <- multinom(response ~ predictor). From fit, how can I obtain the probability weights of the different response classes, given a certain value of the predictor?
I thought of using something like predict(fit,newdata,type=???), but I have no idea about how to continue.
I found a possible solution: predict(fit, newdata = predictor, "probs"). In this way, I was able to find the probability weights for all the values of the predictor: every row corresponds to a certain value.
Related
Can someone please help me with the equivalent of the mnrval function in R? I have not been able to find one where predicted probabilities are returned based on arguments, coefficient estimates and predictor values. I tried to rewrite the Matlab function in R but was unable to because one of the inner functions that was used was private. I would highly appreciate your help on this.
The documentation page on mnrval() states
MNRVAL Predict values for a nominal or ordinal multinomial regression model.
PHAT = MNRVAL(B,X) computes predicted probabilities for the nominal
multinomial logistic regression model with predictor values X. B is the
intercept and coefficient estimates as returned by the MNRFIT function. X
is an N-by-P design matrix with N observations on P predictor variables.
MNRVAL automatically includes intercept (constant) terms in the model; do
not enter a column of ones directly into X. PHAT is an N-by-K matrix of
predicted probabilities for each multinomial category.
I am running an ordinal regression model. I have 8 explanatory variables, 4 of them categorical ('0' or '1') , 4 of them continuous. Beforehand I want to be sure there's no multicollinearity, so I use the variance inflation factor (vif function from the car package) :
mod1<-polr(Y ~ X1+X2+X3+X4+X5+X6+X7+X8, Hess = T, data=df)
vif(mod1)
but I get a VIF value of 125 for one of the variables, as well as the following warning :
Warning message: In vif.default(mod1) : No intercept: vifs may not be sensible.
However, when I convert my dependent variable to numeric (instead of a factor), and do the same thing with a linear model :
mod2<-lm(Y ~ X1+X2+X3+X4+X5+X6+X7+X8, data=df)
vif(mod2)
This time all the VIF values are below 3, suggesting that there's no multicollinearity.
I am confused about the vif function. How can it return VIFs > 100 for one model and low VIFs for another ? Should I stick with the second result and still do an ordinal model anyway ?
The vif() function uses determinants of the correlation matrix of the parameters (and subsets thereof) to calculate the VIF. In the linear model, this includes just the regression coefficients (excluding the intercept). The vif() function wasn't intended to be used with ordered logit models. So, when it finds the variance-covariance matrix of the parameters, it includes the threshold parameters (i.e., intercepts), which would normally be excluded by the function in a linear model. This is why you get the warning you get - it doesn't know to look for threshold parameters and remove them. Since the VIF is really a function of inter-correlations in the design matrix (which doesn't depend on the dependent variable or the non-linear mapping from the linear predictor into the space of the response variable [i.e., the link function in a glm]), you should get the right answer with your second solution above, using lm() with a numeric version of your dependent variable.
I have a question similar to the one here: Testing the difference between marginal effects calculated across factors. I used the same code to generate average marginal effects for two groups. The difference is that I am running a logistic rather than linear regression model. My average marginal effects are on the probability scale, so emmeans will not provide the correct contrast. Does anyone have any suggestions for how to test whether there is a significant difference in the average marginal effects between group 1 and group 2?
Thank you so much,
Ilana
It is a bit unclear what the issue really is, but I'll try. I'm supposing your logistic regression model was fitted using, say, glm:
mod <- glm(cbind(heads, tails) ~ treat, data = mydata, family = binomial())
If you then do
emm <- emmeans(mod, "treat")
emm ### marginal means
pairs(emm) ### differences
Your results will be presented on the logit scale.
If you want them on the probability scale, you can do
summary(emm, type = "response")
summary(pairs(emm), type = "response")
However, the latter will back-transform the differences of logits, thereby producing odds ratios.
If you actually want differences of probabilities rather than ratios of odds, use regrid(), which will construct a new grid of values after back-transforming (and hence it will forget the log transformation):
pairs(regrid(emm))
It seems possible that two or more factors are present and you want contrasts of contrasts on the probability scale. In that case, extend this idea by calling regrid() on the table of EMMs to put everything on the probability scale, then follow the analogous procedure used in the linked article.
I'm using the gmnl function to fit a mixed multinomial logit model.
Since I'm further interested in the predicted probaliities of that model, I want to obtain them by applying something like the predic function.
m4=gmnl(int_choice ~ 1+fico+annual_inc+int_emp_length+| time +grade+ last_fico |0, data = mldata, model="mixl",R=50,panel=TRUE,correlation = TRUE,ranp=c(annual_inc="n",int_emp_length="n"))
## how to mimic predict??
p_hat=predict(m4,type="probs")
Any suggestions?
What you are looking for is a simple conversion rule like this:
Convert a logit (glm output) to probability by computing exp() of the coefficients, which will give you the odds.
Convert the odds into probability using this formula: prob = odds / (1 + odds).
Very good explanation with examples can be found here: https://sebastiansauer.github.io/convert_logit2prob/
The solution can't be found in the vignette but is documented in the help-file. Typing help("fitted.gmnl") yields the following:
fitted(object, outcome = TRUE, ...)
if TRUE, then the fitted and residuals methods return a vector that corresponds to the chosen alternative, otherwise it returns a matrix where each column corresponds to each alternative.
I type str() over the gmnl model
I find an internal attribute called prob.alt that gives choice - residuals
So in your case m4$prob.alt gives some usefull values, (find the max by row gives the predicted choice)
(In my case (a latent class mnl) this do not help since it has the predicted latent class probabilities)
I've run a simple model using orm (i.e. reg <- orm(formula = y ~ x)) and I'm having trouble understanding how to get predicted values for Y. I've never worked with models that use multiple intercepts. I want to know for each and every value of Y in my dataset what the predicted value from the model would be. I tried predict(reg, type="mean") and this produced values that are close to the predicted values from an OLS regression, but I'm not sure if this is what I want. I really just want something analogous to OLS where you can obtain the E(Y) given a set of predictors. If possible, please provide code I can run to do this with a very short explanation.