mlogit package in R: intercept and alternative specific individual variables - r

I'm trying to deal with the package mlogit in R to build up a transportation-mode choice model. I searched similar problems but I've not found anything.
I have a set of 3 alternatives (walk, auto, transit) in a logit model, with alternative specific variables (same parameters for different alternatives) and individual alternative specific variables (ex: 0(if no)/1(if yes) home-destination trip, just for walk mode).
I'd like to have an intercept in only one of the alternatives (auto), but I'm not able to do this. Using reflevel, that refers to only one of the alternatives, I get two intercepts.
ml.data <- mlogit(choice ~ t + cost | dhome, mode, reflevel = "transit")
This is not working as I wish.
Moreover, I'd like to set the alternative specific variables as I said before. Insert them in part 2 of mlogit formula takes me two parameter values, but I'd like to have just one parameter, for the mentioned alternative.
Could anyone help me?

You cannot do what you want. It's not a question of mlogit in particular, it's a question of how multinomial logistic regression works. If your dependent variable has 3 levels, you will have 2 intercepts. And you have to use the same independent variables for the whole model (that's true for all methods of regression).

However, referring to the second part of the question (" individual alternative specific variables (ex: 0(if no)/1(if yes) home-destination trip, just for walk mode") I tried to modify the dataset by inserting 3 columns (dhome.auto [all zeros], dhome.transit [all zeros] and dhome.walk [0 if no / 1 if yes it's a home-destination trip]) in order to obtain this variable effective just for walk mode, even if it's now traited as an alternative specific variable. Then
ml.data <- mlogit(choice ~ t + cost + dhome, mode, reflevel = "transit")
it's a kind of a trick, but it seems to work

Related

What is the R equivalent to the e(sample) command in Stata?

I'm running conditional logistic regression models in R as part of a discordant sibling pair analysis and I need to isolate the total n for each model. Also, I need to isolate the number and % of cases of the disease in the exposed and unexposed groups.
In Stata the e(sample) == 1 command gives this info. Is there an equivalent function for accomplishing this in R?
In R, if you run a regression you create a regression object.
RegOb <- lm(y ~ x1 + x2, data)
Often people call "RegOb" which uses the internal "print" method of this type of object. Alternative "summary(RegOb)" is popular (and often people would assign this).
However, RegOb contains many information about the regression. So in Stata you could use -ereturn list- to see what is saved. In R I would recommend to use "str(RegOb)" or "View(RegOb)" you will see everything that is saved. I have forgotten the correct syntax atm, but it will be something like:
RegOb$data
And since you have the original data, you can simply use a logical statement based on the original and the used data which will give you the estimation sample.

How to account for two covariates in differential gene expression of single cell RNA seq data?

I have human data from different ages and gender. After using integration with seurat, how would I best control for these confounding factors during differential gene expression analysis. I see the option of latent.vars in FindMarkers function. Can I give latent.vars = c("Age", "gender") to account for both together? or can I only use one at a time?
Is there alternative package to do the test better?
Thanks in advance!
You can use that argument, but what it means is that you are shifting to a glm based model, and not the default wilcoxon test. you can also see it in the help page (?FindMarkers) :
latent.vars: Variables to test, used only when ‘test.use’ is one of
'LR', 'negbinom', 'poisson', or 'MAST'
You can see how the glm is called in the source code, under GLMDETest. Basically these two covariates are included in the glm to account for their effects on the dependent variable. What is also important is how you treat the covariate age in this case. Would it be categorical or continuous.. this could affect your results.

I want to use a fixed effects model on a regression with one variable being the group variable

I am using felm() and the code is running on all the model… but I need it to run on state only… the problem asks "Estimate the model using fixed effects (FE) at the state level". Using felm() is not getting me the correct results because I don't know if I need to include state as a dependent variable (doesn't give me correct answers) or how to specify that one variable needs to be the group variable (I'm assuming this is how to get accurate results).
I have tried using
plm(ind~depvar+state,data=data, model='within')
I have tried using
felm(ind~depvar+state,data=data)
FELinMod3<-felm(DRIVING$totfatrte~DRIVING$D81+DRIVING$state, data=DRIVING)
FELinMod3<-plm(DRIVING$totfatrte~DRIVING$D81+DRIVING$state, data=DRIVING, model='within')
output is giving me incorrect coefficients to the ones I know are correct in STATA.
looks like felm() is for when you have multiple grouping variables, but it sounds like you're using only one grouping variable for fixed effects? (i.e., state).
you should get the same correct result for
mod3 <- lm(totfatrte ~ D81 + state, data = DRIVING)
Also, if the coefficients or standard errors disagree between stata and R, that doesn't necessarily mean that R is wrong.
Reading the documentation for felm() indicates that your code should look more like this:
model3<-felm(totfatrte ~ D81 | state, data = DRIVING)
but the code specifications for it are pretty complex based on whether you want to cluster your standard errors and so on.
Hope this helps.

Most straightforward R package for setting subject as random effect in mixed logit model

I have a dataset in which individuals, each belonging to a particular group, repeatedly chose between multiple discrete outcomes.
subID group choice
1 Big A
1 Big B
2 Small B
2 Small B
2 Small C
3 Big A
3 Big B
. . .
. . .
I want to test how group membership influences choice, and want to account for non-independence of observations due to repeated choices being made by the same individuals. In turn, I planned to implement a mixed multinomial regression treating group as a fixed effect and subID as a random effect. It seems that there are a few options for multinomial logits in R, and I'm hoping for some guidance on which may be most easily implemented for this mixed model:
1) multinom - GLM, via nnet, allows the usage of the multinom function. This appears to be a nice, clear, straightforward option... for fixed effect models. However is there a manner to implement random effects with multinom? A previous CV post suggests that multinom is able to handle mixed-effects GLM with poisson distribution and a log link. However, I don't understand (a) why this is the case or (b) the required syntax. Can anyone clarify?
2) mlogit - A fantastic package, with incredibly helpful vignettes. However, the "mixed logit" documentation refers to models that have random effects related to alternative specific covariates (implemented via the rpar argument). My model has no alternative specific variables; I simply want to account for the random intercepts of the participants. Is this possible with mlogit? Is that variance automatically accounted for by setting subID as the id.var when shaping the data to long form with mlogit.data? EDIT: I just found an example of "tricking" mlogit to provide random coefficients for variables that vary across individuals (very bottom here), but I don't quite understand the syntax involved.
3) MCMCglmm is evidently another option. However, as a relative novice with R and someone completely unfamiliar with Bayesian stats, I'm not personally comfortable parsing example syntax of mixed logits with this package, or, even following the syntax, making guesses at priors or other needed arguments.
Any guidance toward the most straightforward approach and its syntax implementation would be thoroughly appreciated. I'm also wondering if the random effect of subID needs to be nested within group (as individuals are members of groups), but that may be a question for CV instead. In any case, many thanks for any insights.
I would recommend the Apollo package by Hess & Palma. It comes with a great documentation and a quite helpful user group.

How to fit a multitple linear regression model on 1664 explantory variables in R

I have one response variable, and I'm trying to find a way of fitting a multiple linear regression model using 1664 different explanatory variables. I'm quite new to R and was taught the way of doing this by stating the formula using each of the explanatory variables in the formula. However as I have 1664 variables, it would take too long to do. Is there a quicker way of doing this?
Thank you!
I think you want to select from the 1664 variables a valid model, i.e. a model that predicts as much of the variability in the data with as few explanatory variables. There are several ways of doing this:
Using expert knowledge to select variables that are known to be relevant. This can be due to other studies finding this, or due to some underlying process that you now makes that variable relevant.
Using some kind of stepwise regression approach which selects the variables are relevant based on how well they explain the data. Do note that this method has some serious downsides. Have a look at stepAIC for a way of doing this using the Aikaike Information Criterium.
Correlating 1664 variables with data will yield around 83 significant correlations if you choose a 95% significance level (0.05 * 1664) purely based on randomness. So, tread carefully with the automatic variable selection. Cutting down the amount of variables with expert knowledge or some decorrelation techniques (e.g. principal component analysis) would help.
For a code example, you first need to include an example of your own (data + code) on which I can build.
I'll answer the programming question, but note that often a regression with that many variables could use some sort of variable selection procedure (e.g. #PaulHiemstra's suggestions).
You can construct a data.frame with only the variables you want to run, then use the formula shortcut: form <- y ~ ., where the dot indicates all variables not yet mentioned.
You could instead construct the formula manually. For instance: form <- as.formula( paste( "y ~", paste(myVars,sep="+") ) )
Then run your regression:
lm( form, data=dat )

Resources