I have asked this question on Cross Validated, but think I might not get help as this is more of a programming question rather then theory/interpretation of the statistics.
I am trying to use the mlogit package in R and have been following the vignette trying to figure out how to get the marginal effects for my data. The example provided uses continuous variables, but I am wondering how to do this with categorical explanatory variables.
I have a value of risk which is continuous as a covariate, but I also have age, class, and gender as covariates. I want to see the marginal effects of "females" only or of "Young - females" in regard to risk. How would I do this?
The help documents say:
z <- with(Fish, data.frame(price = tapply(price, index(m)$alt, mean),
catch = tapply(catch, index(m)$alt, mean),
income = mean(income)))
# compute the marginal effects (the second one is an elasticity
effects(m, covariate = "income", data = z)
effects(m, covariate = "price", type = "rr", data = z)
effects(m, covariate = "catch", type = "ar", data = z)
I'm not sure how to manipulate the z data frame to get the mean risk for females or young females to then be able to calculate the marginal effects. Would I do them all separately? Do I somehow divide the data frame by age class (say I have just 2 age classes: young and old) so that I have 1 data frame for the young, and a separate new data frame for old, then calculate mean risk?
What I am hoping to get from my own data is to be able to interpret the magnitude of the likelihood in producing my categories of offspring. As an example, what I want to say is that if there is a 1 unit increase in risk, it is 10% more likely for older females to produce 2 offspring. As there is a 1 unit increase in risk, younger females are 15% more likely to produce 2 offspring.
I am not sure how to calculate the marginal effects by hand, and therefore am confused as to how to get a package to do it for me. Ive also been trying in the nnet library or the VGAM, but neither of these seem to give a great deal of help either.
I sort of got an answer - not sure if its the best, but it worked. My covariate that I was interested in just so happened to be 2 classes - which means I could turn the covariate into a binary 0,1 numeric response. Therefore, when I rerun the code, I could then calculate the mean for this "categorical" variable.
However, I think that I am missing the point with this marginal effect and am likely using it or trying to interpret it incorrectly.
Related
I am a novice with R and this is a very basic question. I am using lmer to fit a mixed model to a data frame, as follows:
model1=lmer(Mass~Season + Area + Month + (1|Season:Month), data=Transdata)
and then using ggplot2 to plot the fitted data and various diagnostics. For example, for fitted values:
ggplot(model1, aes(x = Season, y = Mass)) + geom_point()
gives me a plot of Mass per Season, shown separately for each of the 3 different areas and 4 different months. Is there a way in which I can get a single estimate of the mean Mass per Season integrated across the different Areas and Months (i.e. from the fixed effects), and e.g. the SEs for each estimate?
Probably the easiest way to do this is
library(emmeans)
emmeans(model1, ~ Season)
I think you could reparameterize/modify the formula so that the parameters corresponded to means per season (rather than the default, which is to fit an intercept corresponding to the first season and then parameterize the model in terms of differences between seasons), but using emmeans is easier.
Is there a way in which I can get a single estimate of the mean Mass per Season integrated across the different Areas and Months (i.e. from the fixed effects), and e.g. the SEs for each estimate?
If I have understood the question properly, surely just the output from summary(model1) will provide this. It will give a separate estimate for each level of Season, apart from the reference level, and each estimate is then the expected difference in Mass for each Season relative to the reference level, keeping the other fixed effects constant, which would seem to answer your question.
Edit: After re-reading the question, the title seems to ask a different question to the body. As for the title:
With a lmer model, can I extract the fitted values of 'y' for the whole model
Yes, you can simply run
fitted(model1)
I have a response Y that is a percentage ranging between 0-1. My data is nested by taxonomy or evolutionary relationship say phylum/genus/family/species and I have one continuous covariate temp and one categorial covariate fac with levels fac1 & fac2.
I am interested in estimating:
is there a difference in Y between fac1 and fac2 (intercept) and how much variance is explained by that
does each level of fac responds differently in regard to temp (linearly so slope)
is there a difference in Y for each level of my taxonomy and how much variance is explained by those (see varcomp)
does each level of my taxonomy responds differently in regard to temp (linearly so slope)
A brute force idea would be to split my data into the lowest taxonomy here species, do a linear beta regression for each species i as betareg(Y(i)~temp) . Then extract slope and intercepts for each speies and group them to a higher taxonomic level per fac and compare the distribution of slopes (intercepts) say, via Kullback-Leibler divergence to a distribution that I get when bootstrapping my Y values. Or compare the distribution of slopes (or interepts) just between taxonomic levels or my factor fac respectively.Or just compare mean slopes and intercepts between taxonomy levels or my factor levels.
Not sure is this is a good idea. And also not sure of how to answer the question of how many variance is explained by my taxonomy level, like in nested random mixed effect models.
Another option may be just those mixed models, but how can I include all the aspects I want to test in one model
say I could use the "gamlss" package to do:
library(gamlss)
model<-gamlss(Y~temp*fac+re(random=~1|phylum/genus/family/species),family=BE)
But here I see no way to incorporate a random slope or can I do:
model<-gamlss(Y~re(random=~temp*fac|phylum/genus/family/species),family=BE)
but the internal call to lme has some trouble with that and guess this is not the right notation anyways.
Is there any way to achive what I want to test, not necessarily with gamlss but any other package that inlcuded nested structures and beta regressions?
Thanks!
In glmmTMB, if you have no exact 0 or 1 values in your response, something like this should work:
library(glmmTMB)
glmmTMB(Y ~ temp*fac + (1 + temp | phylum/genus/family/species),
data = ...,
family = beta_family)
if you have zero values, you will need to do something . For example, you can add a zero-inflation term in glmmTMB; brms can handle zero-one-inflated Beta responses; you can "squeeze" the 0/1 values in a little bit (see the appendix of Smithson and Verkuilen's paper on Beta regression). If you have only a few 0/1 values it won't matter very much what you do. If you have a lot, you'll need to spend some serious time thinking about what they mean, which will influence how you handle them. Do they represent censoring (i.e. values that aren't exactly 0/1 but are too close to the borders to measure the difference)? Are they a qualitatively different response? etc. ...)
As I said in my comment, computing variance components for GLMMs is pretty tricky - there's not necessarily an easy decomposition, e.g. see here. However, you can compute the variances of intercept and slope at each taxonomic level and compare them (and you can use the standard deviations to compare with the magnitudes of the fixed effects ...)
The model given here might be pretty demanding, depending on the size of your phylogeny - for example, you might not have enough replication at the phylum level (in which case you could fit the model ~ temp*(fac + phylum) + (1 + temp | phylum:(genus/family/species)), i.e. pull out the phylum effects as fixed effects).
This is assuming that you're willing to assume that the effects of fac, and its interaction with temp, do not vary across the phylogeny ...
I have constructed a mixed effect model using lmer() with the aim of comparing the growth in reading scores for four different groups of children as they age.
I would like to plot a graph of the 4 different slopes with confidence intervals in R in order to visualize this relationship but I keep getting stuck.
I have tried to use the plot function and some versions of the ggplot as I have done for previous lm() models but it isn't working so far. Here is my attempted model which I hope looks at how the change in reading scores over time(age) interacts with a child's SESDLD grouping (this indicated whether a child has a language problem and whether or not they are high or low income).
AgeSES.model <- lmer(ReadingMeasure ~ Age.c*SESDLD1 + (1|childid), data = reshapedomit, REML = FALSE)
The ReadingMeasure is a continuous score, age.c is centered age measured in months. SESDLD1 is a categorical measure which has 4 levels. I would expect four positive slopes of ReadingMeasure growth with different intercepts and probably differing slopes.
I would really appreciate any pointers on how to do this!
Thank you so much!!
The type of plot I would like to achieve - this was done in Stata
I have a df based on a national survey conducted every two years; the time period is 2010-14 and I filtered the df in order to have only person that appears al least two times. In this way I have a panel df but unbalanced.
I run a regression to study which variables influence the participation in complementary pension (it is voluntary in my country). I run a one-side fixed effect regression and now I want to run a two side fixed effect regression (both individual and time).
The individual variable is uid and time variable is year. I used the plm package in r:
df.p <- plm.data(df, c("uid", "year")
and run the regression:
reg1 <- plm(pens ~ woman + age + I(age^2/100) + high + medium + nord + centre, model="within", effect="twoways", data=df.p)
where high and medium are dummies regarding the education level and nord and centre regard geographic location. For the sake of simplicity I omitted other variables that are present in the original model (20 variables).
After at least 1 hour of working I run the summary command:
summary(reg1)
after another hour of working I got the error:
Error in crossprod(t(X), beta) : non-conformable arguments
so I supposed there was a multicollinearity problem. So I check the multicollinearity with the correlation matrix:
p1 <- with(df, data.frame(woman=woman, age=age, high=high, medium=medium, nord=nord, centre=centre))
round(cor(p1),3)
Consider that I created the matrix using all the variables (here omitted for the sake of simplicity, as I wrote). I didn't find any relevant value. I also check for the variance inflation factor:
vif(p1)
and I got:
No variable from the 20 input variables has collinearity problem.
At this point I suppose the the collinearity problem could be determined by the fact that I run a two side regression but I don't know how to manage the problem.
Thanks in advance.
In the afex package we can find this example of ANOVA analysis:
data(obk.long, package = "afex")
# estimate mixed ANOVA on the full design:
# can be written in any of these ways:
aov_car(value ~ treatment * gender + Error(id/(phase*hour)), data = obk.long,
observed = "gender")
aov_4(value ~ treatment * gender + (phase*hour|id), data = obk.long,
observed = "gender")
aov_ez("id", "value", obk.long, between = c("treatment", "gender"),
within = c("phase", "hour"), observed = "gender")
My question is, How can I write the same model in lme4?
In particular, I don't know how to include the "observed" term?
If I just write
lmer(value ~ treatment * gender + (phase*hour|id), data = obk.long,
observed = "gender")
I get an error telling that observed is not a valid option.
Furthermore, if I just remove the observed option lmer produces the error:
Error: number of observations (=240) <= number of random effects (=240) for term (phase * hour | id); the random-effects parameters and the residual variance (or scale parameter) are probably unidentifiable.
Where in the lmer syntax do I specify the "between" or "within" variable?. As far as I know you just write the dependent variable on the left side and all other variables on the right side, and the error term as (1|id).
The package "car" uses the idata for the intra-subject variable.
I might not know enough about classical ANOVA theory to answer this question completely, but I'll take a crack. First, a couple of points:
the observed argument appears only to be relevant for the computation of effect size.
observed: ‘character’ vector indicating which of the variables are
observed (i.e, measured) as compared to experimentally
manipulated. The default effect size reported (generalized
eta-squared) requires correct specification of the obsered [sic]
(in contrast to manipulated) variables.
... so I think you'd be safe leaving it out.
if you want to override the error you can use
control=lmerControl(check.nobs.vs.nRE="ignore")
... but this probably isn't the right way forward.
I think but am not sure that this is the right way:
m1 <- lmer(value ~ treatment * gender + (1|id/phase:hour), data = obk.long,
control=lmerControl(check.nobs.vs.nRE="ignore",
check.nobs.vs.nlev="ignore"),
contrasts=list(treatment=contr.sum,gender=contr.sum))
This specifies that the interaction of phase and hour varies within id. The residual variance and (phase by hour within id) variance are confounded (which is why we need the overriding lmerControl() specification), so don't trust those particular variance estimates. However, the main effects of treatment and gender should be handled just the same. If you load lmerTest instead of lmer and run summary(m1) or anova(m1) it gives you the same degrees of freedom (10) for the fixed (gender and treatment) effects that are computed by afex.
lme gives comparable answers, but needs to have the phase-by-hour interaction constructed beforehand:
library(nlme)
obk.long$ph <- with(obk.long,interaction(phase,hour))
m2 <- lme(value ~ treatment * gender,
random=~1|id/ph, data = obk.long,
contrasts=list(treatment=contr.sum,gender=contr.sum))
anova(m2,type="marginal")
I don't know how to reconstruct afex's tests of the random effects.
As Ben Bolker correctly says, simply leave observed out.
Furthermore, I would not recommend to do what you want to do. Using a mixed model for a data set without replications within each cell of the design per participant is somewhat questionable as it is not really clear how to specify the random effects structure. Importantly, the Barr et al. maxim of "keep it maximal" does not work here as you realized. The problem is that the model is overparametrized (hence the error from lmer).
I recommend using the ANOVA. More discussion on exactly this question can be found on a crossvalidated thread where Ben and me discussed this more thoroughly.