I am a novice with R and this is a very basic question. I am using lmer to fit a mixed model to a data frame, as follows:
model1=lmer(Mass~Season + Area + Month + (1|Season:Month), data=Transdata)
and then using ggplot2 to plot the fitted data and various diagnostics. For example, for fitted values:
ggplot(model1, aes(x = Season, y = Mass)) + geom_point()
gives me a plot of Mass per Season, shown separately for each of the 3 different areas and 4 different months. Is there a way in which I can get a single estimate of the mean Mass per Season integrated across the different Areas and Months (i.e. from the fixed effects), and e.g. the SEs for each estimate?
Probably the easiest way to do this is
library(emmeans)
emmeans(model1, ~ Season)
I think you could reparameterize/modify the formula so that the parameters corresponded to means per season (rather than the default, which is to fit an intercept corresponding to the first season and then parameterize the model in terms of differences between seasons), but using emmeans is easier.
Is there a way in which I can get a single estimate of the mean Mass per Season integrated across the different Areas and Months (i.e. from the fixed effects), and e.g. the SEs for each estimate?
If I have understood the question properly, surely just the output from summary(model1) will provide this. It will give a separate estimate for each level of Season, apart from the reference level, and each estimate is then the expected difference in Mass for each Season relative to the reference level, keeping the other fixed effects constant, which would seem to answer your question.
Edit: After re-reading the question, the title seems to ask a different question to the body. As for the title:
With a lmer model, can I extract the fitted values of 'y' for the whole model
Yes, you can simply run
fitted(model1)
I have constructed a mixed effect model using lmer() with the aim of comparing the growth in reading scores for four different groups of children as they age.
I would like to plot a graph of the 4 different slopes with confidence intervals in R in order to visualize this relationship but I keep getting stuck.
I have tried to use the plot function and some versions of the ggplot as I have done for previous lm() models but it isn't working so far. Here is my attempted model which I hope looks at how the change in reading scores over time(age) interacts with a child's SESDLD grouping (this indicated whether a child has a language problem and whether or not they are high or low income).
AgeSES.model <- lmer(ReadingMeasure ~ Age.c*SESDLD1 + (1|childid), data = reshapedomit, REML = FALSE)
The ReadingMeasure is a continuous score, age.c is centered age measured in months. SESDLD1 is a categorical measure which has 4 levels. I would expect four positive slopes of ReadingMeasure growth with different intercepts and probably differing slopes.
I would really appreciate any pointers on how to do this!
Thank you so much!!
The type of plot I would like to achieve - this was done in Stata
I have a panel data including income for individuals over years, and I am interested in the income trends of individuals, i.e individual coefficients for income over years, and residuals for each individual for each year (the unexpected changes in income according to my model). However, I have a lot of observations with missing income data at least for one or more years, so with a linear regression I lose the majority of my observations. The data structure is like this:
caseid<-c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,4)
years<-c(1998,2000,2002,2004,2006,2008,1998,2000,2002,2004,2006,2008,
1998,2000,2002,2004,2006,2008,1998,2000,2002,2004,2006,2008)
income<-c(1100,NA,NA,NA,NA,1300,1500,1900,2000,NA,2200,NA,
NA,NA,NA,NA,NA,NA, 2300,2500,2000,1800,NA, 1900)
df<-data.frame(caseid, years, income)
I decided using a random effects model, that I think will still predict income for missing years by using a maximum likelihood approach. However, since Hausman Test gives a significant result I decided to use a fixed effects model. And I ran the code below, using plm package:
inc.fe<-plm(income~years, data=df, model="within", effect="individual")
However, I get coefficients only for years and not for individuals; and I cannot get residuals.
To maybe give an idea, the code in Stata should be
xtest caseid
xtest income year
predict resid, resid
Then I tried to run the pvcm function from the same library, which is a function for variable coefficients.
inc.wi<-pvcm(Income~Year, data=ldf, model="within", effect="individual")
However, I get the following error message:
"Error in FUN(X[[i]], ...) : insufficient number of observations".
How can I get individual coefficients and residuals with pvcm by resolving this error or by using some other function?
My original long form data has 202976 observations and 15 years.
Does the fixef function from package plm give you what you are looking for?
Continuing your example:
fixef(inc.fe)
Residuals are extracted by:
residuals(inc.fe)
You have a random effects model with random slopes and intercepts. This is also known as a random coefficients regression model. The missingness is the tricky part, which (I'm guessing) you'll have to write custom code to solve after you choose how you wish to do so.
But you haven't clearly/properly specified your model (at least in your question) as far as I can tell. Let's define some terms:
Let Y_it = income for ind i (i= 1,..., N) in year t (t= 1,...,T). As I read you question, you have not specified which of the two below models you wish to have:
M1: random intercepts, global slope, random slopes
Y_it ~ N(\mu_i + B T + \gamma_i I T, \sigma^2)
\mu_i ~ N(\phi_0, \tau_0^2)
\gamma_i ~ N(\phi_1, tau_1^2)
M2: random intercepts, random slopes
Y_it ~ N(\mu_i + \gamma_i I T, \sigma^2)
\mu_i ~ N(\phi_0, \tau_0^2)
\gamma_i ~ N(\phi_1, tau_1^2)
Also, your example data is nonsensical (see below). As you can see, you don't have enough observations to estimate all parameters. I'm not familiar with library(plm) but the above models (without missingness) can be estimated in lme4 easily. Without a realistic example dataset, I won't bother providing code.
R> table(df$caseid, is.na(df$income))
FALSE TRUE
1 2 4
2 4 2
3 0 6
4 5 1
Given that you do have missingness, you should be able to produce estimates for either hierarchical model via the typical methods, such as EM. But I do think you'll have to write the code to do the estimation yourself.
I have asked this question on Cross Validated, but think I might not get help as this is more of a programming question rather then theory/interpretation of the statistics.
I am trying to use the mlogit package in R and have been following the vignette trying to figure out how to get the marginal effects for my data. The example provided uses continuous variables, but I am wondering how to do this with categorical explanatory variables.
I have a value of risk which is continuous as a covariate, but I also have age, class, and gender as covariates. I want to see the marginal effects of "females" only or of "Young - females" in regard to risk. How would I do this?
The help documents say:
z <- with(Fish, data.frame(price = tapply(price, index(m)$alt, mean),
catch = tapply(catch, index(m)$alt, mean),
income = mean(income)))
# compute the marginal effects (the second one is an elasticity
effects(m, covariate = "income", data = z)
effects(m, covariate = "price", type = "rr", data = z)
effects(m, covariate = "catch", type = "ar", data = z)
I'm not sure how to manipulate the z data frame to get the mean risk for females or young females to then be able to calculate the marginal effects. Would I do them all separately? Do I somehow divide the data frame by age class (say I have just 2 age classes: young and old) so that I have 1 data frame for the young, and a separate new data frame for old, then calculate mean risk?
What I am hoping to get from my own data is to be able to interpret the magnitude of the likelihood in producing my categories of offspring. As an example, what I want to say is that if there is a 1 unit increase in risk, it is 10% more likely for older females to produce 2 offspring. As there is a 1 unit increase in risk, younger females are 15% more likely to produce 2 offspring.
I am not sure how to calculate the marginal effects by hand, and therefore am confused as to how to get a package to do it for me. Ive also been trying in the nnet library or the VGAM, but neither of these seem to give a great deal of help either.
I sort of got an answer - not sure if its the best, but it worked. My covariate that I was interested in just so happened to be 2 classes - which means I could turn the covariate into a binary 0,1 numeric response. Therefore, when I rerun the code, I could then calculate the mean for this "categorical" variable.
However, I think that I am missing the point with this marginal effect and am likely using it or trying to interpret it incorrectly.
So I've looked at a number of similarly themed posts but none of them seem to be exactly what I need, or I simply don't really understand the solutions they offered... So here it goes...
I ran a mixed-effects model with lme4 to look at some chimpanzee data. I have two factors (aggression rate; copulation rate) which affect my dependent (feeding time).
I would like to produce two scatter plots which show the relationship between each of the predictors and the outcome variable but I would like to draw a line, which is derived from the model estimates (and not an abline of the (lm(y ~ x)) type, which only gives a simple regression line, not one based on the full LMM).
I have a sense that this is only possible with ggplot2 but I have not been able to actually figure out how to do this. Having spent most of the day looking through books and forums, I was hoping this is something that may have a fairly straight-forward answer, if one knows what they are doing.
Thanks for any tips in advance!
Alex
To begin with I had the following model:
M3reml
Linear mixed model fit by REML ['lmerMod']
Formula: z.feeding_time ~ z.copul_rate + z.agro_given + z.agro_recd + (1 | Male) + ac_term
Data: N85
where the variables are the z-transformed values of: male chimpanzee feeding time (z.feeding_time); daily copulation rates with females (acts/hr; z.copul_rate); daily rate of aggression given (z.agro_given); and daily rate of aggression received (z.agro_recd). Random effect – male ID for the 12 males of my study; and a temporal autocorellation term (ac_term).
I wanted produce a regression line based on the model estimates for male feeding time.
Getting the estimates:
p1<-predict(M3reml)
Plotting the estimates against male rates of aggression (z-transformed values):
plot(p1~z.agro_given, data=N85)
adding a regression line:
abline(lm(p1~z.agro_given, data=N85))
I would post an image of the plot here but apparently I am not allowed to yet.