I have to fit an LMM with an interaction random effect but without the marginal random effect, using the lme command. That is, I want to fit the model in oats.lmer (see below) but using the function lme from the nlme package.
The code is
require("nlme")
require("lme4")
oats.lmer <- lmer(yield~nitro + (1|Block:Variety), data = Oats)
summary(oats.lmer)
#Linear mixed model fit by REML ['lmerMod']
#Formula: yield ~ nitro + (1 | Block:Variety)
# Data: Oats
#
#REML criterion at convergence: 598.1
#
#Scaled residuals:
# Min 1Q Median 3Q Max
#-1.66482 -0.72807 -0.00079 0.56416 1.85467
#
#Random effects:
# Groups Name Variance Std.Dev.
# Block:Variety (Intercept) 306.8 17.51
# Residual 165.6 12.87
#Number of obs: 72, groups: Block:Variety, 18
#
#Fixed effects:
# Estimate Std. Error t value
#(Intercept) 81.872 4.846 16.90
#nitro 73.667 6.781 10.86
#
#Correlation of Fixed Effects:
# (Intr)
#nitro -0.420
I started playing with this
oats.lme <- lme(yield~nitro, data = Oats, random = (~1|Block/Variety))
summary(oats.lme)
#Linear mixed-effects model fit by REML
# Data: Oats
# AIC BIC logLik
# 603.0418 614.2842 -296.5209
#
#Random effects:
# Formula: ~1 | Block
# (Intercept)
#StdDev: 14.50596
#
# Formula: ~1 | Variety %in% Block
# (Intercept) Residual
#StdDev: 11.00468 12.86696
#
#Fixed effects: yield ~ nitro
# Value Std.Error DF t-value p-value
#(Intercept) 81.87222 6.945273 53 11.78819 0
#nitro 73.66667 6.781483 53 10.86291 0
# Correlation:
# (Intr)
#nitro -0.293
#
#Standardized Within-Group Residuals:
# Min Q1 Med Q3 Max
#-1.74380770 -0.66475227 0.01710423 0.54298809 1.80298890
#
#Number of Observations: 72
#Number of Groups:
# Block Variety %in% Block
# 6 18
but the problem is that it puts also a marginal random effect for Variety which I want to omit.
The question is: how to specify the random effects in oats.lme such that oats.lme is identical (at least structurally) to oats.lmer ?
It can be as simple as following:
library(nlme)
data(Oats)
## construct an auxiliary factor `f` for interaction / nesting effect
Oats$f <- with(Oats, Block:Variety)
## use `random = ~ 1 | f`
lme(yield ~ nitro, data = Oats, random = ~ 1 | f)
#Linear mixed-effects model fit by REML
# Data: Oats
# Log-restricted-likelihood: -299.0328
# Fixed: yield ~ nitro
#(Intercept) nitro
# 81.87222 73.66667
#
#Random effects:
# Formula: ~1 | f
# (Intercept) Residual
#StdDev: 17.51489 12.86695
#
#Number of Observations: 72
#Number of Groups: 18
Related
Simple logistic regression example.
set.seed(1)
df <- data.frame(out=c(0,1,0,1,0,1,0,1,0),
y=rep(c('A', 'B', 'C'), 3))
result <-glm(out~factor(y), family = 'binomial', data=df)
summary(result)
#Call:
#glm(formula = out ~ factor(y), family = "binomial", data = df)
#Deviance Residuals:
# Min 1Q Median 3Q Max
#-1.4823 -0.9005 -0.9005 0.9005 1.4823
#Coefficients:
# Estimate Std. Error z value Pr(>|z|)
#(Intercept) -6.931e-01 1.225e+00 -0.566 0.571
#factor(y)B 1.386e+00 1.732e+00 0.800 0.423
#factor(y)C 3.950e-16 1.732e+00 0.000 1.000
#(Dispersion parameter for binomial family taken to be 1)
# Null deviance: 12.365 on 8 degrees of freedom
#Residual deviance: 11.457 on 6 degrees of freedom
#AIC: 17.457
#Number of Fisher Scoring iterations: 4
My reference category is now A; results for B and C relative to A are given. I would also like to get the results when B and C are the reference. One can change the reference manually by using levels = in factor(); but this would require fitting 3 models. Is it possible to do this in one go? Or what would be a more efficient approach?
If you want to do all pairwise comparisons, you should usually also do a correction for alpha-error inflation due to multiple testing. You can easily do a Tukey test with package multcomp.
set.seed(1)
df <- data.frame(out=c(0,1,0,1,0,1,0,1,0),
y=rep(c('A', 'B', 'C'), 3))
#y is already a factor, if not, coerce before the model fit
result <-glm(out~y, family = 'binomial', data=df)
summary(result)
library(multcomp)
comps <- glht(result, linfct = mcp(y = "Tukey"))
summary(comps)
#Simultaneous Tests for General Linear Hypotheses
#
#Multiple Comparisons of Means: Tukey Contrasts
#
#
#Fit: glm(formula = out ~ y, family = "binomial", data = df)
#
#Linear Hypotheses:
# Estimate Std. Error z value Pr(>|z|)
#B - A == 0 1.386e+00 1.732e+00 0.8 0.703
#C - A == 0 1.923e-16 1.732e+00 0.0 1.000
#C - B == 0 -1.386e+00 1.732e+00 -0.8 0.703
#(Adjusted p values reported -- single-step method)
#letter notation often used in graphs and tables
cld(comps)
# A B C
#"a" "a" "a"
I am doing multiple OLS regressions. I have used the following lm function:
GroupNetReturnsStockPickers <- read.csv("GroupNetReturnsStockPickers.csv", header=TRUE, sep=",", dec=".")
ModelGroupNetReturnsStockPickers <- lm(StockPickersNet ~ Mkt.RF+SMB+HML+WML, data=GroupNetReturnsStockPickers)
names(GroupNetReturnsStockPickers)
summary(ModelGroupNetReturnsStockPickers)
Which gives me the summary output of:
Call:
lm(formula = StockPickersNet ~ Mkt.RF + SMB + HML + WML, data = GroupNetReturnsStockPickers)
Residuals:
Min 1Q Median 3Q Max
-0.029698 -0.005069 -0.000328 0.004546 0.041948
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.655e-05 5.981e-04 0.078 0.938
Mkt.RF -1.713e-03 1.202e-02 -0.142 0.887
SMB 3.006e-02 2.545e-02 1.181 0.239
HML 1.970e-02 2.350e-02 0.838 0.403
WML 1.107e-02 1.444e-02 0.766 0.444
Residual standard error: 0.009029 on 251 degrees of freedom
Multiple R-squared: 0.01033, Adjusted R-squared: -0.005445
F-statistic: 0.6548 on 4 and 251 DF, p-value: 0.624
This is perfect. However, I am doing a total of 10 multiple OLS regressions, and I wish to create my own summary output, in a data frame, where I extract the Intercept Estimate, the tvalue estimate, and the p-value, for all 10 analyzes individually. Hence it would be a 10x3, where the columns names would be Model1, Model2,..,Model10, and row names: Value, t-value and p-Value.
I appreciate any help.
There's a few packages that do this (stargazer and texreg) as well as this code for outreg.
In any case, if you are only interested in the intercept here is one approach:
# Estimate a bunch of different models, stored in a list
fits <- list() # Create empty list to store models
fits$model1 <- lm(Ozone ~ Solar.R, data = airquality)
fits$model2 <- lm(Ozone ~ Solar.R + Wind, data = airquality)
fits$model3 <- lm(Ozone ~ Solar.R + Wind + Temp, data = airquality)
# Combine the results for the intercept
do.call(cbind, lapply(fits, function(z) summary(z)$coefficients["(Intercept)", ]))
# RESULT:
# model1 model2 model3
# Estimate 18.598727772 7.724604e+01 -64.342078929
# Std. Error 6.747904163 9.067507e+00 23.054724347
# t value 2.756222869 8.518995e+00 -2.790841389
# Pr(>|t|) 0.006856021 1.052118e-13 0.006226638
Look at the broom package, which was created to do exactly what you are asking for. The only difference is that it puts the models into rows and the different statistics into columns, and I understand that you would prefer the opposite, but you can work around that afterwards if it is really necessary.
To give you an example, the function tidy() converts a model output into a dataframe.
model <- lm(mpg ~ cyl, data=mtcars)
summary(model)
Call:
lm(formula = mpg ~ cyl, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-4.9814 -2.1185 0.2217 1.0717 7.5186
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 37.8846 2.0738 18.27 < 2e-16 ***
cyl -2.8758 0.3224 -8.92 6.11e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.206 on 30 degrees of freedom
Multiple R-squared: 0.7262, Adjusted R-squared: 0.7171
F-statistic: 79.56 on 1 and 30 DF, p-value: 6.113e-10
And
library(broom)
tidy(model)
yields the following data frame:
term estimate std.error statistic p.value
1 (Intercept) 37.88458 2.0738436 18.267808 8.369155e-18
2 cyl -2.87579 0.3224089 -8.919699 6.112687e-10
Look at ?tidy.lm to see more options, for instance for confidence intervals, etc.
To combine the output of your ten models into one dataframe, you could use
library(dplyr)
bind_rows(one, two, three, ... , .id="models")
Or, if your different models come from regressions using the same dataframe, you can combine it with dplyr:
models <- mtcars %>% group_by(gear) %>% do(data.frame(tidy(lm(mpg~cyl, data=.), conf.int=T)))
Source: local data frame [6 x 8]
Groups: gear
gear term estimate std.error statistic p.value conf.low conf.high
1 3 (Intercept) 29.783784 4.5468925 6.550360 1.852532e-05 19.960820 39.6067478
2 3 cyl -1.831757 0.6018987 -3.043297 9.420695e-03 -3.132080 -0.5314336
3 4 (Intercept) 41.275000 5.9927925 6.887440 4.259099e-05 27.922226 54.6277739
4 4 cyl -3.587500 1.2587382 -2.850076 1.724783e-02 -6.392144 -0.7828565
5 5 (Intercept) 40.580000 3.3238331 12.208796 1.183209e-03 30.002080 51.1579205
6 5 cyl -3.200000 0.5308798 -6.027730 9.153118e-03 -4.889496 -1.5105036
I'm trying the following model with the lme4 package:
library(nmle) # for the data
data("Machines") # the data
library(lme4)
# the model:
fit1 <- lmer(score ~ -1 + Machine + (1|Worker), data=Machines)
summary(fit1)
> summary(fit1)
Linear mixed model fit by REML ['lmerMod']
Formula: score ~ -1 + Machine + (1 | Worker)
Data: Machines
REML criterion at convergence: 286.9
Scaled residuals:
Min 1Q Median 3Q Max
-2.7249 -0.5233 0.1328 0.6513 1.7559
Random effects:
Groups Name Variance Std.Dev.
Worker (Intercept) 26.487 5.147
Residual 9.996 3.162
Number of obs: 54, groups: Worker, 6
Fixed effects:
Estimate Std. Error t value
MachineA 52.356 2.229 23.48
MachineB 60.322 2.229 27.06
MachineC 66.272 2.229 29.73
Correlation of Fixed Effects:
MachnA MachnB
MachineB 0.888
MachineC 0.888 0.888
I now try to fit the same model using rstan through the glmer2stan package:
library(glmer2stan)
Machines$Machine_idx <- as.numeric(Machines$Machine)
Machines$Worker_idx <- as.numeric(as.character(Machines$Worker))
fit3 <- lmer2stan(score ~ -1 + Machine_idx + (1|Worker_idx), data=Machines)
this is the result
> stanmer(fit3)
glmer2stan model: score ~ -1 + Machine_idx + (1 | Worker_idx) [gaussian]
Level 1 estimates:
Expectation StdDev 2.5% 97.5%
Machine_idx 7.04 0.55 5.95 8.08
sigma 3.26 0.35 2.66 4.02
Level 2 estimates:
(Std.dev. and correlations)
Group: Worker_idx (6 groups / imbalance: 0)
(Intercept) 55.09 (SE 15.82)
DIC: 287 pDIC: 7.9 Deviance: 271.3
I don't think that's the same model. Is my glmer2stan specification wrong?
I know that glmer2stan is not actively developed any more but it should handle this simple model, shouldn't it?
UPDATE:
thanks to the tip by Roland I changed the Machine factor levels to dummies and it now works:
Machines$Worker <- as.numeric(as.character(Machines$Worker))
m <- model.matrix(~ 0 + ., Machines)
m <- as.data.frame(m)
fit3 <- lmer2stan(score ~ -1 + (1|Worker) + MachineA + MachineB + MachineC, data=m, chains=2)
I would like to fit a mixed effect model that allows me to account for unequal variances across different geos. Specifically, I would like to predict response as a function of a fixed effect X with geo as the random effect.
Here are what the data look like:
X response geo
1 4 5.521461 other
2 4 5.164786 other
3 4 5.164786 other
4 6 3.401197 other
5 5 4.867534 other
6 4 5.010635 other
Unique values for the geo column:
[1] "other" "Atlanta-Sandy Springs-Marietta, GA" "Chicago-Naperville-Joliet, IL-IN-WI" "Dallas-Fort Worth-Arlington, TX"
[5] "Houston-Sugar Land-Baytown, TX" "Los Angeles-Long Beach-Santa Ana, CA" "Miami-Fort Lauderdale-Pompano Beach, FL" "Phoenix-Mesa-Glendale, AZ"
Here is the model that I've attempted:
> lme0 <- lme(response ~ factor(predictor) , random = ~1|factor(geo), data = HC_hired)
> summary(lme0)
Linear mixed-effects model fit by REML
Data: HC_hired
AIC BIC logLik
54770.69 54836.3 -27377.34
Random effects:
Formula: ~1 | factor(geo)
(Intercept) Residual
StdDev: 0.08689381 0.66802
Fixed effects: response ~ factor(predictor)
Value Std.Error DF t-value p-value
(Intercept) 4.255531 0.04410213 26918 96.49264 0.0000
factor(predictor)2 0.022986 0.03336742 26918 0.68889 0.4909
factor(predictor)3 0.166341 0.03221410 26918 5.16361 0.0000
factor(predictor)4 0.299172 0.03194177 26918 9.36618 0.0000
factor(predictor)5 0.378645 0.03249053 26918 11.65402 0.0000
factor(predictor)6 0.472583 0.03664732 26918 12.89543 0.0000
Correlation:
(Intr) fct()2 fct()3 fct()4 fct()5
factor(predictor)2 -0.660
factor(predictor)3 -0.683 0.903
factor(predictor)4 -0.689 0.912 0.945
factor(predictor)5 -0.679 0.897 0.930 0.940
factor(predictor)6 -0.603 0.795 0.824 0.832 0.819
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-4.7047458 -0.3424262 0.1883132 0.7045260 2.1949313
Number of Observations: 26931
Number of Groups: 8
My issue is that the output does not specify a random effect for each level of geo. What is the correct model specification to do this? I've tried many permutations of the formula without luck. Any comments on the overall process are also welcome. Many thanks in advance!
RESPONSE TO COMMENT (coercing geo to factor does not change output):
HC_hired$geo <- as.factor(HC_hired$geo)
lme0 <- lme(response ~ factor(predictor) , random = ~1|factor(geo), data = HC_hired)
summary(lme0)
I'm trying to fit inflated beta regression model to proportional data. I'm using the package gamlss and specifing the family BEINF. I'm wondering how I can extract the p-values of the $mu.coefficients. When I typed the command fit.3$mu.coefficients (as shown at the bottom of the my r code), it gave me only the estimates of Mu coefficients. The following is an example of my data.
mydata = data.frame(y = c(0.014931087, 0.003880983, 0.006048048, 0.014931087,
+ 0.016469269, 0.013111447, 0.012715517, 0.007981377), index = c(1,1,2,2,3,3,4,4))
mydata
y index
1 0.004517611 1
2 0.004351405 1
3 0.007952064 2
4 0.004517611 2
5 0.003434018 3
6 0.003602046 4
7 0.002370690 4
8 0.002993016 4
> library(gamlss)
> fit.3 = gamlss(y ~ factor(index), family = BEINF, data = mydata)
> summary(fit.3)
*******************************************************************
Family: c("BEINF", "Beta Inflated")
Call:
gamlss(formula = y ~ factor(index), family = BEINF, data = mydata)
Fitting method: RS()
-------------------------------------------------------------------
Mu link function: logit
Mu Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -5.3994 0.1204 -44.858 1.477e-06
factor(index)2 0.2995 0.1591 1.883 1.329e-01
factor(index)3 -0.2288 0.1805 -1.267 2.739e-01
factor(index)4 -0.5017 0.1952 -2.570 6.197e-02
-------------------------------------------------------------------
Sigma link function: logit
Sigma Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -4.456 0.2514 -17.72 4.492e-07
-------------------------------------------------------------------
Nu link function: log
Nu Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -21.54 10194 -0.002113 0.9984
-------------------------------------------------------------------
Tau link function: log
Tau Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -21.63 10666 -0.002028 0.9984
-------------------------------------------------------------------
No. of observations in the fit: 8
Degrees of Freedom for the fit: 7
Residual Deg. of Freedom: 1
at cycle: 12
Global Deviance: -93.08548
AIC: -79.08548
SBC: -78.52938
*******************************************************************
fit.3$mu.coefficients
(Intercept) factor(index)2 factor(index)3 factor(index)4
-5.3994238 0.2994738 -0.2287571 -0.5016511
I really appreciate all your help.
Use the save option in summary.gamlss, like this for your model above
fit.3 = gamlss(y ~ factor(index), family = BEINF, data = mydata)
sfit.3<-summary(fit.3, save=TRUE)
sfit.3$mu.coef.table
sfit.3$sigma.coef.table
#to get a list of all the slots in the object
str(sfit.3)
fit.3 = gamlss(y ~ factor(index), family = BEINF, data = mydata.ex)
sfit.3<-summary(fit.3, save=TRUE)
fit.3$mu.coefficients
sfit.3$coef.table # Here use Brackets []
estimate.pval<-data.frame(Intercept=sfit.3$coef.table[1,1],pvalue=sfit.3$coef.table[1,4],
"factor(index)^2"=sfit.3$coef.table[2,1] ,pvalue=sfit.3$coef.table[2,4],
"factor(index)^3"=sfit.3$coef.table[3,1] ,pvalue=sfit.3$coef.table[3,4],
"factor(index)^4"=sfit.3$coef.table[4,1] ,pvalue=sfit.3$coef.table[4,4])