I'm trying to fit inflated beta regression model to proportional data. I'm using the package gamlss and specifing the family BEINF. I'm wondering how I can extract the p-values of the $mu.coefficients. When I typed the command fit.3$mu.coefficients (as shown at the bottom of the my r code), it gave me only the estimates of Mu coefficients. The following is an example of my data.
mydata = data.frame(y = c(0.014931087, 0.003880983, 0.006048048, 0.014931087,
+ 0.016469269, 0.013111447, 0.012715517, 0.007981377), index = c(1,1,2,2,3,3,4,4))
mydata
y index
1 0.004517611 1
2 0.004351405 1
3 0.007952064 2
4 0.004517611 2
5 0.003434018 3
6 0.003602046 4
7 0.002370690 4
8 0.002993016 4
> library(gamlss)
> fit.3 = gamlss(y ~ factor(index), family = BEINF, data = mydata)
> summary(fit.3)
*******************************************************************
Family: c("BEINF", "Beta Inflated")
Call:
gamlss(formula = y ~ factor(index), family = BEINF, data = mydata)
Fitting method: RS()
-------------------------------------------------------------------
Mu link function: logit
Mu Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -5.3994 0.1204 -44.858 1.477e-06
factor(index)2 0.2995 0.1591 1.883 1.329e-01
factor(index)3 -0.2288 0.1805 -1.267 2.739e-01
factor(index)4 -0.5017 0.1952 -2.570 6.197e-02
-------------------------------------------------------------------
Sigma link function: logit
Sigma Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -4.456 0.2514 -17.72 4.492e-07
-------------------------------------------------------------------
Nu link function: log
Nu Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -21.54 10194 -0.002113 0.9984
-------------------------------------------------------------------
Tau link function: log
Tau Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -21.63 10666 -0.002028 0.9984
-------------------------------------------------------------------
No. of observations in the fit: 8
Degrees of Freedom for the fit: 7
Residual Deg. of Freedom: 1
at cycle: 12
Global Deviance: -93.08548
AIC: -79.08548
SBC: -78.52938
*******************************************************************
fit.3$mu.coefficients
(Intercept) factor(index)2 factor(index)3 factor(index)4
-5.3994238 0.2994738 -0.2287571 -0.5016511
I really appreciate all your help.
Use the save option in summary.gamlss, like this for your model above
fit.3 = gamlss(y ~ factor(index), family = BEINF, data = mydata)
sfit.3<-summary(fit.3, save=TRUE)
sfit.3$mu.coef.table
sfit.3$sigma.coef.table
#to get a list of all the slots in the object
str(sfit.3)
fit.3 = gamlss(y ~ factor(index), family = BEINF, data = mydata.ex)
sfit.3<-summary(fit.3, save=TRUE)
fit.3$mu.coefficients
sfit.3$coef.table # Here use Brackets []
estimate.pval<-data.frame(Intercept=sfit.3$coef.table[1,1],pvalue=sfit.3$coef.table[1,4],
"factor(index)^2"=sfit.3$coef.table[2,1] ,pvalue=sfit.3$coef.table[2,4],
"factor(index)^3"=sfit.3$coef.table[3,1] ,pvalue=sfit.3$coef.table[3,4],
"factor(index)^4"=sfit.3$coef.table[4,1] ,pvalue=sfit.3$coef.table[4,4])
Related
Simple logistic regression example.
set.seed(1)
df <- data.frame(out=c(0,1,0,1,0,1,0,1,0),
y=rep(c('A', 'B', 'C'), 3))
result <-glm(out~factor(y), family = 'binomial', data=df)
summary(result)
#Call:
#glm(formula = out ~ factor(y), family = "binomial", data = df)
#Deviance Residuals:
# Min 1Q Median 3Q Max
#-1.4823 -0.9005 -0.9005 0.9005 1.4823
#Coefficients:
# Estimate Std. Error z value Pr(>|z|)
#(Intercept) -6.931e-01 1.225e+00 -0.566 0.571
#factor(y)B 1.386e+00 1.732e+00 0.800 0.423
#factor(y)C 3.950e-16 1.732e+00 0.000 1.000
#(Dispersion parameter for binomial family taken to be 1)
# Null deviance: 12.365 on 8 degrees of freedom
#Residual deviance: 11.457 on 6 degrees of freedom
#AIC: 17.457
#Number of Fisher Scoring iterations: 4
My reference category is now A; results for B and C relative to A are given. I would also like to get the results when B and C are the reference. One can change the reference manually by using levels = in factor(); but this would require fitting 3 models. Is it possible to do this in one go? Or what would be a more efficient approach?
If you want to do all pairwise comparisons, you should usually also do a correction for alpha-error inflation due to multiple testing. You can easily do a Tukey test with package multcomp.
set.seed(1)
df <- data.frame(out=c(0,1,0,1,0,1,0,1,0),
y=rep(c('A', 'B', 'C'), 3))
#y is already a factor, if not, coerce before the model fit
result <-glm(out~y, family = 'binomial', data=df)
summary(result)
library(multcomp)
comps <- glht(result, linfct = mcp(y = "Tukey"))
summary(comps)
#Simultaneous Tests for General Linear Hypotheses
#
#Multiple Comparisons of Means: Tukey Contrasts
#
#
#Fit: glm(formula = out ~ y, family = "binomial", data = df)
#
#Linear Hypotheses:
# Estimate Std. Error z value Pr(>|z|)
#B - A == 0 1.386e+00 1.732e+00 0.8 0.703
#C - A == 0 1.923e-16 1.732e+00 0.0 1.000
#C - B == 0 -1.386e+00 1.732e+00 -0.8 0.703
#(Adjusted p values reported -- single-step method)
#letter notation often used in graphs and tables
cld(comps)
# A B C
#"a" "a" "a"
I have the (sample) dataset below:
round<-c( 0.125150, 0.045800, -0.955299, -0.232007, 0.120880, -0.041525, 0.290473, -0.648752, 0.113264, -0.403685)
square<-c(-0.634753, 0.000492, -0.178591, -0.202462, -0.592054, -0.583173, -0.632375, -0.176673, -0.680557, -0.062127)
ideo<-c(0,1,0,1,0,1,0,0,1,1)
ex<-data.frame(round,square,ideo)
When I ran the GEE regression in SPSS I took this table as a result.
I used packages gee and geepack in R to run the same analysis and I took these results:
#gee
summary(gee(ideo ~ square + round,data = ex, id = ideo,
corstr = "independence"))
Coefficients:
Estimate Naive S.E. Naive z Robust S.E. Robust z
(Intercept) 1.0541 0.4099 2.572 0.1328 7.937
square 1.1811 0.8321 1.419 0.4095 2.884
round 0.7072 0.5670 1.247 0.1593 4.439
#geepack
summary(geeglm(ideo ~ square + round,data = ex, id = ideo,
corstr = "independence"))
Coefficients:
Estimate Std.err Wald Pr(>|W|)
(Intercept) 1.054 0.133 63.00 2.1e-15 ***
square 1.181 0.410 8.32 0.0039 **
round 0.707 0.159 19.70 9.0e-06 ***
---
I would like to recreate exactly the table of SPSS(not the results as I use a subset of the original dataset)but I do not know how to achieve all these results.
A tiny bit of tidyverse magic can get the same results - more or less.
Get the information from coef(summary(geeglm())) and compute the necessary columns:
library("tidyverse")
library("geepack")
coef(summary(geeglm(ideo ~ square + round,data = ex, id = ideo,
corstr = "independence"))) %>%
mutate(lowerWald = Estimate-1.96*Std.err, # Lower Wald CI
upperWald=Estimate+1.96*Std.err, # Upper Wald CI
df=1,
ExpBeta = exp(Estimate)) %>% # Transformed estimate
mutate(lWald=exp(lowerWald), # Upper transformed
uWald=exp(upperWald)) # Lower transformed
This produces the following (with the data you provided). The order and the names of the columns could be modified to suit your needs
Estimate Std.err Wald Pr(>|W|) lowerWald upperWald df ExpBeta lWald uWald
1 1.0541 0.1328 62.997 2.109e-15 0.7938 1.314 1 2.869 2.212 3.723
2 1.1811 0.4095 8.318 3.925e-03 0.3784 1.984 1 3.258 1.460 7.270
3 0.7072 0.1593 19.704 9.042e-06 0.3949 1.019 1 2.028 1.484 2.772
I have a balanced panel data set, df, that essentially consists in three variables, A, B and Y, that vary over time for a bunch of uniquely identified regions. I would like to run a regression that includes both regional (region in the equation below) and time (year) fixed effects. If I'm not mistaken, I can achieve this in different ways:
lm(Y ~ A + B + factor(region) + factor(year), data = df)
or
library(plm)
plm(Y ~ A + B,
data = df, index = c('region', 'year'), model = 'within',
effect = 'twoways')
In the second equation I specify indices (region and year), the model type ('within', FE), and the nature of FE ('twoways', meaning that I'm including both region and time FE).
Despite I seem to be doing things correctly, I get extremely different results. The problem disappears when I do not consider time fixed effects - and use the argument effect = 'individual'.
What's the deal here? Am I missing something? Are there any other R packages that allow to run the same analysis?
Perhaps posting an example of your data would help answer the question. I am getting the same coefficients for some made up data. You can also use felm from the package lfe to do the same thing:
N <- 10000
df <- data.frame(a = rnorm(N), b = rnorm(N),
region = rep(1:100, each = 100), year = rep(1:100, 100))
df$y <- 2 * df$a - 1.5 * df$b + rnorm(N)
model.a <- lm(y ~ a + b + factor(year) + factor(region), data = df)
summary(model.a)
# (Intercept) -0.0522691 0.1422052 -0.368 0.7132
# a 1.9982165 0.0101501 196.866 <2e-16 ***
# b -1.4787359 0.0101666 -145.450 <2e-16 ***
library(plm)
pdf <- pdata.frame(df, index = c("region", "year"))
model.b <- plm(y ~ a + b, data = pdf, model = "within", effect = "twoways")
summary(model.b)
# Coefficients :
# Estimate Std. Error t-value Pr(>|t|)
# a 1.998217 0.010150 196.87 < 2.2e-16 ***
# b -1.478736 0.010167 -145.45 < 2.2e-16 ***
library(lfe)
model.c <- felm(y ~ a + b | factor(region) + factor(year), data = df)
summary(model.c)
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# a 1.99822 0.01015 196.9 <2e-16 ***
# b -1.47874 0.01017 -145.4 <2e-16 ***
This does not seem to be a data issue.
I'm doing computer exercises in R from Wooldridge (2012) Introductory Econometrics. Specifically Chapter 14 CE.1 (data is the rental file at: https://www.cengage.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&product_isbn_issn=9781111531041)
I computed the model in differences (in Python)
model_diff = smf.ols(formula='diff_lrent ~ diff_lpop + diff_lavginc + diff_pctstu', data=rental).fit()
OLS Regression Results
==============================================================================
Dep. Variable: diff_lrent R-squared: 0.322
Model: OLS Adj. R-squared: 0.288
Method: Least Squares F-statistic: 9.510
Date: Sun, 05 Nov 2017 Prob (F-statistic): 3.14e-05
Time: 00:46:55 Log-Likelihood: 65.272
No. Observations: 64 AIC: -122.5
Df Residuals: 60 BIC: -113.9
Df Model: 3
Covariance Type: nonrobust
================================================================================
coef std err t P>|t| [0.025 0.975]
--------------------------------------------------------------------------------
Intercept 0.3855 0.037 10.469 0.000 0.312 0.459
diff_lpop 0.0722 0.088 0.818 0.417 -0.104 0.249
diff_lavginc 0.3100 0.066 4.663 0.000 0.177 0.443
diff_pctstu 0.0112 0.004 2.711 0.009 0.003 0.019
==============================================================================
Omnibus: 2.653 Durbin-Watson: 1.655
Prob(Omnibus): 0.265 Jarque-Bera (JB): 2.335
Skew: 0.467 Prob(JB): 0.311
Kurtosis: 2.934 Cond. No. 23.0
==============================================================================
Now, the PLM package in R gives the same results for the first-difference models:
library(plm) modelfd <- plm(lrent~lpop + lavginc + pctstu,
data=data,model = "fd")
No problem so far. However, the fixed effect reports different estimates.
modelfx <- plm(lrent~lpop + lavginc + pctstu, data=data, model =
"within", effect="time") summary(modelfx)
The FE results should not be any different. In fact, the Computer Exercise question is:
(iv) Estimate the model by fixed effects to verify that you get identical estimates and standard errors to those in part (iii).
My best guest is that I am miss understanding something on the R package.
I have to fit an LMM with an interaction random effect but without the marginal random effect, using the lme command. That is, I want to fit the model in oats.lmer (see below) but using the function lme from the nlme package.
The code is
require("nlme")
require("lme4")
oats.lmer <- lmer(yield~nitro + (1|Block:Variety), data = Oats)
summary(oats.lmer)
#Linear mixed model fit by REML ['lmerMod']
#Formula: yield ~ nitro + (1 | Block:Variety)
# Data: Oats
#
#REML criterion at convergence: 598.1
#
#Scaled residuals:
# Min 1Q Median 3Q Max
#-1.66482 -0.72807 -0.00079 0.56416 1.85467
#
#Random effects:
# Groups Name Variance Std.Dev.
# Block:Variety (Intercept) 306.8 17.51
# Residual 165.6 12.87
#Number of obs: 72, groups: Block:Variety, 18
#
#Fixed effects:
# Estimate Std. Error t value
#(Intercept) 81.872 4.846 16.90
#nitro 73.667 6.781 10.86
#
#Correlation of Fixed Effects:
# (Intr)
#nitro -0.420
I started playing with this
oats.lme <- lme(yield~nitro, data = Oats, random = (~1|Block/Variety))
summary(oats.lme)
#Linear mixed-effects model fit by REML
# Data: Oats
# AIC BIC logLik
# 603.0418 614.2842 -296.5209
#
#Random effects:
# Formula: ~1 | Block
# (Intercept)
#StdDev: 14.50596
#
# Formula: ~1 | Variety %in% Block
# (Intercept) Residual
#StdDev: 11.00468 12.86696
#
#Fixed effects: yield ~ nitro
# Value Std.Error DF t-value p-value
#(Intercept) 81.87222 6.945273 53 11.78819 0
#nitro 73.66667 6.781483 53 10.86291 0
# Correlation:
# (Intr)
#nitro -0.293
#
#Standardized Within-Group Residuals:
# Min Q1 Med Q3 Max
#-1.74380770 -0.66475227 0.01710423 0.54298809 1.80298890
#
#Number of Observations: 72
#Number of Groups:
# Block Variety %in% Block
# 6 18
but the problem is that it puts also a marginal random effect for Variety which I want to omit.
The question is: how to specify the random effects in oats.lme such that oats.lme is identical (at least structurally) to oats.lmer ?
It can be as simple as following:
library(nlme)
data(Oats)
## construct an auxiliary factor `f` for interaction / nesting effect
Oats$f <- with(Oats, Block:Variety)
## use `random = ~ 1 | f`
lme(yield ~ nitro, data = Oats, random = ~ 1 | f)
#Linear mixed-effects model fit by REML
# Data: Oats
# Log-restricted-likelihood: -299.0328
# Fixed: yield ~ nitro
#(Intercept) nitro
# 81.87222 73.66667
#
#Random effects:
# Formula: ~1 | f
# (Intercept) Residual
#StdDev: 17.51489 12.86695
#
#Number of Observations: 72
#Number of Groups: 18
I'm working with mixed-effect logistic regression models using a single random variable (using glmer), and I am struggling to find a way to produce predicted probabilities and the respective 95% CI's. I have been able to do this for fixed-effect models using the following type of code:
Call:
glm(formula = survive/trials ~ class, family = binomial(logexp(vespdata$expos)),
data = vespdata)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.6823 0.2621 0.4028 0.4540 0.6935
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 4.6774 0.5796 8.069 7.07e-16 ***
class2 -1.3236 0.6957 -1.903 0.0571 .
class3 -0.5751 0.9170 -0.627 0.5306
class4 -1.0806 0.9217 -1.172 0.2411
class5 -1.2889 0.6564 -1.964 0.0496 *
class6 -1.5379 0.6508 -2.363 0.0181 *
class8 -1.2078 0.6957 -1.736 0.0825 .
vesppredict2 = with(vespdata, data.frame(class = gl(7,1))
vesppredict2 = cbind(vesppredict2, predict(vespclass.exp, newdata = vesppredict2,
type = "link", se = TRUE))
vesppredict2 = within(vesppredict2,
{PredictedProb = (plogis(fit))^23
LL = (plogis(fit - (1.96 * se.fit)))^23
UL = (plogis(fit + (1.96 * se.fit)))^23
ErrorBar = (UL-PredictedProb)
})
The problem I'm having is that predict() cannot use the argument se = TRUE for mixed-effect models. I tried adding the argument re.form = NA but to no avail. I'd be grateful for any tips!