I am wondering how to easily find the variation within the fixed effects used in a feols regression. In the below code, the fixed effects are type and CO2. How do I obtain the variance within each of the fixed effects (type and CO2)?
library(fixest)
library(car)
library(pander)
##Using the built-in CO2 data frame, run regression
i<- feols(conc ~ uptake + Treatment | Type, CO2, vcov = "hetero")
summary(i)
Related
I have longitudinal data from two surveys and I want to do a pre-post analysis. Normally, I would use survey::svyglm() or svyVGAM::svy_vglm (for multinomial family) to include sampling weights, but these functions don't account for the random effects. On the other hand, lme4::lmer accounts for the repeated measures, but not the sampling weights.
For continuous outcomes, I understand that I can do
w_data_wide <- svydesign(ids = ~1, data = data_wide, weights = data_wide$weight)
svyglm((post-pre) ~ group, w_data_wide)
and get the same estimates that I would get if I could use lmer(outcome ~ group*time + (1|id), data_long) with weights [please correct me if I'm wrong].
However, for categorical variables, I don't know how to do the analyses. WeMix::mix() has a parameter weights, but I'm not sure if it treats them as sampling weights. Still, this function can't support multinomial family.
So, to resume: can you enlighten me on how to do a pre-post test analysis of categorical outcomes with 2 or more levels? Any tips about packages/functions in R and how to use/write them would be appreciated.
I give below some data sets with binomial and multinomial outcomes:
library(data.table)
set.seed(1)
data_long <- data.table(
id=rep(1:5,2),
time=c(rep("Pre",5),rep("Post",5)),
outcome1=sample(c("Yes","No"),10,replace=T),
outcome2=sample(c("Low","Medium","High"),10,replace=T),
outcome3=rnorm(10),
group=rep(sample(c("Man","Woman"),5,replace=T),2),
weight=rep(c(1,0.5,1.5,0.75,1.25),2)
)
data_wide <- dcast(data_long, id~time, value.var = c('outcome1','outcome2','outcome3','group','weight'))[, `:=` (weight_Post = NULL, group_Post = NULL)]
EDIT
As I said below in the comments, I've been using lmer and glmer with variables used to calculate the weights as predictors. It happens that glmer returns a lot of problems (convergence, high eigenvalues...), so I give another look at #ThomasLumley answer in this post and others (https://stat.ethz.ch/pipermail/r-help/2012-June/315529.html | https://stats.stackexchange.com/questions/89204/fitting-multilevel-models-to-complex-survey-data-in-r).
So, my question is now if a can use participants id as clusters in svydesign
library(survey)
w_data_long_cluster <- svydesign(ids = ~id, data = data_long, weights = data_long$weight)
summary(svyglm(factor(outcome1) ~ group*time, w_data_long_cluster, family="quasibinomial"))
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.875e+01 1.000e+00 18.746 0.0339 *
groupWoman -1.903e+01 1.536e+00 -12.394 0.0513 .
timePre 5.443e-09 5.443e-09 1.000 0.5000
groupWoman:timePre 2.877e-01 1.143e+00 0.252 0.8431
and still interpret groupWoman:timePre as differences in the average rate of change/improvement in the outcome over time between sex groups, as if I was using mixed models with participants as random effects.
Thank you once again!
A linear model with svyglm does not give the same parameter estimates as lme4::lmer. It does estimate the same parameters as lme4::lmer if the model is correctly specified, though.
Generalised linear models with svyglm or svy_vglm don't estimate the same parameters as lme4::glmer, as you note. However, they do estimate perfectly good regression parameters and if you aren't specifically interested in the variance components or in estimating the realised random effects (BLUPs) I would recommend just using svy_glm.
Another option if you have non-survey software for random effects versions of the models is to use that. If you scale the weights to sum to the sample size and if all the clustering in the design is modelled by random effects in the model, you will get at least a reasonable approximation to valid inference. That's what I've seen recommended for Bayesian survey modelling, for example.
I started using R studio a few days ago and I am struggling a bit to compute a VIF. Here is the situation:
I have a panel data and ran Fixed effect and Random effect regressions. I have one Dependent Variable (New_biz_density) and 2 Independent variables (Cost_to_start, Capital_requirements). I would like to check if my two independent variables present multicollinearity by computing their Variance Inflation Factor, both for Fixed and Random effect models.
I already installed some packages to perform the VIF (Faraway, Car) but did not manage to do it. Does anybody know how to do it?
Here is my script:
# install.packages("plm")
library(plm)
mydata<- read.csv("/Users/juliantabone/Downloads/DATAweakoutliers.csv")
Y <- cbind(new_biz_density)
X <- cbind(capital_requirements, cost_to_start)
# Set data as panel data
pdata <- plm.data(mydata, index=c("country_code","year"))
# Descriptive statistics
summary(Y)
summary(X)
# Pooled OLS estimator
pooling <- plm(Y ~ X, data=pdata, model= "pooling")
summary(pooling)
# Between estimator
between <- plm(Y ~ X, data=pdata, model= "between")
summary(between)
# First differences estimator
firstdiff <- plm(Y ~ X, data=pdata, model= "fd")
summary(firstdiff)
# Fixed effects or within estimator
fixed <- plm(Y ~ X, data=pdata, model= "within")
summary(fixed)
# Random effects estimator
random <- plm(Y ~ X, data=pdata, model= "random")
summary(random)
# LM test for random effects versus OLS
plmtest(pooling)
# LM test for fixed effects versus OLS
pFtest(fixed, pooling)
# Hausman test for fixed versus random effects model
phtest(random, fixed)
There seem to be two popular ways of calculating VIFs (Variance Inflation Factors, to detect collinearity among variables in regression) in R:
The vif() function in the car package, where the input is the model. This requires you to first fit a model before you can check for VIFs among variables in the model.
The corvif() function, where the input are the actual candidate explanatory variables (i.e. a list of variables, before the model is even fitted). This function is part of the AED package (Zuur et al. 2009), which has been discontinued. This one seems to work only on a list of variables, not on a fitted regression model.
To compute VIF refer function non_collinear_vars from metan package.
To compute the VIF between two variables Cost_to_start & Capital_requirements use,
library(metan)
non_collinear_vars(X)
I'm trying to standardize regression coefficients for a linear regression model which has interaction terms. Currently, I'm using lm.beta from the car package, but the help file states:
Warning: This function does not produce 'correct' standardized
coefficients when interaction terms are present
Since my regression has interaction terms, this is worrying. Is there an alternative to car's lm.beta which standardizes regression coefficients and works with regression models with interactions?
You can use the scale function to scale your data before passing it to lm. This does not require any extra packages and gives the standardized regression. Here is a simple example:
iris2 <- iris
iris2[ ,c('Sepal.Length','Sepal.Width')] <- scale(iris2[ ,c('Sepal.Length','Sepal.Width')])
fit <- lm( Petal.Width ~ Sepal.Length*Sepal.Width, data=iris2)
Stata allows for fixed effects and random effects specification of the logistic regression through the xtlogit fe and xtlogit re commands accordingly. I was wondering what are the equivalent commands for these specifications in R.
The only similar specification I am aware of is the mixed effects logistic regression
mymixedlogit <- glmer(y ~ x1 + x2 + x3 + (1 | x4), data = d, family = binomial)
but I am not sure whether this maps to any of the aforementioned commands.
The glmer command is used to quickly fit logistic regression models with varying intercepts and varying slopes (or, equivalently, a mixed model with fixed and random effects).
To fit a varying intercept multilevel logistic regression model in R (that is, a random effects logistic regression model), you can run the following using the in-built "mtcars" data set:
data(mtcars)
head(mtcars)
m <- glmer(mtcars$am ~ 1 + mtcars$wt + (1|mtcars$gear), family="binomial")
summary(m)
# and you can examine the fixed and random effects
fixef(m); ranef(m)
To fit a varying-intercept slope model in Stata, you of course use the xtlogit command (using the similar but not identical in-built "auto" data set in Stata):
sysuse auto
xtset gear_ratio
xtlogit foreign weight, re
I'll add that I find the entire reference to "fixed" versus "random" effects ambiguous, and I prefer to refer to the structure of the model itself (e.g., are the intercepts varying? which slopes are varying, if any? is the model nested in 2 levels or more? are the levels cross-classified or not?). For a similar view, see Andrew Gelman's thoughts on "fixed" versus "random" effects.
Update: Ben Bolker's excellent comment below points out that in R it's more informative when using predict commands to use the data=mtcars option instead of, say, the dollar notation:
data(mtcars)
m1 <- glmer(mtcars$am ~ 1 + mtcars$wt + (1|mtcars$gear), family="binomial")
m2 <- glmer(am ~ 1 + wt + (1|gear), family="binomial", data=mtcars)
p1 <- predict(m1); p2 <- predict(m2)
names(p1) # not that informative...
names(p2) # very informative!
I am using lmer to fit a multilevel polynomial regression model with several fixed effects (including subject-specific variables like age, short-term memory span, etc.) and two sets of random effects (Subject and Subject:Condition). Now I would like to predict data for a hypothetical subject with particular properties (age, short-term memory span, etc.). I fit the model (m) and created a new data frame (pred) that contains my hypothetical subject, but when I tried predict(m, pred) I got an error:
Error in UseMethod("predict") :
no applicable method for 'predict' applied to an object of class "mer"
I know I could use the brute-force method of extracting fixed effects from my model and multiplying it all out, but is there a more elegant solution?
You can do this type of extrapolated prediction easily with the merTools package for R: http://www.github.com/jknowles/merTools
merTools includes a function called predictInterval which provides robust prediction capabilities for lmer and glmer fits. Specifically, you can use this function to predict extrapolated data, and to obtain prediction intervals that account for the variance in both the fixed and random effects, as well as the residual error of the model.
Here's a quick code example:
library(merTools)
m1 <- lmer(Reaction ~ Days + (1|Subject), data = sleepstudy)
predOut <- predictInterval(m1, newdata = sleepstudy, n.sims = 100)
# extrapolated data
extrapData <- sleepstudy[1:10,]
extrapData$Days <- 20
extrapPred <- predictInterval(m1, newdata = extrapData)