This question already has an answer here:
Summarize plm model as an equation in r [closed]
(1 answer)
Closed 2 years ago.
In my thesis i estimated different "within" and "pooling" models using plm() of the plm package. Additionally, i modified some models by using a time lag. All the models work well and i got my results. Now i would like to visualize the models by showing their equation. So my question is:
Is there a way to extract the equation from the model?
I would need it in the most basic way, before any calculation is done.... more or less like this because it is not about showing my results, but the math i use.
For my models i use a panel dataset and my models look more less like this (just more control variables).
model1 <- plm (a ~ b + c, model = "within", data)
Thank you
Yeah. The formula is embedded in the model object.
mod <- plm(mpg ~ hp,mtcars)
mod$call
data("Grunfeld", package="plm")
grun.fe <- plm(inv~value+capital, data = Grunfeld, model = "within")
grun.fe$call
Also, check out equatiomatic to show the formula in latex:
https://datalorax.github.io/equatiomatic/index.html
Good luck!
Related
I'm trying to find out how well my mixed model with family effect fits the data. Is it possible to extract r squared values from lmekin functions? And if so, is it possible to extract partial r squared values for each of the covariables?
Example:
model= lmekin(formula = height ~ score + sex + age + (1 | IID), data = phenotype_df, varlist = kinship_matrix)
I have tried the MuMin package but it doesn't seem to work with lmekin models. Thanks.
I am able to use the r.squaredLR() function,
library(coxme)
library(MuMIn)
data(ergoStool, package="nlme") # use a data set from nlme
fit1 <- lmekin(effort ~ Type + (1|Subject), data=ergoStool)
r.squaredLR(fit1)
(I am pretty sure that works, but one thing that is great to do is to create a reproducible example so I can run your code to double check, for example I am not exactly sure what phenotype_df looks like, and I am not able to run your code as it is, a great resource for this is the reprex package).
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have two responses which conform to beta (also known as betar) and Poisson families, and I am looking into fitting additive mixed-models with beta and quasi-families (count data is over-dispersed), respectively.
I am aware that I could use gamm function from mgcv package which accepts both beta and quassi-families, however I am considering that it uses PQL, and the AIC reported is not useful for comparing models - which is the primary objective of my analyses.
In the case of the count response, I am aware that QAIC has been used for ranking/comparing overdispersed mixed models but I cannot find anything that says this it is appropriate for overdispersed GAMM.
I understand these are potentially two questions in one but they both have a common theme of model selection with extended families and potentially have different solutions. Below I provide reproducible examples for each case.
##generate data
library(gamm4)
library(mgcv)
dat <- gamSim(1,n=400,scale=2)
dat<-subset(dat, select=c(x0,x1,x2,x3,f) )
dat$g <- as.factor(sample(1:20,400,replace=TRUE))#random factor
dat$yb<-runif(400)#yb ranges between 0-1 hence fitted with beta family
dat$f <- dat$f + model.matrix(~ g-1)%*%rnorm(20)*2
dat$yp <- rpois(400,exp(dat$f/7))#y2 is counts hence poisson family
#beta family example with gamm function (this works - however not sure if the subsequent model comparisons are valid!)
m1b<- gamm(yb~s(x0)+s(x1)+s(x2)+s(x3),family=betar(link='logit'),data=dat,random=list(g=~1))
m2b<-gamm(yb~s(x1)+s(x2)+s(x3),family=betar(link='logit'),data=dat,random=list(g=~1))
m3b<-gamm(yb~s(x0)+s(x2)+s(x3),family=betar(link='logit'),data=dat,random=list(g=~1))
#AIC to compare models
AIC(m1b,m2b,m3b)
#try the same using gamm4 (ideally)- it obviously fails with beta family.
m<-gamm4(yb~s(x0)+s(x1)+s(x2)+s(x3),family=betar(link='logit'),data=dat,random = ~ (1|g))
##Example with quassi family - yp response is overdispersed count data (may not be overdispered in this example
#example using gamm function
m1p<-gamm(yp~s(x0)+s(x1)+s(x2)+s(x3),family = quasipoisson,data=dat,random=list(g=~1))
m2p<-gamm(yp~s(x1)+s(x2)+s(x3),family = quasipoisson,data=dat,random=list(g=~1))
m3p<-gamm(yp~s(x0)+s(x2)+s(x3),family = quasipoisson,data=dat,random=list(g=~1))
#AIC to compare models
AIC(m1p,m2p,m3p)
#again the example with using gamm4 function will not work as it doesnt accept quassi falimies
m<-gamm4(yp~s(x0)+s(x1)+s(x2)+s(x3),family = quasipoisson,data=dat,random = ~ (1|g))
You have a bunch of questions here, but I'll try to tackle them. Basically, you want to fit parametric statistical models with
random effects (nlme, lme4)
distributions from the exponential family ... (MASS::glmmPQL, lme4::glmer)
... with overdispersion ...
... or distributions beyond the exponential family such as the Beta distribution (VGAM, betareg)
additive models/splines (splines) ...
... or penalized regression splines, which automatically adjust the complexity of the smooth terms
... with a real likelihood model rather than a marginal or quasi-likelihood model (e.g. GEEs, PQL), so you can do classic inference
Each of the specified issues above adds 1 or more "difficulty points" to a model-fitting exercise ... usually once your score goes beyond about +3 or so, you have to find a way to compromise or take shortcuts on some of the things you want. You've correctly identified gamm and gamm4 as doing some of the stuff you want, but you can't get everything. Some suggestions:
Overdispersion
One way to handle overdispersion is with an observation-level random effect, e.g.
dat$obs <- factor(seq(nrow(dat)))
m <- gamm4(yp~s(x0)+s(x1)+s(x2)+s(x3),
family = poisson,data=dat,random = ~ (1|g)+(1|obs))
Another alternative is to adjust the overdispersion yourself, if you think that makes sense, e.g.:
m0 <- gamm4(yp~s(x0)+s(x1)+s(x2)+s(x3),family = poisson,data=dat,random = ~ (1|g))
First compute overdispersion:
(phi <- sum(residuals(m0$gam,type="pearson")^2/df.residual(m0$gam)))
## [1] 1.003436
(if we repeat this exercise with m0$mer instead we get 0.9939696: the result is almost exactly equal to 1 because we generated data from a Poisson distribution in the first place ...)
(qaic <- -2*logLik(m0$mer)/phi + 2*lme4:::npar.merMod(m0$mer))
N.B. I am guessing that it makes sense to construct the likelihoods, etc. from the individual components of a gamm4 fit in this way; use at your own risk
Alternative distributions
The glmmADMB and glmmTMB packages (both off-CRAN but findable via Google ...) can both handle mixed Beta models. They can't do penalized regression splines, but you can use regular splines via splines::ns() or splines::bs() (but you do have to decide on the appropriate level of complexity -- maybe you can guess from preliminary gamm or mgcv fits ...)
library(glmmADMB)
library(splines)
m3b <- glmmadmb(yb~ns(x0,2)+ns(x1,2)+ns(x2,5)+ns(x3,2)+(1|g),
family="beta",link="logit",data=dat)
The glmmTMB package can in principle do this:
library(glmmTMB)
m2b <- glmmTMB(yb~ns(x0,2)+ns(x1,2)+ns(x2,5)+ns(x3,2)+(1|g),
family=list(family="beta",link="logit"),data=dat)
but the package is in development and the current set of results don't make sense -- so I might hesitate to use it at this point.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
short formula call for many variables when building a model
I have a biggish data frame (112 variables) that I'd like to do a stepwise logistic regression on using R. I know how to setup the glm model and the stepAIC model, but I'd rather not type in all the headings to input the independent variables. Is there a fast way to give the glm model an entire data frame as independent variables such that it will recognize each column as an x variable to be included in the model? I tried:
ft<-glm(MFDUdep~MFDUind, family=binomial)
But it didn't work (wrong data types). MFDUdep and MFDUind are both data frames, with MFDUind containing 111 'x' variables and MFDUdep containing a single 'y'.
You want the . special symbol in the formula notation. Also, it is probably better to have the response and predictors in the single data frame.
Try:
MFDU <- cbind(MFDUdep, MFDUind)
ft <- glm(y ~ ., data = MFDU, family = binomial)
Now that I have given you the rope, I am obliged to at least warn you about the potential for hanging...
The approach you are taking is usually not the recommended one, unless perhaps prediction is the purpose of the model. Regression coefficient for selected variables may be strongly biased so if you are using this for enlightenment, then rethink your approach.
You will also need a lot of observations to allow 100+ terms in a model.
Better alternative exist; e.g. see the glmnet package for one such approach which allows for ridge, lasso or both (elastic net) constraints on the set of coefficients, which allows one to minimise model error at the expense of a small amount of additional bias.
I have fit my discrete count data using a variety of functions for comparison. I fit a GEE model using geepack, a linear mixed effect model on the log(count) using lme (nlme), a GLMM using glmer (lme4), and a GAMM using gamm4 (gamm4) in R.
I am interested in comparing these models and would like to plot the expected (predicted) values for a new set of data (predictor variables). My goal is to compare the predicted effects for each model under particular conditions (x variables). Of particular interest is the comparison between marginal (GEE) and conditional estimates.
I think my main problem might be getting the new data in the correct form with the correct labels and attributes and such. I am still very much an R novice and struggle with this stuff (no course on this at my university unfortunately).
I currently have fitted models
gee1 lme1 lmer1 gamm1
and can extract their fixed effect coefficients and standard errors without a problem. I also don't have a problem converting them from the log scale or estimating confidence intervals accounting for the random effects.
I also have my new dataframe newdat which has 365 observations of 23 variables (average environmental data for each day of the year).
I am stuck on how to predict new count estimates from this. I played around with the model.matrix function but couldn't get it to work. For example, I tried:
mm = model.matrix(terms(glmm1), newdat) # Error in model.frame.default(object,
# data, xlev = xlev) : object is not a matrix
newdat$pcount = mm %*% fixef(glmm1)
Any suggestions or good references would be greatly appreciated. Can anyone help with the error above?
Getting predictions for lme() and lmer() is documented on http://glmm.wikidot.com/faq
I am trying to learn R after using Stata and I must say that I love it. But now I am having some trouble. I am about to do some multiple regressions with Panel Data so I am using the plm package.
Now I want to have the same results with plm in R as when I use the lm function and Stata when I perform a heteroscedasticity robust and entity fixed regression.
Let's say that I have a panel dataset with the variables Y, ENTITY, TIME, V1.
I get the same standard errors in R with this code
lm.model<-lm(Y ~ V1 + factor(ENTITY), data=data)
coeftest(lm.model, vcov.=vcovHC(lm.model, type="HC1))
as when I perform this regression in Stata
xi: reg Y V1 i.ENTITY, robust
But when I perform this regression with the plm package I get other standard errors
plm.model<-plm(Y ~ V1 , index=C("ENTITY","YEAR"), model="within", effect="individual", data=data)
coeftest(plm.model, vcov.=vcovHC(plm.model, type="HC1))
Have I missed setting some options?
Does the plm model use some other kind of estimation and if so how?
Can I in some way have the same standard errors with plm as in Stata with , robust
By default the plm package does not use the exact same small-sample correction for panel data as Stata. However in version 1.5 of plm (on CRAN) you have an option that will emulate what Stata is doing.
plm.model<-plm(Y ~ V1 , index=C("ENTITY","YEAR"), model="within",
effect="individual", data=data)
coeftest(plm.model, vcov.=function(x) vcovHC(x, type="sss"))
This should yield the same clustered by group standard-errors as in Stata (but as mentioned in the comments, without a reproducible example and what results you expect it's harder to answer the question).
For more discussion on this and some benchmarks of R and Stata robust SEs see Fama-MacBeth and Cluster-Robust (by Firm and Time) Standard Errors in R.
See also:
Clustered standard errors in R using plm (with fixed effects)
Is it possible that your Stata code is different from what you are doing with plm?
plm's "within" option with "individual" effects means a model of the form:
yit = a + Xit*B + eit + ci
What plm does is to demean the coefficients so that ci drops from the equation.
yit_bar = Xit_bar*B + eit_bar
Such that the "bar" suffix means that each variable had its mean subtracted. The mean is calculated over time and that is why the effect is for the individual. You could also have a fixed time effect that would be common to all individuals in which case the effect would be through time as well (that is irrelevant in this case though).
I am not sure what the "xi" command does in STATA, but i think it expands an interaction right ? Then it seems to me that you are trying to use a dummy variable per ENTITY as was highlighted by #richardh.
For your Stata and plm codes to match you must be using the same model.
You have two options:(1) you xtset your data in stata and use the xtreg option with the fe modifier or (2) you use plm with the pooling option and one dummy per ENTITY.
Matching Stata to R:
xtset entity year
xtreg y v1, fe robust
Matching plm to Stata:
plm(Y ~ V1 + as.factor(ENTITY) , index=C("ENTITY","YEAR"), model="pooling", effect="individual", data=data)
Then use vcovHC with one of the modifiers. Make sure to check this paper that has a nice review of all the mechanics behind the "HC" options and the way they affect the variance covariance matrix.
Hope this helps.