I built a fixed effects model using the lm function. I had previously created price clusters of independent variables (cluster number * price) with an end goal of estimating demand elasticities. Am I correct that the factor function ( + factor(Tulsa$Cluster)-1) at the end provides different intercepts (b0's) for each of the pricing cluster independent variables?
model.fixed = lm(Tulsa$ln_volume ~ Tulsa$Price_Cluster_1
+ Tulsa$Price_Cluster_2
+ Tulsa$Price_Cluster_3
+ Tulsa$Price_Cluster_4
+ Tulsa$Price_Cluster_5
+ Tulsa$PC1
+ Tulsa$PC2
+ Tulsa$PC3
+ Tulsa$PC4
+ factor(Tulsa$Cluster)-1,data=Tulsa)
The coefficients for the price clusters are the right direction and make intuitive sense. Thanks in advance.
Related
I'm analyzing some longitudinal data using lme4 package (lmer function) with 3 Levels: measurement points nested in individuals nested in households. I'm interested in linear and non-linear change curves surrounding a specific life event. My model has many time predictors (indicating linear change before and after the event occurs and indicating non-linear change (i.e., squared time variables) before and after the event occurs). Additionally, I have several Level-2 predictors that do not vary with time (i.e., personality traits) and some control variables (e.g., age, gender). So far I did not include any random slopes or cross-level interactions.
This is my model code:
model.RI <- lmer(outcome ~ time + female_c + age_c + age_c2 + preLin + preLin.sq + postLin + postLin.sq + per1.c + per2.c + per3.c + per4.c + per5.c + (1 | ID) + (1 | House))
outcome = my dependent variable
time = year 1, year 2, year 3 ... (until year 9); this variable symbolizes something like a testing effect
female_c = gender centered
age_c = age centered
age_c2 = age squared centered
preLin = time variable indicating time to the event (this variable is 0 after the event has occurred and is -1 e.g. one year ahead of the event, -2 two years ahead of the event etc.)
preLin.sq = squared values of preLin
postLin = time variable indicating time after the event (this variable is 0 before the event and increases after the event has occurred; e.g. is +1 one year after the event)
postLin.sq = squared values of postLin
per1.c until per5.c = personality traits on Level 2 (centered)
ID = indicating the individuum
House = indicating the household
I was wondering how I could plot the predicted values of this lmer model (e.g., using ggplot2?). I've plotted change curves using the method=gam in R. This is a rather data-driven method to inspect the data without pre-defining if the curve is linear or quadratic or whatever. I would now like to check whether my parametric lmer model is comparable to that data-driven gam-plot I already have. Do you have any advise how to do this?
I would be more than happy to get some help on this! Please also feel free to ask if I was not precise enough on my explanation of what I would like to do!
Thanks a lot!
Follow this link: This is how my gam-plot looks like and I hope to get something similar when plotting the predicted values of my lmer model!
You can use the ggpredict()-function from the ggeffects-package. If you want to plot predicted values of time (preLin), you would simply write:
ggpredict(model.RI, "preLin")
The function returns a data frame (see articles), which you can use in ggplot, but you can also directly plot the results:
ggpredict(model.RI, "preLin") %>% plot()
or
p <- ggpredict(model.RI, "preLin")
plot(p)
You could also use the sjPlot-package, however, for marginal effects / predicted values, the sjPlot::plot_model()-function internally just calls ggeffects::ggpredict(), so the results would basically be identical.
Another note to your model: if you have longitudinal data, you should also include your time-variable as random slope. I'm not sure how postLin acutally refers to preLin, but if preLin captures all your measurements, you should at least write your model like this:
model.RI <- lmer(
outcome ~ time + female_c + age_c + age_c2 + preLin + preLin.sq +
postLin + postLin.sq + per1.c + per2.c + per3.c + per4.c + per5.c +
(1 + preLin | ID) + (1 + preLin | House)
)
If you also assume a quadratic trend for each person (ID), you could even add the squared term as random slope.
As your figure example suggests using splines, you could also try this:
library(splines)
model.RI <- lmer(
outcome ~ time + female_c + age_c + age_c2 + bs(preLin)
postLin + postLin.sq + per1.c + per2.c + per3.c + per4.c + per5.c +
(1 + preLin | ID) + (1 + preLin | House)
)
p <- ggpredict(model.RI, "preLin")
plot(p)
Examples for splines are also demonstrated on the website I mentioned above.
Edit:
Another note is related to nesting: you're currently modelling a fully crossed or cross-classified model. If it's completely nested, the random parts would look like this:
... + (1 + preLin | House / ID)
(see also this small code-example).
I am using binom.krige() function of the R package geoRglm for determining the spatial predictions of a binary (0, 1) response variable with several continuous as well as discrete covariates.
Using glm() with binomial logit link function I found that the response variable is showing significant dependency on several covariates.
I included the trend into binom.krige() using krige.glm.control() where I specified the two trend models as
> trend.d=trend.spatial(~ rivers + roads + annual_pre + annual_tem + elevation_ + host_densi + lulc + moist_dq + moist_in + moist_wq, data_points)
> trend.l=trend.spatial(~ rivers1 + roads + annual_pre + annual_tem + elevation_ + host_densi + lulc + moist_dq + moist_in + moist_wq, pred_grid)
The question, which is confusing me, is when trend.d and trend.l go into krige.glm.control() and eventually into binom.krige(), does it actually fit a glm with binomial logit or just linear model (because the above equations seem to be a linear model)?
I'm trying to include time fixed effects (dummies for years generated with model.matrix) into a PPML regression in R.
Without time fixed effect the regression is:
require(gravity)
my_model <- PPML(y="v", dist="dist",
x=c("land","contig","comlang_ethno",
"smctry","tech","exrate"),
vce_robust=T, data=database)
I've tried to add command fe=c("year") within the PPML function but it doesn't work.
I'd appreciate any help on this.
I would comment on the previous answer but don't have enough reputation. The gravity model in your PPML command specifies v = dist × exp(land + contig + comlang_ethno + smctry + tech + exrate + TimeFE) = exp(log(dist) + land + contig + comlang_ethno + smctry + tech + exrate + TimeFE).
The formula inside of glm should have as its RHS the variables inside the exponential, because it represents the linear predictor produced by the link function (the Poisson default for which is natural log). So in sum, your command should be
glm(v ~ log(dist) + land + contig + comlang_ethno + smctry + tech + exrate + factor(year),
family='quasipoisson')
and in particular, you need to have distance in logs on the RHS (unlike the previous answer).
Just make sure that year is a factor, than you can just use the plain-and-simple glm-function as
glm(y ~ dist + year, family = "quasipoisson")
which gives you the results with year as dummies/fixed effects. The robust SE are then calculated with
lmtest::coeftest(EstimationResults.PPML, vcov=sandwich::vcovHC(model.PPML, "HC1"))
The PPML function does nothing more, it just isn't very flexible.
Alternatively to PPML and glm, you can also solve your problem using the function femlm (from package FENmlm) which deals with fixed-effect estimation for maximum likelihood models.
The two main advantages of function femlm are:
you can add as many fixed-effects as you want, and they are dealt with separately leading to computing times without comparison to glm (especially when fixed-effects contain many categories)
standard-errors can be clustered with intuitive commands
Here's an example regarding your problem (with just two variables and the year fixed-effects):
library(FENmlm)
# (default family is Poisson, 'pipe' separates variables from fixed-effects)
res = femlm(v ~ log(dist) + land | year, base)
summary(res, se = "cluster")
This code estimates the coefficients of variables log(dist) and land with year fixed-effects; then it displays the coefficients table with clustered standard-errors (w.r.t. year) for the two variables.
Going beyond your initial question, now assume you have a more complex case with three fixed-effects: country_i, country_j and year. You'd write:
res = femlm(v ~ log(dist) + land | country_i + country_j + year, base)
You can then easily play around with clustered standard-errors:
# Cluster w.r.t. country_i (default is first cluster encountered):
summary(res, se = "cluster")
summary(res, se = "cluster", cluster = "year") # cluster w.r.t. year cluster
# Two-way clustering:
summary(res, se = "twoway") # two-way clustering w.r.t. country_i & country_j
# two way clustering w.r.t. country_i & year:
summary(res, se = "twoway", cluster = c("country_i", "year"))
For more information on the package, the vignette can be found at https://cran.r-project.org/web/packages/FENmlm/vignettes/FENmlm.html.
First of all, I am relatively new in using R and haven't used lavaan (or growth models) before so please excuse my ignorance.
I am doing my thesis and analyzing the U.S. financial industry during the financial crisis of 2007. I therefore have individual banks and several variables for each bank across time (from 2007-2013), some are time-variant (such as ROA or capital adequacy) and some are time-invariant (such as size or age). Some variables are also time-variant but not multi-level since they apply to all firms (such as the average ROA of the U.S. financial industry).
Fist of all, can I use lavaan's growth curve model ("growth") in this instance? The example given on the tutorial is for either time-varying variables (c) that influence the outcome (DV) or time-invariant variables (x1 & x2) which influence the slope (s) and intercept (i). What about time varying variables that influence the slope and intercept? I couldn't find an example for this syntax.
Also, how do I specify the "groups" (i.e. different banks) in my analysis? It is actually possible to do a multi-level growth curve model in lavaan (or R for that matter)?
Last but not least, I could find how to import a multilevel dataset in R. My dataset is basically a 3-dimensional matrix (different variables for different firms across time) so how do I input that via SPSS (or notepad?)?
Any help is much appreciated, I am basically lost on how to implement this model and sincerely need some assistance...
Thank you all in advance for your time!
Harry
edit: Here is the sytanx that I have come with so far. DO you think it makes sense?
ETHthesismodel <- '
# intercept and slope with fixed coefficients
i =~ 1*t1 + 1*t2 + 1*t3 + 1*t4
s =~ 0*t1 + 1*t2 + 2*t3 + 3*t4
#regressions (independent variables that influence the slope & intercept)
i ~ high_constr_2007 + high_constr_2008 + ... + low_constr_2007 + low_constr_2008 + ... + ... diff_2013
s ~ high_constr_2007 + high_constr_2008 + ... + low_constr_2007 + low_constr_2008 + ... + ... diff_2013
# time-varying covariates (control variables)
t1 ~ size_2007 + cap_adeq_2007 + brand_2007 +... + acquisitions_2007
t2 ~ size_2008 + cap_adeq_2008 + brand_2008 + ... + acquisitions_2008
...
t7 ~ size_2013 + cap_adeq_2013 + brand_2013 + ... + acquisitions_2013
'
fit <- growth(ETHthesismodel, data = inputdata,
group = "bank")
summary(fit)
I'm attempting to "translate" a model run in HLM7 software to R lmer syntax.
This is from the now-ubiquitous "Math achievement" dataset. The outcome is math achievement score, and in the dataset there are various student-level predictors (such as minority status, SES, and whether or not the student is female) and various school level predictors (such as Catholic vs. Public).
The only predictors in the model I want to fit are student-level predictors, which have all been group-mean centered to deal with dummy variables (aside: contrast codes are better). The students are nested in schools, so we should (I think) have random effects specified for all of the components of the model.
Here is the HLM model:
Level-1 Model
(note: all predictors at level one are group mean centered)
MATHACHij = β0j + β1j*(MINORITYij) + β2j*(FEMALEij) + β3j*(SESij) + rij
Level-2 Models
β0j = γ00 + u0j
β1j = γ10 + u1j
β2j = γ20 + u2j
β3j = γ30 + u3j
Mixed Model
MATHACHij = γ00 + γ10*MINORITYij + γ20*FEMALEij + γ30*SESij + u0j + u1j*MINORITYij + u2j*FEMALEij + u3j*SESij + rij
Translating it to lmer syntax, I try:
(note: _gmc means the variable has been group mean centered, the grouping factor is "school_id")
model1<-lmer(mathach~minority_gmc+female_gmc+ses_gmc+(minority_gmc|school_id)+(female_gmc|school_id)+(ses_gmc|school_id), data=data, REML=F)
When I run this model I get results that don't mesh with the HLM results. Am I specifying the random effects incorrectly?
Thanks!
When you specify your random effect structure, you can include each random effect in one parentheses. While this may not solve your result dependencies, I believe the appropriate random effects code syntax for your model is this:
lmer(mathach~minority_gmc + female_gmc + ses_gmc + (1 + minority_gmc + female_gmc + ses_gmc |school_id), data=data, REML=F)