Multiplication of different FLAGs - fixed effects model - r

I want to perform a regression in a fixed-effects model. To construct such a model, I have multiple FLAGs, like the following:
y ~ x + z + FlagYear1 + FlagYear2 + FlagYear3 + FlagCountry1 + FlagCountry2
I want to perform another regression in which I have fixed effects for Year * Country, so that the model will be equal to this
y ~ x + z + FlagYear1Country1 + FlagYear1Country2 + FlagYear2Country1 + FlagYear2Country2 + FlagYear3Country1 + FlagYear3Country2
As I have 26 countries and 8 Years in my model, so it would be very time-consuming to manually construct all the FLAGs. I know there is a command to perform this automatically in Stata, how can I do the same in R?

If by 'FLAG' you are referring to 0/1 coded indicator variables (or dummy variables) then R has an easy way to enter all of these interactions into a forumla.
If you have factor variables country with 26 levels and year with 8 levels then you can use
y ~ x + z + country*year
and this will expand the factors into every combination of country and year.
Look at the documentation for formula to understand how this works.
If you already have the indicator variables then you could use
y ~ x + z + (FlagYear1 + FlagYear2 + FlagYear3) * (FlagCountry1 + FlagCountry2)

Related

interaction term in multilevel analysis using lmer() function

level 1 variable:
income - continuous
level 2 variable:
state's general whether: three leveled categorical variable: hot/moderate/cool
used effect coded, and generate two variables because it has three levels.
(weather_ef1, weather_ef2)
enrolled in university - binary : yes/no ( effect coded. yes = -1, no =1)
DV:
math score
grouping variable: household
model 1: (fixed slope)
Dv is predicted by income, enrollment, and the interaction between enrollment and income.
in this case,
lmer(y~ 1 + income + enrollment +income*enrollment+ (1|householdID), data=data)
lmer(y~ 1 + income + enrollment +income:enrollment+ (1|householdID), data=data)
: is it for interaction? or * is it for interaction?
further, do I have to do factor(enrollment)?
or is it okay because it is already effect coded?
model 2: (fixed slope)
DV is predicted by income, weather, and interaction between income and weather
lmer( y ~ 1 + income + weather_ef1 + weather_ef2 + weather_ef1*income
+ weather_ef2*income +(1|houshold_id), data)
lmer ( y ~ l + income + weather_ef1+ weather_ef2 + weather_ef1:income
+ weather_ef2:income + (1|houshold_id), data)
Still confusing * is right or: is right.
I think the effect code variables are already effect coded, so I don't have to
do use the factor(weather_ef1) things.
From the documentation (use ?formula):
The * operator denotes factor crossing: a*b interpreted as a+b+a:b.
In other words a*b adds the main effects of a and b and their interaction. So in your model when you use income*enrollment this is the same as income + enrollment +income:enrollment. The two versions you described for each model should give identical results. You could just have used:
lmer(y~ 1 + income*enrollment+ (1|householdID), data=data)
which also describes the same model.
If your variables are effect coded then you don't need to use factor but be careful about the interpretation of the effects.

Significant variables for Logistic regression in R

I am still new to R and still struggling. I am trying to do a logistic regression using a categorical and continuous variable and I am supposed to select the right variable for my model. There are 27 variables and a 8,000 observations.
I have gone through a couple of articles online including stepwise regression by AIC and all I do is confuse myself the more. I was also told to select my variables from the correlation matrix but when I do the correlation I don't seem to find the correlation especially with the categorical variable. I also try to fit all the model and I get some variables with p-value less than 0.5. This is the code:
d4 <- d3[,c('SW','MOI','YOI','DOI_CMC','RMOB','RYOB','RDOB_CMC',
'RCA','Region','TPR','DPR','NV','HEL','Has_Radio','Has_TV',
'Religion','WI','MOFB','YOB','DOB_CMC','DOFB_CMC','AOR','MTFBI',
'DSOUOM_CMC','RW','RH','RBMI')]
cor(d4)
d5 <- cor(d4)
round(cor(d4),2)
When I select the significant variables and try to apply logistic regression all the p value will be between 0.9 to 1. See code:
d3 <- lm(TPR ~ SW + MOI + RMOB + RYOB + RCA + Region + TPR + DPR +
NV + HEL + Has_Radio + Has_TV + Religion + WI + MOFB +
YOB + DOB_CMC + DOFB_CMC + AOR + MTFBI + DSOUOM_CMC +
RW + RH + RBMI,
data = d3, family = "binomial")
summary(d3)
I need help with this please!!
Here is the sample of d3

Plot predicted values from lmer longitudinal analysis

I'm analyzing some longitudinal data using lme4 package (lmer function) with 3 Levels: measurement points nested in individuals nested in households. I'm interested in linear and non-linear change curves surrounding a specific life event. My model has many time predictors (indicating linear change before and after the event occurs and indicating non-linear change (i.e., squared time variables) before and after the event occurs). Additionally, I have several Level-2 predictors that do not vary with time (i.e., personality traits) and some control variables (e.g., age, gender). So far I did not include any random slopes or cross-level interactions.
This is my model code:
model.RI <- lmer(outcome ~ time + female_c + age_c + age_c2 + preLin + preLin.sq + postLin + postLin.sq + per1.c + per2.c + per3.c + per4.c + per5.c + (1 | ID) + (1 | House))
outcome = my dependent variable
time = year 1, year 2, year 3 ... (until year 9); this variable symbolizes something like a testing effect
female_c = gender centered
age_c = age centered
age_c2 = age squared centered
preLin = time variable indicating time to the event (this variable is 0 after the event has occurred and is -1 e.g. one year ahead of the event, -2 two years ahead of the event etc.)
preLin.sq = squared values of preLin
postLin = time variable indicating time after the event (this variable is 0 before the event and increases after the event has occurred; e.g. is +1 one year after the event)
postLin.sq = squared values of postLin
per1.c until per5.c = personality traits on Level 2 (centered)
ID = indicating the individuum
House = indicating the household
I was wondering how I could plot the predicted values of this lmer model (e.g., using ggplot2?). I've plotted change curves using the method=gam in R. This is a rather data-driven method to inspect the data without pre-defining if the curve is linear or quadratic or whatever. I would now like to check whether my parametric lmer model is comparable to that data-driven gam-plot I already have. Do you have any advise how to do this?
I would be more than happy to get some help on this! Please also feel free to ask if I was not precise enough on my explanation of what I would like to do!
Thanks a lot!
Follow this link: This is how my gam-plot looks like and I hope to get something similar when plotting the predicted values of my lmer model!
You can use the ggpredict()-function from the ggeffects-package. If you want to plot predicted values of time (preLin), you would simply write:
ggpredict(model.RI, "preLin")
The function returns a data frame (see articles), which you can use in ggplot, but you can also directly plot the results:
ggpredict(model.RI, "preLin") %>% plot()
or
p <- ggpredict(model.RI, "preLin")
plot(p)
You could also use the sjPlot-package, however, for marginal effects / predicted values, the sjPlot::plot_model()-function internally just calls ggeffects::ggpredict(), so the results would basically be identical.
Another note to your model: if you have longitudinal data, you should also include your time-variable as random slope. I'm not sure how postLin acutally refers to preLin, but if preLin captures all your measurements, you should at least write your model like this:
model.RI <- lmer(
outcome ~ time + female_c + age_c + age_c2 + preLin + preLin.sq +
postLin + postLin.sq + per1.c + per2.c + per3.c + per4.c + per5.c +
(1 + preLin | ID) + (1 + preLin | House)
)
If you also assume a quadratic trend for each person (ID), you could even add the squared term as random slope.
As your figure example suggests using splines, you could also try this:
library(splines)
model.RI <- lmer(
outcome ~ time + female_c + age_c + age_c2 + bs(preLin)
postLin + postLin.sq + per1.c + per2.c + per3.c + per4.c + per5.c +
(1 + preLin | ID) + (1 + preLin | House)
)
p <- ggpredict(model.RI, "preLin")
plot(p)
Examples for splines are also demonstrated on the website I mentioned above.
Edit:
Another note is related to nesting: you're currently modelling a fully crossed or cross-classified model. If it's completely nested, the random parts would look like this:
... + (1 + preLin | House / ID)
(see also this small code-example).

Unbinding a list of factors in R

I've got a simple question I hope.
I'm trying to assign predicted factor levels from an ordinal logistic regression to a variable in order to calculate the hit-rate of the model. I used the following code:
telecomdata$predfpm1 <- predict(fpm1,type = "class")
FPM is an ordered logistic regression as presented below:
fpm1 <- clm(proposition ~ DMpropHigh + age + rel_length + education + gender + income + num_phones + arpu_index + calls_out_6_index + calls_in_6_index + calls_6_index + DM , data = telecomdata)
The output can be found above. The numbers look okay, but the format is wrong. R now creates a list of factors for each observation, rather than assigning each individual predicted value to an observation. Can anyone help me with how to fix this?

Constrained linear regression coefficients in R [duplicate]

This question already has an answer here:
R : constraining coefficients and error variance over multiple subsample regressions [closed]
(1 answer)
Closed 6 years ago.
I'm estimating several ordinary least squares linear regressions in R. I want to constrain the estimated coefficients across the regressions such that they're the same. For example, I have the following:
z1 ~ x + y
z2 ~ x + y
And I would like the estimated coefficient on y in the first regression to be equal to the estimated coefficient on x in the second.
Is there a straight-forward way to do this? Thanks in advance.
More detailed edit
I'm trying to estimate a system of linear demand functions, where the corresponding welfare function is quadratic. The welfare function has the form:
W = 0.5*ax*(Qx^2) + 0.5*ay*(Qy^2) + 0.5*bxy*Qx*Qy + 0.5*byx*Qy*Qx + cx*Qx + cy*Qy
Therefore, it follows that the demand functions are:
dW/dQx = Px = 2*0.5*ax*Qx + 0 + 0.5*bxy*Qy + 0.5*byx*Qy + 0 + cx
dW/dQx = Px = ax*Qx + 0.5*(bxy + byx)*Qy + cx
and
dW/dQy = Py = ay*Qy + 0.5*(byx + bxy)*Qx + cy
I would like to constrain the system so that byx = bxy (the cross-product coefficients in the welfare function). If this condition holds, the two demand functions become:
Px = ax*Qx + bxy*Qy + cy
Py = ay*Qy + bxy*Qy + cy
I have price (Px and Py) and quantity (Qx and Qy) data, but what I'm really interested in is the welfare (W) which I have no data for.
I know how to calculate and code all the matrix formulae for constrained least squares (which would take a fair few lines of code to get the coefficients, standard errors, measures of fit etc that come standard with lm()). But I was hoping there might be an existing R function (i.e. something that can be done to the lm() function) so that I wouldn't have to code all of this.
For your specified regression:
Px = ax*Qx + bxy*Qy + cy
Py = ay*Qy + bxy*Qy + cy
We can introduce a grouping factor:
id <- factor(rep.int(c("Px", "Py"), c(length(Px), length(Py))),
levels = c("Px", "Py"))
We also need to combine data:
z <- c(Px, Py) ## response
x <- c(Qx, Qy) ## covariate 1
y <- c(Qy, Qy) ## covariate 2
Then we can fit a linear model using lm with a formula:
z ~ x + y + x:id
If the x and y values are the same, then you could use this model:
lm( I(z1+z2)~ x +y ) # Need to divide coefficients by 2
If they are separate data then you could rbind the two datasets after renaming z2 to z1.

Resources