Calculate indirect effect of 1-1-1 (within-person, multilevel) mediation analyses - r

I have data from an Experience Sampling Study, which consists of 8140 observations nested in 106 participants. I want to test if there is a mediation, in which I also want to compare the predictors (X1= socialInteraction_tech, X2= socialInteraction_ftf, M = MPEE_int, Y= wellbeing). X1, X2, and M are person-mean centred in order to obtain the within-person effects. To account for the autocorrelation I have fit a model with an ARMA(2,1) structure. We control for time with the variable "obs".
This is the final model including all variables of interest:
fit_mainH1xmy <- lme(fixed = wellbeing ~ 1 + obs # Controls
+ MPEE_int_centred + socialInteraction_tech_centred + socialInteraction_ftf_centred,
random = ~ 1 + obs | ID, correlation = corARMA(form = ~ obs | ID, p = 2, q = 1),
data = file, method = "ML", na.action=na.exclude)
summary(fit_mainH1xmy)
The mediation is partial, as my predictor X still significantly predicts Y after adding M.
However, I can't find a way to calculate c'(cprime), the indirect effect.
I have found the mlma package, but it looks weird and requires me to do transformations to my data.
I have tried melting the data in a long format and using lmer() to fit the model (following https://quantdev.ssri.psu.edu/sites/qdev/files/ILD_Ch07_2017_Within-PersonMedationWithMLM.html), but lmer() does not let me take into account the moving average (MA-part of the ARMA(2,1) structure).
Does anyone know how I could now obtain the indirect effect?

Related

Panel regression in R with variable coefficients

We have tried to do two (similar) panel regressions in R.
1) One with time and individual fixed effects (the usual intercept dummies) using plm(). However, we are only interested mostly interested in a "slope coefficient" or beta for each individual and not one for all of the individuals:
Regression 1
Where alpha_i is the individual fixed effect, gamme_t is the time fixed effect. The sum of X is the variable X and three lags:
Sum X variable
We have already included the lagged X variables as new columns in our dataset so in our specification in the code we simply treat them as four different variables:
This is an attempt at using plm() and include our own dummy variables for each individual Beta
plm(income ~ (factor(firmid)-1)*(expense_rate + lag1 + lag2 + lag3), data = data1,
effect = c("time"), model = c("within"), index = c("name", "date"))
The lag1, lag,2, lag2 are the lagged variables of expense rate.
Data1 is in the form of a data frame.
“(factor(firmid)-1)” is an attempt at introducing dummies to get Betas for each individual instead of one for all individuals.
2) The second (and simpler) regression is:
Regression 2
This is an example of our attempt at using pvcm
pvcm1 <- pvcm(income ~ expense_rate + lag1 + lag2 + lag3, data = data1,
effect = "individual", model = "within")
Our question is what specific code and or packages/functions which would be suitable for these regressions. We have tried pvcm to no avail, running into errors such as:
“Error in table(index[1], index[2], useNA = "ifany") :
attempt to make a table with >= 2^31 elements”
and
“Error: cannot allocate vector of size 599.7 Gb”
. Furthermore, pvcm() does not seem to be able to cope with both individua and time fixed effects as in 1).

Error when trying to run fixed effects logistic regression

not sure where can I get help, since this exact post was considered off-topic on StackExchange.
I want to run some regressions based on a balanced panel with electoral data from Brazil focusing on 2 time periods. I want to understand if after a change in legislation that prohibited firm donations to candidates, those individuals that depended most on these resources had a lower probability of getting elected.
I have already ran a regression like this on R:
model_continuous <- plm(percentage_of_votes ~ time +
treatment + time*treatment, data = dataset, model = 'fd')
On this model I have used a continuous variable (% of votes) as my dependent variable. My treatment units or those that in time = 0 had no campaign contributions coming from corporations.
Now I want to change my dependent variable so that it is a binary variable indicating if the candidate was elected on that year. All of my units were elected on time = 0. How can I estimate a logit or probit model using fixed effects? I have tried using the pglm package in R.
model_binary <- pglm(dummy_elected ~ time + treatment + time*treatment,
data = dataset,
effects = 'twoways',
model = 'within',
family = 'binomial',
start = NULL)
However, I got this error:
Error in maxRoutine(fn = logLik, grad = grad, hess = hess, start = start, :
argument "start" is missing, with no default
Why is that happening? What is wrong with my model? Is it conceptually correct?
I want the second regression to be as similar as possible to the first one.
I have read that clogit function from the survival package could do the job, but I dont know how to do it.
Edit:
this is what a sample dataset could look like:
dataset <- data.frame(individual = c(1,1,2,2,3,3,4,4,5,5),
time = c(0,1,0,1,0,1,0,1,0,1),
treatment = c(0,0,1,1,0,0,1,1,0,0),
corporate = c(0,0,0.1,0,0,0,0.5,0,0,0))
Based on the comments, I believe the logistic regression reduces to treatment and dummy_elected. Accordingly I have fabricated the following dataset:
dataset <- data.frame("treatment" = c(rep(1,1000),rep(0,1000)),
"dummy_elected" = c(rep(1, 700), rep(0, 300), rep(1, 500), rep(0, 500)))
I then ran the GLM model:
library(MASS)
model_binary <- glm(dummy_elected ~ treatment, family = binomial(), data = dataset)
summary(model_binary)
Note that the treatment coefficient is significant and the coefficients are given. The resulting probabilities are thus
Probability(dummy_elected) = 1 => 1 / (1 + Exp(-(1.37674342264577E-16 + 0.847297860386033 * :treatment)))
Probability(dummy_elected) = 0 => 1 - 1 / (1 + Exp(-(1.37674342264577E-16 + 0.847297860386033 * :treatment)))
Note that these probabilities are consistent with the frequencies I generated the data.
So for each row, take the max probability across the two equations above and that's the value for dummy_elected.

Converting a mixed model with repeated and random effects and different covariance structures from SAS to R

I have a model, created in SAS by a colleague, with a repeated effect that has an ARH1 (autoregressive heterogeneous variances) covariance structure and a random effect (with a variance components covariance structure) that I am trying to re-create in R, where I have more experience.
The original SAS code is:
PROC MIXED DATA=mylib.sep_cover_data plots=all ALPHA=0.15 CL COVTEST;
CLASS soil_grp dummy_year plot pasture;
MODEL cover = pcp soil_grp / ALPHA=0.15 CL residual SOLUTION;
RANDOM pasture / ALPHA=0.15 CL SOLUTION;
REPEATED dummy_year / subject = plot type = ARH(1);
After looking through other similar questions on here, I'm pretty sure I'm able to re-create the repeated statement in R, including the covariance structure, using the nlme library:
library(nlme)
cover.data <- read.csv("https://drive.google.com/uc?export=donload&id=0Bxdatltmq5ljMVlObXh1NHFGck0", header = TRUE)
cover.data <- cover.data[cover.data$Flag=="data",]
cover.data$soil_grp <- factor(cover.data$soil_grp)
cover.data$dummy_year <- factor(cover.data$dummy_year)
cover.data$pcp <- as.numeric(as.character(cover.data$pcp))
cover.data$cover <- as.numeric(as.character(cover.data$cover))
model.test1 <- gls(cover ~ Pcp + soil_grp, corr = corAR1(, form = ~ 1 | year/plot), weights = varIdent(form = ~ 1 | dummy_year), data = cover.data, na.action = "na.omit")
but I can't figure out how to add in the random effect (with a different covariance structure) into this model.
Additionally, I can put both the repeated and random variables into the same model but don't know how to specify the covariance structures for them.
model.test2 <- lme(cover ~ Pcp_jan_jun + soil, random = list( plot = ~ 1, pasture = ~1), data = cover.data, na.action = "na.omit")
Is is possible to put both a repeated and random effect with different covariance structures into the same model in R? If so, how do I code R to do it?
The data can be downloaded from
https://drive.google.com/uc?export=donload&id=0Bxdatltmq5ljMVlObXh1NHFGck0

Simulation-based power analysis for Linear Mixed Model (repeated measures) using pilot data

I am doing a simple mixed model analysis and would like to estimate the power of the study (for multiple possible sample sizes). I am using lme4 to fit the models and would like to use the simulate() function in order to simulate new data based on the parameters estimated in my pilot study model and then to use this data for power analysis.
My model is:
m.2 <- lmer(y ~ time + group + (time | subject), REML=FALSE)
And the data looks like this:
npg <- 20 # No of subjects per group
subject <- 1:(2*npg) # Subjects' ids
group <- gl(2, npg, labels = c("Control", "Treatment"))
dts <- data.frame(subject, group) # Subject-level data
dtL <- list(time = 0:9,
subject = subject)
dtLong <- expand.grid(dtL) # "Long" format
mrgDt <- merge(dtLong, dts, sort = FALSE) # Merged
I know I can get the parameters from the pilot data model like this:
newparams <- list(
beta = fixef(m.2),
theta = getME(m.2, "theta"),
sigma = getME(m.2, "sigma"))
And that I can simulate new data using this:
ss <- simulate(~ time + group + (time|subject),
nsim=1,
newdata=d,
family=gaussian,
newparams=newparams)
Unfortunately, I'm new to simulation and don't really know how to do power analysis with simulated data sets. Could you help me with some code and some advice?
EDIT:
I would like to do a power analysis similar to the one in Gałecki & Burzykowski (2013) book.
Gałecki, A., & Burzykowski, T. (2013). Linear mixed-effects models using R: A step-by-step approach. Springer Science & Business Media, p. 515. The example is: Empirical power of the F-test for the treatment effect based on the simulated values of the F-test statistics.
The info in https://peerj.com/articles/1226.pdf and the Supplemental Information (https://peerj.com/articles/1226/#supp-1) is very useful and I think it's the way to go, but I don't understand how to apply their code.
I wanted to simulate data based on the real parameters of the fitted model from my pilot study.
I don't have all the data ready at the current time.For the sake of example and reproducibility let's say I use the plantgrowth data set
df<-read.table('http://www.uib.no/People/nzlkj/statkey/data/plantgrowth.csv',header=T)
and my model is
m.2 <- lmer(height ~ time + treat + (time | individual), REML=FALSE)
Under my control is the number of individuals in each arm or the study (the two groups, treatment and control).

Converting Repeated Measures mixed model formula from SAS to R

There are several questions and posts about mixed models for more complex experimental designs, so I thought this more simple model would help other beginners in this process as well as I.
So, my question is I would like to formulate a repeated measures ancova in R from sas proc mixed procedure:
proc mixed data=df1;
FitStatistics=akaike
class GROUP person day;
model Y = GROUP X1 / solution alpha=.1 cl;
repeated / type=cs subject=person group=GROUP;
lsmeans GROUP;
run;
Here is the SAS output using the data created in R (below):
. Effect panel Estimate Error DF t Value Pr > |t| Alpha Lower Upper
Intercept -9.8693 251.04 7 -0.04 0.9697 0.1 -485.49 465.75
panel 1 -247.17 112.86 7 -2.19 0.0647 0.1 -460.99 -33.3510
panel 2 0 . . . . . . .
X1 20.4125 10.0228 7 2.04 0.0811 0.1 1.4235 39.4016
Below is how I formulated the model in R using 'nlme' package, but am not getting similar coefficient estimates:
## create reproducible example fake panel data set:
set.seed(94); subject.id = abs(round(rnorm(10)*10000,0))
set.seed(99); sds = rnorm(10,15,5);means = 1:10*runif(10,7,13);trends = runif(10,0.5,2.5)
this = NULL; set.seed(98)
for(i in 1:10) { this = c(this,rnorm(6, mean = means[i], sd = sds[i])*trends[i]*1:6)}
set.seed(97)
that = sort(rep(rnorm(10,mean = 20, sd = 3),6))
df1 = data.frame(day = rep(1:6,10), GROUP = c(rep('TEST',30),rep('CONTROL',30)),
Y = this,
X1 = that,
person = sort(rep(subject.id,6)))
## use package nlme
require(nlme)
## run repeated measures mixed model using compound symmetry covariance structure:
summary(lme(Y ~ GROUP + X1, random = ~ +1 | person,
correlation=corCompSymm(form=~day|person), na.action = na.exclude,
data = df1,method='REML'))
Now, the output from R, which I now realize is similar to the output from lm():
Value Std.Error DF t-value p-value
(Intercept) -626.1622 527.9890 50 -1.1859379 0.2413
GROUPTEST -101.3647 156.2940 7 -0.6485518 0.5373
X1 47.0919 22.6698 7 2.0772934 0.0764
I believe I'm close as to the specification, but not sure what piece I'm missing to make the results match (within reason..). Any help would be appreciated!
UPDATE: Using the code in the answer below, the R output becomes:
> summary(model2)
Scroll to bottom for the parameter estimates -- look! identical to SAS.
Linear mixed-effects model fit by REML
Data: df1
AIC BIC logLik
776.942 793.2864 -380.471
Random effects:
Formula: ~GROUP - 1 | person
Structure: Diagonal
GROUPCONTROL GROUPTEST Residual
StdDev: 184.692 14.56864 93.28885
Correlation Structure: Compound symmetry
Formula: ~day | person
Parameter estimate(s):
Rho
-0.009929987
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | GROUP
Parameter estimates:
TEST CONTROL
1.000000 3.068837
Fixed effects: Y ~ GROUP + X1
Value Std.Error DF t-value p-value
(Intercept) -9.8706 251.04678 50 -0.0393178 0.9688
GROUPTEST -247.1712 112.85945 7 -2.1900795 0.0647
X1 20.4126 10.02292 7 2.0365914 0.0811
Please try below:
model1 <- lme(
Y ~ GROUP + X1,
random = ~ GROUP | person,
correlation = corCompSymm(form = ~ day | person),
na.action = na.exclude, data = df1, method = "REML"
)
summary(model1)
I think random = ~ groupvar | subjvar option with R lme provides similar result of repeated / subject = subjvar group = groupvar option with SAS/MIXED in this case.
Edit:
SAS/MIXED
R (a revised model2)
model2 <- lme(
Y ~ GROUP + X1,
random = list(person = pdDiag(form = ~ GROUP - 1)),
correlation = corCompSymm(form = ~ day | person),
weights = varIdent(form = ~ 1 | GROUP),
na.action = na.exclude, data = df1, method = "REML"
)
summary(model2)
So, I think these covariance structures are very similar (σg1 = τg2 + σ1).
Edit 2:
Covariate estimates (SAS/MIXED):
Variance person GROUP TEST 8789.23
CS person GROUP TEST 125.79
Variance person GROUP CONTROL 82775
CS person GROUP CONTROL 33297
So
TEST group diagonal element
= 125.79 + 8789.23
= 8915.02
CONTROL group diagonal element
= 33297 + 82775
= 116072
where diagonal element = σk1 + σk2.
Covariate estimates (R lme):
Random effects:
Formula: ~GROUP - 1 | person
Structure: Diagonal
GROUP1TEST GROUP2CONTROL Residual
StdDev: 14.56864 184.692 93.28885
Correlation Structure: Compound symmetry
Formula: ~day | person
Parameter estimate(s):
Rho
-0.009929987
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | GROUP
Parameter estimates:
1TEST 2CONTROL
1.000000 3.068837
So
TEST group diagonal element
= 14.56864^2 + (3.068837^0.5 * 93.28885 * -0.009929987) + 93.28885^2
= 8913.432
CONTROL group diagonal element
= 184.692^2 + (3.068837^0.5 * 93.28885 * -0.009929987) + (3.068837 * 93.28885)^2
= 116070.5
where diagonal element = τg2 + σ1 + σg2.
Oooh, this is going to be a tricky one, and if it's even possible using standard nlme functions, is going to take some serious study of Pinheiro/Bates.
Before you spend the time doing that though, you should make absolutely sure that this is exact model you need. Perhaps there's something else that might fit the story of your data better. Or maybe there's something R can do more easily that is just as good, but not quite the same.
First, here's my take on what you're doing in SAS with this line:
repeated / type=cs subject=person group=GROUP;
This type=cs subject=person is inducing correlation between all the measurements on the same person, and that correlation is the same for all pairs of days. The group=GROUP is allowing the correlation for each group to be different.
In contrast, here's my take on what your R code is doing:
random = ~ +1 | person,
correlation=corCompSymm(form=~day|person)
This code is actually adding almost the same effect in two different ways; the random line is adding a random effect for each person, and the correlation line is inducing correlation between all the measurements on the same person. However, these two things are almost identical; if the correlation is positive, you get the exact same result by including either of them. I'm not sure what happens when you include both, but I do know that only one is necessary. Regardless, this code has the same correlation for all individuals, it's not allowing each group to have their own correlation.
To let each group have their own correlation, I think you have to build a more complicated correlation structure up out of two different pieces; I've never done this but I'm pretty sure I remember Pinheiro/Bates doing it.
You might consider instead adding a random effect for person and then letting the variance be different for the different groups with weights=varIdent(form=~1|group) (from memory, check my syntax, please). This won't quite be the same but tells a similar story. The story in SAS is that the measurements on some individuals are more correlated than the measurements on other individuals. Thinking about what that means, the measurements for individuals with higher correlation will be closer together than the measurements for individuals with lower correlation. In contrast, the story in R is that the variability of measurements within individuals varies; thinking about that, measurements with higher variability with have lower correlation. So they do tell similar stories, but come at it from opposite sides.
It is even possible (but I would be surprised) that these two models end up being different parameterizations of the same thing. My intuition is that the overall measurement variability will be different in some way. But even if they aren't the same thing, it would be worth writing out the parameterizations just to be sure you understand them and to make sure that they are appropriately describing the story of your data.

Resources