Compute effect sizes of path coefficients in SEM with R - r

I am currently using the lavaan package in R for structural equation models. I would like to compute the effect sizes (i.e., partial-eta-squared) for each of my path coefficient. Is there already a package that does this?
For instance, how can I compute the effect size of the c, a and b regression coefficients?
set.seed(1234)
X <- rnorm(100)
M <- 0.5*X + rnorm(100)
Y <- 0.7*M + rnorm(100)
Data <- data.frame(X = X, Y = Y, M = M)
model <- ' # direct effect
Y ~ c*X
# mediator
M ~ a*X
Y ~ b*M
# indirect effect (a*b)
ab := a*b
# total effect
total := c + (a*b)
'
fit <- sem(model, data = Data)
summary(fit)
Ideally the method should also work when building models based on latent variables.

Related

how to obtain p-value (or CIs) for correlation of random effects in a GLMM (lme4)

I want to test for a correlation between the random effects of a GLMM model calculated in lme4. I have already been suggested to conduct a likelihood ratio comparison of a model with and without the random correlation. That is indeed significant but I wanted to ask if there is any way to get the confidence intervals or p-values of this correlation from the model.
(Specifically, I have compared a model with the random effects structure (1 + X1 + X2 || group) against (1 + X1 + X2 | group) but the problem is that in the second model also the correlation with the intercept is included and I want to specifically test for the significance of the correlation between X1 and X2. Unfortunately, a model with (1 + X1 | group) + (1 + X2 | group) does not converge)
Any help would be appreciated
You can use confint() to get likelihood profile confidence intervals. P-values would be harder; you could do parametric bootstrapping but it would be slow.
set.seed(101)
dd <- data.frame(x = rnorm(1000), y = rnorm(1000),
g = factor(sample(1:20, size = 1000, replace = TRUE)))
library(lme4)
dd$z <- simulate(~ x + y + (1 + x + y | g),
newdata = dd,
newparams = list(beta = rep(1, 3),
sigma = 1,
theta = rep(1, 6)))[[1]]
m <- lmer(z ~ x + y + (1 + x + y | g),
data = dd)
In the confint() call below, parm = "theta_" means "all covariance parameters". You could use parm = c(2, 3, 5) to select only the correlation parameters, but you'd have to read ?profile.merMod and think carefully to figure out the correct indices ...
cc <- confint(m, parm = "theta_", oldNames = FALSE)
Results give you 95% (by default) CIs for all of the covariance parameters. In this example, the x/y slope correlation is significant but the correlations between (intercept and x) and (intercept and y) aren't. (Note that the correlations aren't necessarily invariant to reparameterizing the model, in particular centering or otherwise shifting the predictors will change the answers ...)
cc
2.5 % 97.5 %
sd_(Intercept)|g 0.38142602 0.7451417
cor_x.(Intercept)|g -0.15990774 0.6492967
cor_y.(Intercept)|g -0.01148283 0.7294138
sd_x|g 0.67205037 1.2800681
cor_y.x|g 0.53404483 0.9116571
sd_y|g 0.83378353 1.5742580
sigma 0.94201110 1.0311559

Simulate data from regression model with exact parameters in R

How can I simulate data so that the coefficients recovered by lm are determined to be particular pre-determined values and have normally distributed residuals? For example, could I generate data so that lm(y ~ 1 + x) will yield (Intercept) = 1.500 and x = 4.000? I would like the solution to be versatile enough to work for multiple regression with continuous x (e.g., lm(y ~ 1 + x1 + x2)) but there are bonus points if it works for interactions as well (lm(y ~ 1 + x1 + x2 + x1*x2)). Also, it should work for small N (e.g., N < 200).
I know how to simulate random data which is generated by these parameters (see e.g. here), but that randomness carries over to variation in the estimated coefficients, e.g., Intercept = 1.488 and x = 4.067.
Related: It is possible to generate data that yields pre-determined correlation coefficients (see here and here). So I'm asking if this can be done for multiple regression?
One approach is to use a perfectly symmetrical noise. The noise cancels itself so the estimated parameters are exactly the input parameters, yet the residuals appear normally distributed.
x <- 1:100
y <- cbind(1,x) %*% c(1.5, 4)
eps <- rnorm(100)
x <- c(x, x)
y <- c(y + eps, y - eps)
fit <- lm(y ~ x)
# (Intercept) x
# 1.5 4.0
plot(fit)
Residuals are normally distributed...
... but exhibit an anormally perfect symmetry!
EDIT by OP: I wrote up a general-purpose code exploiting the symmetrical-residuals trick. It scales well with more complex models. This example also shows that it works for categorical predictors and interaction effects.
library(dplyr)
# Data and residuals
df = tibble(
# Predictors
x1 = 1:100, # Continuous
x2 = rep(c(0, 1), each=50), # Dummy-coded categorical
# Generate y from model, including interaction term
y_model = 1.5 + 4 * x1 - 2.1 * x2 + 8.76543 * x1 * x2,
noise = rnorm(100) # Residuals
)
# Do the symmetrical-residuals trick
# This is copy-and-paste ready, no matter model complexity.
df = bind_rows(
df %>% mutate(y = y_model + noise),
df %>% mutate(y = y_model - noise) # Mirrored
)
# Check that it works
fit <- lm(y ~ x1 + x2 + x1*x2, df)
coef(fit)
# (Intercept) x1 x2 x1:x2
# 1.50000 4.00000 -2.10000 8.76543
You could do rejection sampling:
set.seed(42)
tol <- 1e-8
x <- 1:100
continue <- TRUE
while(continue) {
y <- cbind(1,x) %*% c(1.5, 4) + rnorm(length(x))
if (sum((coef(lm(y ~ x)) - c(1.5, 4))^2) < tol) continue <- FALSE
}
coef(lm(y ~ x))
#(Intercept) x
# 1.500013 4.000023
Obviously, this is a brute-force approach and the smaller the tolerance and the more complex the model, the longer this will take. A more efficient approach should be possible by providing residuals as input and then employing some matrix algebra to calculate y values. But that's more of a maths question ...

Constrained weighted linear regression in R

I am trying to set up a contrained weighted linear regression. That is to say, that I have a dataset of i observations and three different x values. Each observations has a weight. I want to perform a weighted multiple linear regression using the restrictions that the weighted mean of each x value has to be zero and the weighted standard deviation should be one.
Since I am new and have no reputation yet, I can‘t post images with latex formulas. So I have to write them down this way.
First restriction $\sum_{i} w_{i} X_{i,k} = 0$ for k = 1,2,3.
Second one: $\sum_{i} w_{i} X_{i,k}^2 = 1$ for k = 1,2,3.
This is an example dataset:
y <- rnorm(10)
w <- rep(0.1, 10)
x1 <- rnorm(10)
x2 <- rnorm(10)
x3 <- rnorm(10)
data <- cbind(y, x1, x2, x3, w)
lm(y ~ x1 + x2 + x3, data = data, weigths = data$w)
The weights do not have to be equal for each observation but have to add up to one.
I would like to include these restrictions into the regression. Is there a way to do that?
You could perhaps use the Generalised Linear Model:
glm(y ~ x1 + x2 + x3, weights = w, data=data)
Data needs to be a data.frame(...).

Removing additive effect in Multiple Regression in R

I have this data set that I will used for my model
set.seed(123)
x <- rnorm(100)
DF <- data.frame(x = x,
y = 4 + (1.5*x) + rnorm(100, sd = 2),
b = as.factor(round(abs(DF$x/3))),
c = as.factor(round(abs(DF$y/3)))
)
I was assigned to create a multiplicative model for them with a based 5 like this equation:
y=5*b(i)*c(i)
but the best that I can do is this one:
m1 <- lm(y ~ b*c, data = DF)
summary(m1)
This model is okay but I do want to remove the additive effect and just get the multiplicative model and I also replace the intercept with 5 and create difference coefficient for the first level of b and c.
Is there a way in R to do this task?
To fit the model without a constant use lm(y~b*c -1,...). Setting a fixed constant can be done by specifying the offset and not fitting the constant or by subtracting the known constant from the dependent variable and fitting a model with no constant.
set.seed(123)
x <- rnorm(100)
DF <- as.data.frame(cbind(x))
DF$y = 4 + (1.5*x) + rnorm(100, sd = 2)
DF$b = round(abs(DF$x/3))
DF$c = round(abs(DF$y/3))
DF$bc = DF$b*DF$c
m1 <- lm(y~ b*c, data=DF) # model w/ a constant
m2 <- lm(y~ b*c - 1, data=DF) # model w/o a constant
m3 <- lm(y~ b*c -1 + offset(rep(5,nrow(DF))), data=DF) # model w/ a constant of 5
m4 <- lm(y-5~ b*c -1, data=DF) # subtracting fixed constant from y's

How to simulate quantities of interest using arm or rstanarm packages in R?

I would like to know how to simulate quantities of interest out of a regression model estimated using either the arm or the rstanarm packages in R. I am a newbie in Bayesian methods and R and have been using the Zelig package for some time. I asked a similar question before, but I would like to know if it is possible to simulate those quantities using the posterior distribution estimated by those packages.
In Zelig you can set the values you want for the independent values and it calculates the results for the outcome variable (expected value, probability, etc). An example:
# Creating a dataset:
set.seed(10)
x <- rnorm(100,20,10)
z <- rnorm(100,10,5)
e <- rnorm(100,0,1)
y <- 2*x+3*z+e
df <- data.frame(x,z,e,y)
# Loading Zelig
require(Zelig)
# Model
m1.zelig <- zelig(y ~ x + z, model="ls", data=df)
summary(m1.zelig)
# Simulating z = 10
s1 <- setx(m1.zelig, z = 10)
simulation <- sim(m1.zelig, x = s1)
summary(simulation)
So Zelig keeps x at its mean (20.56), and simulates the quantity of interest with z = 10. In this case, y is approximately 71.
The same model using arm:
# Model
require(arm)
m1.arm <- bayesglm(y ~ x + z, data=df)
summary(m1.arm)
And using rstanarm:
# Model
require(rstanarm)
m1.stan <- stanlm(y ~ x + z, data=df)
print(m1.stan)
Is there any way to simulate z = 10 and x equals to its mean with the posterior distribution estimated by those two packages and get the expected value of y? Thank you very much!
In the case of bayesglm, you could do
sims <- arm::sim(m1.arm, n = 1000)
y_sim <- rnorm(n = 1000, mean = sims#coef %*% t(as.matrix(s1)), sd = sims#sigma)
mean(y_sim)
For the (unreleased) rstanarm, it would be similar
sims <- as.matrix(m1.stan)
y_sim <- rnorm(n = nrow(sims), mean = sims[,1:(ncol(sims)-1)] %*% t(as.matrix(s1)),
sd = sims[,ncol(sims)])
mean(y_sim)
In general for Stan, you could pass s1 as a row_vector and utilize it in a generated quantities block of a .stan file like
generated quantities {
real y_sim;
y_sim <- normal_rng(s1 * beta, sigma);
}
in which case the posterior distribution of y_sim would appear when you print the posterior summary.

Resources