I have a mixed model with two categorical predictors (X1, X2) and one continuous predictor (X3).
model <- lmer(z ~ x1 * x2 * x3 + (1|group), data = data)
I am interested in comparing high and low traits of my continuous predictor.
My plan is to contrast the simple slopes for X3 (at -1 SD, M, +1 SD).
As far as I understand this can be done using the emtrends() function from emmeans like so:
sd1 <- mean(data$X3, na.rm = T) + sd(data$X3, na.rm = T)
mean <- mean(data$X3, na.rm = T)
sd2 <- mean(data$X3, na.rm = T) - sd(data$X3, na.rm = T)
mylist <- list(X3 = c(sd1, mean, sd2))
emtrends(model, ~ X1 * X2 | X3,
var = "X3",
at = mylist)
However, the coefficients provided are the same for the three values of X3.
Can anyone shed light on this?
simple_slopes function in the reghelper -package could be an alternative to emmeans in this specific case. The following simulation probes simple slopes for the -1,0,1 values of x3 (that was simulated as having mean=0, sd=1), but you can of course use any values. For categorical predictors, all combinations as well as slope for x3 at the "mean" values of both categorical predictors are calculated:
library(lme4)
library(reghelper)
set.seed(42143)
#sample size
n=1000
#generate the factor variables (dummy coded)
x1<-sample(x = c(0,1),size = n,replace=T,prob=c(.90,.10))
x2<-sample(x = c(0,1),size = n,replace=T,prob=c(.90,.10))
#generate the continuous variable
x3<-rnorm(n,mean=0,sd=1)
#generate the group variable
group<-sample(letters,size=n,replace=T)
#generate the dependent variable with some main effect and three-way interaction weights
z<-0.2*x1-0.3*x2+0.4*x3+0.2*x1*x2*x3+rnorm(n)
#collect variables to a data frame
dat<-data.frame(x1,x2,x3,group,z)
#run the model
model <- lmer(z ~ x1 * x2 * x3 + (1|group), data = dat)
#explore simple slopes at specified variables values.
#"sstest" gives the slopes
simple_slopes(model,levels=list(x1=c(0,1,mean(dat$x1),"sstest"),
x2=c(0,1,mean(dat$x2),"sstest"),
x3=c(-1,0,1,"sstest")))
Related
I want to use the propensity score fitted from a linear model to match observations using the MatchIt library.
For example, suppose df is a dataframe. If I were to balance its treatment column in terms of x1, x2 and x3 using the propensity score from a logit, I would set distance = 'glm' and link = 'logit':
m <- matchit(formula = treatment ~ x1 + x2 + x3,
data = df,
method = 'nearest',
distance = 'glm',
link = 'logit')
How can I do the same with a linear model instead of a logit?
As per the documentation:
When link is prepended by "linear.", the linear predictor is used instead of the predicted probabilities.
Hence, I tried:
m <- matchit(formula = treatment ~ x1 + x2 + x3,
data = df,
method = 'nearest',
distance = 'glm',
link = 'linear.logit')
I'm afraid that doing this (link = linear.logit) would use the score from the log-odds of the logit model.
Is there any way I can just use a linear model instead of a generalized linear model?
You cannot do this from within matchit() (and you shouldn't, in general, which is why it is not allowed). But you can always estimate propensity score outside matchit() however you want and the supply them to matchit(). To use a linear probability model, you would run the following:
fit <- lm(treatment ~ x1 + x2 + x3, data = df)
ps <- fit$fitted
m <- matchit(treatment ~ x1 + x2 + x3,
data = df,
method = 'nearest',
distance = ps)
I have a numerical (continuous) dependent variable and more than 40 independent variables. (2 numerical, 3 categorical and the rest are dummy variables). I tried to do both forward and backward selection to have a smaller number of predictor variables in my model:
nullmod <- lm(y ~ 1, data = df)
fullmod <- lm(y ~ x1 + x2 + x3 .... + x40, data = df)
reg1A <- step(nullmod, scope = list(lower = nullmod, upper = fullmod),
direction = "forward")
I also did the same with direction = "backward
the result says that my model is best with none of the variables added! while when I do multiple regression with a few of my variables, I get pretty significant results. Why is that?
I am trying to set up a contrained weighted linear regression. That is to say, that I have a dataset of i observations and three different x values. Each observations has a weight. I want to perform a weighted multiple linear regression using the restrictions that the weighted mean of each x value has to be zero and the weighted standard deviation should be one.
Since I am new and have no reputation yet, I can‘t post images with latex formulas. So I have to write them down this way.
First restriction $\sum_{i} w_{i} X_{i,k} = 0$ for k = 1,2,3.
Second one: $\sum_{i} w_{i} X_{i,k}^2 = 1$ for k = 1,2,3.
This is an example dataset:
y <- rnorm(10)
w <- rep(0.1, 10)
x1 <- rnorm(10)
x2 <- rnorm(10)
x3 <- rnorm(10)
data <- cbind(y, x1, x2, x3, w)
lm(y ~ x1 + x2 + x3, data = data, weigths = data$w)
The weights do not have to be equal for each observation but have to add up to one.
I would like to include these restrictions into the regression. Is there a way to do that?
You could perhaps use the Generalised Linear Model:
glm(y ~ x1 + x2 + x3, weights = w, data=data)
Data needs to be a data.frame(...).
I want to observe the effect of a treatment variable on my outcome Y. I did a multiple regression: fit <- lm (Y ~ x1 + x2 + x3). x1 is the treatment variable and x2, x3 are the control variables. I used the predict function holding x2 and x3 to their means. I plotted this predict function.
Now I would like to add a line to my plot similar to a simple regression abline but I do not know how to do this.
I think I have to use line(x,y) where y = predict and x is a sequence of values for my variable x1. But R tells me the lengths of y and x differ.
I think you are looking for termplot:
## simulate some data
set.seed(0)
x1 <- runif(100)
x2 <- runif(100)
x3 <- runif(100)
y <- cbind(1,x1,x2,x3) %*% runif(4) + rnorm(100, sd = 0.1)
## fit a model
fit <- lm(y ~ x1 + x2 + x3)
termplot(fit, se = TRUE, terms = "x1")
termplot uses predict.lm(, type = "terms") for term-wise prediction. If a model has intercept (like above), predict.lm will centre each term (What does predict.glm(, type=“terms”) actually do?). In this way, each terms is predicted to be 0 at the mean of the covariate, and the standard error at the mean is 0 (hence the confidence interval intersects the line at the mean).
I would like to use predict.lm to generate fitted values for the predictors in a model. In other words, I want to multiply the coefficient estimates in an lm object by the values in a new data frame with n rows. The result is a matrix with n rows in which each value has been multiplied by its corresponding coefficient.
This is easy to do when the new data frame has a column for each of the coefficient estimates in the lm object. For example:
y <- rnorm(6)
x1 <- rnorm(6)
x2 <- rnorm(6)
modEasy <- lm(y ~ x1 + x2)
dfEasy <- data.frame(x1 = rnorm(6), x2 = rnorm(6))
predict(modEasy, newdata = dfEasy, type = "terms")
But it is tricky to generate fitted values when the model includes a factor variable or a polynomial. In these cases, the new data frame has fewer than k columns. For example:
y <- rnorm(6)
x1 <- rnorm(6)
x2 <- factor(rep(letters[1:3], 2))
modHard <- lm(y ~ x1 + x2)
dfHard <- data.frame(x1 = rnorm(6), x2 = factor(rep(letters[1:3], each=2)))
predict(modHard, newdata = dfHard, type = "terms")
In this case, modHard$coefficients contains estimates for x1 and for each factor level of x2, just as it should. But dfHard has only two columns: one for x1 and one for x2. As a result, predict returns a matrix with only two columns: one for x1 and the other for x2.
In this situation, I want predict to return a matrix with columns for x1 and for each level of x2. I can produce that matrix with this code:
mf <- model.frame(formula(modHard), dfHard)
mm <- model.matrix(formula(modHard), mf)
t(modHard$coefficients * t(mm))
But is there a simpler way?