R : Plotting Prediction Results for a multiple regression - r

I want to observe the effect of a treatment variable on my outcome Y. I did a multiple regression: fit <- lm (Y ~ x1 + x2 + x3). x1 is the treatment variable and x2, x3 are the control variables. I used the predict function holding x2 and x3 to their means. I plotted this predict function.
Now I would like to add a line to my plot similar to a simple regression abline but I do not know how to do this.
I think I have to use line(x,y) where y = predict and x is a sequence of values for my variable x1. But R tells me the lengths of y and x differ.

I think you are looking for termplot:
## simulate some data
set.seed(0)
x1 <- runif(100)
x2 <- runif(100)
x3 <- runif(100)
y <- cbind(1,x1,x2,x3) %*% runif(4) + rnorm(100, sd = 0.1)
## fit a model
fit <- lm(y ~ x1 + x2 + x3)
termplot(fit, se = TRUE, terms = "x1")
termplot uses predict.lm(, type = "terms") for term-wise prediction. If a model has intercept (like above), predict.lm will centre each term (What does predict.glm(, type=“terms”) actually do?). In this way, each terms is predicted to be 0 at the mean of the covariate, and the standard error at the mean is 0 (hence the confidence interval intersects the line at the mean).

Related

Replace lm coefficients and calculate results of lm new in R

I am able to change the coefficients of my linear model. Then i want to compare the results of my "new" model with the new coefficients, but R is not calculating the results with the new coefficients.
As you can see in my following example the summary of my models fit and fit1 are excactly the same, though results like multiple R-squared should or fitted values should change.
set.seed(2157010) #forgot set.
x1 <- 1998:2011
x2 <- x1 + rnorm(length(x1))
y <- 3*x2 + rnorm(length(x1)) #you had x, not x1 or x2
fit <- lm( y ~ x1 + x2)
# view original coefficients
coef(fit)
# generate second function for comparing results
fit1 <- fit
# replace coefficients with new values, use whole name which is coefficients:
fit1$coefficients[2:3] <- c(5, 1)
# view new coefficents
coef(fit1)
# Comparing
summary(fit)
summary(fit1)
Thanks in advance
It might be easier to compute the multiple R^2 yourself with the substituted parameters.
mult_r2 <- function(beta, y, X) {
tot_ss <- var(y) * (length(y) - 1)
rss <- sum((y - X %*% beta)^2)
1 - rss/tot_ss
}
(or, more compactly, following the comments, you could compute p <- X %*% beta; (cor(y,beta))^2)
mult_r2(coef(fit), y = model.response(model.frame(fit)), X = model.matrix(fit))
## 0.9931179, matches summary()
Now with new coefficients:
new_coef <- coef(fit)
new_coef[2:3] <- c(5,1)
mult_r2(new_coef, y = model.response(model.frame(fit)), X = model.matrix(fit))
## [1] -343917
That last result seems pretty wild, but the substituted coefficients are very different from the true least-squares coeffs, and negative R^2 is possible when the model is bad enough ...

How to get simple slopes for particular values in emtrends?

I have a mixed model with two categorical predictors (X1, X2) and one continuous predictor (X3).
model <- lmer(z ~ x1 * x2 * x3 + (1|group), data = data)
I am interested in comparing high and low traits of my continuous predictor.
My plan is to contrast the simple slopes for X3 (at -1 SD, M, +1 SD).
As far as I understand this can be done using the emtrends() function from emmeans like so:
sd1 <- mean(data$X3, na.rm = T) + sd(data$X3, na.rm = T)
mean <- mean(data$X3, na.rm = T)
sd2 <- mean(data$X3, na.rm = T) - sd(data$X3, na.rm = T)
mylist <- list(X3 = c(sd1, mean, sd2))
emtrends(model, ~ X1 * X2 | X3,
var = "X3",
at = mylist)
However, the coefficients provided are the same for the three values of X3.
Can anyone shed light on this?
simple_slopes function in the reghelper -package could be an alternative to emmeans in this specific case. The following simulation probes simple slopes for the -1,0,1 values of x3 (that was simulated as having mean=0, sd=1), but you can of course use any values. For categorical predictors, all combinations as well as slope for x3 at the "mean" values of both categorical predictors are calculated:
library(lme4)
library(reghelper)
set.seed(42143)
#sample size
n=1000
#generate the factor variables (dummy coded)
x1<-sample(x = c(0,1),size = n,replace=T,prob=c(.90,.10))
x2<-sample(x = c(0,1),size = n,replace=T,prob=c(.90,.10))
#generate the continuous variable
x3<-rnorm(n,mean=0,sd=1)
#generate the group variable
group<-sample(letters,size=n,replace=T)
#generate the dependent variable with some main effect and three-way interaction weights
z<-0.2*x1-0.3*x2+0.4*x3+0.2*x1*x2*x3+rnorm(n)
#collect variables to a data frame
dat<-data.frame(x1,x2,x3,group,z)
#run the model
model <- lmer(z ~ x1 * x2 * x3 + (1|group), data = dat)
#explore simple slopes at specified variables values.
#"sstest" gives the slopes
simple_slopes(model,levels=list(x1=c(0,1,mean(dat$x1),"sstest"),
x2=c(0,1,mean(dat$x2),"sstest"),
x3=c(-1,0,1,"sstest")))

Constrained weighted linear regression in R

I am trying to set up a contrained weighted linear regression. That is to say, that I have a dataset of i observations and three different x values. Each observations has a weight. I want to perform a weighted multiple linear regression using the restrictions that the weighted mean of each x value has to be zero and the weighted standard deviation should be one.
Since I am new and have no reputation yet, I can‘t post images with latex formulas. So I have to write them down this way.
First restriction $\sum_{i} w_{i} X_{i,k} = 0$ for k = 1,2,3.
Second one: $\sum_{i} w_{i} X_{i,k}^2 = 1$ for k = 1,2,3.
This is an example dataset:
y <- rnorm(10)
w <- rep(0.1, 10)
x1 <- rnorm(10)
x2 <- rnorm(10)
x3 <- rnorm(10)
data <- cbind(y, x1, x2, x3, w)
lm(y ~ x1 + x2 + x3, data = data, weigths = data$w)
The weights do not have to be equal for each observation but have to add up to one.
I would like to include these restrictions into the regression. Is there a way to do that?
You could perhaps use the Generalised Linear Model:
glm(y ~ x1 + x2 + x3, weights = w, data=data)
Data needs to be a data.frame(...).

Subset of predictors using coefplot()

I'd like to do a plot of coefficients using coefplot() that only takes into account a subset of the predictors that I'm using. For example, if you have the code
y1 <- rnorm(1000,50,23)
x1 <- rnorm(1000,50,2)
x2 <- rbinom(1000,1,prob=0.63)
x3 <- rpois(1000, 2)
fit1 <- lm(y1 ~ x1 + x2 + x3)
and then ran
coefplot(fit1)
it would give you a plot displaying the coefficients of the intercept, x1, x2 and x3. How can I modify this so I only get the coefficients for say, x1 and x2?
You can use the argument predictors and it will only plot the coefficients you need:
library(coefplot)
coefplot(fit1, predictors=c('x1','x2'))
Output:

optimal predictor value for multivariate regression in R

Suppose I have 1 response variable Y and 2 predictors X1 and X2, such as the following
Y X1 X2
2.3 1.1 1.2
2.5 1.24 1.17
......
Assuming I have a strong belief the following model works well
fit <- lm(Y ~ poly(X1,2) + X2)
in other words, there is a quadratic relation between Y and X1, a linear relationship between Y and X2.
Now here are my questions:
how to find the optimal value of (x1,x2) such that the fitted model reaches the maximal value at this pair of value?
now assuming X2 has to be fixed at some particular value, how to find the optimal x1 such that the fitted value is maximized?
So here is an empirical way to do this:
# create some random data...
set.seed(1)
X1 <- 1:100
X2 <- sin(2*pi/100*(1:100))
df <- data.frame(Y=3 + 5*X1 -0.2 * X1^2 + 100*X2 + rnorm(100,0,5),X1,X2)
fit <- lm(Y ~ poly(X1,2,raw=T) + X2, data=df)
# X1 and X2 unconstrained
df$pred <- predict(fit)
result <- with(df,df[pred==max(pred),])
result
# Y X1 X2 pred
# 19 122.8838 19 0.9297765 119.2087
# max(Y|X2=0)
newdf <- data.frame(Y=df$Y, X1=df$X1, X2=0)
newdf$pred2 <- predict(fit,newdata=newdf)
result2 <- with(newdf,newdf[pred2==max(pred2),])
result2
# Y X1 X2 pred2
#12 104.6039 12 0 35.09141
So in this example, when X1 and X2 are unconstrained, the maximum value of Y = 119.2 and occurs at (X1,X2) = (122.8,0.930). When X2 is constrained to 0, the maximum value of Y = 35.1 and occurs at (X1,X2) = (104.6,0).
There are a couple of things to consider:
These are global maxima in the space of your data. In other words if your real data has a large number of variables there might be local maxima that you will not find this way.
This method has resolution only as great as your dataset. So if the true maximum occurs at a point between your data points, you will not find it this way.
This technique is restricted to the bounds of your dataset. So if the true maximum is outside those bounds, you will not find it. On the other hand, using a model outside the bounds of your data is, IMHO, the definition of reckless.
Finally, you should be aware the poly(...) produces orthogonal polynomials which will generate a fit, but the coefficients will be very difficult to interpret. If you really want a quadratic fit, e.g. a+ b × x+ c × x2, you are better off doing that explicitly with Y~X1 +I(X1^2)+X2, or using raw=T in the call to poly(...).
credit to #sashkello
Basically, I have to extract coefficients from lm object and multiply with corresponding terms to form the formula to proceed.
I think this is not very efficient. What if this is regression with hundreds of predictors?

Resources