How do I create a regression line with various variables in R - r

I have already created the actual regression code but I am trying to get the regression line and a predicted line onto a plot but I can't seem to figure it out.
m1 <- lm(variable1 ~ 2 + 3 + 4 + 5 + 6 + 7 + 8, data = prog)
summary(m1)
and then I want to create the plot on the basis of hyp.data but I am still a bit lost.

Consider two (not 7!) predictor variables; one is numeric, the other categorical (i.e. a factor).
# Simulate data
set.seed(2017);
x1 <- 1:10;
x2 <- as.factor(sample(c("treated", "not_treated"), 10, replace = TRUE));
df <- cbind.data.frame(
y = 2 * x1 + as.numeric(x2) - 1 + rnorm(10),
x1 = x1,
x2 = x2);
In that case you can do the following:
# Fit the linear model
m1 <- lm(y ~ x1 + x2, data = df);
# Get predictions
df$pred <- predict(m1);
# Plot data
library(ggplot2);
ggplot(df, aes(x = x1, y = y)) +
geom_point() +
facet_wrap(~ x2, scales = "free") +
geom_line(aes(x = x1, y = pred), col = "red");

Related

Plotting multiple lm() models in one plot

I have fitted 6 lm() models and 1 gam() model on the same dataset.
Now I want to plot them all in one plot on top of each other. Can I do this without defining the models again in ggplot?
My case is this
I have
model1 <- lm(y~1, data = data) %>% coef()
model2 <- lm(y~x, data = data) %>% coef()
model3 <- lm(y~abs(x), data = data) %>% coef()
...
model7 <- gam(y~s(x), data = data) %>% coef()
can I feed the stored coefficients of my models to ggplot?
ggplot(data, mapping = aes(x = x, y = y)) +
geom_point() +
geom_abline(model1) +
geom_abline(model2) +
....
Or do Is the only way to plot the model prediction lines to manualy fill out the parameters like this:
ggplot(data, mapping = aes(x = x, y = y)) +
geom_point() +
geom_abline(intercept = model1[1]) +
geom_abline(slope = model2[2], intercept = model2[1]) +
geom_abline(slope = model3[2], intercept = model3[1]) +
...
Example code
set.seed(123)
x <- rnorm(50)
y <- rweibull(50,1)
d <- as.data.frame(cbind(x,y))
model1 <- coef(lm(y~1, data = d))
model2 <- coef(lm(y~x, data = d))
model3 <- coef(lm(y~abs(x), data = d))
Including the SE for each line/model and a legend would be welcome as well.
In order for this to work, you really need to save the whole model. So if we assume you have the entire model
# set.seed(101) used for sample data
model1 <- lm(y~1, data = d)
model2 <- lm(y~x, data = d)
model3 <- lm(y~abs(x), data = d)
We can write a helper function to predict new values from these models over a the given range of x values. Here's such a function
newvalsforx <- function(x) {
xrng <- seq(min(x), max(x), length.out=100)
function(m) data.frame(x=xrng, y=predict(m, data.frame(x=xrng)))
}
pred <- newvals(d$x)
This pred() will make predictions from the models over the observed range of x. We can then use these as new data to pass to geom_lines that we can add to a plot. For example
ggplot(d, aes(x,y)) +
geom_point() +
geom_line(data=pred(model1), color="red") +
geom_line(data=pred(model2), color="blue") +
geom_line(data=pred(model3), color="green")
This gives me

How to plot 3 models in one Figure in R?

I'm new with R and I have fit 3 models for my data as follows:
Model 1: y = a(x) + b
lm1 = lm(data$CBI ~ data$dNDVI)
Model 2: y = a(x)2 + b(x) + c
lm2 <- lm(CBI ~ dNDVI + I(dNDVI^2), data=data)
Model 3: y = x(a|x| + b)–1
lm3 = nls(CBI ~ dNDVI*(a*abs(dNDVI) + b) - 1, start = c(a = 1.5, b = 2.7), data = data)
Now I would like to plot all these three models in R but I could not find the way to do it, can you please help me? I have tried with the first two models as follow and it work but I don't know how to add the Model 3 on it:
ggplot(data = data, aes(x = dNDVI, y = CBI)) +
geom_point() +
geom_smooth(method = lm, formula = y ~ x, size = 1, se = FALSE) +
geom_smooth(method = lm, formula = y ~ x + I(x^2), size = 1, se = FALSE ) +
theme_bw()
I also would like to add a legend which show 3 different colours or types of lines/curves for the 3 models as well. Can you please guide me how to make it in the figure?
Using iris as a dummy set to represent the three models:
new.dat <- data.frame(Sepal.Length=seq(min(iris$Sepal.Length),
max(iris$Sepal.Length), length.out=50)) #new data.frame to predict the fitted values for each model
m1 <- lm(Petal.Length ~ Sepal.Length, iris)
m2 <- lm(Petal.Length ~ Sepal.Length + I(Sepal.Length^2), data=iris)
m3 <- nls(Petal.Length ~ Sepal.Length*(a*abs(Sepal.Length) + b) - 1,
start = c(a = 1.5, b = 2.7), data = iris)
new.dat$m1.fitted <- predict(m1, new.dat)
new.dat$m2.fitted <- predict(m2, new.dat)
new.dat$m3.fitted <- predict(m3, new.dat)
new.dat <- new.dat %>% gather(var, val, m1.fitted:m3.fitted) #stacked format of fitted data of three models (to automatically generate the legend in ggplot)
ggplot(new.dat, aes(Sepal.Length, val, colour=var)) +
geom_line()

Stack coefficient plots in R

I'm running a set of models with the same independent variables but different dependent variables and would like to create a set of coefficient plots in one figures in which each model gets its own panel. The following code provides intuition but in this all of the models are integrated into one figure rather than have 3 unique panels side-by-side in one figure:
require("coefplot")
set.seed(123)
dat <- data.frame(x = rnorm(100), z = rnorm(100), y1 = rnorm(100), y2 = rnorm(100), y3 = rnorm(100))
mod1 <- lm(y1 ~ x + z, data = dat)
mod2 <- lm(y2 ~ x + z, data = dat)
mod3 <- lm(y3 ~ x + z, data = dat)
multiplot(mod1,mod2, mod3)
Which generates this plot:
Any thoughts on how to get them to panel next to each other in one figure? Thanks!
I haven't used the coefplot package before, but you can create a coefficient plot directly in ggplot2.
set.seed(123)
dat <- data.frame(x = rnorm(100), z = rnorm(100), y1 = rnorm(100), y2 = rnorm(100), y3 = rnorm(100))
mod1 <- lm(y1 ~ x + z, data = dat)
mod2 <- lm(y2 ~ x + z, data = dat)
mod3 <- lm(y3 ~ x + z, data = dat)
## Create data frame of model coefficients and standard errors
# Function to extract what we need
ce = function(model.obj) {
extract = summary(get(model.obj))$coefficients[ ,1:2]
return(data.frame(extract, vars=row.names(extract), model=model.obj))
}
# Run function on the three models and bind into single data frame
coefs = do.call(rbind, sapply(paste0("mod",1:3), ce, simplify=FALSE))
names(coefs)[2] = "se"
# Faceted coefficient plot
ggplot(coefs, aes(vars, Estimate)) +
geom_hline(yintercept=0, lty=2, lwd=1, colour="grey50") +
geom_errorbar(aes(ymin=Estimate - se, ymax=Estimate + se, colour=vars),
lwd=1, width=0) +
geom_point(size=3, aes(colour=vars)) +
facet_grid(. ~ model) +
coord_flip() +
guides(colour=FALSE) +
labs(x="Coefficient", y="Value") +
theme_grey(base_size=15)

Different behaviour lm in stat_smooth

In this question someone asked if it is possible change the colour in a ggplot2 plot depending on a linear regression line.
The proposed solution worked, the points have a different colour above and below the plot.
library(ggplot2)
set.seed(2015)
df <- data.frame(x = rnorm(100),
y = rnorm(100))
# Fit linear regression
l = lm(y ~ x, data = df)
# Make new group variable based on residuals
df$group = NA
df$group[which(l$residuals >= 0)] = "above"
df$group[which(l$residuals < 0)] = "below"
# Make the plot
ggplot(df, aes(x,y)) +
geom_point(aes(colour = group)) +
geom_smooth(method = "lm", formula = y ~ x)
But I would like to do regression for y-1. As asked in this question.
# Fit linear regression
l = lm(y - 1 ~ x, data = df)
# Make new group variable based on residuals
df$group = NA
df$group[which(l$residuals >= 0)] = "above"
df$group[which(l$residuals < 0)] = "below"
# Make the plot
ggplot(df, aes(x,y)) +
geom_point(aes(colour = group)) +
geom_smooth(method = "lm", formula = y - 1 ~ x)
This is not what I expected. It looks to me that stat_smooth did what expected. The lm however gives the same result for y ~ x and y - 1 ~ x
What am I missing here?
If you want to color points based on where they lie according to the line, you can try comparing the actual value to the predicted value rather than using the residual
df$group = NA
df$group[df$y>predict(l)] = "above"
df$group[df$y<predict(l)] = "below"

Constraining slope in stat_smooth with ggplot (plotting ANCOVA)

Using ggplot(), I am trying to plot the results of an ANCOVA in which slopes of the two linear components are equal: i.e., lm(y ~ x + A). The default behavior for geom_smooth(method = "lm") is to plot separate slopes and intercepts for each level of each factor. For example, with two levels of A
library(ggplot2)
set.seed(1234)
n <- 20
x1 <- rnorm(n); x2 <- rnorm(n)
y1 <- 2 * x1 + rnorm(n)
y2 <- 3 * x2 + (2 + rnorm(n))
A <- as.factor(rep(c(1, 2), each = n))
df <- data.frame(x = c(x1, x2), y = c(y1, y2), A = A)
p <- ggplot(df, aes(x = x, y = y, color = A))
p + geom_point() + geom_smooth(method = "lm")
I can fit the ANCOVA separately with lm() and then use geom_abline() to manually add the lines. This approach has a couple of drawbacks like having the lines extend beyond the range of the data and manually specify the colors.
fm <- lm(y ~ x + A, data = df)
summary(fm)
a1 <- coef(fm)[1]
b <- coef(fm)[2]
a2 <- a1 + coef(fm)[3]
p + geom_point() +
geom_abline(intercept = a1, slope = b) +
geom_abline(intercept = a2, slope = b)
I know ancova() in the HH package automates the plotting, but I don't really care for lattice graphics. So I am looking for a ggplot()-centric solution.
library(HH)
ancova(y ~ x + A, data = df)
Is there a method to accomplish this using ggplot()? For this example, A has two levels, but I have situations with 3, 4, or more levels. The formula argument to geom_smooth() doesn't seem to have the answer (as far as I can tell).
For completeness, this works:
library(ggplot2)
set.seed(1234)
n <- 20
x1 <- rnorm(n); x2 <- rnorm(n)
y1 <- 2 * x1 + rnorm(n)
y2 <- 3 * x2 + (2 + rnorm(n))
A <- as.factor(rep(c(1, 2), each = n))
df <- data.frame(x = c(x1, x2), y = c(y1, y2), A = A)
fm <- lm(y ~ x + A, data = df)
p <- ggplot(data = cbind(df, pred = predict(fm)),
aes(x = x, y = y, color = A))
p + geom_point() + geom_line(aes(y = pred))

Resources