Plotting lm curve does not show using ggplot - r

As per ex18q1 in "R for Data Science" I am trying to find the best model for the data:
sim1a <- tibble(
x = rep(1:10, each = 3),
y = x * 1.5 + 6 + rt(length(x), df = 2)
)
I've applied linear model and am trying to plot the results on a graph using ggplot:
sim1a_mod <- lm(x ~ y, data = sim1a)
ggplot(sim1a, aes(x, y)) +
geom_point(size = 2, colour= "gray") +
geom_abline(intercept = coef(sim1a_mod)[[1]], slope = coef(sim1a_mod)[[2]], colour = "red")
coef(sim1a_mod)[[1]] prints -1.14403
coef(sim1a_mod)[[2]] prints 0.4384473
I create the plot with the data points, but the model is not showing. What am I doing wrong?

The nomenclature for typing formulas for model functions like lm(), glm(), lmer() etc. in R is always DV ~ IV1 + IV2 + ... + IVn where DV is your dependent variable and IVn is your list of independent variables. We typically chart the dependent variable on the y-axis and the independent variable on the x-axis, so in your case you'll need to change your sim1a_mod model to lm(y ~ x, data = sim1a).
In your original code, because you were running a different model, your line was being charted, but it was outside of your view. If you attempt to chart again with your original model with the following code you will then see your regression line:
ggplot(sim1a, aes(x, y)) +
geom_point(size = 2, colour= "gray") +
geom_abline(intercept = coef(sim1a_mod)[[1]], slope = coef(sim1a_mod)[[2]], colour = "red") +
scale_x_continuous(limits = c(-30, 30)) + scale_y_continuous(limits = c(-30, 30))

Related

How can a graph of a polynomial regression with a categorical variable be plotted?

In the R statistical package, is there a way to plot a graph of a second order polynomial regression with one continuous variable and one categorical variable?
To generate a linear regression graph with one categorical variable:
library(ggplot2)
library(ggthemes) ## theme_few()
set.seed(1)
df <- data.frame(minutes = runif(60, 5, 15), endtime=60, category="a")
df$category = df$category=letters[seq( from = 1, to = 2 )]
df$endtime = df$endtime + df$minutes^3/180 + df$minutes*runif(60, 1, 2)
ggplot(df, aes(y=endtime, x=minutes, col = category)) +
geom_point() +
geom_smooth(method=lm) +
theme_few()
To plot a polynomial graph with one one continuous variable:
ggplot(df, aes(x=minutes, y=endtime)) +
geom_point() +
stat_smooth(method='lm', formula = y ~ poly(x,2), size = 1) +
xlab('Minutes of warm up') +
ylab('End time')
But I can’t figure out how to plot a polynomial graph with one continuous variable and one categorical variable.
Just add a colour or group mapping. This will make ggplot fit and display separate polynomial regressions for each category. (1) It's not possible to display an additive mixed-polynomial regression (i.e. lm(y ~ poly(x,2) + category)); (2) what's shown here is not quite equivalent to the results of the interaction model lm(y ~ poly(x,2)*col), because the residual variances (and hence the widths of the confidence ribbons) are estimated separately for each group.
ggplot(df, aes(x=minutes, y=endtime, col = category)) +
geom_point() +
stat_smooth(method='lm', formula = y ~ poly(x,2)) +
labs(x = 'Minutes of warm up', y = 'End time') +
theme_few()

'How to plot a polynomial line and equation using ggplot and then combine them into one plot'

I am trying to compare and contrast on one plot the difference between four relationships with a similar x-axis.
I can seem to plot the regression line but have no idea how to plot the equation and/or combine all four plots onto one.
Here is the basic foundation of my code: Sorry if it is pretty basic or clumsy, I am just beginning.
library(ggplot2)
library(cowplot)
p1 <- ggplot(NganokeData, aes(x=Depth,y=LCU1)) + geom_point() +
labs(x ='Depths (cm)', y ='Density (Hu)', title = 'Density Regression of Lake Nganoke Core 1') +
ylim(1,2)
p2 <- ggplot(NganokeData, aes(x=Depth,y=LCU2)) + geom_point() +
labs(x ='Depths (cm)', y ='Density (Hu)', title = 'Density Regression of Lake Nganoke Core 2') +
ylim(1,2)
p3 <- ggplot(NganokeData, aes(x=Depth,y=LCU3)) + geom_point() +
labs(x ='Depths (cm)', y ='Density (Hu)', title = 'Density Regression of Lake Nganoke Core 3') +
ylim(1,2)
p4 <- ggplot(NganokeData, aes(x=Depth,y=LCU4)) + geom_point() +
labs(x ='Depths (cm)', y ='Density (Hu)', title = 'Density Regression of Lake Nganoke Core 4') +
ylim(1,2)
p3 + stat_smooth(method = "lm", formula = y ~ poly(x, 3), size = 1) #Adds polynomial regression
Picture of my code
It looks like you have a variable of interest (LCU1, LCU2, LCU3, LCU4) in the column names. You can use gather from the tidyr package to reshape the data frame:
library(tidyr)
long_data <- gather(NganokeData, key = "core", value = "density",
LCU1, LCU2, LCU3, LCU4)
And then use facet_grid from the ggplot2 package to divide your plot into the four facets you are looking for.
p <- ggplot(long_data, aes(x=Depth,y=density)) +
geom_point() +
labs(x ='Depths (cm)', y ='Density (Hu)',
title = 'Density Regression of Lake Nganoke Cores') +
ylim(1,2) +
facet_grid(rows = vars(core)) #can also use cols instead
p + stat_smooth(method = "lm", formula = y ~ poly(x, 3), size = 1)
Your code is great. But as a beginner, I highly recommend taking a few minutes to read and learn to use the tidyr package, as ggplot2 is built on the concepts of tidy data, and you'll find it much easier to make your visualizations if you can manipulate the data frame into the format you need before trying to plot it.
https://tidyr.tidyverse.org/index.html
EDIT:
To add an annotation detailing the regression equation, I found code taken from this blog post by Jodie Burchell:
http://t-redactyl.io/blog/2016/05/creating-plots-in-r-using-ggplot2-part-11-linear-regression-plots.html
First, though, it's not going to be possible to glean a displayable regression equation using the poly function in your formula as you have it. The advantage of orthogonal polynomials is that they avoid collinearity, but the drawback is that you no longer have an easily interpretable regression equation with x and x squared and x cubed as regressors.
So we'll have to change the lm fit formula to
y ~ poly(x, 3, raw = TRUE)
which will fit the raw polynomials and give you the regression equation you are looking for.
You are going to have to alter the x and y position values to determine where on the graph to place the annotation, since I don't have your data, but here's the custom function you need:
equation = function(x) {
lm_coef <- list(a = round(coef(x)[1], digits = 2),
b = round(coef(x)[2], digits = 2),
c = round(coef(x)[3], digits = 2),
d = round(coef(x)[4], digits = 2),
r2 = round(summary(x)$r.squared, digits = 2));
lm_eq <- substitute(italic(y) == a + b %.% italic(x) + c %.% italic(x)^2 + d %.% italic(x)^3*","~~italic(R)^2~"="~r2,lm_coef)
as.character(as.expression(lm_eq));
}
Then just add the annotation to your plot, adjust the x and y parameters as needed, and you'll be set:
p +
stat_smooth(method = "lm", formula = y ~ poly(x, 3, raw = TRUE), size = 1) +
annotate("text", x = 1, y = 10, label = equation(fit), parse = TRUE)

how to plot predicted values on lm line for a null model using ggplot in r

Trying to reproduce below base code using ggplot which is yielding
incorrect result
base code
model1 <- lm(wgt ~ 1, data = bdims)
model1_null <- augment(model1)
plot(bdims$hgt, bdims$wgt)
abline(model1, lwd = 2, col = "blue")
pre_null <- predict(model1)
segments(bdims$hgt, bdims$wgt, bdims$hgt, pre_null, col = "red")
ggplot code
bdims %>%
ggplot(aes(hgt, wgt)) +
geom_point() +
geom_smooth(method = "lm", formula = bdims$hgt ~ 1) +
segments(bdims$hgt, bdims$wgt, bdims$hgt, pre_null, col = "red")
Here's an example using the built-in mtcars data:
ggplot(mtcars, aes(wt, mpg)) +
geom_point() +
geom_smooth(method = "lm", formula = y ~ 1) +
geom_segment(aes(xend = wt, yend = mean(mpg)), col = "firebrick2")
The formula references the aesthetic dimensions, not the variable names. And you need to use geom_segment not the base graphics segments. In a more complicated case you would pre-compute the model's predicted values for the segments, but for a null model it's easy enough to just use mean inline.

How to add linear model results (adj-r squared, slope and p-value) onto regression plot in r

Hi I have created a linear model and a regression plot - However, I would like to have the model results on the plot itself - something like the image below:
How do I show the key results on the plot? Below is my code for the plot:
library(ggplot2)
ggplot(HP_crime15, aes (x = as.numeric(HP_crime15$Theft15), y =
as.numeric(HP_crime15$X2015))) + geom_point(shape=1) +
geom_smooth(method=lm) + xlab ("Recorded number of Thefts") +
ylab("House prices (£)") + ggtitle("Title")
Ideally good questions are those that pose the problem by providing a reproducible example. Anyway, I have approached this problem in two steps;
Step 1: Determine the linear regression model;
fit1 <- lm(Sepal.Length ~ Petal.Width, data = iris)
Step 2: Plot the model;
library (ggplot2)
ggplot(fit1$model, aes_string(x = names(fit1$model)[2], y = names(fit1$model)[1])) +
geom_point() +
stat_smooth(method = "lm", col = "red") +
labs(title = paste("Adj R2 = ",signif(summary(fit1)$adj.r.squared, 5),
"Intercept =",signif(fit1$coef[[1]],5 ),
" Slope =",signif(fit1$coef[[2]], 5),
" P =",signif(summary(fit1)$coef[2,4], 5)))
Here is another option: instead of adding the statistics to the title, you could add a label to the plot:
library (ggplot2)
fit1 <- lm(Sepal.Length ~ Petal.Width, data = iris)
ggplot(fit1$model, aes_string(x = names(fit1$model)[2], y = names(fit1$model)[1])) +
geom_point() +
stat_smooth(method = "lm", col = "red") +
geom_label(aes(x = 0, y = 7.5), hjust = 0,
label = paste("Adj R2 = ",signif(summary(fit1)$adj.r.squared, 5),
"\nIntercept =",signif(fit1$coef[[1]],5 ),
" \nSlope =",signif(fit1$coef[[2]], 5),
" \nP =",signif(summary(fit1)$coef[2,4], 5)))

loess and glm plotting with ggplot2

I am trying to plot the model predictions from a binary choice glm against the empirical probability using data from the titanic. To show differences across class and sex I am using faceting, but I have two things things I can't quite figure out. The first is that I'd like to restrict the loess curve to be between 0 and 1, but if I add the option ylim(c(0,1)) to the end of the plot, the ribbon around the loess curve gets cut off if one side of it is outside the bound. The second thing I'd like to do is draw a line from the minimum x-value (predicted probability from the glm) for each facet, to the maximum x-value (within the same facet) and y = 1 so as to show glm predicted probability.
#info on this data http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic3info.txt
load(url('http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic3.sav'))
titanic <- titanic3[ ,-c(3,8:14)]; rm(titanic3)
titanic <- na.omit(titanic) #probably missing completely at random
titanic$age <- as.numeric(titanic$age)
titanic$sibsp <- as.integer(titanic$sibsp)
titanic$survived <- as.integer(titanic$survived)
training.df <- titanic[sample(nrow(titanic), nrow(titanic) / 2), ]
validation.df <- titanic[!(row.names(titanic) %in% row.names(training.df)), ]
glm.fit <- glm(survived ~ sex + sibsp + age + I(age^2) + factor(pclass) + sibsp:sex,
family = binomial(link = "probit"), data = training.df)
glm.predict <- predict(glm.fit, newdata = validation.df, se.fit = TRUE, type = "response")
plot.data <- data.frame(mean = glm.predict$fit, response = validation.df$survived,
class = validation.df$pclass, sex = validation.df$sex)
require(ggplot2)
ggplot(data = plot.data, aes(x = as.numeric(mean), y = as.integer(response))) + geom_point() +
stat_smooth(method = "loess", formula = y ~ x) +
facet_wrap( ~ class + sex, scale = "free") + ylim(c(0,1)) +
xlab("Predicted Probability of Survival") + ylab("Empirical Survival Rate")
The answer to your first question is to use coord_cartesian(ylim=c(0,1)) instead of ylim(0,1); this is a moderately FAQ.
For your second question, there may be a way to do it within ggplot but it was easier for me to summarize the data externally:
g0 <- ggplot(data = plot.data, aes(x = mean, y = response)) + geom_point() +
stat_smooth(method = "loess") +
facet_wrap( ~ class + sex, scale = "free") +
coord_cartesian(ylim=c(0,1))+
labs(x="Predicted Probability of Survival",
y="Empirical Survival Rate")
(I shortened your code slightly by eliminating some default values and using labs.)
ss <- ddply(plot.data,c("class","sex"),summarise,minx=min(mean),maxx=max(mean))
g0 + geom_segment(data=ss,aes(x=minx,y=minx,xend=maxx,yend=maxx),
colour="red",alpha=0.5)

Resources