I would like to make a plot that has multiple geom_smooth(method="loess") lines for differing thresholds, but I'm having some issues.
Specifically, I want a geom_smooth() line for the all points >1 standard deviation (SD) or < -1 SD (which includes -/+2SD), one for <-2SD and >2SD, and one with all the points together. However, I'm running into an issue where it is only doing the smooth for the data within each category (i.e. greater than 1 SD but less than 2 SD.
I have made some toy data here:
#test data
a <- c(rnorm(10000, mean=0, sd = 1))
b <- c(rnorm(10000, mean=0, sd = 1))
test <- as.data.frame(cbind(a,b))
test3$Thresholds <- cut(test$a, breaks = c(-Inf,-2*sd(test$a),-sd(test$a),0,sd(test$a), 2*sd(test$a), Inf),
labels = c("2_SD+", "1_SD", "0_SD","0_SD", "1_SD", "2_SD+"))
plot <- ggplot(test3, aes(x=b, y=a, color=Thresholds, alpha = 0.25, legend = F)) + geom_point() + geom_smooth(method="loess")
This creates the following plot:
Does anyone have any suggestions?
If you want smoothing done for different quantities of x and y you have to manipulate the data component...
library(ggplot2)
library(dplyr)
#test data
a <- c(rnorm(10000, mean=0, sd = 1))
b <- c(rnorm(10000, mean=0, sd = 1))
test <- as.data.frame(cbind(a,b))
test$Thresholds <- cut(test$a, breaks = c(-Inf,-2*sd(test$a),-sd(test$a),0,sd(test$a), 2*sd(test$a), Inf),
labels = c("2_SD+", "1_SD", "0_SD","0_SD", "1_SD", "2_SD+"))
ggplot(test, aes(x=b, y=a)) +
geom_point() +
# just 2
geom_smooth(data = test %>% filter(Thresholds == "2_SD+"), method="loess") +
# 1 and 2
geom_smooth(data = test %>% filter(Thresholds == "1_SD" | Thresholds == "2_SD+" ), method="loess", color = "yellow") +
#all
geom_smooth(data = test, method="loess", color = "red")
#> `geom_smooth()` using formula 'y ~ x'
#> `geom_smooth()` using formula 'y ~ x'
#> `geom_smooth()` using formula 'y ~ x'
Related
I have a datasetwith two variables hours studied and grade. I would like to take some 100 samples of 20 each from this data set and show 100 regression lines along with the original regression line. Any suggestions?
library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 3.6.3
grades = read.csv("https://www.dropbox.com/s/me6wiww943hzddj/grades.csv?dl=1")
qplot(hours, grade, data = grades, geom = "point") + geom_smooth(method = lm)
#> `geom_smooth()` using formula 'y ~ x'
Using a loop:
n=100
for(i in 1:n){
df = grades[sample(1:nrow(grades), 20),]
g = g + geom_smooth(method = lm, data=df, color="red", size=0.5, alpha = 0)
}
plot(g)
Output:
I encourage you to mess with the aesthetics of it, adding a dashed line for example:
We can also use sample_n
library(dplyr)
library(ggplot2)
g <- qplot(hours, grade, data = grades, geom = "point") +
geom_smooth(method = lm)
n <- 100
for(i in seq_len(n)) {
tmpdat <- grades %>%
sample_n(20)
g <- g +
geom_smooth(method = lm, data = tmpdat, color = 'red',
size = 0.5, alpha = 0)
}
plot(g)
I need to create an insightful graphic with a regression line, data points, and confidence intervals. I am not looking for smoothed lines. I have tried multiple codes, but I just can't get it right.
I am looking for something like this:
Some codes I have tried:
p <- scatterplot(df.regsoft$w ~ df.regsoft$b,
data = df.regsoft,
boxplots = FALSE,
regLine = list(method=lm, col="red"),
pch = 16,
cex = 0.7,
xlab = "Fitted Values",
ylab = "Residuals",
legend = TRUE,
smooth = FALSE)
abline(coef = confint.lm(result.rs))
But this doesn't create what I want to create, however it is closest to what I intended. Notice that I took out "smooth" since this is not really what I am looking for.
How can I make this plot interactive?
If you don't mind switch to ggplot and the tidyverse, then this is simply a geom_smooth(method = "lm"):
library(tidyverse)
d <- tibble( #random stuff
x = rnorm(100, 0, 1),
y = 0.25 * x + rnorm(100, 0, 0.25)
)
m <- lm(y ~ x, data = d) #linear model
d %>%
ggplot() +
aes(x, y) + #what to plot
geom_point() +
geom_smooth(method = "lm") +
theme_bw()
without method = "lm" it draws a smoothed line.
As for the Conf. interval (Obs 95%) lines, it seems to me that's simply a quantile regression. In that case, you can use the quantreg package.
If you want to make it interactive, you can use the plotly package:
library(plotly)
p <- d %>%
ggplot() +
aes(x, y) +
geom_point() +
geom_smooth(method = "lm") +
theme_bw()
ggplotly(p)
================================================
P.S.
I am not completely sure this is what the figure you posted is showing (I guess so), but to add the quantile lines, I would just perform two quantile regressions (upper and lower) and then calculate the values of the quantile lines for your data:
library(tidyverse)
library(quantreg)
d <- tibble( #random stuff
x = rnorm(100, 0, 1),
y = 0.25 * x + rnorm(100, 0, 0.25)
)
m <- lm(y ~ x, data = d) #linear model
# 95% quantile, two tailed
rq_low <- rq(y ~ x, data = d, tau = 0.025) #lower quantile
rq_high <- rq(y ~ x, data = d, tau = 0.975) #upper quantile
d %>%
mutate(low = rq_low$coefficients[1] + x * rq_low$coefficients[2],
high = rq_high$coefficients[1] + x * rq_high$coefficients[2]) %>%
ggplot() +
geom_point(aes(x, y)) +
geom_smooth(aes(x, y), method = "lm") +
geom_line(aes(x, low), linetype = "dashed") +
geom_line(aes(x, high), linetype = "dashed") +
theme_bw()
I would like to force a linear regression through a specific x-axis crossing point using "geom_smooth" in ggplot2:
geom_smooth(aes(x = x, y = y), method = "lm", formula = y ~ x)
Intuitively, choosing an x-axis intercept, one would use the formula y = a * (x - b) + c.
Implementing this in the "formula" code as e.g. :
geom_smooth(aes(x = x, y = y), method = "lm", formula = y ~ x - 5)
Does not work.
I am not sure it is possible to do this just using geom_smooth. However, you could predict the regression outside of your ggplot2 call, using an offset to set the intercept required and plot it subsequently.
For example:
set.seed(1)
# Generate some data
x <- 1:10
y <- 3 + 2*x + rnorm(length(x), 0, 2)
# Simple regression
z_1 <- lm(y ~ x)
# Regression with no intercept
z_2 <- lm(y ~ x + 0)
# Regression with intercept at (0,3) - the 'true' intercept
z_3 <- lm(y ~ x + 0, offset=rep(3, length(x)))
# See the coefficients
coef(z_1)
#(Intercept) x
# 2.662353 2.109464
coef(z_2)
# x
#2.4898
coef(z_3)
# x
#1.775515
# Combine into one dataframe
df <- cbind.data.frame(x,y,predict(z_1),predict(z_2), predict(z_3))
# Plot the three regression lines
library(ggplot2)
ggplot(df) + geom_point(aes(x,y)) +
geom_line(aes(x,predict(z_1)), color = "red") +
geom_line(aes(x,predict(z_2)), color = "blue") +
geom_line(aes(x,predict(z_3)), color = "green") +
scale_x_continuous(limits = c(0,10)) +
scale_y_continuous(limits = c(0,30))
You'll need to use the offset function for the x-intercept that's already locked in. That's passed via the method.args argument of geom_smooth, since not all smoothing methods can use that argument.
You'll also need to specify the orientation argument to confirm that you've got an x-intercept, rather than the y-intercept.
I also specified the number of smoothing points to plot (n) and the offset repeats to match -- not sure if that's strictly necessary.
Some gymnastics to be sure, but hopefully it helps.
library("tidyverse")
mtcars %>%
ggplot(aes(disp, hp)) +
geom_point() +
geom_smooth(method = "lm",
orientation = "y",
formula = y ~ x + 0,
color= "blue",
se = FALSE,
n = nrow(mtcars),
method.args=list(offset=rep(100, nrow(mtcars))),
fullrange = TRUE) +
scale_x_continuous(limits =c(0, 600))
#> Warning: Removed 5 rows containing missing values (geom_smooth).
Created on 2020-07-08 by the reprex package (v0.3.0)
Borrowing the example data from this question, if I have the following data and I fit the following non linear model to it, how can I calculate the 95% prediction interval for my curve?
library(broom)
library(tidyverse)
x <- seq(0, 4, 0.1)
y1 <- (x * 2 / (0.2 + x))
y <- y1 + rnorm(length(y1), 0, 0.2)
d <- data.frame(x, y)
mymodel <- nls(y ~ v * x / (k + x),
start = list(v = 1.9, k = 0.19),
data = d)
mymodel_aug <- augment(mymodel)
ggplot(mymodel_aug, aes(x, y)) +
geom_point() +
geom_line(aes(y = .fitted), color = "red") +
theme_minimal()
As an example, I can easily calculate the prediction interval from a linear model like this:
## linear example
d2 <- d %>%
filter(x > 1)
mylinear <- lm(y ~ x, data = d2)
mypredictions <-
predict(mylinear, interval = "prediction", level = 0.95) %>%
as_tibble()
d3 <- bind_cols(d2, mypredictions)
ggplot(d3, aes(x, y)) +
geom_point() +
geom_line(aes(y = fit)) +
geom_ribbon(aes(ymin = lwr, ymax = upr), alpha = .15) +
theme_minimal()
Based on the linked question, it looks like the investr::predFit function will do what you want.
investr::predFit(mymodel,interval="prediction")
?predFit doesn't explain how the intervals are computed, but ?plotFit says:
Confidence/prediction bands for nonlinear regression (i.e.,
objects of class ‘nls’) are based on a linear approximation as
described in Bates & Watts (2007). This fun[c]tion was in[s]pired by the
‘plotfit’ function from the ‘nlstools’ package.
also known as the Delta method (e.g. see emdbook::deltavar).
I'm new with R and I have fit 3 models for my data as follows:
Model 1: y = a(x) + b
lm1 = lm(data$CBI ~ data$dNDVI)
Model 2: y = a(x)2 + b(x) + c
lm2 <- lm(CBI ~ dNDVI + I(dNDVI^2), data=data)
Model 3: y = x(a|x| + b)–1
lm3 = nls(CBI ~ dNDVI*(a*abs(dNDVI) + b) - 1, start = c(a = 1.5, b = 2.7), data = data)
Now I would like to plot all these three models in R but I could not find the way to do it, can you please help me? I have tried with the first two models as follow and it work but I don't know how to add the Model 3 on it:
ggplot(data = data, aes(x = dNDVI, y = CBI)) +
geom_point() +
geom_smooth(method = lm, formula = y ~ x, size = 1, se = FALSE) +
geom_smooth(method = lm, formula = y ~ x + I(x^2), size = 1, se = FALSE ) +
theme_bw()
I also would like to add a legend which show 3 different colours or types of lines/curves for the 3 models as well. Can you please guide me how to make it in the figure?
Using iris as a dummy set to represent the three models:
new.dat <- data.frame(Sepal.Length=seq(min(iris$Sepal.Length),
max(iris$Sepal.Length), length.out=50)) #new data.frame to predict the fitted values for each model
m1 <- lm(Petal.Length ~ Sepal.Length, iris)
m2 <- lm(Petal.Length ~ Sepal.Length + I(Sepal.Length^2), data=iris)
m3 <- nls(Petal.Length ~ Sepal.Length*(a*abs(Sepal.Length) + b) - 1,
start = c(a = 1.5, b = 2.7), data = iris)
new.dat$m1.fitted <- predict(m1, new.dat)
new.dat$m2.fitted <- predict(m2, new.dat)
new.dat$m3.fitted <- predict(m3, new.dat)
new.dat <- new.dat %>% gather(var, val, m1.fitted:m3.fitted) #stacked format of fitted data of three models (to automatically generate the legend in ggplot)
ggplot(new.dat, aes(Sepal.Length, val, colour=var)) +
geom_line()