How to display standardized y-scores in interaction plot - r

I am trying to plot a two-way interaction of standardized data in R using the package "interplot". However, the displayed y-scores are not standardized anymore. Why is that and how can I fix that?
I have tried to change the y-limits and to use the "scale_y_continuous()" function.
# generate data
x <- rnorm(100, 0, 1)
y <- x + rnorm(100, 0, 1)
z <- y + rnorm(100, 0, 1)
df <- as.data.frame(cbind(x,y,z))
# build model with interaction term
model1 <- glm(y ~ x*z, data=df)
# plot interaction
require(interplot)
interplot(model1, var1 = "x",var2 = "z", ci = 0.95, predPro = TRUE,
var2_vals = c(-1, 1), hist=F) + xlim(-3, 3) +
theme_classic()
I expect the y-scale to display values between -3 and +3, since the scores are standardized. However, the displayed y-values are between 20 and 80.

With the help of ?interplot example :
set.seed(123)
# generate data
x <- rnorm(100, 0, 1)
y <- x + rnorm(100, 0, 1)
z <- y + rnorm(100, 0, 1)
df <- as.data.frame(cbind(x,y,z))
# build model with interaction term
model1 <- glm(y ~ x*z, data=df)
# lm(y ~ x*z, data=df) # glm => is a linear model
# plot interaction
require(interplot, quietly = TRUE, warn.conflicts = FALSE)
interplot(model1, var1 = "x",var2 = "z", ci = 0.95,
predPro = TRUE, var2_vals = c(-1,1)) +
xlim(-3, 3) +
xlab("x values") +
ylab("Estimated Coefficient for z") +
ggtitle('Estimated Coefficient of z by x conditionally to y in c(-1,1)') +
theme_classic()
interplot(model1, var1 = "x",var2 = "z", ci = 0.95) +
xlim(-3, 3) +
xlab("x values") +
ylab("Estimated Coefficient for z") +
ggtitle('Estimated Coefficient of z by x') +
theme_classic()
#> Warning: Removed 28 rows containing missing values (geom_path).

Related

Ho to run stratified bootstrapped linear regression in R?

Into my model x is categorical variable with 3 categories: 0,1 & 2, where 0 is reference category. However 0 categories are larger than others (1,2), so to avoid biased sample I want to to stratified bootstrapping, but could not find any relevant method for that
df <- data.frame (x = c(0,0,0,0,0,1,1,2,2),
y = c(10,11,10,10,12,17,16,20,19),
m = c(6,5,6,7,2,10,14,8,11)
)
df$x <- as.factor(df$x)
df$x <- relevel(df$x,ref = "0")
fit <- lm(y ~ x*m, data = df)
summary(fit)
Expanding on Roland's answer in the comments, you can harvest the confidence intervals from bootstrapping using boot.ci:
library(boot)
b <- boot(df, \(DF, i) coef(lm(y ~ x*m, data = df[i,])), strata = df$x, R = 999)
result <- do.call(rbind, lapply(seq_along(b$t0), function(i) {
m <- boot.ci(b, type = 'norm', index = i)$normal
data.frame(estimate = b$t0[i], lower = m[2], upper = m[3])
}))
result
#> estimate lower upper
#> (Intercept) 12.9189189 10.7166127 15.08403731
#> x1 6.5810811 2.0162637 8.73184665
#> x2 9.7477477 6.9556841 11.37390826
#> m -0.4459459 -0.8010925 -0.07451434
#> x1:m 0.1959459 -0.1842914 0.55627896
#> x2:m 0.1126126 -0.2572955 0.48352616
And even plot the results like this:
ggplot(within(result, var <- rownames(result)), aes(estimate, var)) +
geom_vline(xintercept = 0, color = 'gray') +
geom_errorbarh(aes(xmin = lower, xmax = upper), height = 0.1) +
geom_point(color = 'red') +
theme_light()

How can I add confidence intervals to a scatterplot for a regression on two variables?

I need to create an insightful graphic with a regression line, data points, and confidence intervals. I am not looking for smoothed lines. I have tried multiple codes, but I just can't get it right.
I am looking for something like this:
Some codes I have tried:
p <- scatterplot(df.regsoft$w ~ df.regsoft$b,
data = df.regsoft,
boxplots = FALSE,
regLine = list(method=lm, col="red"),
pch = 16,
cex = 0.7,
xlab = "Fitted Values",
ylab = "Residuals",
legend = TRUE,
smooth = FALSE)
abline(coef = confint.lm(result.rs))
But this doesn't create what I want to create, however it is closest to what I intended. Notice that I took out "smooth" since this is not really what I am looking for.
How can I make this plot interactive?
If you don't mind switch to ggplot and the tidyverse, then this is simply a geom_smooth(method = "lm"):
library(tidyverse)
d <- tibble( #random stuff
x = rnorm(100, 0, 1),
y = 0.25 * x + rnorm(100, 0, 0.25)
)
m <- lm(y ~ x, data = d) #linear model
d %>%
ggplot() +
aes(x, y) + #what to plot
geom_point() +
geom_smooth(method = "lm") +
theme_bw()
without method = "lm" it draws a smoothed line.
As for the Conf. interval (Obs 95%) lines, it seems to me that's simply a quantile regression. In that case, you can use the quantreg package.
If you want to make it interactive, you can use the plotly package:
library(plotly)
p <- d %>%
ggplot() +
aes(x, y) +
geom_point() +
geom_smooth(method = "lm") +
theme_bw()
ggplotly(p)
================================================
P.S.
I am not completely sure this is what the figure you posted is showing (I guess so), but to add the quantile lines, I would just perform two quantile regressions (upper and lower) and then calculate the values of the quantile lines for your data:
library(tidyverse)
library(quantreg)
d <- tibble( #random stuff
x = rnorm(100, 0, 1),
y = 0.25 * x + rnorm(100, 0, 0.25)
)
m <- lm(y ~ x, data = d) #linear model
# 95% quantile, two tailed
rq_low <- rq(y ~ x, data = d, tau = 0.025) #lower quantile
rq_high <- rq(y ~ x, data = d, tau = 0.975) #upper quantile
d %>%
mutate(low = rq_low$coefficients[1] + x * rq_low$coefficients[2],
high = rq_high$coefficients[1] + x * rq_high$coefficients[2]) %>%
ggplot() +
geom_point(aes(x, y)) +
geom_smooth(aes(x, y), method = "lm") +
geom_line(aes(x, low), linetype = "dashed") +
geom_line(aes(x, high), linetype = "dashed") +
theme_bw()

Multiple geom_smooth at differing thresholds

I would like to make a plot that has multiple geom_smooth(method="loess") lines for differing thresholds, but I'm having some issues.
Specifically, I want a geom_smooth() line for the all points >1 standard deviation (SD) or < -1 SD (which includes -/+2SD), one for <-2SD and >2SD, and one with all the points together. However, I'm running into an issue where it is only doing the smooth for the data within each category (i.e. greater than 1 SD but less than 2 SD.
I have made some toy data here:
#test data
a <- c(rnorm(10000, mean=0, sd = 1))
b <- c(rnorm(10000, mean=0, sd = 1))
test <- as.data.frame(cbind(a,b))
test3$Thresholds <- cut(test$a, breaks = c(-Inf,-2*sd(test$a),-sd(test$a),0,sd(test$a), 2*sd(test$a), Inf),
labels = c("2_SD+", "1_SD", "0_SD","0_SD", "1_SD", "2_SD+"))
plot <- ggplot(test3, aes(x=b, y=a, color=Thresholds, alpha = 0.25, legend = F)) + geom_point() + geom_smooth(method="loess")
This creates the following plot:
Does anyone have any suggestions?
If you want smoothing done for different quantities of x and y you have to manipulate the data component...
library(ggplot2)
library(dplyr)
#test data
a <- c(rnorm(10000, mean=0, sd = 1))
b <- c(rnorm(10000, mean=0, sd = 1))
test <- as.data.frame(cbind(a,b))
test$Thresholds <- cut(test$a, breaks = c(-Inf,-2*sd(test$a),-sd(test$a),0,sd(test$a), 2*sd(test$a), Inf),
labels = c("2_SD+", "1_SD", "0_SD","0_SD", "1_SD", "2_SD+"))
ggplot(test, aes(x=b, y=a)) +
geom_point() +
# just 2
geom_smooth(data = test %>% filter(Thresholds == "2_SD+"), method="loess") +
# 1 and 2
geom_smooth(data = test %>% filter(Thresholds == "1_SD" | Thresholds == "2_SD+" ), method="loess", color = "yellow") +
#all
geom_smooth(data = test, method="loess", color = "red")
#> `geom_smooth()` using formula 'y ~ x'
#> `geom_smooth()` using formula 'y ~ x'
#> `geom_smooth()` using formula 'y ~ x'

How to calculate 95% prediction interval from nls

Borrowing the example data from this question, if I have the following data and I fit the following non linear model to it, how can I calculate the 95% prediction interval for my curve?
library(broom)
library(tidyverse)
x <- seq(0, 4, 0.1)
y1 <- (x * 2 / (0.2 + x))
y <- y1 + rnorm(length(y1), 0, 0.2)
d <- data.frame(x, y)
mymodel <- nls(y ~ v * x / (k + x),
start = list(v = 1.9, k = 0.19),
data = d)
mymodel_aug <- augment(mymodel)
ggplot(mymodel_aug, aes(x, y)) +
geom_point() +
geom_line(aes(y = .fitted), color = "red") +
theme_minimal()
As an example, I can easily calculate the prediction interval from a linear model like this:
## linear example
d2 <- d %>%
filter(x > 1)
mylinear <- lm(y ~ x, data = d2)
mypredictions <-
predict(mylinear, interval = "prediction", level = 0.95) %>%
as_tibble()
d3 <- bind_cols(d2, mypredictions)
ggplot(d3, aes(x, y)) +
geom_point() +
geom_line(aes(y = fit)) +
geom_ribbon(aes(ymin = lwr, ymax = upr), alpha = .15) +
theme_minimal()
Based on the linked question, it looks like the investr::predFit function will do what you want.
investr::predFit(mymodel,interval="prediction")
?predFit doesn't explain how the intervals are computed, but ?plotFit says:
Confidence/prediction bands for nonlinear regression (i.e.,
objects of class ‘nls’) are based on a linear approximation as
described in Bates & Watts (2007). This fun[c]tion was in[s]pired by the
‘plotfit’ function from the ‘nlstools’ package.
also known as the Delta method (e.g. see emdbook::deltavar).

How do I propagate the error of a linear regression when projecting from Y to X?

I'm trying to figure out how to propagate errors in the following case
I am calibrating a machine with a couple of standards (a, b, c) with
accepted values x. My machine measures y for these standards, with a
certain error (standard deviation of 1 in this example).
Then I measure replicates of a sample, yielding ynew. Now I want to
convert these values to the accepted measurement scale (the x-axis).
To do this, I can of course do some linear algebra and convert the slope and
intercept that I got from my standard measurements to a reversed slope and
intercept as follows
This works nicely to convert the input values, but how do I get proper estimates of the errors?
In R, I've tried the following:
library(broom) # for tidy lm
library(ggplot2) # for plotting
library(dplyr) # to allow piping
# find confidence value
cv <- function(x, level = 95) {
qt(1 - ((100 - level) / 100) / 2, df = length(x) - 1) * sd(x) / sqrt(length(x))
}
# find confidence interval
ci <- function(x, level = 95) {
xbar <- mean(x)
xci <- cv(x, level = level)
c(fit = xbar, lwr = xbar - xci, upr = xbar + xci)
}
set.seed(1337)
# create fake data
dat <- data.frame(id = rep(letters[1:3], 20),
x = rep(c(1, 7, 10), 20)) %>%
mutate(y = rnorm(n(), -20 + 1.25 * x, 1))
# generate linear model
mod <- lm(y ~ x, dat)
# tidy
mod_aug <- augment(mod)
# these are the new samples that my machine measures
ynew <- rnorm(10, max(dat$y) + 3)
# predict new x-value based on y-value that is outside of range
## predict(mod, newdata = data.frame(y = ynew), interval = "predict")
# Error in eval(predvars, data, env) : object 'x' not found
# or tidy
## augment(mod, newdata = data.frame(y = ynew))
# 50 row df that doesn't make sense
# found this function that should do the job, but it doesn't extrapolate
## approx(x = mod$fitted.values, y = dat$x, xout = ynew)$y
# [1] NA NA NA NA NA NA NA NA NA NA
# this one from Hmisc does allow for extrapolation
with_approx <- Hmisc::approxExtrap(x = mod_aug$.fitted, y = mod_aug$x, xout = ynew)$y
# but in case of lm, isn't using the slope and intercept of a model okay too?
with_itc_slp <- (- coef(mod)[1] / coef(mod)[2]) + (1 / coef(mod)[2] * ynew)
# this would be the 95% prediction interval of the model at the average
# sample position. Could also use "confidence" but this is more correct?
avg_prediction <- predict(mod,
newdata = data.frame(x = mean(with_itc_slp)),
interval = "prediction")
# plot it
ggplot(dat, aes(x = x, y = y, col = id)) +
geom_point() +
geom_hline(yintercept = ynew, col = "gray") +
geom_smooth(aes(group = 1), method = "lm", se = F, fullrange = T,
col = "lightblue") +
geom_smooth(aes(group = 1), method = "lm") +
# 95% CI of the new sample
annotate("pointrange", x = 1, y = mean(ynew),
ymin = ci(ynew)[2], ymax = ci(ynew)[3], col = "green") +
# 95% prediction interval of the linear model at the average transformed
# x-position
annotate("pointrange", x = mean(with_approx), y = mean(ynew),
ymin = avg_prediction[2], ymax = avg_prediction[3], col = "green") +
# transformed using approx
annotate("point", x = with_approx, y = ynew, size = 3, col = "blue",
shape = 1) +
# transformed using intercept and slope
annotate("point", x = with_itc_slp, y = ynew, size = 3, col = "red",
shape = 2) +
# it's pretty
coord_fixed()
resulting in this plot:
Now how do I go from these 95% CIs in the y-direction to transformed sample
x-values with a confidence interval in the x-direction?

Resources