Plot standard error in base r scatterplot [duplicate] - r

This question already has answers here:
How can I plot data with confidence intervals?
(4 answers)
Closed last month.
I am well aware of how to plot the standard error of a regression using ggplot. As an example with the iris dataset, this can be easily done with this code:
library(tidyverse)
iris %>%
ggplot(aes(x=Sepal.Width,
y=Sepal.Length))+
geom_point()+
geom_smooth(method = "lm",
se=T)
I also know that a regression using base R scatterplots can be achieved with this code:
#### Scatterplot ####
plot(iris$Sepal.Width,
iris$Sepal.Length)
#### Fit Regression ####
fit <- lm(iris$Sepal.Length ~ iris$Sepal.Width)
#### Fit Line to Plot ####
abline(fit, col="red")
However, I've tried looking up how to plot standard error in base R scatterplots, but all I have found both on SO and Google is how to do this with error bars. However, I would like to shade the error in a similar way as ggplot does above. How can one accomplish this?
Edit
To manually obtain the standard error of this regression, I believe you would calculate it like so:
#### Derive Standard Error ####
fit <- lm(Sepal.Length ~ Sepal.Width,
iris)
n <- length(iris)
df <- n-2 # degrees of freedom
y.hat <- fitted(fit)
res <- resid(fit)
sq.res <- res^2
ss.res <- sum(sq.res)
se <- sqrt(ss.res/df)
So if this may allow one to fit it into a base R plot, I'm all ears.

Here's a slightly fiddly approach using broom::augment to generate a dataset with predictions and standard errors. You could also do it in base R with predict if you don't want to use broom but that's a couple of extra lines.
Note: I was puzzled as to why the interval in my graph are narrower than your ggplot interval in the question. But a look at the geom_smooth documentation suggests that the se=TRUE option adds a 95% confidence interval rather than +-1 se as you might expect. So its probably better to generate your own intervals rather than letting the graphics package do it!
#### Fit Regression (note use of `data` argument) ####
fit <- lm(data=iris, Sepal.Length ~ Sepal.Width)
#### Generate predictions and se ####
dat <- broom::augment(fit, se_fit = TRUE)
#### Alternative using `predict` instead of broom ####
dat <- cbind(iris,
.fitted=predict(fit, newdata = iris),
.se.fit=predict(fit, newdata = iris, se.fit = TRUE)$se.fit)
#### Now sort the dataset in the x-axis order
dat <- dat[order(dat$`Sepal.Width`),]
#### Plot with predictions and standard errors
with(dat, {
plot(Sepal.Width,Sepal.Length)
polygon(c(Sepal.Width, rev(Sepal.Width)), c(.fitted+.se.fit, rev(.fitted-.se.fit)), border = NA, col = hsv(1,1,1,0.2))
lines(Sepal.Width,.fitted, lwd=2)
lines(Sepal.Width,.fitted+.se.fit, col="red")
lines(Sepal.Width,.fitted-.se.fit, col="red")
})

Related

metafor provides 95%CI that are different from the original values

I am using metafor package for combining beta coefficients from the linear regression model. I used the following code. I supplied the reported se and beta values for the rma function. But, when I see the forest plot, the 95% confidence intervals are different from the ones reported in the studies. I also tried it using mtcars data set by running three models and combining the coefficients. Still, the 95%CI we see on the forest plot are different from the original models. The deviations are far from rounding errors. A reproducible example is below.
library(metafor)
library(dplyr)
lm1 <- lm(hp~mpg, data=mtcars[1:15,])
lm2 <- lm(hp~mpg, data=mtcars[1:32,])
lm3 <- lm(hp~mpg, data=mtcars[13:32,])
study <- c("study1", "study2", "study3")
beta_coef <- c(lm1$coefficients[2],
lm2$coefficients[2],
lm3$coefficients[2]) %>% as.numeric()
se <- c(1.856, 1.31,1.458)
ci_lower <- c(confint(lm1)[2,1],
confint(lm2)[2,1],
confint(lm3)[2,1]) %>% as.numeric()
ci_upper <- c(confint(lm1)[2,2],
confint(lm2)[2,2],
confint(lm3)[2,2]) %>% as.numeric()
df <- cbind(study=study,
beta_coef=beta_coef,
se=se,
ci_lower=ci_lower,
ci_upper=ci_upper) %>% as.data.frame()
pooled <- rma(yi=beta_coef, vi=se, slab=study)
forest(pooled)
Compare the confidence intervals on the forest plot with the one on the data frame.
data frame
df <- cbind(study=study,
beta_coef=beta_coef,
se=se,
ci_lower=ci_lower,
ci_upper=ci_upper) %>% as.data.frame()
Argument vi is for specifying the sampling variances, but you are passing the standard errors to the argument. So you should do:
pooled <- rma(yi=beta_coef, sei=se, slab=study)
But you will still find a discrepancy here, since the CIs in the forest plot are constructed based on a normal distribution, while the CIs you obtained from the regression model are based on t-distributions. If you want the exact same CIs in the forest plot, you could just pass the CI bounds to the function like this:
forest(beta_coef, ci.lb=ci_lower, ci.ub=ci_upper)
If you want to add a summary polygon from some meta-analysis to the forest plot, you can do this with addpoly(). So the complete code for this example would be:
forest(beta_coef, ci.lb=ci_lower, ci.ub=ci_upper, ylim=c(-1.5,6))
addpoly(pooled, row=-1)
abline(h=0)

Logistic Regression in R using a loop to save code

Using R to do some logistic regression using the Boston crime dataset.
This code works just fine:
#################################
library(MASS)
head(Boston)
?Boston
plot(Boston$zn, Boston$crim) #gives scatter plot
lm(formula=Boston$crim~Boston$zn, data=Boston) #gives slope and intercept of best fit line
lm.Boston <-lm(formula=Boston$crim~Boston$zn, data=Boston) #saves information as lm.Boston
abline(lm.Boston) #plots best fit line Adds on to existing plot
abline(v=mean(Boston$zn),col='red') #plots mean for crim
abline(h=mean(Boston$crim),col='red') #plots mean for zn
summary(Boston$zn)
###############################
But I have to replace the $zn with 13 other variable values and I am trying to do it in a loop to save having to repeat the code block 13 times!
Tying this, but get an error
for (i in 2:ncol(Boston)){
clname <- colnames(Boston)[i]
predictor <- paste('Boston$',clname,sep="")
print(predictor)
plot(eval(predictor), Boston$crim) #gives scatter plot
# lm(formula=Boston$crim~predictor, data=Boston) #gives slope and intercept of best fit line
# lm.Boston <-lm(formula=Boston$crim~predictor, data=Boston) #saves information as lm.Boston
# abline(lm.Boston) #plots best fit line Adds on to existing plot
# abline(v=mean(predictor),col='red') #plots mean for crim
# abline(h=mean(Boston$crim),col='red') #plots mean for clname
}
The predictor variable seems to be correct when I print it out, but the first plot statement gives an error (commented out the rest of the code to try and fix this error.
Here is the error I get:
[1] "Boston$zn" Error in xy.coords(x, y, xlabel, ylabel, log) : 'x'
and 'y' lengths differ
You can store the column names in a separate list and then iterate over it. Or you can directly use it in a for loop. I have stored it here in a separate list.
After that, you need to add the proper labels with paste0:
library(MASS)
columns_boston <- colnames(Boston)[2:ncol(Boston)]
for (i in columns_boston){
predictor <- Boston[,i]
print(predictor)
plot(predictor, Boston$crim, xlab=paste0(i), ylab=paste0('crim')) #gives scatter plot
lm(formula=Boston$crim~predictor, data=Boston) #gives slope and intercept of best fit line
lm.Boston <-lm(formula=Boston$crim~predictor, data=Boston) #saves information as lm.Boston
abline(lm.Boston) #plots best fit line Adds on to existing plot
abline(v=mean(predictor),col='red') #plots mean for crim
abline(h=mean(Boston$crim),col='red') #plots mean for clname
}
Sample output for the last column:
You can remove the ylab if you want.
What you want to do (13 different charts) you can do like this
library(tidyverse)
library(MASS)
plotVar = function(data, name) data %>% ggplot(aes(crim, val))+
geom_point()+
stat_smooth(formula =y~x, method="glm")+
ylab(name)
Boston %>% pivot_longer(
-crim, names_to = "var", values_to = "val"
) %>% group_by(var) %>%
nest() %>%
group_map(~plotVar(.x$data[[1]], .y))
First plot
Last plot
This, however, is not a logistic regression!
You have to specify exactly what you want to achieve.

Plotting the predictions of a mixed model as a line in R

I'm trying to plot the predictions (predict()) of my mixed model below such that I can obtain my conceptually desired plot as a line below.
I have tried to plot my model's predictions, but I don't achieve my desired plot. Is there a better way to define predict() so I can achieve my desired plot?
library(lme4)
dat3 <- read.csv('https://raw.githubusercontent.com/rnorouzian/e/master/dat3.csv')
m4 <- lmer(math~pc1+pc2+discon+(pc1+pc2+discon|id), data=dat3)
newdata <- with(dat3, expand.grid(pc1=unique(pc1), pc2=unique(pc2), discon=unique(discon)))
y <- predict(m4, newdata=newdata, re.form=NA)
plot(newdata$pc1+newdata$pc2, y)
More sjPlot. See the parameter grid to wrap several predictors in one plot.
library(lme4)
library(sjPlot)
library(patchwork)
dat3 <- read.csv('https://raw.githubusercontent.com/rnorouzian/e/master/dat3.csv')
m4 <- lmer(math~pc1+pc2+discon+(pc1+pc2+discon|id), data=dat3) # Does not converge
m4 <- lmer(math~pc1+pc2+discon+(1|id), data=dat3) # Converges
# To remove discon
a <- plot_model(m4,type = 'pred')[[1]]
b <- plot_model(m4,type = 'pred',title = '')[[2]]
a + b
Edit 1: I had some trouble removing the dropcon term within the sjPlot framework. I gave up and fell back on patchwork. I'm sure Daniel could knows the correct way.
As Magnus Nordmo suggest, this is very simple with sjPlot which has some predefined functions for these types of plot.
library(lme4)
dat3 <- read.csv('https://raw.githubusercontent.com/rnorouzian/e/master/dat3.csv')
m4 <- lmer(math~pc1+pc2+discon+(pc1+pc2+discon|id), data=dat3)
plot_model(m4, type = 'pred', terms = c('pc1', 'pc2'),
ci.lvl = 0)
which gives the following result.
This plot is designed to include different quantiles of the second term in terms over the axes of pc1 and pred. You could split up these plots and combine them using patchwork and the interval can be changed by using square brackets after the term in terms (eg pc1 [-10:1] for interval between -10 and 1).

Plotting a Regression Line Without Extracting Each Coefficient Separately [duplicate]

This question already has answers here:
How to plot a comparisson of two fixed categorical values for linear regression of another continuous variable
(3 answers)
Closed 4 years ago.
For my Stats class, we are using R to compute all of our statistics, and we are working with numeric data that also has a categorical factor. The way we currently are plotting fitted lines is with lm() and then looking at the summary to grab the coefficients manually, create a mesh, and then use the lines() function. I am wanting a way to do this easier. I have seen the predict() function, but not how to use this along with categories.
For example, the data set found here has 2 numerical variables, and one categorical. I want to be able plot the line of best fit for men and women in this set without having to extract each coefficient individually, as below in my current code.
bank<-read.table("http://www.uwyo.edu/crawford/datasets/bank.txt",header=TRUE)
fit <-lm(salary~years*gender,data=bank)
summary(fit)
yearhat<-seq(0,max(bank$salary),length=1000)
salaryfemalehat=fit$coefficients[1]+fit$coefficients[2]*yearhat
salarymalehat=(fit$coefficients[1]+fit$coefficients[3])+(fit$coefficients[2]+fit$coefficients[4])*yearhat
Using what you have, you can get the same predicted values with
yearhat<-seq(0,max(bank$salary),length=1000)
salaryfemalehat <- predict(fit, data.frame(years=yearhat, gender="Female"))
salarymalehat <- predict(fit, data.frame(years=yearhat, gender="Male"))
To supplement MrFlick, in case of more levels we can try:
dat <- mtcars
dat$cyl <- as.factor(dat$cyl)
fit <- lm(mpg ~ disp*cyl, data = dat)
plot(dat$disp, dat$mpg)
with(dat,
for(i in levels(cyl)){
lines(disp, predict(fit, newdata = data.frame(disp = disp, cyl = i))
, col = which(levels(cyl) == i))
}
)

Getting vectors out of ggplot2

I am trying to show that there is a wierd "bump" in some data I am analysing (it is to do with market share. My code is here:-
qplot(Share, Rate, data = Dataset3, geom=c("point", "smooth"))
(I appreciate that this is not very useful code without the dataset).
Is there anyway that I can get the numeric vector used to generate the smoothed line out of R? I just need that layer to try to fit a model to the smoothed data.
Any help gratefully received.
Yes, there is. ggplot uses the function loess as the default smoother in geom_smooth. this means you can use loess directly to estimate your smoothing parameters.
Here is an example, adapted from ?loess :
qplot(speed, dist, data=cars, geom="smooth")
Use loess to estimate the smoothed data, and predict for the estimated values::
cars.lo <- loess(dist ~ speed, cars)
pc <- predict(cars.lo, data.frame(speed = seq(4, 25, 1)), se = TRUE)
The estimates are now in pc$fit and the standard error in pc$fit.se. The following bit of code extraxts the fitted values into a data.frame and then plots it using ggplot :
pc_df <- data.frame(
x=4:25,
fit=pc$fit)
ggplot(pc_df, aes(x=x, y=fit)) + geom_line()

Resources