Produce nice linear regression plot (fitted line, confidence / prediction bands, etc) - r

I have this sample 10-year regression in the future.
date<-as.Date(c("2015-12-31", "2014-12-31", "2013-12-31", "2012-12-31"))
value<-c(16348, 14136, 12733, 10737)
#fit linear regression
model<-lm(value~date)
#build predict dataframe
dfuture<-data.frame(date=seq(as.Date("2016-12-31"), by="1 year", length.out = 10))
#predict the futurne
predict(model, dfuture, interval = "prediction")
How can I add confidence bands to this?

The following code will generate good-looking regression plot for you. My comments along the code should explain everything clear. The code will use value, model as in your question.
## all date you are interested in, 4 years with observations, 10 years for prediction
all_date <- seq(as.Date("2012-12-31"), by="1 year", length.out = 14)
## compute confidence bands (for all data)
pred.c <- predict(model, data.frame(date=all_date), interval="confidence")
## compute prediction bands (for new data only)
pred.p <- predict(model, data.frame(date=all_date[5:14]), interval="prediction")
## set up regression plot (plot nothing here; only set up range, axis)
ylim <- range(range(pred.c[,-1]), range(pred.p[,-1]))
plot(1:nrow(pred.c), numeric(nrow(pred.c)), col = "white", ylim = ylim,
xaxt = "n", xlab = "Date", ylab = "prediction",
main = "Regression Plot")
axis(1, at = 1:nrow(pred.c), labels = all_date)
## shade 95%-level confidence region
polygon(c(1:nrow(pred.c),nrow(pred.c):1), c(pred.c[, 2], rev(pred.c[, 3])),
col = "grey", border = NA)
## plot fitted values / lines
lines(1:nrow(pred.c), pred.c[, 1], lwd = 2, col = 4)
## add 95%-level confidence bands
lines(1:nrow(pred.c), pred.c[, 2], col = 2, lty = 2, lwd = 2)
lines(1:nrow(pred.c), pred.c[, 3], col = 2, lty = 2, lwd = 2)
## add 95%-level prediction bands
lines(4 + 1:nrow(pred.p), pred.p[, 2], col = 3, lty = 3, lwd = 2)
lines(4 + 1:nrow(pred.p), pred.p[, 3], col = 3, lty = 3, lwd = 2)
## add original observations on the plot
points(1:4, rev(value), pch = 20)
## finally, we add legend
legend(x = "topleft", legend = c("Obs", "Fitted", "95%-CI", "95%-PI"),
pch = c(20, NA, NA, NA), lty = c(NA, 1, 2, 3), col = c(1, 4, 2, 3),
text.col = c(1, 4, 2, 3), bty = "n")
The JPEG is generated by code:
jpeg("regression.jpeg", height = 500, width = 600, quality = 100)
## the above code
dev.off()
## check your working directory for this JPEG
## use code getwd() to see this director if you don't know
As you can see from the plot,
Confidence band grows wider as you try to make prediction further away from you observed data;
Prediction interval is wider than confidence interval.
If you want to know more about how predict.lm() computes confidence / prediction intervals internally, read How does predict.lm() compute confidence interval and prediction interval?, and my answer there.
Thanks to Alex's demonstration of simple use of visreg package; but I still prefer to using R base.

You can simply use visreg::visreg
library(visreg)
visreg(model)
If you are interested in the values:
> head(visreg(model)$fit)
date value visregFit visregLwr visregUpr
1 2012-12-31 13434.5 10753.10 9909.073 11597.13
2 2013-01-10 13434.5 10807.81 9974.593 11641.02
3 2013-01-21 13434.5 10862.52 10040.033 11685.00
4 2013-02-01 13434.5 10917.22 10105.389 11729.06
5 2013-02-12 13434.5 10971.93 10170.658 11773.21
6 2013-02-23 13434.5 11026.64 10235.837 11817.44

Related

How can I show non-inferiority with a plot using R

I compare two treatments A and B. The objective is to show that A is not inferior to B. The non inferiority margin delta =-2
After comparing Treatment A - Treatment B I have these results
Mean difference and 95% CI = -0.7 [-2.1, 0.8]
I would like to plot this either with a package or manually. I have no idea how to do it.
Welch Two Sample t-test
data: mydata$outcome[mydata$traitement == "Bras S"] and mydata$outcome[mydata$traitement == "B"]
t = 0.88938, df = 258.81, p-value = 0.3746
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-2.133224 0.805804
sample estimates:
mean of x mean of y
8.390977 9.054688
I want to create this kind of plot:
You could abstract the relevant data from the t.test results and then plot in base R using segments and points to plot the data and abline to draw in the relevant vertical lines. Since there were no reproducible data, I made some up but the process is generally the same.
#sample data
set.seed(123)
tres <- t.test(runif(10), runif(10))
# get values to plot from t test results
ci <- tres$conf.int
ests <- tres$estimate[1] - tres$estimate[2]
# plot
plot(x = ci, ylim = c(0,2), xlim = c(-4, 4), type = "n", # blank plot
bty = "n", xlab = "Treatment A - Treatment B", ylab = "",
axes = FALSE)
points(x = ests, y = 1, pch = 20) # dot for point estimate
segments(x0 = ci[1], x1 = ci[2], y0 = 1) #CI line
abline(v = 0, lty = 2) # vertical line, dashed
abline(v = 2, lty = 1, col = "darkblue") # vertical line, solid, blue
axis(1, col = "darkblue") # add in x axis, blue
EDIT:
If you wanted to more accurately recreate your figure with the x axis in descending order and using your statement "Mean difference and 95% CI = -0.7 [-2.1, 0.8]", you can do the following manipulations to the above approach:
diff <- -0.7
ci <- c(-2.1, 0.8)
# plot
plot(1, xlim = c(-4, 4), type = "n",
bty = "n", xlab = "Treatment A - Treatment B", ylab = "",
axes = FALSE)
points(x = -diff, y = 1, pch = 20)
segments(x0 = -ci[2], x1 = -ci[1], y0 = 1)
abline(v = 0, lty = 2)
abline(v = 2, lty = 1, col = "darkblue")
axis(1, at = seq(-4,4,1), labels = seq(4, -4, -1), col = "darkblue")

Which one the is more appropriate predictive model to use in R for the following scenario

I have values in x axis ranging from 300 mm to 0.075 mm, and in y - axis from 0 to 100. I need to predict the values for x = 0.002. There is a need to plot using semilog plot. I tried to use lm function in the following way:
f2 <- data.frame(sievesize = c(0.075, 1.18, 2.36, 4.75), weight = c(55, 66.9, 67.69, 75)
f3 <- data.frame(sievesize = 0.002)
model1 <- lm(weight ~ log10(sievesize), data = f2)
pred3 <- predict(model1, f3)
Is there any better way to predict the values for 0.002?
You cannot do much with the data except to calculate the prediction interval to understand what is a margin of error for your prediction (it will be shown that it is 38.5 mm +/- 21 mm):
just four points in a range of your experimental data (~ 18 bytes of data).
0.002 mm sieve size is outside your data range [0.075, 4.75]. Unfortunately this kind of extrapolation of any model leads to quate a huge prediction error.
non-linear relation you are fitting in lin-log plot has a discontinuity when approach to zero
the data are distributed in a very narrow range for an exponential dependence.
Please see below the code:
f2 <- data.frame(sievesize = c(0.075, 1.18, 2.36, 4.75), weight = c(55, 66.9, 67.69, 75))
f3 <- data.frame(sievesize = c(0.002))
m_lm <- lm(weight ~ log10(sievesize), data = f2)
fit_lm <- predict(m_lm, f3, interval = "prediction")
fit_lm
pred_x <- data.frame(sievesize = seq(0.001, 5, .01))
fit_conf <- predict(m_lm, pred_x, interval = "prediction")
# fit lwr upr
# 1 38.46763 17.73941 59.19586
plot(log10(f2$sievesize), f2$weight, ylim = c(0, 85), pch = 16, xlim = c(-3, 1))
points(log10(f3$sievesize), fit_lm[, 1], col = "red", pch = 16)
lines(log10(pred_x$sievesize), fit_conf[, 1])
lines(log10(pred_x$sievesize), fit_conf[, 2], col = "blue")
lines(log10(pred_x$sievesize), fit_conf[, 3], col = "blue")
legend("bottomright",
legend = c("experiment", "fitted line", "prediction interval", "forecasted"),
lty = c(NA, 1, 1, NA),
lwd = c(NA, 1, 1, NA),
pch = c(16, NA, NA, 16),
col = c("black", "black", "blue", "red"))
and the graph which illustrates above mentioned points:
So the usage some advance techniques like nonlinear fit, glm or bayessian regression etc. will not bring additional insights as the data set is extriemly small and distributed in very narrow range.

How to change the increment value for xlim and ylim when I wanna plot

I am quite new in R.
I am doing a part of my MSc thesis and wanna make some diurnal plots of for instance methane production in a period of time.
Now I a wanna see its variation in time and its correlation with another factor in the same time. Then I have two questions.
First:
How to define the xlim and ylim to increase by 2 hours. It has its own default and when I give it for example:
xlim = c(0, 23)
then it starts from 0 and goes up in 5 hours. I want it to go up in 2 hours.
Second:
How to put another variable which might be correlated to my first variable in the same time period. Let's say methane production in 23 hours could be related to oxygen consumption, just as an example. How can I put oxygen and methane in the same axis(y) against time (x)?
I will be so appreciated if you could help me with this.
Kinds,
Farhad
You can use at and labels arguments in axis function call to customize labels and tick locations.
You can use axis function with argument side = 4 to create custom y-axis on the right of you graph.
Please see the code below illustrating the above mentioned points:
set.seed(123)
x <- 0:23
df<- data.frame(
x,
ch4 = 1000 - x ^ 2,
o2 = 2000 - 2 * (x - 10) ^ 2
)
par(mar = c(5, 5, 2, 5))
with(df, plot(x, ch4,
type = "l", col = "red3",
ylab = "CH4 emission",
lwd = 3,
xlim = c(0, 23),
xlab = "",
xaxt = "n"))
axis(1, at = seq(0, 23, 2), labels = seq(0, 23, 2))
par(new = TRUE)
with(df, plot(x, o2,
pch = 16, axes = FALSE,
xlab = NA, ylab = NA, cex = 1.2))
axis(side = 4)
mtext(side = 4, line = 3, "O2 consumption")
legend("topright",
legend = c("O2", "CH4"),
lty = c(1, 0),
lwd = c(3, NA),
pch = c(NA, 16),
col = c("red3", "black"))
Output:

Can anybody help figure out why my labels for the y-axis and x-axis are not appearing?

As part of my code to have a 4 rows by 2 columns panel with 8 plots I was suggested to use the code below as an example but when doing so I cannot get the text on the y and x axis. Please see the code below.
#This is the code to have the plots as 4 x 2 in the page
m <- rbind(c(1,2,3,4), c(5,6,7,8) )
layout(m)
par(oma = c(6, 6, 1, 1)) # manipulate the room for the overall x and y axis titles
par(mar = c(.1, .1, .8, .8)) # manipulate the plots be closer together or further apart
###this is the code to insert for instance one of my linear regression plots as part of this panel (imagine I have other 7 identical replicates of this)
####ASF 356 standard curve
asf_356<-read.table("asf356.csv", head=TRUE, sep=',')
asf_356
# Linear Regression
fit <- lm( ct ~ count, data=asf_356)
summary(fit) # show results
predict.lm(fit, interval = c("confidence"), level = 0.95, add=TRUE)
newx <- seq(min(asf_356$count), max(asf_356$count), 0.1)
a <- predict(fit, newdata=data.frame(count=newx), interval="confidence")
plot(x = asf_356$count, y = asf_356$ct, xlab="Log(10) for total ASF 356 genome copies", ylab="Cycle threshold value", xlim=c(0,10), ylim=c(0,35), lty=1, family="serif")
curve(expr=fit$coefficients[1]+fit$coefficients[2]*x,xlim=c(min(asf_356$count), max(asf_356$count)),col="black", add=TRUE, lwd=2)
lines(newx,a[,2], lty=3)
lines(newx,a[,3], lty=3)
legend(x = 0.5, y = 20, legend = c("Logistic regression model", "95% individual confindence interval"), lty = c("solid", "dotdash"), col = c("black", "black"), enter code herebty = "n")
mod.fit=summary(fit)
r2 = mod.fit$r.squared
mylabel = bquote(italic(R)^2 == .(format(r2, digits = 3)))
text(x = 8.2, y = 25, labels = mylabel)
legend(x = 7, y = 35, legend =c("y= -3.774*x + 41.21"), bty="n")
I have been able to find a similar post here and the argument that I was missing was :
title(xlab="xx", ylab="xx", outer=TRUE, line=3, family="serif")
Thanks
Finally I have my work..thanks to whom helped me before as well

Plot normal, left and right skewed distribution in R

I want to create 3 plots for illustration purposes:
- normal distribution
- right skewed distribution
- left skewed distribution
This should be an easy task, but I found only this link, which only shows a normal distribution. How do I do the rest?
If you are not too tied to normal, then I suggest you use beta distribution which can be symmetrical, right skewed or left skewed based on the shape parameters.
hist(rbeta(10000,5,2))
hist(rbeta(10000,2,5))
hist(rbeta(10000,5,5))
Finally I got it working, but with both of your help, but I was relying on this site.
N <- 10000
x <- rnbinom(N, 10, .5)
hist(x,
xlim=c(min(x),max(x)), probability=T, nclass=max(x)-min(x)+1,
col='lightblue', xlab=' ', ylab=' ', axes=F,
main='Positive Skewed')
lines(density(x,bw=1), col='red', lwd=3)
This is also a valid solution:
curve(dbeta(x,8,4),xlim=c(0,1))
title(main="posterior distrobution of p")
just use fGarch package and these functions:
dsnorm(x, mean = 0, sd = 1, xi = 1.5, log = FALSE)
psnorm(q, mean = 0, sd = 1, xi = 1.5)
qsnorm(p, mean = 0, sd = 1, xi = 1.5)
rsnorm(n, mean = 0, sd = 1, xi = 1.5)
** mean, sd, xi location parameter mean, scale parameter sd, skewness parameter xi.
Examples
## snorm -
# Ranbdom Numbers:
par(mfrow = c(2, 2))
set.seed(1953)
r = rsnorm(n = 1000)
plot(r, type = "l", main = "snorm", col = "steelblue")
# Plot empirical density and compare with true density:
hist(r, n = 25, probability = TRUE, border = "white", col = "steelblue")
box()
x = seq(min(r), max(r), length = 201)
lines(x, dsnorm(x), lwd = 2)
# Plot df and compare with true df:
plot(sort(r), (1:1000/1000), main = "Probability", col = "steelblue",
ylab = "Probability")
lines(x, psnorm(x), lwd = 2)
# Compute quantiles:
round(qsnorm(psnorm(q = seq(-1, 5, by = 1))), digits = 6)

Resources