I have the following data, which I'm trying to model via GLM, using Gamma function. It works, except that abline won't show any line. What am I doing wrong?
y <- c(0.00904977380111,0.009174311972687,0.022573363475789,0.081632653008122,0.005571030584803,1e-04,0.02375296916921,0.004962779106823,0.013729977117333,0.00904977380111,0.004514672640982,0.016528925619835,1e-04,0.027855153258277,0.011834319585449,0.024999999936719,1e-04,0.026809651528869,0.016348773841071,1e-04,0.009345794439034,0.00457665899303,0.004705882305772,0.023201856194357,1e-04,0.033734939711656,0.014251781472007,0.004662004755245,0.009259259166667,0.056872037917387,0.018518518611111,0.014598540145986,0.009478673032951,0.023529411811211,0.004819277060357,0.018691588737881,0.018957345923721,0.005390835525461,0.056179775223141,0.016348773841071,0.01104972381185,0.010928961639344,1e-04,1e-04,0.010869565271444,0.011363636420778,0.016085790883856,0.016,0.005665722322786,0.01117318441372,0.028818443860841,1e-04,0.022988505862069,0.01010101,1e-04,0.018083182676638,0.00904977380111,0.00961538466323,0.005390835525461,0.005763688703004,1e-04,0.005571030584803,1e-04,0.014388489208633,0.005633802760722,0.005633802760722,1e-04,0.005361930241431,0.005698005811966,0.013986013986014,1e-04,1e-04)
x <- c(600,600,600,600,600,600,600,600,600,600,600,600,600,600,600,600,600,600,600,600,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,744.47,744.47,744.47,744.47,744.47,744.47,744.47,630.42,630.42,630.42,630.42,630.42,630.42,630.42,630.42,630.42)
hist(y,breaks=15)
plot(y~x)
fit <- glm(y~x,family='Gamma'(link='log'))
abline(fit)
abline plots linear functions, from a simple linear regression, say. A GLM with a Gamma family and a log link is nonlinear on the original scale. To visualize the fit of such a model, you could use predict (an example is given below). Several packages (e.g. effects or visreg) for R exist that feature functions that allow you to directly plot the fit on the original scale including confidence intervals.
Here is an example using visreg using your data and model:
library(visreg)
y <- c(0.00904977380111,0.009174311972687,0.022573363475789,0.081632653008122,0.005571030584803,1e-04,0.02375296916921,0.004962779106823,0.013729977117333,0.00904977380111,0.004514672640982,0.016528925619835,1e-04,0.027855153258277,0.011834319585449,0.024999999936719,1e-04,0.026809651528869,0.016348773841071,1e-04,0.009345794439034,0.00457665899303,0.004705882305772,0.023201856194357,1e-04,0.033734939711656,0.014251781472007,0.004662004755245,0.009259259166667,0.056872037917387,0.018518518611111,0.014598540145986,0.009478673032951,0.023529411811211,0.004819277060357,0.018691588737881,0.018957345923721,0.005390835525461,0.056179775223141,0.016348773841071,0.01104972381185,0.010928961639344,1e-04,1e-04,0.010869565271444,0.011363636420778,0.016085790883856,0.016,0.005665722322786,0.01117318441372,0.028818443860841,1e-04,0.022988505862069,0.01010101,1e-04,0.018083182676638,0.00904977380111,0.00961538466323,0.005390835525461,0.005763688703004,1e-04,0.005571030584803,1e-04,0.014388489208633,0.005633802760722,0.005633802760722,1e-04,0.005361930241431,0.005698005811966,0.013986013986014,1e-04,1e-04)
x <- c(600,600,600,600,600,600,600,600,600,600,600,600,600,600,600,600,600,600,600,600,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,744.47,744.47,744.47,744.47,744.47,744.47,744.47,630.42,630.42,630.42,630.42,630.42,630.42,630.42,630.42,630.42)
fit <- glm(y~x,family='Gamma'(link='log'))
visreg(fit, scale = "response")
An here is the example using R base graphics and predict:
pred_frame <- data.frame(
x = seq(min(x), max(x), length.out = 1000)
)
pred_frame$fit <- predict(fit, newdata = pred_frame, type = "response")
plot(y~x, pch = 16, las = 1, cex = 1.5)
lines(fit~x, data = pred_frame, col = "steelblue", lwd = 3)
You are not being consistent here since you chose to model on the log scale but you are plotting on the raw scale. Mind you many, many published plots do the same. You need to plot the points in log space or transform the coefficients and pass them to abline() explicitly.
How can I calculate and plot a confidence interval for my regression in r? So far I have two numerical vectors of equal length (x,y) and a regression object(lm.out). I have made a scatterplot of y given x and added the regression line to this plot. I am looking for a way to add a 95% prediction confidence band for lm.out to the plot. I've tried using the predict function, but I don't even know where to start with that :/. Here is my code at the moment:
x=c(1,2,3,4,5,6,7,8,9,0)
y=c(13,28,43,35,96,84,101,110,108,13)
lm.out <- lm(y ~ x)
plot(x,y)
regression.data = summary(lm.out) #save regression summary as variable
names(regression.data) #get names so we can index this data
a= regression.data$coefficients["(Intercept)","Estimate"] #grab values
b= regression.data$coefficients["x","Estimate"]
abline(a,b) #add the regression line
Thank you!
Edit: I've taken a look at the proposed duplicate and can't quite get to the bottom of it.
You have yo use predict for a new vector of data, here newx.
x=c(1,2,3,4,5,6,7,8,9,0)
y=c(13,28,43,35,96,84,101,110,108,13)
lm.out <- lm(y ~ x)
newx = seq(min(x),max(x),by = 0.05)
conf_interval <- predict(lm.out, newdata=data.frame(x=newx), interval="confidence",
level = 0.95)
plot(x, y, xlab="x", ylab="y", main="Regression")
abline(lm.out, col="lightblue")
lines(newx, conf_interval[,2], col="blue", lty=2)
lines(newx, conf_interval[,3], col="blue", lty=2)
EDIT
as it is mention in the coments by Ben this can be done with matlines as follow:
plot(x, y, xlab="x", ylab="y", main="Regression")
abline(lm.out, col="lightblue")
matlines(newx, conf_interval[,2:3], col = "blue", lty=2)
I'm going to add a tip that would have saved me a lot of frustration when trying the method given by #Alejandro Andrade: If your data are in a data frame, then when you build your model with lm(), use the data= argument rather than $ notation. E.g., use
lm.out <- lm(y ~ x, data = mydata)
rather than
lm.out <- lm(mydata$y ~ mydata$x)
If you do the latter, then this statement
predict(lm.out, newdata=data.frame(x=newx), interval="confidence", level = 0.95)
seems to either ignore the new values passed using newdata= or there's a silent error. Either way, the output is the predictions from the original data, not the new data.
Also, be sure your x variable gets the same name in the new data frame that it had
in the original. That's easier to figure out because you do get an error, but knowing it ahead of time might save you a round of debugging.
Note: Tried to add this as a comment, but don't have enough reputation points.
When applying gam.check in the mgcv package, R produces some residual plots and basis dimension output. Is there a way to only produce the plots and not the printed output?
library(mgcv)
set.seed(0)
dat <- gamSim(1,n=200)
b <- gam(y~s(x0)+s(x1)+s(x2)+s(x3), data=dat)
plot(b, pages=1)
gam.check(b, pch=19, cex=.3)
There are four plots, from top left, moving down and across we have:
A QQ plot of the residuals
A histogram of the residuals
A plot of residuals vs the linear predictor
A plot of observed vs fitted values.
In the code below, I assume b contains your fitted model, as per your example. First some things we need
type <- "deviance" ## "pearson" & "response" are other valid choices
resid <- residuals(b, type = type)
linpred <- napredict(b$na.action, b$linear.predictors)
observed.y <- napredict(b$na.action, b$y)
Note the last two lines are applying the NA handling method used when the model was fitted to the information on the linear.predictors and y, the stored copy of the response data.
The above code and that shown below is all given in the first 10 or so lines of the gam.check() source. To view this, just enter
gam.check
at the R prompt.
Each plot is produced as follows:
QQ plot
This is produced via qq.gam():
qq.gam(b, rep = 0, level = 0.9, type = type, rl.col = 2,
rep.col = "gray80")
Histogram of residuals
This is produced using
hist(resid, xlab = "Residuals", main = "Histogram of residuals")
Residuals vs linear predictor
This is produced using
plot(linpred, resid, main = "Resids vs. linear pred.",
xlab = "linear predictor", ylab = "residuals")
Observed vs fitted values
This is produced using
plot(fitted(b), observed.y, xlab = "Fitted Values",
ylab = "Response", main = "Response vs. Fitted Values")
There are now the two packages gratia and mgcViz which have functions to produce the gam.check output as ggplots which you can store as an object. The former doesn't print anything to console, the latter does.
require(gratia)
appraise(b)
require(mgcViz)
b = getViz(b)
check(b)