Why abline won't show line from glm with Gamma family? - r

I have the following data, which I'm trying to model via GLM, using Gamma function. It works, except that abline won't show any line. What am I doing wrong?
y <- c(0.00904977380111,0.009174311972687,0.022573363475789,0.081632653008122,0.005571030584803,1e-04,0.02375296916921,0.004962779106823,0.013729977117333,0.00904977380111,0.004514672640982,0.016528925619835,1e-04,0.027855153258277,0.011834319585449,0.024999999936719,1e-04,0.026809651528869,0.016348773841071,1e-04,0.009345794439034,0.00457665899303,0.004705882305772,0.023201856194357,1e-04,0.033734939711656,0.014251781472007,0.004662004755245,0.009259259166667,0.056872037917387,0.018518518611111,0.014598540145986,0.009478673032951,0.023529411811211,0.004819277060357,0.018691588737881,0.018957345923721,0.005390835525461,0.056179775223141,0.016348773841071,0.01104972381185,0.010928961639344,1e-04,1e-04,0.010869565271444,0.011363636420778,0.016085790883856,0.016,0.005665722322786,0.01117318441372,0.028818443860841,1e-04,0.022988505862069,0.01010101,1e-04,0.018083182676638,0.00904977380111,0.00961538466323,0.005390835525461,0.005763688703004,1e-04,0.005571030584803,1e-04,0.014388489208633,0.005633802760722,0.005633802760722,1e-04,0.005361930241431,0.005698005811966,0.013986013986014,1e-04,1e-04)
x <- c(600,600,600,600,600,600,600,600,600,600,600,600,600,600,600,600,600,600,600,600,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,744.47,744.47,744.47,744.47,744.47,744.47,744.47,630.42,630.42,630.42,630.42,630.42,630.42,630.42,630.42,630.42)
hist(y,breaks=15)
plot(y~x)
fit <- glm(y~x,family='Gamma'(link='log'))
abline(fit)

abline plots linear functions, from a simple linear regression, say. A GLM with a Gamma family and a log link is nonlinear on the original scale. To visualize the fit of such a model, you could use predict (an example is given below). Several packages (e.g. effects or visreg) for R exist that feature functions that allow you to directly plot the fit on the original scale including confidence intervals.
Here is an example using visreg using your data and model:
library(visreg)
y <- c(0.00904977380111,0.009174311972687,0.022573363475789,0.081632653008122,0.005571030584803,1e-04,0.02375296916921,0.004962779106823,0.013729977117333,0.00904977380111,0.004514672640982,0.016528925619835,1e-04,0.027855153258277,0.011834319585449,0.024999999936719,1e-04,0.026809651528869,0.016348773841071,1e-04,0.009345794439034,0.00457665899303,0.004705882305772,0.023201856194357,1e-04,0.033734939711656,0.014251781472007,0.004662004755245,0.009259259166667,0.056872037917387,0.018518518611111,0.014598540145986,0.009478673032951,0.023529411811211,0.004819277060357,0.018691588737881,0.018957345923721,0.005390835525461,0.056179775223141,0.016348773841071,0.01104972381185,0.010928961639344,1e-04,1e-04,0.010869565271444,0.011363636420778,0.016085790883856,0.016,0.005665722322786,0.01117318441372,0.028818443860841,1e-04,0.022988505862069,0.01010101,1e-04,0.018083182676638,0.00904977380111,0.00961538466323,0.005390835525461,0.005763688703004,1e-04,0.005571030584803,1e-04,0.014388489208633,0.005633802760722,0.005633802760722,1e-04,0.005361930241431,0.005698005811966,0.013986013986014,1e-04,1e-04)
x <- c(600,600,600,600,600,600,600,600,600,600,600,600,600,600,600,600,600,600,600,600,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,744.47,744.47,744.47,744.47,744.47,744.47,744.47,630.42,630.42,630.42,630.42,630.42,630.42,630.42,630.42,630.42)
fit <- glm(y~x,family='Gamma'(link='log'))
visreg(fit, scale = "response")
An here is the example using R base graphics and predict:
pred_frame <- data.frame(
x = seq(min(x), max(x), length.out = 1000)
)
pred_frame$fit <- predict(fit, newdata = pred_frame, type = "response")
plot(y~x, pch = 16, las = 1, cex = 1.5)
lines(fit~x, data = pred_frame, col = "steelblue", lwd = 3)

You are not being consistent here since you chose to model on the log scale but you are plotting on the raw scale. Mind you many, many published plots do the same. You need to plot the points in log space or transform the coefficients and pass them to abline() explicitly.

Related

R plot confidence interval lines with a robust linear regression model (rlm)

I need to plot a Scatterplot with the confidence interval for a robust linear regression (rlm) model, all the examples I had found only work with LM.
This is my code:
model1 <- rlm(weightsE$brain ~ weightsE$body)
newx <- seq(min(weightsE$body), max(weightsE$body), length.out=70)
newx<-as.data.frame(newx)
colnames(newx)<-"brain"
conf_interval <- predict(model1, newdata = data.frame(x=newx), interval = 'confidence',
level=0.95)
#create scatterplot of values with regression line
plot(weightsE$body, weightsE$body)
abline(model1)
#add dashed lines (lty=2) for the 95% confidence interval
lines(newx, conf_interval[,2], col="blue", lty=2)
lines(newx, conf_interval[,3], col="blue", lty=2)
but the results of predict don't produce a straight line for the upper and lower level, they are more like random predictions.
You have a few problems to fix here.
When you generate a model, don't use rlm(weightsE$brain ~ weightsE$body), instead use rlm(brain ~ body, data = weightsE). Otherwise, the model cannot take new data for predictions. Any predictions you get will be produced from the original weightsE$body values, not from the new data you pass into predict
You are trying to create a prediction data frame with a column called "brain', but you are trying to predict the value of "brain", so you need a column called "body"
newx is already a data frame, but for some reason you are wrapping it inside another data frame when you do newdata = data.frame(x=newx). Just pass newx.
You are plotting with plot(weightsE$body, weightsE$body), when it should be plot(weightsE$body, weightsE$brain)
Putting all this together, and using a dummy data set with the same names as your own (see below), we get:
library(MASS)
model1 <- rlm(brain ~ body, data = weightsE)
newx <- data.frame(body = seq(min(weightsE$body),
max(weightsE$body), length.out=70))
conf_interval <- predict(model1, newdata = data.frame(x=newx),
interval = 'confidence',
level=0.95)
#create scatterplot of values with regression line
plot(weightsE$body, weightsE$brain)
abline(model1)
#add dashed lines (lty=2) for the 95% confidence interval
lines(newx$body, conf_interval[, 2], col = "blue", lty = 2)
lines(newx$body, conf_interval[, 3], col = "blue", lty = 2)
Incidentally, you could do the whole thing in ggplot in much less code:
library(ggplot2)
ggplot(weightsE, aes(body, brain)) +
geom_point() +
geom_smooth(method = MASS::rlm)
Reproducible dummy data
data(mtcars)
weightsE <- setNames(mtcars[c(1, 6)], c("brain", "body"))
weightsE$body <- 10 - weightsE$body

adding knn fit to plot in R

~Beginner in R~
I have the following code for a data set that has variables: price, mileage, and color. I have plotted a basic plot of x=mileage and y=price, and fitted a linear regression line to the plot.
cd = read.csv("https://bitbucket.org/remcc/rob-data-sets/downloads/susedcars.csv")
cd = cd[,c('price','mileage','color')]
n = nrow(cd)
set.seed(99)
pin = .75 #percent train (or percent in-sample)
ii = sample(1:n,floor(pin*n))
cdtr = cd[ii,]
cdte = cd[-ii,]
dim(cd)
plot(cd$mileage, cd$price, xlab="Mileage", ylab="Price", pch=16,cex=.8)
abline(lm(cd$price ~ cd$mileage), col="red", lwd=2)
## FITTING KNN
pred_knn=knn(data.frame(cdtr$mileage), data.frame(cdte$mileage), cl=cdtr$price, k=50)
I am trying to fit a line using pred_knn to the plot so that my plot looks like this:
However, I am not sure how to go about adding the kNN fit to my plot
As dcarlson mentions in the comments, it looks like you're using class::knn to predict a continuous variable, when it's meant to be used for classification (i.e. categorical responses).
The FNN package allows for kNN regression. Does this help:
library(FNN)
pred_knn = FNN::knn.reg(train = cdtr[2], test = cdte[2], y = cdtr[1], k = 50)
plot(cdte$mileage,cdte$price,, xlab="Mileage", ylab="Price", pch=16,cex=.8))
ORDER = order(cdte$mileage)
lines(cdte$mileage[ORDER],pred_knn$pred[ORDER])

Calculating piecewise quantile linear regression with segmented package R

I am looking for a way to obtain the piecewise quantile linear regression with R. I have been able to compute the Quantile regression with the package quantreg. However, I don't want just 1 unique slope but want to check for breakpoints in my dataset. I have seen that the segmented package can do so. While it works good if the fit is carried out with lm or glm (as shown below in an example), it doesn't manage to work for quantile.
On the segmented package info I have read that there is a segmented.default which can be used for specific regression models, such as Quantiles. However, when I apply it for my quantile outcome it gives me the following errors:
Error in diag(vv) : invalid 'nrow' value (too large or NA)
In addition: Warning message:
cannot compute the covariance matrix
If instead of using K=2 I use for example psi I get other type of errors:
Error in rq.fit.br(x, y, tau = tau, ...) : Singular design matrix
I have created an example with the mtcars data so you can see the errors that I get.
library(quantreg)
library(segmented)
data(mtcars)
out.rq <- rq(mpg ~ wt, data= mtcars)
out.lm <- lm(mpg ~ wt, data= mtcars)
# Plotting the results
plot(mpg ~ wt, data = mtcars, pch = 1, main = "mpg ~ wt")
abline(out.lm, col = "red", lty = 2)
abline(out.rq, col = "blue", lty = 2)
legend("topright", legend = c("linear", "quantile"), col = c("red", "blue"), lty = 2)
#Generating segmented LM
o <- segmented(out.lm, seg.Z= ~wt, npsi=2, control=seg.control(display=FALSE))
plot(o, lwd=2, col=2:6, main="Segmented regression", res=FALSE) #lwd: line width #col: from 2 to 6 #RES: show datapoints
#Generating segmented Quantile
#using K=2
o.quantile <- segmented.default(out.rq, seg.Z= ~wt, control=seg.control(display=FALSE, K=2))
# using psi
o.quantile <- segmented.default(out.rq, seg.Z= ~wt, psi=list(wt=c(2,4)), control=seg.control(display=FALSE))
I came across this post after a long time because I have the same issue. Just in case others might be stuck with the problem in the future, I wanted to point out what the problem is.
I examined "segmented.default". There is a line in the source code as follows:
Cov <- try(vcov(objF), silent = TRUE)
vcov is used to calculate the covariance matrix but does not work for quantile regression object objF. To get the covariance matrix for quantile regression, you need:
summary(objF,se="boot",cov=TRUE)$cov
Here, I used bootstrap method to compute the covariance matrix by selecting se="boot" but you should choose the appropriate method for you. Check ?summary.rq then "se" section for different methods.
Additionally, you need to assign the row/column names as follows:
dimnames(Cov)[[1]] <- dimnames(Cov)[[2]] <- unlist(attributes(objF$coef))
After modifying the function, it worked for me.
Maybe the other answer isn't particularly clean, as you need to modify a package function.
Additionally, maybe boot isn't such a good idea for SEs, according to this answer.
To get it working a bit easier, add a function to your workspace:
vcov.rq <- function(object, ...) {
result = summary(object, se = "nid", covariance = TRUE)$cov
rownames(result) = colnames(result) = names(coef(object))
return(result)
}
Caveats from the Cross-Validated link apply.

Plotting a 95% confidence interval for a lm object

How can I calculate and plot a confidence interval for my regression in r? So far I have two numerical vectors of equal length (x,y) and a regression object(lm.out). I have made a scatterplot of y given x and added the regression line to this plot. I am looking for a way to add a 95% prediction confidence band for lm.out to the plot. I've tried using the predict function, but I don't even know where to start with that :/. Here is my code at the moment:
x=c(1,2,3,4,5,6,7,8,9,0)
y=c(13,28,43,35,96,84,101,110,108,13)
lm.out <- lm(y ~ x)
plot(x,y)
regression.data = summary(lm.out) #save regression summary as variable
names(regression.data) #get names so we can index this data
a= regression.data$coefficients["(Intercept)","Estimate"] #grab values
b= regression.data$coefficients["x","Estimate"]
abline(a,b) #add the regression line
Thank you!
Edit: I've taken a look at the proposed duplicate and can't quite get to the bottom of it.
You have yo use predict for a new vector of data, here newx.
x=c(1,2,3,4,5,6,7,8,9,0)
y=c(13,28,43,35,96,84,101,110,108,13)
lm.out <- lm(y ~ x)
newx = seq(min(x),max(x),by = 0.05)
conf_interval <- predict(lm.out, newdata=data.frame(x=newx), interval="confidence",
level = 0.95)
plot(x, y, xlab="x", ylab="y", main="Regression")
abline(lm.out, col="lightblue")
lines(newx, conf_interval[,2], col="blue", lty=2)
lines(newx, conf_interval[,3], col="blue", lty=2)
EDIT
as it is mention in the coments by Ben this can be done with matlines as follow:
plot(x, y, xlab="x", ylab="y", main="Regression")
abline(lm.out, col="lightblue")
matlines(newx, conf_interval[,2:3], col = "blue", lty=2)
I'm going to add a tip that would have saved me a lot of frustration when trying the method given by #Alejandro Andrade: If your data are in a data frame, then when you build your model with lm(), use the data= argument rather than $ notation. E.g., use
lm.out <- lm(y ~ x, data = mydata)
rather than
lm.out <- lm(mydata$y ~ mydata$x)
If you do the latter, then this statement
predict(lm.out, newdata=data.frame(x=newx), interval="confidence", level = 0.95)
seems to either ignore the new values passed using newdata= or there's a silent error. Either way, the output is the predictions from the original data, not the new data.
Also, be sure your x variable gets the same name in the new data frame that it had
in the original. That's easier to figure out because you do get an error, but knowing it ahead of time might save you a round of debugging.
Note: Tried to add this as a comment, but don't have enough reputation points.

get equation of linear SVM regression line

I'm struggling to find a way to get the equation for a linear SVM model in the regression case, since most of the questions deal with classification... I have fit it with caret package.
1- univariate case
set.seed(1)
fit=train(mpg~hp, data=mtcars, method="svmLinear")
plot(x=mtcars$hp, y=predict(fit, mtcars), pch=15)
points(x=mtcars$hp, y=mtcars$mpg, col="red")
abline(lm(mpg~hp, mtcars), col="blue")
Which gives the plot with red=actual, black=fitted, and blue line is classic regression. In this case I know I could manually calculate the SVM prediction line from 2 points, but is there a way to get directly the equation from the model structure? I actually need the equation like this y=a+bx (here mpg=?+?*hp ) with values in the original scale.
2-multivariate
same question but with 2 dependent variables (mpg~hp+wt)
Thanks,
Yes, I believe there is. Take a look at this answer, which is similar, but does not use the caret library. If you add svp = fit$finalModelto the example, you should be able to follow it almost exactly. I applied a similar technique to your data below. I scaled the data to fit nicely on the plot of the vectors since the library scales the data at runtime.
require(caret)
set.seed(1)
x = model.matrix(data=mtcars, mpg ~ scale(hp)) #set up data
y = mtcars$mpg
fit=train(x, y, method="svmLinear") #train
svp = fit$finalModel #extract s4 model object
plot(x, xlab="", ylab="")
w <- colSums(coef(svp)[[1]] * x[unlist(alphaindex(svp)),])
b <- b(svp)
abline(b/w[1],-w[2]/w[1], col='red')
abline((b+1)/w[1],-w[2]/w[1],lty=2, col='red')
abline((b-1)/w[1],-w[2]/w[1],lty=2, col='red')
And your second question:
x = model.matrix(data=mtcars, mpg ~ scale(hp) + scale(wt) - 1) #set up data
fit=train(x, y, method="svmLinear") #train
svp = fit$finalModel #extract s4 model object
plot(x, xlab="", ylab="")
w <- colSums(coef(svp)[[1]] * x[unlist(alphaindex(svp)),])
b <- b(svp)
abline(b/w[1],-w[2]/w[1], col='red')
abline((b+1)/w[1],-w[2]/w[1],lty=2, col='red')
abline((b-1)/w[1],-w[2]/w[1],lty=2, col='red')
Edit
The above answer concerns plotting a boundary, not the linear SVM regression line. To answer the question, one easy way to get the line is to extract the predicted values and plot the regression. You actually only need a couple of points to get the line, but for simplicity, I used the following code.
abline(lm(predict(fit, newdata=mtcars) ~ mtcars$hp), col='green')
or
abline(lm(predict(fit) ~ mtcars$hp), col='green')

Resources