Calculating piecewise quantile linear regression with segmented package R - r

I am looking for a way to obtain the piecewise quantile linear regression with R. I have been able to compute the Quantile regression with the package quantreg. However, I don't want just 1 unique slope but want to check for breakpoints in my dataset. I have seen that the segmented package can do so. While it works good if the fit is carried out with lm or glm (as shown below in an example), it doesn't manage to work for quantile.
On the segmented package info I have read that there is a segmented.default which can be used for specific regression models, such as Quantiles. However, when I apply it for my quantile outcome it gives me the following errors:
Error in diag(vv) : invalid 'nrow' value (too large or NA)
In addition: Warning message:
cannot compute the covariance matrix
If instead of using K=2 I use for example psi I get other type of errors:
Error in rq.fit.br(x, y, tau = tau, ...) : Singular design matrix
I have created an example with the mtcars data so you can see the errors that I get.
library(quantreg)
library(segmented)
data(mtcars)
out.rq <- rq(mpg ~ wt, data= mtcars)
out.lm <- lm(mpg ~ wt, data= mtcars)
# Plotting the results
plot(mpg ~ wt, data = mtcars, pch = 1, main = "mpg ~ wt")
abline(out.lm, col = "red", lty = 2)
abline(out.rq, col = "blue", lty = 2)
legend("topright", legend = c("linear", "quantile"), col = c("red", "blue"), lty = 2)
#Generating segmented LM
o <- segmented(out.lm, seg.Z= ~wt, npsi=2, control=seg.control(display=FALSE))
plot(o, lwd=2, col=2:6, main="Segmented regression", res=FALSE) #lwd: line width #col: from 2 to 6 #RES: show datapoints
#Generating segmented Quantile
#using K=2
o.quantile <- segmented.default(out.rq, seg.Z= ~wt, control=seg.control(display=FALSE, K=2))
# using psi
o.quantile <- segmented.default(out.rq, seg.Z= ~wt, psi=list(wt=c(2,4)), control=seg.control(display=FALSE))

I came across this post after a long time because I have the same issue. Just in case others might be stuck with the problem in the future, I wanted to point out what the problem is.
I examined "segmented.default". There is a line in the source code as follows:
Cov <- try(vcov(objF), silent = TRUE)
vcov is used to calculate the covariance matrix but does not work for quantile regression object objF. To get the covariance matrix for quantile regression, you need:
summary(objF,se="boot",cov=TRUE)$cov
Here, I used bootstrap method to compute the covariance matrix by selecting se="boot" but you should choose the appropriate method for you. Check ?summary.rq then "se" section for different methods.
Additionally, you need to assign the row/column names as follows:
dimnames(Cov)[[1]] <- dimnames(Cov)[[2]] <- unlist(attributes(objF$coef))
After modifying the function, it worked for me.

Maybe the other answer isn't particularly clean, as you need to modify a package function.
Additionally, maybe boot isn't such a good idea for SEs, according to this answer.
To get it working a bit easier, add a function to your workspace:
vcov.rq <- function(object, ...) {
result = summary(object, se = "nid", covariance = TRUE)$cov
rownames(result) = colnames(result) = names(coef(object))
return(result)
}
Caveats from the Cross-Validated link apply.

Related

Stargazer custom confidence intervals with multiple models

Stargazer is exponentiating 'wrong' confidence intervals because it is using normal distribution instead of t-distribution. So one has to use custom confidence intervals (Stargazer Confidence Interval Incorrect?).
But how does one it with multiple models?
model1 <- glm(vs ~ mpg + hp, data = mtcars, family = 'binomial')
model2 <- glm(vs ~ mpg + disp, data = mtcars, family = 'binomial')
library(stargazer)
stargazer(model1,
apply.coef = exp,
digits = 3,
ci = T,
t.auto = F,
type = "text",
ci.custom = list(exp(confint(model1))))
This works as intended. But when I am adding
ci.custom = list(exp(confint(model1, model2))))
then I'll get
Error in Pnames[which] : invalid subscript type 'list'
I tried with c() but to no avail.
The documentation says
a list of two-column numeric matrices ...
so
cc <- lapply(list(model1, model2), function(x) exp(confint(x)))
stargazer(model1, model2,
...,
ci.custom = cc)
should work. (cc <- list(exp(confint(model1)), exp(confint(model2))) also works, and is a little more explicit, but won't generalize as well ...)
For what it's worth, the difference for generalized linear models between the default CIs and those provided by confint() is not a Normal-vs-Student-t distinction (this is different from the case in the linked answer about linear models) — it's the difference between Wald and profile likelihood confidence intervals. (There is some theory for finite-size corrections in GLMs, called Bartlett corrections, but they're not easy to compute/widely available.)

transforming variables to use for prediction in regression model

I have the following dataset, I wish to find the best varibles among pred1, pred2, pred3, pred5, and pred6 to buid a regression model to predict resp1 and resp2.
So far, I only find out pred2 seems like the one to use for resp1.
library(ggplot2)
testdat <- read.csv("testdat.csv", header = T)
plot(testdat$pred2,testdat$resp1)
m1<-lm(resp1~pred2, data=testdat)
smooth <- smooth.spline(testdat$pred2,testdat$resp1, spar=1)
lines(smooth, col='red', lwd = 2, lty = 2)
I tried ^2,sqrt,log to do the transformations, when I check summary(m1), no matter what I tried, the R-squared seems no higher than 53%, desperate.
The same about transforming variables to predict resp2, no higher than 66%
plot(testdat$pred3,testdat$resp2)
m1<-lm(resp2~pred3, data=testdat)
smooth <- smooth.spline(testdat$pred3,testdat$resp2, spar=1)
lines(smooth, col='red', lwd = 2, lty = 2)
Sample dataset:
https://www.filehosting.org/file/details/846977/testdat.csv
Having more than one variable is highly considered, if you are trying to generate a regression model to predict resp1 and then resp2 by using one or more varibles among pred1, pred2, pred3, pred5, and pred6. What would you do?
If you are looking to use linear regression, you could perform a RESET test on your data and work from there:
resettest(resp1~pred2, data=testdat, power = 2:3, type = "regressor)
If the p.value is lower than 0.05, then you have to change the model.
And use adjusted R2 instead of R2 to make comparisons between models.
Otherwise, more information regarding your data might be necessary (e.g. what is its nature?)

How to plot multiple glmer models into one single plot?

I have two glmer models with two covariates each that I'm trying to plot into a single figure.
MWE:
## generalized linear mixed model
library(lattice)
cbpp$response <- sample(c(0,1), replace=TRUE, size=nrow(cbpp))
gm1 <- glmer(response ~ size + incidence + (1 | herd),
data = cbpp, family = binomial)
cbpp$obs <- 1:nrow(cbpp)
gm2 <- glmer(response ~ size + incidence + (1 | herd) + (1|obs),
family = binomial, data = cbpp)
I am trying to plot the predicted values againts each covariate for each model. I found the sjPlot library and the plot_model function, which can plot these predictions when using type = "pred". Calling this function individually on each model works perfect and yields two separate figures like this for each model:
However I'm not familiar with R and I am having a hard time trying to plot the 4 plots on the same figure.
The plot_model function has a grid parameter, which only works for models with a Poisson distirbution. For gm1 and gm2, I am getting the following error when I call plot_model(gm1, type = "pred", grid = TRUE):
Error in if (attr(x, "logistic", exact = TRUE) == "1" && attr(x, "is.trial", : missing value where TRUE/FALSE needed
Anyway, I would not be able to plot the three models in one figure using this so I tried three different approaches. First, I saw the plot_models function, which takes multiple models as input. When I try to pass the two models as arguments, calling plot_models(gm1, gm2) I get the following error:
Error: $ operator not defined for this S4 class
Second, I tried using the par function setting the mfrow and then calling plot_model again without success. I don't get any error but the plots keep showing as individual figures.
Third, I tried using the gridExtra library. Calling
p1 <- plot_model(gm1, type = "pred")
p2 <- plot_model(gm2, type = "pred")
grid.arrange(p1, p2)
results in the following error:
Error in gList(list(ppt = list(data = list(x = c(-2, -1, 0, 1, 2, 3, 4, : only 'grobs' allowed in "gList"
Does anyone have an insight on this?
EDIT
This seems to work:
pp1 <- plot_model(gm1,type="pred")
pp2 <- plot_model(gm2,type="pred")
plot_grid(c(pp1,pp2))

Why abline won't show line from glm with Gamma family?

I have the following data, which I'm trying to model via GLM, using Gamma function. It works, except that abline won't show any line. What am I doing wrong?
y <- c(0.00904977380111,0.009174311972687,0.022573363475789,0.081632653008122,0.005571030584803,1e-04,0.02375296916921,0.004962779106823,0.013729977117333,0.00904977380111,0.004514672640982,0.016528925619835,1e-04,0.027855153258277,0.011834319585449,0.024999999936719,1e-04,0.026809651528869,0.016348773841071,1e-04,0.009345794439034,0.00457665899303,0.004705882305772,0.023201856194357,1e-04,0.033734939711656,0.014251781472007,0.004662004755245,0.009259259166667,0.056872037917387,0.018518518611111,0.014598540145986,0.009478673032951,0.023529411811211,0.004819277060357,0.018691588737881,0.018957345923721,0.005390835525461,0.056179775223141,0.016348773841071,0.01104972381185,0.010928961639344,1e-04,1e-04,0.010869565271444,0.011363636420778,0.016085790883856,0.016,0.005665722322786,0.01117318441372,0.028818443860841,1e-04,0.022988505862069,0.01010101,1e-04,0.018083182676638,0.00904977380111,0.00961538466323,0.005390835525461,0.005763688703004,1e-04,0.005571030584803,1e-04,0.014388489208633,0.005633802760722,0.005633802760722,1e-04,0.005361930241431,0.005698005811966,0.013986013986014,1e-04,1e-04)
x <- c(600,600,600,600,600,600,600,600,600,600,600,600,600,600,600,600,600,600,600,600,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,744.47,744.47,744.47,744.47,744.47,744.47,744.47,630.42,630.42,630.42,630.42,630.42,630.42,630.42,630.42,630.42)
hist(y,breaks=15)
plot(y~x)
fit <- glm(y~x,family='Gamma'(link='log'))
abline(fit)
abline plots linear functions, from a simple linear regression, say. A GLM with a Gamma family and a log link is nonlinear on the original scale. To visualize the fit of such a model, you could use predict (an example is given below). Several packages (e.g. effects or visreg) for R exist that feature functions that allow you to directly plot the fit on the original scale including confidence intervals.
Here is an example using visreg using your data and model:
library(visreg)
y <- c(0.00904977380111,0.009174311972687,0.022573363475789,0.081632653008122,0.005571030584803,1e-04,0.02375296916921,0.004962779106823,0.013729977117333,0.00904977380111,0.004514672640982,0.016528925619835,1e-04,0.027855153258277,0.011834319585449,0.024999999936719,1e-04,0.026809651528869,0.016348773841071,1e-04,0.009345794439034,0.00457665899303,0.004705882305772,0.023201856194357,1e-04,0.033734939711656,0.014251781472007,0.004662004755245,0.009259259166667,0.056872037917387,0.018518518611111,0.014598540145986,0.009478673032951,0.023529411811211,0.004819277060357,0.018691588737881,0.018957345923721,0.005390835525461,0.056179775223141,0.016348773841071,0.01104972381185,0.010928961639344,1e-04,1e-04,0.010869565271444,0.011363636420778,0.016085790883856,0.016,0.005665722322786,0.01117318441372,0.028818443860841,1e-04,0.022988505862069,0.01010101,1e-04,0.018083182676638,0.00904977380111,0.00961538466323,0.005390835525461,0.005763688703004,1e-04,0.005571030584803,1e-04,0.014388489208633,0.005633802760722,0.005633802760722,1e-04,0.005361930241431,0.005698005811966,0.013986013986014,1e-04,1e-04)
x <- c(600,600,600,600,600,600,600,600,600,600,600,600,600,600,600,600,600,600,600,600,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,3500,744.47,744.47,744.47,744.47,744.47,744.47,744.47,630.42,630.42,630.42,630.42,630.42,630.42,630.42,630.42,630.42)
fit <- glm(y~x,family='Gamma'(link='log'))
visreg(fit, scale = "response")
An here is the example using R base graphics and predict:
pred_frame <- data.frame(
x = seq(min(x), max(x), length.out = 1000)
)
pred_frame$fit <- predict(fit, newdata = pred_frame, type = "response")
plot(y~x, pch = 16, las = 1, cex = 1.5)
lines(fit~x, data = pred_frame, col = "steelblue", lwd = 3)
You are not being consistent here since you chose to model on the log scale but you are plotting on the raw scale. Mind you many, many published plots do the same. You need to plot the points in log space or transform the coefficients and pass them to abline() explicitly.

predicted values with MuMIn throwing error when full = FALSE

I am running into an error when I try to compute the predicted values from a model averaged object using the MuMIn package's predict.averaging. I have been assured that when the full argument is set to FALSE the function should return predicted values based off the conditional average coefficients. However, it returns an error. See example below using the cars dataset. It is very similar to my actual set up.
library(MuMIn)
options(na.action = "na.fail")
global.model <- glm(mpg ~ hp + drat + wt,
data=mtcars)
dr <- dredge(global.model)
mod.avg <- model.avg(dr, subset = delta < 2, fit = T)
summary(mod.avg)
predict(mod.avg, se.fit = TRUE, full = FALSE)
The error indicates that full is ignored, meaning that the full model coefficients are used for the predicted values (not what I want). I have confirmed this by some simple manual checking of values. It is also evident my examining predict() output. Notice how the values jump, suggesting that a coefficient is set to zero or something. It has also been suggested that changing glm to lm will fix the issue but it does not, at least for me.
Thanks!
Comparing predictions from the component models to the averaged ones you can see that the "full averaged" predictions fall within the component predictions (which is as it should be).
On the other hand, the "subset averaged" coefficients produce predictions that are quite biased. This is because the effects are augmented due to ignoring the zero coefficients when calculating the mean.
# Full/subset averaged predictions
pyfa <- predict(mod.avg, full = TRUE)
pysa <- predict(mod.avg, full = FALSE)
# Note: full=FALSE works only with se.fit=FALSE
# Predictions from component models
pycm <- do.call("cbind", lapply(get.models(mod.avg, TRUE), predict))
n <- ncol(pycm)
k <- rep(1:3, c(n, 1, 1))
lty <- c(2,1,1); lwd <- c(1,2,2); col <- c(3,1,2)
matplot(cbind(pycm, pyfa, pysa), type = "l",
lty = lty[k], lwd = lwd[k], col = col[k],
ylab = "predicted")
legend("topleft", legend = c("component", "full average", "subset average"),
lty = lty, lwd = lwd, col = col)

Resources