I have a data series of around 250 annual maximum rainfall measurements, maxima[,] and want to apply quantile regression to all series at once and obtain the significance of each regression model in R.
library(quantreg)
qmag <- array(NA, c(250,4))
taus <- c(0.05, 0.1, 0.95, 0.975)
for(igau in 1:250){
qure <- rq(maxima[,igau+1]~maxima[,1], tau=taus)
qmag[igau,] <- coef(qure)[2,]
}
I've tried
summary(qure, se="boot")$p.value
ci(qure)
and other similar variations but get NULL values. Is it actually possible to automatically extract the p-values from quantreg to a table, rather than just viewing them individually in summary() for each model?
have a look at the structure produced by running str() of the summary-object:
require(quantreg)
data(engel)
mod <- rq(foodexp ~ income, data = engel)
summ <- summary(mod, se = "boot")
summ
str(summ)
summ$coefficients[,4]
Related
I am using metafor package for combining beta coefficients from the linear regression model. I used the following code. I supplied the reported se and beta values for the rma function. But, when I see the forest plot, the 95% confidence intervals are different from the ones reported in the studies. I also tried it using mtcars data set by running three models and combining the coefficients. Still, the 95%CI we see on the forest plot are different from the original models. The deviations are far from rounding errors. A reproducible example is below.
library(metafor)
library(dplyr)
lm1 <- lm(hp~mpg, data=mtcars[1:15,])
lm2 <- lm(hp~mpg, data=mtcars[1:32,])
lm3 <- lm(hp~mpg, data=mtcars[13:32,])
study <- c("study1", "study2", "study3")
beta_coef <- c(lm1$coefficients[2],
lm2$coefficients[2],
lm3$coefficients[2]) %>% as.numeric()
se <- c(1.856, 1.31,1.458)
ci_lower <- c(confint(lm1)[2,1],
confint(lm2)[2,1],
confint(lm3)[2,1]) %>% as.numeric()
ci_upper <- c(confint(lm1)[2,2],
confint(lm2)[2,2],
confint(lm3)[2,2]) %>% as.numeric()
df <- cbind(study=study,
beta_coef=beta_coef,
se=se,
ci_lower=ci_lower,
ci_upper=ci_upper) %>% as.data.frame()
pooled <- rma(yi=beta_coef, vi=se, slab=study)
forest(pooled)
Compare the confidence intervals on the forest plot with the one on the data frame.
data frame
df <- cbind(study=study,
beta_coef=beta_coef,
se=se,
ci_lower=ci_lower,
ci_upper=ci_upper) %>% as.data.frame()
Argument vi is for specifying the sampling variances, but you are passing the standard errors to the argument. So you should do:
pooled <- rma(yi=beta_coef, sei=se, slab=study)
But you will still find a discrepancy here, since the CIs in the forest plot are constructed based on a normal distribution, while the CIs you obtained from the regression model are based on t-distributions. If you want the exact same CIs in the forest plot, you could just pass the CI bounds to the function like this:
forest(beta_coef, ci.lb=ci_lower, ci.ub=ci_upper)
If you want to add a summary polygon from some meta-analysis to the forest plot, you can do this with addpoly(). So the complete code for this example would be:
forest(beta_coef, ci.lb=ci_lower, ci.ub=ci_upper, ylim=c(-1.5,6))
addpoly(pooled, row=-1)
abline(h=0)
My aim is to study how the inclusion of several risk factors improve a clinical model predicting the incidence of stroke in a survival analysis. I want to use the NRI to compare two models (1 baseline model vs baseline model+new risk factor). However, I would prefer to stratify the probabilities of these models in categorical variables (i.e., risk 0-3%, risk 4-6%, risk >7%...).
For that purpose I have prepared the following code with R, but I am not sure whether it is correct or not and I would prefer to clarify it =).
I will use the open survival data “lung” in order to show a reproducible example. IMPORTANT: I will not consider censored data in this analysis, but it should be analyzed as appropriate in further analyses:
library(survival)
library(pec)##To calculate probabilities of event at a point of time.
library(Hmisc)##To calculate NRI
data(lung)
lung <- na.omit(lung)
lung$status <- lung$status-1##necessary for NRI package
stats <- lung$status
tempo <- lung$time
##Create two models, 1 baseline and 1 baseline+new predictor
model1 <- coxph(Surv(tempo,stats)~age+sex, data=lung, x=T)
model2 <- coxph(Surv(tempo,stats)~age+sex+factor(ph.ecog), data=lung,
x=T)
##Estimate the survival probability at time= 500.
lung$x <- predictSurvProb(model1, newdata=lung, times=c(500))
lung$y <- predictSurvProb(model2, newdata=lung, times=c(500))
##Calculate the probability of event (1-survival) and we categorize it.
lung$x <- cut(1-lung$x, c(0, 0.25, 0.5, 0.75, 1), include.lowest=TRUE,
right=FALSE)
lung$y <- cut(1-lung$y, c(0, 0.25, 0.5, 0.75, 1), include.lowest=TRUE,
right=FALSE)
##Confusion matrix
cases <- subset(lung, stats==1)
controls <- subset(lung, stats==0)
table(lung$x, lung$y)##All patients
table(cases$x, cases$y)##cases
table(controls$x, controls$y)##controls
##Calculate NRI
x <- as.numeric(lung$x)/4##Predictions should be ranged between 0-1
y <- as.numeric(lung$y)/4
improveProb(x, y, stats)
My questions:
1) Is this mode to calculate the risk of event at t=500 correct?
2) Is it correct to calculate NRI with this function? I have tried other packages (i.e., NRIcens) and I obtained the same results…
Thanks in advance for your help!
I'm sure this is easily resolvable, but I have a question regarding quantile regression.
Say I have a data frame which follows the trend of a second-order polynomial curve and I construct a quantile regression fitted through different parts of the data:
##Data preperation
set.seed(5)
d <- data.frame(x=seq(-5, 5, len=51))
d$y <- 50 - 0.3*d$x^2 + rnorm(nrow(d))
##Quantile regression
Taus <- c(0.1,0.5,0.9)
QUA<-rq(y ~ 1 + x + I(x^2), tau=Taus, data=d)
plot(y~x,data=d)
for (k in 1:length(Taus)){
curve((QUA$coef[1,k])+(QUA$coef[2,k])*(x)+(QUA$coef[3,k])*(x^2),lwd=2,lty=1, add = TRUE)
}
I can obtain the maximum y value through the 'predict.rq' function and you can see this the following plot.
##Maximum prediction
Pred_df<- as.data.frame(predict.rq(QUA))
apply(Pred_df,2,max)
So my question is how do I obtain the x-value which corresponds to the maximum y-value (i.e. the break in slope) for each quantile?
The package broom could be very useful here:
library(broom)
library(dplyr)
augment(QUA) %>%
group_by(.tau) %>%
filter(.fitted == max(.fitted))
For my manuscript, I plotted a lme with an interaction of two continuous variables:
Create data
mydata <- data.frame( SID=sample(1:150,400,replace=TRUE),age=sample(50:70,400,replace=TRUE), sex=sample(c("Male","Female"),200, replace=TRUE),time= seq(0.7, 6.2, length.out=400), Vol =rnorm(400),HCD =rnorm(400))
mydata$time <- as.numeric(mydata$time)
Run the model:
model <- lme(HCD ~ age*time+sex*time+Vol*time, random=~time|SID, data=mydata)
Make plot:
sjp.int(model, swap.pred=T, show.ci=T, mdrt.values="meansd")
The reviewer now wants me to add the raw data points to this plot. How can I do this? I tried adding geom_point() referring to mydata, but that is not possible.
Any ideas?
Update:
I thought that maybe I could extract the random slope of HCD and then residuals HCD for the covariates and also residuals Vol for the covariates and plot those two to make things easier (then I could plot the points in a 2D plot).
So, I tried to extract the slopes and use these to fit a linear regression, but the results are different (in the reproducible example less significant, but in my data: the interaction became non-significant (and was significant in the lme)). Not sure what that means or whether this just shows that I should not try to plot it this way.
get the slopes:
model <- lme(HCD ~ time, random=~time|SID, data=mydata)
slopes <- rbind(row.names(model$coefficients$random$SID), model$coef$random$SID[,2])
slopes2 <- data.frame(matrix(unlist(slopes), nrow=144, byrow=T))
names(slopes2)[1] <- "SID"
names(slopes2)[2] <- "slopes"
(save the slopes2 and reopen, because somehow R sees it as a factor)
Then create a cross-sectional dataframe and merge the slopes:
mydata$time2 <- round(mydata$time)
new <- reshape(mydata,idvar = "SID", timevar="time2", direction="wide")
newdata <- dplyr::left_join(new, slop, by="SID")
The lm:
modelw <- lm(slop$slopes ~ age.1+sex.1+Vol.1, data=newdata)
Vol now has a p-value of 0.8 (previously this was 0.14)
I have been struggling with the following problem for some time and would be very grateful for any help. I am running a logit model in R using the mlogit function and am able to generate the predicted probability of choosing each alternative for a given value of the predictors as follows:
library(mlogit)
data("Fishing", package = "mlogit")
Fish <- mlogit.data(Fishing, varying = c(2:9), shape = "wide", choice = "mode")
Fish_fit<-Fish[-(1:4),]
Fish_test<-Fish[1:4,]
m <- mlogit(mode ~price+ catch | income, data = Fish_fit)
predict(m,newdata=Fish_test,)
I cannot, however, work out how to add confidence intervals to the predicted probability estimates. I have already tried adding arguments to the predict function, but none seem to generate them. Any ideas on how it can be achieved would be much appreciated.
One approach here is Monte Carlo simulation. You'd simulate repeated draws from a multivariate-normal sampling distribution whose parameters are given by your model results.
For each simulation, estimate your predicted probabilities, and use their empirical distribution over simulations to get your confidence intervals.
library(MASS)
est_betas <- m$coefficients
est_preds <- predict(m, newdata = Fish_test)
sim_betas <- mvrnorm(1000, m$coefficients, vcov(m))
sim_preds <- apply(sim_betas, 1, function(x) {
m$coefficients <- x
predict(m, newdata = Fish_test)
})
sim_ci <- apply(sim_preds, 1, quantile, c(.025, .975))
cbind(prob = est_preds, t(sim_ci))
# prob 2.5% 97.5%
# beach 0.1414336 0.10403634 0.1920795
# boat 0.3869535 0.33521346 0.4406527
# charter 0.3363766 0.28751240 0.3894717
# pier 0.1352363 0.09858375 0.1823240