Display single coefficient plots in quantile regressions? - r

I am plotting regression summaries for a quantile regression I did with quantreg.
Obviously the method plot.summary.rqs is in use here. The problem is that is use quite a few explanatory variables each of which are displayed in the plot. Most of the coefficients behave not significantly different from OLS, so I just want to pick out and display a few of them.
How can I select the plots that I need to show? I am using knitr for my reports but do not want to show dozens of variables (and you get there quickly using dummies). Is there a way to cherry pick?

By default, plot.summary.rqs plots all coefficients:
library(quantreg)
data(stackloss)
myrq <- rq(stack.loss ~ Air.Flow + Water.Temp + Acid.Conc., tau = seq(0.35, 0.65, 0.1), data=stackloss)
plot(summary(myrq)) # Plots all 4 coefficients
To cherry pick coefficients, the parm argument can be used:
plot(summary(myrq), parm = 2) # Plot only second regressor (Air.Flow)
plot(summary(myrq), parm = "Water.Temp") # Plot only Water.Temp
plot(summary(myrq), parm = 3:4) # Plot third and fourth regressor

Related

Plotting Cumulative Events from Adjusted Survival Curve in R

I am attempting to create an adjusted survival curve (from a Cox model) and would like to display this information as cumulative events.
I have attempted this:
library(survival)
data("ovarian")
library(survminer)
model<-coxph(Surv(futime, fustat) ~ age + strata(rx), data=ovarian)
gplot<-ggadjustedcurves(model) ## Expected plot of adjusted survival curve
Because the "fun=" still has not been implemented in ggadjustedcurves I took the advice of a user on this page and extracted the elements into plotdata and created a new column as shown below.
plotdata<-gplot$data
plotdata%<>%
mutate(new=1-surv) ## 1-survival probability
I am new to R environment and ggplot so how can I then plot the new adjusted survival curve with the new created column and keep the theme of the original plot (contained in gplot).
Thanks!
Edit:
My current solution is as follows.
library(rms)
model<-coxph(Surv(futime, fustat) ~ age+ strata(rx), data=ovarian)
survfit(model, conf.type = "plain", conf.int = 1)
plot(survfit(model), conf.int = T,col = c(1,2), fun='event')
This achieves the survival curve I wanted however I am not sure if the confidence bars are really the standard errors (+/-1). I supplied 1 to the conf.int argument and believe this to create the standard errors in this way since conf.type is specified as plain.
How can I further customize this plot as the base graph looks rather bland! How do I get a display as close as possible to the survminer curves?
You can use the adjustedCurves package instead, which allows both plotting confidence intervals and naturally includes an option to display cumulative incidence functions. First, install it using:
devtools::install_github("https://github.com/RobinDenz1/adjustedCurves")
Now you can use:
library(adjustedCurves)
library(survival)
library(riskRegression)
# needs to be a factor
ovarian$rx <- factor(ovarian$rx)
# needs to include x=TRUE
model <- coxph(Surv(futime, fustat) ~ age + strata(rx), data=ovarian, x=TRUE)
adj <- adjustedsurv(data=ovarian,
event="fustat",
ev_time="futime",
variable="rx",
method="direct",
outcome_model=model,
conf_int=TRUE)
plot(adj, cif=TRUE, conf_int=TRUE)
Which produces:
I would probably not use this method here, though. Simulation studies have shown that the cox-regression based method performs badly in small sample sizes. You might want to take a look at method="iptw" or method="aiptw" inside the adjustedCurves package instead.

Plot standard error bars from mixed model using ggplot2

I want to plot (using ggplot2) linear mixed effects model (lmer function from lme4) together with error bars representing standard errors. Here is the model:
m1 <- lmer(repinterv ~ prevoutc * outcome * prevtask + (1|id), p1)
Repinterv is a continuous dependent variable, while three factors are binary, within-subjects. Each line of the data frame is a single experimental trial.
While I have a working line to make a fit for each effect and interaction, I'm really strugling with error bars.
p1$fit = model.matrix(m1) %% fixef (m1) # fits
p1$fitse = model.matrix(m1) %% coef(summary(m1))[,2] # standard errors
The first line here calculates the fitted values for each level of the model. I tried to use it for standard errors from model's summary, but the problem is that while fixed effects are presented as the relative difference from the intercept, SE are the actual values (as I understand it). If I use this method then I get summed standard errors for each fit, instead of the actual value from coef(summary(m1)).
ggplot(p1, aes(x = outcome, y = fit, fill = prevoutc)) + # grouped bar plot
facet_wrap(~ prevtask, labeller = gridlab) +
stat_summary(fun.y=mean,geom="bar", position=position_dodge(0.9)) +
geom_errorbar(aes(ymin=fit-fitse, ymax=fit+fitse), width = 0.1, size =0.5, position = position_dodge(0.9))
Can you please tip me if I should use some other operator or a different method to subtract SE for this model?
Edit:
Below are the coefficients of my model. I want to plot estimates and corresponding standard errors.
Estimate Std. Error t value
(Intercept) 335.69881 16.190304 20.734558
prevoutc1 10.74602 7.143445 1.504318
outcome1 37.36665 8.471898 4.410659
prevtask1 12.92135 7.330930 1.762580
prevoutc1:outcome1 -14.39956 9.338283 -1.541992
prevoutc1:prevtask1 17.37322 10.491121 1.655993
outcome1:prevtask1 -29.37134 9.957079 -2.949795
prevoutc1:outcome1:prevtask1 14.75692 13.539756 1.089896
And that's the plot I currently have:

Predict Future values using polynomial regression in R

Was trying to predict the future value of a sample using polynomial regression in R. The y values within the sample forms a wave pattern.
For example
x = 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16
y= 1,2,3,4,5,4,3,2,1,0,1,2,3,4,5,4
But when the graph is plotted for future values the resultant y values was completely different from what was expected. Instead of a wave pattern, was getting a graph where the y values keep increasing.
futurY = 17,18,19,20,21,22
Tried different degrees of polynomial regression, but the predicted results for futurY were drastically different from what was expected
Following is the sample R code which was used to get the results
dfram <- data.frame('x'=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16))
dfram$y <- c(1,2,3,4,5,4,3,2,1,0,1,2,3,4,5,4)
plot(dfram,dfram$y,type="l", lwd=3)
pred <- data.frame('x'=c(17,18,19,20,21,22))
myFit <- lm(y ~ poly(x,5), data=dfram)
newdata <- predict(myFit, pred)
print(newdata)
plot(pred[,1],data.frame(newdata)[,1],type="l",col="red", lwd=3)
Is this the correct technique to be used for predicting the unknown future y values OR should I be using other techniques like forecasting?
# Reproducing your data frame
dfram <- data.frame("x" = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16),
"y" = c(1,2,3,4,5,4,3,2,1,0,1,2,3,4,5,4))
From your graph I've got the phase and period of the signal. There're better ways of calculating that automatically.
# Phase and period
fase = 1
per = 10
In the linear model function I've put the triangular signal equations.
fit <- lm(y ~ I((((trunc((x-fase)/(per/2))%%2)*2)-1) * (x-fase)%%(per/2))
+ I((((trunc((x-fase)/(per/2))%%2)*2)-1) * ((per/2)-((x-fase)%%(per/2))))
,data=dfram)
# Predict the old data
p_olddata <- predict(fit,type="response")
# Predict the new data
newdata <- data.frame('x'=c(17,18,19,20,21,22))
p_newdata <- predict(fit,newdata,type="response")
# Ploting Old and new data
plot(x=c(dfram$x,newdata$x),
y=c(p_olddata,p_newdata),
col=c(rep("blue",length(p_olddata)),rep("green",length(p_olddata))),
xlab="x",
ylab="y")
lines(dfram)
Where the black line is the original signal, the blue circles are the prediction for the original points and the green circles are the prediction for the new data.
The graph shows a perfect fit for the model because there's no noise in the data. In a real dataset you may find it so the fit will not look as nice as that.

How to get only the plots from gam.check

When applying gam.check in the mgcv package, R produces some residual plots and basis dimension output. Is there a way to only produce the plots and not the printed output?
library(mgcv)
set.seed(0)
dat <- gamSim(1,n=200)
b <- gam(y~s(x0)+s(x1)+s(x2)+s(x3), data=dat)
plot(b, pages=1)
gam.check(b, pch=19, cex=.3)
There are four plots, from top left, moving down and across we have:
A QQ plot of the residuals
A histogram of the residuals
A plot of residuals vs the linear predictor
A plot of observed vs fitted values.
In the code below, I assume b contains your fitted model, as per your example. First some things we need
type <- "deviance" ## "pearson" & "response" are other valid choices
resid <- residuals(b, type = type)
linpred <- napredict(b$na.action, b$linear.predictors)
observed.y <- napredict(b$na.action, b$y)
Note the last two lines are applying the NA handling method used when the model was fitted to the information on the linear.predictors and y, the stored copy of the response data.
The above code and that shown below is all given in the first 10 or so lines of the gam.check() source. To view this, just enter
gam.check
at the R prompt.
Each plot is produced as follows:
QQ plot
This is produced via qq.gam():
qq.gam(b, rep = 0, level = 0.9, type = type, rl.col = 2,
rep.col = "gray80")
Histogram of residuals
This is produced using
hist(resid, xlab = "Residuals", main = "Histogram of residuals")
Residuals vs linear predictor
This is produced using
plot(linpred, resid, main = "Resids vs. linear pred.",
xlab = "linear predictor", ylab = "residuals")
Observed vs fitted values
This is produced using
plot(fitted(b), observed.y, xlab = "Fitted Values",
ylab = "Response", main = "Response vs. Fitted Values")
There are now the two packages gratia and mgcViz which have functions to produce the gam.check output as ggplots which you can store as an object. The former doesn't print anything to console, the latter does.
require(gratia)
appraise(b)
require(mgcViz)
b = getViz(b)
check(b)

Conditionally colour data points outside of confidence bands in R

I need to colour datapoints that are outside of the the confidence bands on the plot below differently from those within the bands. Should I add a separate column to my dataset to record whether the data points are within the confidence bands? Can you provide an example please?
Example dataset:
## Dataset from http://www.apsnet.org/education/advancedplantpath/topics/RModules/doc1/04_Linear_regression.html
## Disease severity as a function of temperature
# Response variable, disease severity
diseasesev<-c(1.9,3.1,3.3,4.8,5.3,6.1,6.4,7.6,9.8,12.4)
# Predictor variable, (Centigrade)
temperature<-c(2,1,5,5,20,20,23,10,30,25)
## For convenience, the data may be formatted into a dataframe
severity <- as.data.frame(cbind(diseasesev,temperature))
## Fit a linear model for the data and summarize the output from function lm()
severity.lm <- lm(diseasesev~temperature,data=severity)
# Take a look at the data
plot(
diseasesev~temperature,
data=severity,
xlab="Temperature",
ylab="% Disease Severity",
pch=16,
pty="s",
xlim=c(0,30),
ylim=c(0,30)
)
title(main="Graph of % Disease Severity vs Temperature")
par(new=TRUE) # don't start a new plot
## Get datapoints predicted by best fit line and confidence bands
## at every 0.01 interval
xRange=data.frame(temperature=seq(min(temperature),max(temperature),0.01))
pred4plot <- predict(
lm(diseasesev~temperature),
xRange,
level=0.95,
interval="confidence"
)
## Plot lines derrived from best fit line and confidence band datapoints
matplot(
xRange,
pred4plot,
lty=c(1,2,2), #vector of line types and widths
type="l", #type of plot for each column of y
xlim=c(0,30),
ylim=c(0,30),
xlab="",
ylab=""
)
Well, I thought that this would be pretty easy with ggplot2, but now I realize that I have no idea how the confidence limits for stat_smooth/geom_smooth are calculated.
Consider the following:
library(ggplot2)
pred <- as.data.frame(predict(severity.lm,level=0.95,interval="confidence"))
dat <- data.frame(diseasesev,temperature,
in_interval = diseasesev <=pred$upr & diseasesev >=pred$lwr ,pred)
ggplot(dat,aes(y=diseasesev,x=temperature)) +
stat_smooth(method='lm') + geom_point(aes(colour=in_interval)) +
geom_line(aes(y=lwr),colour=I('red')) + geom_line(aes(y=upr),colour=I('red'))
This produces:
alt text http://ifellows.ucsd.edu/pmwiki/uploads/Main/strangeplot.jpg
I don't understand why the confidence band calculated by stat_smooth is inconsistent with the band calculated directly from predict (i.e. the red lines). Can anyone shed some light on this?
Edit:
figured it out. ggplot2 uses 1.96 * standard error to draw the intervals for all smoothing methods.
pred <- as.data.frame(predict(severity.lm,se.fit=TRUE,
level=0.95,interval="confidence"))
dat <- data.frame(diseasesev,temperature,
in_interval = diseasesev <=pred$fit.upr & diseasesev >=pred$fit.lwr ,pred)
ggplot(dat,aes(y=diseasesev,x=temperature)) +
stat_smooth(method='lm') +
geom_point(aes(colour=in_interval)) +
geom_line(aes(y=fit.lwr),colour=I('red')) +
geom_line(aes(y=fit.upr),colour=I('red')) +
geom_line(aes(y=fit.fit-1.96*se.fit),colour=I('green')) +
geom_line(aes(y=fit.fit+1.96*se.fit),colour=I('green'))
The easiest way is probably to calculate a vector of TRUE/FALSE values that indicate if a data point is inside of the confidence interval or not. I'm going to reshuffle your example a little bit so that all of the calculations are completed before the plotting commands are executed- this provides a clean separation in the program logic that could be exploited if you were to package some of this into a function.
The first part is pretty much the same, except I replaced the additional call to lm() inside predict() with the severity.lm variable- there is no need to use additional computing resources to recalculate the linear model when we already have it stored:
## Dataset from
# apsnet.org/education/advancedplantpath/topics/
# RModules/doc1/04_Linear_regression.html
## Disease severity as a function of temperature
# Response variable, disease severity
diseasesev<-c(1.9,3.1,3.3,4.8,5.3,6.1,6.4,7.6,9.8,12.4)
# Predictor variable, (Centigrade)
temperature<-c(2,1,5,5,20,20,23,10,30,25)
## For convenience, the data may be formatted into a dataframe
severity <- as.data.frame(cbind(diseasesev,temperature))
## Fit a linear model for the data and summarize the output from function lm()
severity.lm <- lm(diseasesev~temperature,data=severity)
## Get datapoints predicted by best fit line and confidence bands
## at every 0.01 interval
xRange=data.frame(temperature=seq(min(temperature),max(temperature),0.01))
pred4plot <- predict(
severity.lm,
xRange,
level=0.95,
interval="confidence"
)
Now, we'll calculate the confidence intervals for the origional data points and run a test to see if the points are inside the interval:
modelConfInt <- predict(
severity.lm,
level = 0.95,
interval = "confidence"
)
insideInterval <- modelConfInt[,'lwr'] < severity[['diseasesev']] &
severity[['diseasesev']] < modelConfInt[,'upr']
Then we'll do the plot- first a the high-level plotting function plot(), as you used it in your example, but we will only plot the points inside the interval. We will then follow up with the low-level function points() which will plot all the points outside the interval in a different color. Finally, matplot() will be used to fill in the confidence intervals as you used it. However instead of calling par(new=TRUE) I prefer to pass the argument add=TRUE to high-level functions to make them act like low level functions.
Using par(new=TRUE) is like playing a dirty trick a plotting function- which can have unforeseen consequences. The add argument is provided by many functions to cause them to add information to a plot rather than redraw it- I would recommend exploiting this argument whenever possible and fall back on par() manipulations as a last resort.
# Take a look at the data- those points inside the interval
plot(
diseasesev~temperature,
data=severity[ insideInterval,],
xlab="Temperature",
ylab="% Disease Severity",
pch=16,
pty="s",
xlim=c(0,30),
ylim=c(0,30)
)
title(main="Graph of % Disease Severity vs Temperature")
# Add points outside the interval, color differently
points(
diseasesev~temperature,
pch = 16,
col = 'red',
data = severity[ !insideInterval,]
)
# Add regression line and confidence intervals
matplot(
xRange,
pred4plot,
lty=c(1,2,2), #vector of line types and widths
type="l", #type of plot for each column of y
add = TRUE
)
I liked the idea and tried to make a function for that. Of course it's far from being perfect. Your comments are welcome
diseasesev<-c(1.9,3.1,3.3,4.8,5.3,6.1,6.4,7.6,9.8,12.4)
# Predictor variable, (Centigrade)
temperature<-c(2,1,5,5,20,20,23,10,30,25)
## For convenience, the data may be formatted into a dataframe
severity <- as.data.frame(cbind(diseasesev,temperature))
## Fit a linear model for the data and summarize the output from function lm()
severity.lm <- lm(diseasesev~temperature,data=severity)
# Function to plot the linear regression and overlay the confidence intervals
ci.lines<-function(model,conf= .95 ,interval = "confidence"){
x <- model[[12]][[2]]
y <- model[[12]][[1]]
xm<-mean(x)
n<-length(x)
ssx<- sum((x - mean(x))^2)
s.t<- qt(1-(1-conf)/2,(n-2))
xv<-seq(min(x),max(x),(max(x) - min(x))/100)
yv<- coef(model)[1]+coef(model)[2]*xv
se <- switch(interval,
confidence = summary(model)[[6]] * sqrt(1/n+(xv-xm)^2/ssx),
prediction = summary(model)[[6]] * sqrt(1+1/n+(xv-xm)^2/ssx)
)
# summary(model)[[6]] = 'sigma'
ci<-s.t*se
uyv<-yv+ci
lyv<-yv-ci
limits1 <- min(c(x,y))
limits2 <- max(c(x,y))
predictions <- predict(model, level = conf, interval = interval)
insideCI <- predictions[,'lwr'] < y & y < predictions[,'upr']
x_name <- rownames(attr(model[[11]],"factors"))[2]
y_name <- rownames(attr(model[[11]],"factors"))[1]
plot(x[insideCI],y[insideCI],
pch=16,pty="s",xlim=c(limits1,limits2),ylim=c(limits1,limits2),
xlab=x_name,
ylab=y_name,
main=paste("Graph of ", y_name, " vs ", x_name,sep=""))
abline(model)
points(x[!insideCI],y[!insideCI], pch = 16, col = 'red')
lines(xv,uyv,lty=2,col=3)
lines(xv,lyv,lty=2,col=3)
}
Use it like this:
ci.lines(severity.lm, conf= .95 , interval = "confidence")
ci.lines(severity.lm, conf= .85 , interval = "prediction")

Resources