Plot standard error bars from mixed model using ggplot2 - r

I want to plot (using ggplot2) linear mixed effects model (lmer function from lme4) together with error bars representing standard errors. Here is the model:
m1 <- lmer(repinterv ~ prevoutc * outcome * prevtask + (1|id), p1)
Repinterv is a continuous dependent variable, while three factors are binary, within-subjects. Each line of the data frame is a single experimental trial.
While I have a working line to make a fit for each effect and interaction, I'm really strugling with error bars.
p1$fit = model.matrix(m1) %% fixef (m1) # fits
p1$fitse = model.matrix(m1) %% coef(summary(m1))[,2] # standard errors
The first line here calculates the fitted values for each level of the model. I tried to use it for standard errors from model's summary, but the problem is that while fixed effects are presented as the relative difference from the intercept, SE are the actual values (as I understand it). If I use this method then I get summed standard errors for each fit, instead of the actual value from coef(summary(m1)).
ggplot(p1, aes(x = outcome, y = fit, fill = prevoutc)) + # grouped bar plot
facet_wrap(~ prevtask, labeller = gridlab) +
stat_summary(fun.y=mean,geom="bar", position=position_dodge(0.9)) +
geom_errorbar(aes(ymin=fit-fitse, ymax=fit+fitse), width = 0.1, size =0.5, position = position_dodge(0.9))
Can you please tip me if I should use some other operator or a different method to subtract SE for this model?
Edit:
Below are the coefficients of my model. I want to plot estimates and corresponding standard errors.
Estimate Std. Error t value
(Intercept) 335.69881 16.190304 20.734558
prevoutc1 10.74602 7.143445 1.504318
outcome1 37.36665 8.471898 4.410659
prevtask1 12.92135 7.330930 1.762580
prevoutc1:outcome1 -14.39956 9.338283 -1.541992
prevoutc1:prevtask1 17.37322 10.491121 1.655993
outcome1:prevtask1 -29.37134 9.957079 -2.949795
prevoutc1:outcome1:prevtask1 14.75692 13.539756 1.089896
And that's the plot I currently have:

Related

ROC for Logistic regression in R

I would like to ask for help with my project. My goal is to get ROC curve from existing logistic regression.
First of all, here is what I'm analyzing.
glm.fit <- glm(Severity_Binary ~ Side + State + Timezone + Temperature.F. + Wind_Chill.F. + Humidity... + Pressure.in. + Visibility.mi. + Wind_Direction + Wind_Speed.mph. + Precipitation.in. + Amenity + Bump + Crossing + Give_Way + Junction + No_Exit + Railway + Station + Stop + Traffic_Calming + Traffic_Signal + Sunrise_Sunset , data = train_data, family = binomial)
glm.probs <- predict(glm.fit,type = "response")
glm.probs = predict(glm.fit, newdata = test_data, type = "response")
glm.pred = ifelse(glm.probs > 0.5, "1", "0")
This part works fine, I am able to show a table of prediction and mean result. But here comes the problem for me, I'm using pROC library, but I am open to use anything else which you can help me with. I'm using test_data with approximately 975 rows, but variable proc has only 3 sensitivities/specificities values.
library(pROC)
proc <- roc(test_data$Severity_Binary,glm.probs)
test_data$sens <- proc$sensitivities[1:975]
test_data$spec <- proc$specificities[1:975]
ggplot(test_data, aes(x=spec, y=sens)) + geom_line()
Here´s what I have as a result:
With Warning message:
Removed 972 row(s) containing missing values (geom_path).
As I found out, proc has only 3 values as I said.
You can't (and shouldn't) assign the sensitivity and specificity to the data. They are summary data and exist in a different dimension than your data.
Specifically, these two lines are wrong and make no sense at all:
test_data$sens <- proc$sensitivities[1:975]
test_data$spec <- proc$specificities[1:975]
Instead you must either save them to a new data.frame, or use some of the existing functions like ggroc:
ggroc(proc)
If you consider what the ROC curve does, there is no reason to expect it to have the same dimensions as your dataframe. It provides summary statistics of your model performance (sensitivity, specificity) evaluated on your dataset for different thresholds in your prediction.
Usually you would expect some more nuance on the curve (more than the 3 datapoints at thresholds -Inf, 0.5, Inf). You can look at the distribution of your glm.probs - this ROC curve indicates that all predictions are either 0 or 1, with very little inbetween (hence only one threshold at 0.5 on your curve). [This could also mean that you unintentially used your binary glm.pred for calculating the ROC curve, and not glm.probs as shown in the question (?)]
This seems to be more an issue with your model than with your code - here an example from a random different dataset, using the same steps you took (glm(..., family = binomial, predict(, type = "response"). This produces a ROC curve with 333 steps for ~1300 datapoints.
PS: (Ingore the fact that this is evaluated on training data, the point is the code looks alright up to the point of generating the ROC curve)
m1 <- glm(survived ~ passengerClass + sex + age, data = dftitanic, family = binomial)
myroc <- roc(dftitanic$survived,predict(m1, dftitanic, type = "response"))
plot(myroc)

Interpretation of contour plots (mgcv)

When we plot a GAM model using the mgcv package with isotropic smoothers, we have a contour plot that looks something like this:
x axis for one predictor,
y axis for another predictor,
the main is a function s(x1, x2) (isotropic smother).
Suppose that in this model we have many other isotropic smoothers like:
y ~ s(x1, x2) + s(x3, x4) + s(x5, x6)
My doubts are: when interpreting the contour plot for s(x1, x2), what happens to the others isotropic smoothers? Are they "fixed at their medians"? Can we interpret a s(x1, x2) plot separately?
Because this model is additive in the functions you can interpret the functions (the separate s() terms) separately, but not necessarily as separate effects of covariates on the response. In your case there is no overlap between the covariates in each of the bivariate smooths, so you can also interpret them as the effects of the covariates on the response separately from the other smoothers.
All of the smooth functions are typically subject to a sum to zero constraint to allow the model constant term (the intercept) to be an identifiable parameter. As such, the 0 line in each plot is the value of the model constant term (on the scale of the link function or linear predictor).
The plots shown in the output from plot.gam(model) are partial effects plots or partial plots. You can essentially ignore the other terms if you are interested in understanding the effect of that term on the response as a function of the covariates for the term.
If you have other terms in the model that might include one or more covariates in another terms, and you want to look at how the response changes as you vary that term or coavriate, then you should predict from the model over the range of the variables you are interested in, whilst holding the other variables at some representation values, say their means or medians.
For example if you had
model <- gam(y ~ s(x, z) + s(x, v), data = foo, method = 'REML')
and you want to know how the response varied as a function of x only, you would fix z and v at representative values and then predict over a range of values for x:
newdf <- with(foo, expand.grid(x = seq(min(x), max(x), length = 100),
z = median(z)
v = median(v)))
newdf <- cbind(newdf, fit = predict(model, newdata = newdf, type = 'response'))
plot(fit ~ x, data = newdf, type = 'l')
Also, see ?vis.gam in the mgcv package as a means of preparing plots like this but where it does the hard work.

How do I plot predicted probabilities for a Logit regression with fixed effects in R?

I am a complete newbie to R.
I have the following logit equation I am estimating:
allAM <- glm (AM ~ VS + Prom + LS_Exp + Sex + Age + Age2 + Jpart + X2004LS + X2009LS + X2014LS + factor(State), family = binomial(link = "logit"), data = mydata)
AM is a standard binary (happened/didn’t happen). The three “X****LS” variables are dummies indicating different sessions of congress and “factor(State)” is used to generate fixed effects/dummies for each state.
VS is the key independent variable of interest and I want to generate the predicated probability that AM=1 for each value of VS between 0 and 60, holding everything else at its mean.
I am running into trouble, however, generating and plotting the predicted probabilities because “State” is a factor. I want to be able to show the average effects, not 50 different charts/effects for each state.
Per (Hanmer and Kalkan 2013) http://onlinelibrary.wiley.com/doi/10.1111/j.1540-5907.2012.00602.x/abstract I was advised to do the following to plot the predicted probabilities:
pred.seq <- seq(from=0, to=60, by=0.01)
pred.out <- c()
for(i in 1:length(pred.seq)){
mydata.c <- mydata
mydata.c$VS <- pred.seq[i]
pred.out[i] <- mean(predict(allAM, newdata=mydata.c, type="response"))
}
plot(pred.out ~ pred.seq, type="l")
This approach seems to work, though I don’t really understand it.
I want to add the upper and lower 95% confidence intervals to the plot, but when I attempt to do it by hand the way I know how:
lower <- pred.out$fit - (1.96*pred.out$se.fit)
upper <- pred.out$fit + (1.96*pred.out$se.fit)
I get the following error:
Error in pred.outfit:fit: operator is invalid for atomic vectors
Can anyone advise how I can plot the confidence intervals and how I can specify different levels of VS so that I can report some specific predicted probabilities?

Continuous quantiles of a scatterplot

I have a data set, for which I graphed a regression (using ggplot2's stat_smooth) :
ggplot(data = mydf, aes(x=time, y=pdm)) + geom_point() + stat_smooth(col="red")
I'd also like to have the quantiles (if it's simpler, having only the quartiles will do) using the same method. All I manage to get is the following :
ggplot(data = mydf, aes(x=time, y=pdm, z=surface)) + geom_point() + stat_smooth(col="red") + stat_quantile(quantiles = c(0.25,0.75))
Unfortunately, I can't put method="loess" in stat_quantile(), which, if I'm not mistaken, would solve my problem.
(In case it's not clear, desired behavior = non linear regressions for the quantiles, and therefore the regression for Q25 and Q75 being below and above (respectively) my red curve (and Q50, if plotted, would be my red curve)).
Thanks
stat_quantile is, by default, plotting best-fit lines for the 25th and 75th percentiles at each x-value. stat_quantile uses the rq function from the quantreg package (implicitly, method="rq" in the stat_quantile call). As far as I know, rq doesn't do loess regression. However, you can use other flexible functions for the quantile regression. Here are two examples:
B-Spline:
library(splines)
stat_quantile(formula=y ~ bs(x, df=4), quantiles = c(0.25,0.75))
Second-Order Polynomial:
stat_quantile(formula=y ~ poly(x, 2), quantiles = c(0.25,0.75))
stat_quantile is still using rq, but rq accepts formulas of the type listed above (if you don't supply a formula, then stat_quantile is implicitly using formula=y~x). If you use the same formula in geom_smooth as for stat_quantile, you'll have consistent regression methods being used for the quantiles and for the mean expectation.

Display single coefficient plots in quantile regressions?

I am plotting regression summaries for a quantile regression I did with quantreg.
Obviously the method plot.summary.rqs is in use here. The problem is that is use quite a few explanatory variables each of which are displayed in the plot. Most of the coefficients behave not significantly different from OLS, so I just want to pick out and display a few of them.
How can I select the plots that I need to show? I am using knitr for my reports but do not want to show dozens of variables (and you get there quickly using dummies). Is there a way to cherry pick?
By default, plot.summary.rqs plots all coefficients:
library(quantreg)
data(stackloss)
myrq <- rq(stack.loss ~ Air.Flow + Water.Temp + Acid.Conc., tau = seq(0.35, 0.65, 0.1), data=stackloss)
plot(summary(myrq)) # Plots all 4 coefficients
To cherry pick coefficients, the parm argument can be used:
plot(summary(myrq), parm = 2) # Plot only second regressor (Air.Flow)
plot(summary(myrq), parm = "Water.Temp") # Plot only Water.Temp
plot(summary(myrq), parm = 3:4) # Plot third and fourth regressor

Resources