Obtain residual diagnostic plots from rms package ols() function - r

How do you obtain residual diagnostic plots from an ols() object? normally if using glm() or lm(), I'd just do plot(lm()), but plot(ols()) gives an error.
My code is:
fit <- ols(y ~ rcs(x1,4)*x2, data=data, x=TRUE, y=TRUE)
plot(fit)
The error message I receive is
Error in match.arg(type) :
'arg' should be one of “ordinary”, “score”, “dfbeta”, “dfbetas”, “dffit”, “dffits”, “hat”, “hscore”

Laboriously (but flexibly), you need to compute residuals and estimates (using resid() and fitted()) and bind them into your data frame, then use plotting package like ggplot2 or lattice to create the plots yourself. Harrell gives examples at bottom of p. 153 of the 2nd edition of his book that describes the use of this package in detail.
As a quick and dirty alternative, you can fit a version using conventional functions (e.g. lm()) and plot() will return the usual diagnostic plots. Things like rcs() will work with many of the base fitting functions.

The ols object does inherit from the lm object,
class(rcsLogFit)
## [1] "ols" "rms" "lm"
but I was unable to gets stats:::lm() to work without the error noted here.
A quick workaround is to make a copy that forgets its rms-specific classes:
rcsLogCopy <- rcsLogFit
class(rcsLogCopy) <- "lm"
and then
plot(rcsLogCopy)
works fine. I suppose you could do this directly on the original instead.

Related

plot() does not show all diagnostic plots for lme/lmer

When using the lme/lmer function, I cannot get R to display all 4 diagnostic plots (res vs fit, normal-QQ, scale-location, res vs leverage) with par(mfrow=c(2,2)) and plot().
I just get the res vs fit plot and nothing else.
I have no problem when using the lm function.
Does anybody know how to do this?
library(lme4)
m0<-lmer(hematology~Treatment*day+Gender+(1|ID),data=long,na.action=na.omit,REML=FALSE)
par(mfrow=c(2,2))
plot(m0)
tl;dr ?plot.merMod explains in quite a bit of detail how the plotting methods work for fits produced by [g]lmer ...
You can get at least the first three plots corresponding to plot.lm fairly easily:
fitted vs residual with smooth line added
plot(lmer_model, type=c("p","smooth"), col.line=1)
(it's harder to get the smooth and the zero line drawn in different colours)
scale-location plot
plot(lmer_model,
sqrt(abs(resid(.)))~fitted(.),
type=c("p","smooth"), col.line=1)
Q-Q plot
lattice::qqmath(lmer_model)
residuals vs leverage
plot(fm1, rstudent(.) ~ hatvalues(.))
(the Cook's distances can be computed via cooks.distance() but superimposing the contours of CD={0.5,1} isn't so easy ...)
historical note
The design and implementation of lme4 diagnostic plot methods differ from plot.lm, which is the canonical example in base R. Why? I don't know for sure, but this approach is derived from the nlme package, which predates R; the earliest version I could find is this page from the Wayback Machine (1998), which links to a copy of the user's guide for version 1.2, dated February 1995; that's three months before the first source-code release of R (via ftp) in June 1995.
it uses lattice (derived from Trellis™ graphics) rather than base-R graphics
although it doesn't automatically construct e.g. scale-location plots, it is more flexible. You can use formulas to show fitted or residual values vs parameters, facet, etc., e.g. plot(fm1,residuals(.)~Days|Subject)
there are separate commands for plotting residuals etc. (plot) and Q-Q plots (qqnorm in nlme, qqmath in lme4)
I know that this is a 2-year-old question, but I was having the same issue (September/2022) and then I found Panel of Diagnostic Residual Plots and redres
resid_panel(mod1, smoother = TRUE, qqbands = TRUE)
which shows:
As long as we're adding answers, the performance package is now available:
library(lme4)
library(performance)
fm1 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy)
check_model(fm1)
In R, plot is a generic function. This means that when you call plot, R examines the class of the object you have passed to the first argument and chooses the plotting method according to this class.
Let's take an example. Suppose I use the lm function to create a model. The resulting model object will have class "lm":
lm_model <- lm(Sepal.Length ~ Sepal.Width, data = iris)
class(lm_model)
#> [1] "lm"
That means that when I call plot(lm_model), R will see that I am calling plot on an object of class lm. Instead of trying to construct a basic xy plot as it would if I did plot(1:10), R now knows to call a plotting method that has been specifically written to plot objects of type "lm". In this case, it will dispatch the method stats:::plot.lm, which is a long function that takes the "lm" object and creates the 4 diagnostic plots.
Now let's see what we get when we create a model with lmer:
library(lme4)
lmer_model <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
class(lmer_model)
#> [1] "lmerMod"
#> attr(,"package")
#> [1] "lme4"
Our model is an object of type "lmerMod". When we call plot on this object, R looks up the correct method to plot an object of this class. Since it has a completely different structure from an object of class "lm", it wouldn't make sense to plot it with plot.lm, so the authors who created the lme4 package had to decide what the best way to plot an object of class "lmerMod" was. They wrote the method lme4:::plot.merMod, which draws the single plot you see when you call plot on your model.
Why is this? That's one for the authors to answer, but it seems the main reason is that they wanted a plot method that would cover GLMM, LMM and REML models. The diagnostic plots for lm don't make sense for all of these model types.
So the short answer is that there is no problem to "solve" as such; this is just not how "lmerMod" objects are plotted. If you have specific concerns about some aspects of your fit that can be answered by these diagnostic plots, you should examine these individually.

PCA in R using the caret package vs prcomp PCA

I have a dataframe data with more than 50 variables and I am trying to do a PCA in R using the caret package.
library(caret)
library(e1071)
trans <- preProcess(data,method=c("YeoJohnson", "center","scale", "pca"))
If I understand this code correctly, it applies a YeoJohnson transformation (because data has zeros in it), standardises data and than applies PCA (by default, the function keeps only the PCs that are necessary to explain at least 95% of the variability in the data)
However, when I use the prcomp command,
model<-prcomp(data,scale=TRUE)
I can get more outputs like printing the summary or doing plot(data, type = "l") which I am not able to do in trans. Does anyone know if there are any functions in caret package producing the same outputs as in prcomp?
You can access the principal components themselves with the predict function.
df <- predict(trans, data)
summary(df)
You won't have exactly the same output as with prcomp: while caret uses prcomp(), it discards the original prcomp class object and does not return it.

Creating a Function from svyglm

I fit the following glm model using the survey package:
design <- svydesign(ids=training.data$name, design=design,family=quasibinomial(), data=training.data)
significant.model <- svyglm(Win~x+ y + start+ speed+ vx0 + vy0 + ay + az + length+ rate+ height+ hand+ zone+ count, design=design, family=quasibinomial, data=training.data)
I have a set of test data that I excluded from the model fitting process so that I would be able to see how the model predicts the outcomes for the test data and examine the difference.
Typically, I would use makeFun in the mosaic package, but this does not support objects of type svyglm. Is there another function or method that I can use to create a function for the model?
There are a lot of categorical variables with multiple levels, so writing a user-defined function is not ideal in this situation.
I'm not sure what difficulty you were experiencing since your example is not reproducible. But since an svyglm object is a glm object, makeFun() will create a wrapper around predict() just as it would do for any glm object. This has not been tested extensively, but it seems to work in the following example:
r
example(svyglm)
f <- makeFun(api.reg)
f(enroll = 500)

Forest plot from logistf

I have run a few models in for the penalized logistic model in R using the
logistf package. I however wish to plot some forest plots for the data.
The sjPlot package : http://www.strengejacke.de/sjPlot/custplot/
gives excellent function for the glm output, but no function for the logistf function.
Any assistance?
The logistf objects differ in their structure compared to glm objects, but not too much. I've added support for logistf-fitted models, however, 1) model summaries can't be printed and b) predicted probability plots are currently not supported with logistf-models.
I'll update the code on GitHub tonight, so you can try the updated sjp.glm function...
library(sjPlot)
library(logistf)
data(sex2)
fit<-logistf(case ~ age+oc+vic+vicl+vis+dia, data=sex2)
# for this example, axisLimits need to be specified manually
sjp.glm(fit, axisLimits = c(0.05, 25), transformTicks = T)

How to do a Tukey HSD test with the Anova command (car package)

I'm dealing with an unbalanced design/sample and originally learned aov(). I know now that for my ANOVA tests I need to use the Type III Sum of Squares which involves using fitting using lm() rather than using aov().
The problem is getting post-hoc tests (specifically Tukey's HSD) using lm(). All the research I've done has said that using simint in the multcomp package would work, but now that it's updated that command seems to not be available. It also seems to rely upon going through aov() to calculate.
Essentially all of the Tukey HSD tests I've found for R assume that you use aov() for the comparison rather than lm(). To get the Type III Sum of Squares I need for the unbalanced design I have to use:
mod<-lm(Snavg~StudentEthnicity*StudentGender)
Anova(mod, type="III")
How do I use a Tukey HSD test with my mod using lm()? Or conversely, calculate my ANOVA using Type III and still be able to run a Tukey HSD test?
Thanks!
Try HSD.test in agricolae
library(agricolae)
data(sweetpotato)
model<-lm(yield~virus, data=sweetpotato)
comparison <- HSD.test(model,"virus", group=TRUE,
main="Yield of sweetpotato\nDealt with different virus")
Output
Study: Yield of sweetpotato
Dealt with different virus
HSD Test for yield
Mean Square Error: 22.48917
virus, means
yield std.err replication
cc 24.40000 2.084067 3
fc 12.86667 1.246774 3
ff 36.33333 4.233727 3
oo 36.90000 2.482606 3
alpha: 0.05 ; Df Error: 8
Critical Value of Studentized Range: 4.52881
Honestly Significant Difference: 12.39967
Means with the same letter are not significantly different.
Groups, Treatments and means
a oo 36.9
ab ff 36.33333
bc cc 24.4
c fc 12.86667
As an initial note, unless it's been changed, to get the correct results for type iii sum of squares, you need to set the contrast coding for the factor variables. This can be done inside the lm call or with options. The example below uses options.
I would be cautious about using HSD.test and similar functions with unbalanced designs unless the documentation addresses their use in these situations. The documentation for TukeyHSD mentions that it adjusts for "mildly unbalanced" designs. I don't know if HSD.test handles things differently. You'd have to check additional documentation for the package or the original reference cited for the function.
As a side note, enclosing the whole HSD.test function in parentheses will cause it to print the results. See example below.
In general, I would recommend using the flexible emmeans (née lsmeans) or multcomp packages for all your post-hoc comparison needs. emmeans is particularly useful for doing mean separations on interactions or for examining contrasts among treatments. [EDIT: Caveat that I am the author of these pages.]
With an unbalanced design, you may want to report the E.M. (or L.S.) means instead of the arithmetic means. See SAEPER: What are least square means?. [EDIT: Caveat that I am the author of this page.] Note in the example below that the marginal means reported by emmeans are different than those reported by HSD.test.
Also note that the "Tukey" in glht has nothing to do with Tukey HSD or Tukey-adjusted comparisons; it just sets up the contrasts for all pairwise tests, as the output says.
However, the adjust="tukey" in emmeans functions does mean to use Tukey-adjusted comparisons, as the output says.
The following example is partially adapted from ARCHBS: One-way Anova.
### EDIT: Some code changed to reflect changes to some functions
### in the emmeans package
if(!require(car)){install.packages("car")}
library(car)
data(mtcars)
mtcars$cyl.f = factor(mtcars$cyl)
mtcars$carb.f = factor(mtcars$carb)
options(contrasts = c("contr.sum", "contr.poly"))
model = lm(mpg ~ cyl.f + carb.f, data=mtcars)
library(car)
Anova(model, type="III")
if(!require(agricolae)){install.packages("agricolae")}
library(agricolae)
(HSD.test(model, "cyl")$groups)
if(!require(emmeans)){install.packages("emmeans")}
library(emmeans)
marginal = emmeans(model,
~ cyl.f)
pairs(marginal, adjust="tukey")
if(!require(multcomp)){install.packages("multcomp")}
library(multcomp)
cld(marginal, adjust="tukey", Letters=letters)
if(!require(multcomp)){install.packages("multcomp")}
library(multcomp)
mc = glht(model,
mcp(cyl.f = "Tukey"))
summary(mc, test=adjusted("single-step"))
cld(mc)
I found HSD.test() also to be very meticulous about the way you have built either the lm() or aov() model that you're using for it.
There was no output from HSD.test() with my data when I had used following idea of coding for lm() :
model<-lm(sweetpotato$yield ~ sweetpotato$virus)
out <- HSD.test(model,"virus", group=TRUE, console=TRUE)
Output was only:
Name: virus
sweetpotato$virus
The output was equally bad when using the same logic for aov()
model<-aov(sweetpotato$yield ~ sweetpotato$virus)
To get the output for HSD.test() the lm()
(or also if using aov() for the model )
must be constructed strictly using the logic presented in the MYaseen208 answer:
model <- lm(yield~virus, data=sweetpotato)
Hope this helps someone who's not getting a proper output from HSD.test().
I was stuck with the same problem of the HSD.test printing out nothing. You need to put console=TRUE inside the function, so it prints out automatically.
For example:
HSD.test(alturacrit.anova, "fator", console=TRUE).
Hope it helps!

Resources