Custom plots using the effects package - r

I try to customize the multiline graphs from the effects package.
Is there anyway to position the legend in the example below within the plotting area and not above the graph?
Alternatively: Does anyone know how to plot the results of the multiline regressions calculated by the effects package using ggplot2?
I appreciate any help.
Andy
Example:
library(effects)
data(Prestige)
mod5 <- lm(prestige ~ income*type + education, data=Prestige)
eff_cf <- effect("income*type", mod5)
print(plot(eff_cf, multiline=TRUE))

This is how you plot effect object in ggplot
library(ggplot2)
## Change effect object to dataframe
eff_df <- data.frame(eff_cf)
## Plot ggplot with legend on the bottom
ggplot(eff_df)+geom_line(aes(income,fit,linetype=type))+theme_bw()+
xlab("Income")+ylab("Prestige")+coord_cartesian(xlim=c(0,25000),ylim=c(30,110))+
theme(legend.position="bottom")
You can change xlim and ylim depending on how you want to display your data.
The output is as follows:

From ?xyplot you read :
Alternatively, the key can be positioned inside the plot region by
specifying components x, y and corner. x and y determine the location
of the corner of the key given by corner, which is usually one of
c(0,0), c(1,0), c(1,1) and c(0,1), which denote the corners of the
unit square.
and from ?plot.eff you read
key.args additional arguments to be passed to the key trellis
argument to xyplot or densityplot, e.g., to position the key (legend)
in the plotting region.
So for example you can do the following:
plot(eff_cf, multiline=TRUE,
key.args=list(x=0.2,y=0.9,corner=c(x=1, y=1)))

Based on Ruben's answer, you can try following:
library(sjPlot)
sjp.int(mod5, type = "eff", swapPredictors = T)
which will reproduce the plot with ggplot, and sjp.int also returns the plot object for further customization. However, you can also set certain legend-parameters with the sjPlot-package:
sjp.setTheme(legend.pos = "bottom right",
legend.inside = T)
sjp.int(mod5, type = "eff", swapPredictors = T)
which gives you following plot:
See sjPlot-manual for examples on how to customize plot-appearance and legend-position/size etc.
For plotting estimates of your model as forest plot, or marginal effects of all model terms, see ?sjp.lm in the sjPlot-package, or you may even try out the latest features in my package from GitHub.

#Tom Wenseleers
You can use sjPlot::sjp.int with type='eff' for this.
However, it won't give you rug plots and no raw data points yet either.
mod5 <- lm(prestige ~ type * income + education, data=Prestige)
library(sjPlot)
sjp.int(mod5,showCI = T, type = 'eff')
There's an argument partial.residuals = T to the effect() function.
This gives you fitted values, partial.residuals.raw and partial.residuals.adjusted.
I suppose you could merge that data on the original dataset and then plot smooths by group, but I ran into some difficulties early on (e.g. na.action=na.exclude is not respected).

Related

Editing the appearance of the confidence intervals in a marginal effects plot

I'm producing a series of marginal effects plots from a logistic regression, using plot_model. I would like to change the appearance of the confidence intervals in the plot below, but I can't figure out a way to do it. I assume this would be through editing the ggplot theme?
Ideally, I would like to be able to make the parallel bars smaller or remove them entirely, change the line thickness, etc. If you could point me in the right direction that would be very helpful.
library(sjPlot)
mtcars$am <- factor(mtcars$am)
m <- glm(vs ~ am, mtcars, family = 'binomial')
plot_model(m, type = "pred", terms = "am")
Output:
I'm new to ggplot2, so sorry if there is a simple answer to this!
Thanks
plot_model produces a ggplot object. The problem with using extension packages like sjPlot is that one has to sacrifice some of one's ability customize a plot in return for ease-of-use.
It is possible to alter a ggplot after it has been created, but it does require altering the layer specifications of the plot. This isn't too difficult if you know where to look, but for a relatively new user it can be quite intimidating.
First, store your plot:
p <- plot_model(m, type = "pred", terms = "am")
Now, if we want to change the size of the parallel bars, we can do:
p$layers[[2]]$geom_params$width <- 0.01
(Obviously to get rid of them completely set it to 0 instead of 0.01)
To change the thickness of the lines, we can do:
p$layers[[2]]$aes_params$size <- 1.4
And to change the color of the lines we do:
p$layers[[2]]$aes_params$colour <- 'deepskyblue4'
It will also look better to have the points in front of the lines rather than behind them, so we can copy the back layer to the front like this:
p$layers[[3]] <- p$layers[[1]]
That leaves us with the following plot:
p
However, we can still add scales, coords and themes to this plot to customize it, so for example, we might wish to do:
p +
theme_minimal(base_size = 20) +
coord_cartesian() +
theme(aspect.ratio = 1.5,
plot.title = element_text(hjust = 0.5),
plot.title.position = 'plot')

When using GAM in R, why is the plot for two continuos variables different from the first one plotted with two continuos by categorical?

I'm using GAM in R and I can't understand why the output for two different equations that should give the same plot are not exactly the same.
For example, when using the mpg dataset with a multivariate equation as follows, I get the plot for the additive affect of weight and rpm in hw.mpg. Then, I want to see what happens when I plot the data of rmp by fuel type. This gives me 3 plots, and I expected the first one (weight) to be exactly the same as the one plotted previously without the "by fuel" differentiation. Am I missing something? Then what is the graph 1 in figure 2 showing?
To get figure 1:
par(mfrow=c(1,2))
data(mpg)
mod_hwy1 <- gam(hw.mpg ~ s(weight) + s(rpm), data = mpg, method = "REML")
plot(mod_hwy1)
To get figure 2:
par(mfrow=c(1,3))
mod_hwy2 <- gam(hw.mpg ~ s(weight) + s(rpm, by=fuel), data = mpg, method = "REML")
plot(mod_hwy2)
Using my own data is even more visible that the two graphs are not exactly the same:
Please someone help me understand!
The main problem with your model is that you forgot to include the group means for the levels of fuel. As a result, the smooths, which are centred about the overall mean of the response are having to also model the group means for the levels of fuel.
Fit the model as:
mod_hwy2 <- gam(hw.mpg ~ fuel + # <--- group means
s(weight) + s(rpm, by=fuel),
data = mpg, method = "REML")
Then add in Gregor's point about these effects being conditional upon the other terms in the model and you should be able to understand what's going one and why things change.
And regarding one of your comments; the locations are shown in your plot, look at the label for the y-axis of each plot.

R icenReg package: Move plot legend for ic_np fit

I need to create a plot that compares interval censored survival curves for three species. I am able to generate a plot that shows all three curves using the ic_np function in the icenReg package in R. When I plot the output of this ic_np fit using base R plot(), a legend appears in the bottom left corner.
This example from the icenReg package documentation yields a similar figure:
library(icenReg)
data(miceData)
fit <- ic_np(cbind(l, u) ~ grp, data = miceData) #Stratifies fit by group
plot(fit)
However, having the caption in the bottom left covers the most interesting comparison of my survival curves, so I would like to move the legend to the top right.
I have seen this question about setting a legend position for basic plots in base R. Answers to this question seem to assume that I can generate a plot without the legend, but I have not been able to do that.
I have also seen this question about adding a legend to other types of survival analysis that do not seem to generate a legend by default, but I have not been able to implement these methods with interval censored data.
I have read that I can't move a legend that has already been added to a plot, but I don't know how to generate this particular plot without a legend so that I can add one back in where I want it (top right).
How can I either (a) generate this plot of interval censored Kaplan-Meier survival curves using ic_np without a legend -- maybe using some hidden parameter of plot() -- OR (b) generate this figure using a different plotting device, assuming the plot legend is then moveable?
There doesn't seem to be a help page in the package for the plot function so you need to determine the class of the fit-object and look at the code:
class(fit)
#[1] "ic_npList"
#attr(,"package")
#[1] "icenReg"
plot.ic_npList
#Error: object 'plot.ic_npList' not found
So it's not exported and we need to dig deeper (not suprising since exported functions do need to have help pages.)
getAnywhere(plot.ic_npList)
#-----------
A single object matching ‘plot.ic_npList’ was found
It was found in the following places
registered S3 method for plot from namespace icenReg
namespace:icenReg
with value
function (x, fitNames = NULL, lgdLocation = "bottomleft", ...)
{
addList <- list(xlim = x$xRange, ylim = c(0, 1), xlab = "time",
ylab = "S(t)", x = NA)
dotList <- list(...)
#.........
#..........
legend(lgdLocation, legend = grpNames, col = cols, lty = 1)
}
<bytecode: 0x7fc9784fa660>
<environment: namespace:icenReg>
So there is a location parameter for the legend placement and the obvious alternative to try is:
plot(fit, lgdLocation = "topright")

R - Infer legend details from variable

It's my first question on SO. Hopefully it will be enough detail:
I've made a Kaplan-Meier plot and would like to add a legend, however I am not having much luck with it. I know how to make a legend when you know which line is representative of it's respective category. Unfortunately since I'm using a large data set I don't know which line is which, and therefore am having a hard time creating the legend manually. Is there any way R can infer which category is which colour on the following graph? (The current selections aren't right in the legend, I was just guessing)
kmsurv1 <- survfit(Surv(as.numeric(time),hydraulic)~type)
# Specify axis options within plot()
plot(kmsurv1, col=c(1:12), main="Hydraulic Breakdown of Vehicles", sub="subtitle", xlab="Time", ylab="Probability of Being Operational", xlim=c(15400, 16500),ylim=c(.6,1.0))
legend("bottomleft", inset = 0, title = "Vehicle Type",legend= c("Hitachi Backhoe","Transport Trucks", "Water Trucks","Cat D8 Dozers", "D10 Dozers")
,fill = c(1:12), horiz=TRUE)
I'm assuming you are using the survival package. The package document specifies that with print(obj), the order of printout is the order in which they plot. You can then extract those names with rownames(summary(sfit)$table). Just make sure that the colors you choose are in the same order in the plot and legend lines. Here's an example:
library(survival)
sfit <- survfit(Surv(start, stop, event) ~ sex, mgus1, subset=(enum==1))
print(sfit) # the order of printout is the order in which they plot
plot(sfit, col=1:4)
legend("topleft",legend=rownames(summary(sfit)$table),col=1:4, lty=1)
I found an extremely easy answer from: http://rpubs.com/sinhrks/plot_surv
install.packages('ggfortify')
install.packages("ggplot2")
library(ggplot2)
library(ggfortify)
fit <- survfit(Surv(as.numeric(time),hydraulic)~type, data = mydata)
autoplot(fit,xlim = c(15000,16500))
My Answer
Thank you to https://stackoverflow.com/users/3396821/mlavoie for pointing me in the right direction.

ggplot2 2d Density Weights

I'm trying to plot some data with 2d density contours using ggplot2 in R.
I'm getting one slightly odd result.
First I set up my ggplot object:
p <- ggplot(data, aes(x=Distance,y=Rate, colour = Company))
I then plot this with geom_points and geom_density2d. I want geom_density2d to be weighted based on the organisation's size (OrgSize variable). However when I add OrgSize as a weighting variable nothing changes in the plot:
This:
p+geom_point()+geom_density2d()
Gives an identical plot to this:
p+geom_point()+geom_density2d(aes(weight = OrgSize))
However, if I do the same with a loess line using geom_smooth, the weighting does make a clear difference.
This:
p+geom_point()+geom_smooth()
Gives a different plot to this:
p+geom_point()+geom_smooth(aes(weight=OrgSize))
I was wondering if I'm using density2d inappropriately, should I instead be using contour and supplying OrgSize as the 'height'? If so then why does geom_density2d accept a weighting factor?
Code below:
require(ggplot2)
Company <- c("One","One","One","One","One","Two","Two","Two","Two","Two")
Store <- c(1,2,3,4,5,6,7,8,9,10)
Distance <- c(1.5,1.6,1.8,5.8,4.2,4.3,6.5,4.9,7.4,7.2)
Rate <- c(0.1,0.3,0.2,0.4,0.4,0.5,0.6,0.7,0.8,0.9)
OrgSize <- c(500,1000,200,300,1500,800,50,1000,75,800)
data <- data.frame(Company,Store,Distance,Rate,OrgSize)
p <- ggplot(data, aes(x=Distance,y=Rate))
# Difference is apparent between these two
p+geom_point()+geom_smooth()
p+geom_point()+geom_smooth(aes(weight = OrgSize))
# Difference is not apparent between these two
p+geom_point()+geom_density2d()
p+geom_point()+geom_density2d(aes(weight = OrgSize))
geom_density2d is "accepting" the weight parameter, but then not passing to MASS::kde2d, since that function has no weights. As a consequence, you will need to use a different 2d-density method.
(I realize my answer is not addressing why the help page says that geom_density2d "understands" the weight argument, but when I have tried to calculate weighted 2D-KDEs, I have needed to use other packages besides MASS. Maybe this is a TODO that #hadley put in the help page that then got overlooked?)

Resources