Editing the appearance of the confidence intervals in a marginal effects plot - r

I'm producing a series of marginal effects plots from a logistic regression, using plot_model. I would like to change the appearance of the confidence intervals in the plot below, but I can't figure out a way to do it. I assume this would be through editing the ggplot theme?
Ideally, I would like to be able to make the parallel bars smaller or remove them entirely, change the line thickness, etc. If you could point me in the right direction that would be very helpful.
library(sjPlot)
mtcars$am <- factor(mtcars$am)
m <- glm(vs ~ am, mtcars, family = 'binomial')
plot_model(m, type = "pred", terms = "am")
Output:
I'm new to ggplot2, so sorry if there is a simple answer to this!
Thanks

plot_model produces a ggplot object. The problem with using extension packages like sjPlot is that one has to sacrifice some of one's ability customize a plot in return for ease-of-use.
It is possible to alter a ggplot after it has been created, but it does require altering the layer specifications of the plot. This isn't too difficult if you know where to look, but for a relatively new user it can be quite intimidating.
First, store your plot:
p <- plot_model(m, type = "pred", terms = "am")
Now, if we want to change the size of the parallel bars, we can do:
p$layers[[2]]$geom_params$width <- 0.01
(Obviously to get rid of them completely set it to 0 instead of 0.01)
To change the thickness of the lines, we can do:
p$layers[[2]]$aes_params$size <- 1.4
And to change the color of the lines we do:
p$layers[[2]]$aes_params$colour <- 'deepskyblue4'
It will also look better to have the points in front of the lines rather than behind them, so we can copy the back layer to the front like this:
p$layers[[3]] <- p$layers[[1]]
That leaves us with the following plot:
p
However, we can still add scales, coords and themes to this plot to customize it, so for example, we might wish to do:
p +
theme_minimal(base_size = 20) +
coord_cartesian() +
theme(aspect.ratio = 1.5,
plot.title = element_text(hjust = 0.5),
plot.title.position = 'plot')

Related

Is it possible to over-ride the x axis range in R package ggbio when using autoplot and ensdb transcripts?

I am trying to use ggbio to plot gene transcripts. I want to plot a very specific range so it matches my ggplot2 plots. The problem is my example plot ends up having range of 133,567,500-133,570,000 regardless of the GRange and whether I specify xlim or not.
This example should only plot a small bit of intron (the thin arrowed line) but instead plots the full 2 exons and intron in between. I believe autoplot wants to plot the entire transcript or transcripts present in the range and widens the range to accommodate for that.
library(EnsDb.Hsapiens.v86)
library(ggbio)
ensdb <- EnsDb.Hsapiens.v86
mut<-GRanges("10", IRanges(133568909, 133569095))
gene <- autoplot(ensdb, which=mut, names.expr="gene_name",xlim=c(133568909,133569095))
gene.gg <- gene#ggplot
png("test_gene_plot_5.png")
gene.gg
dev.off()
Is there any way to over-ride this? I've looked at the manual page for autoplot and I couldn't narrow down an option that would fix it. Others have said to use xlim, but that does not seem to change anything
I like ggbio because it can make a ggplot2 object to be plotted along with other ggplot2 objects. I have not seen an example for that with other approaches like Gvis. But I would entertain other approaches if they could be combined with my existing plots.
Thanks!
Amy
It kind of depends wether you want clipped or squished data. Usually autoplot outputs a ggplot object at some point that can be manipulated as such.
For squished data:
library(GenomicRanges) # just to be sure start and end work
gene#ggplot +
scale_x_continuous(limits = c(start(mut), end(mut)), oob = scales::squish)
For clipped data:
gene#ggplot +
coord_cartesian(xlim = c(start(mut), end(mut)))
But to be totally honest, I'm unsure wether this is the most informative way to communicate that you are plotting the internals of an intron.
Alternatively, I've written a gene model geom at some point that doesn't work through the autoplot methods (which can sometimes be a pain if you want to customise everything). Downside is that you'd have to do some manual gene searching and setting aesthetics. Upside is that it works like most other geoms and is therefore easy to combine with some other data.
library(ggnomics) # from: https://github.com/teunbrand/ggnomics
# Finding a gene's exons manually
my_gene <- transcriptsByOverlaps(EnsDb.Hsapiens.v86, mut)
my_gene <- exonsByOverlaps(EnsDb.Hsapiens.v86, my_gene)
my_gene <- as.data.frame(my_gene)
some_other_data <- data.frame(
x = seq(start(mut), end(mut), by = 10),
y = cumsum(rnorm(19))
)
ggplot(some_other_data) +
geom_line(aes(x, y)) +
geom_genemodel(data = my_gene,
aes(xmin = start, xmax = end,
y = max(some_other_data$y) + 1,
group = 1, strand = strand)) +
coord_cartesian(xlim = c(start(mut), end(mut)))
Hope that helped!

Axis breaks in ggplot histogram in R [duplicate]

I have data that is mostly centered in a small range (1-10) but there is a significant number of points (say, 10%) which are in (10-1000). I would like to plot a histogram for this data that will focus on (1-10) but will also show the (10-1000) data. Something like a log-scale for th histogram.
Yes, i know this means not all bins are of equal size
A simple hist(x) gives
while hist(x,breaks=c(0,1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2,3,4,5,7.5,10,15,20,50,100,200,500,1000,10000))) gives
none of which is what I want.
update
following the answers here I now produce something that is almost exactly what I want (I went with a continuous plot instead of bar-histogram):
breaks <- c(0,1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2,4,8)
ggplot(t,aes(x)) + geom_histogram(colour="darkblue", size=1, fill="blue") + scale_x_log10('true size/predicted size', breaks = breaks, labels = breaks)![alt text][3]
the only problem is that I'd like to match between the scale and the actual bars plotted. There two options for doing that : the one is simply use the actual margins of the plotted bars (how?) then get "ugly" x-axis labels like 1.1754,1.2985 etc. The other, which I prefer, is to control the actual bins margins used so they will match the breaks.
Log scale histograms are easier with ggplot than with base graphics. Try something like
library(ggplot2)
dfr <- data.frame(x = rlnorm(100, sdlog = 3))
ggplot(dfr, aes(x)) + geom_histogram() + scale_x_log10()
If you are desperate for base graphics, you need to plot a log-scale histogram without axes, then manually add the axes afterwards.
h <- hist(log10(dfr$x), axes = FALSE)
Axis(side = 2)
Axis(at = h$breaks, labels = 10^h$breaks, side = 1)
For completeness, the lattice solution would be
library(lattice)
histogram(~x, dfr, scales = list(x = list(log = TRUE)))
AN EXPLANATION OF WHY LOG VALUES ARE NEEDED IN THE BASE CASE:
If you plot the data with no log-transformation, then most of the data are clumped into bars at the left.
hist(dfr$x)
The hist function ignores the log argument (because it interferes with the calculation of breaks), so this doesn't work.
hist(dfr$x, log = "y")
Neither does this.
par(xlog = TRUE)
hist(dfr$x)
That means that we need to log transform the data before we draw the plot.
hist(log10(dfr$x))
Unfortunately, this messes up the axes, which brings us to workaround above.
Using ggplot2 seems like the most easy option. If you want more control over your axes and your breaks, you can do something like the following :
EDIT : new code provided
x <- c(rexp(1000,0.5)+0.5,rexp(100,0.5)*100)
breaks<- c(0,0.1,0.2,0.5,1,2,5,10,20,50,100,200,500,1000,10000)
major <- c(0.1,1,10,100,1000,10000)
H <- hist(log10(x),plot=F)
plot(H$mids,H$counts,type="n",
xaxt="n",
xlab="X",ylab="Counts",
main="Histogram of X",
bg="lightgrey"
)
abline(v=log10(breaks),col="lightgrey",lty=2)
abline(v=log10(major),col="lightgrey")
abline(h=pretty(H$counts),col="lightgrey")
plot(H,add=T,freq=T,col="blue")
#Position of ticks
at <- log10(breaks)
#Creation X axis
axis(1,at=at,labels=10^at)
This is as close as I can get to the ggplot2. Putting the background grey is not that straightforward, but doable if you define a rectangle with the size of your plot screen and put the background as grey.
Check all the functions I used, and also ?par. It will allow you to build your own graphs. Hope this helps.
A dynamic graph would also help in this plot. Use the manipulate package from Rstudio to do a dynamic ranged histogram:
library(manipulate)
data_dist <- table(data)
manipulate(barplot(data_dist[x:y]), x = slider(1,length(data_dist)), y = slider(10, length(data_dist)))
Then you will be able to use sliders to see the particular distribution in a dynamically selected range like this:

effect plots: how to add legend, change text size and add significance levels?

Me and some friends need to 'upgrade' a GLM and LMER plot.
we need to add the significance levels, change text size and legend position.
how do we do this in the "allEffectsplot"??
it needs to look somewhat like this graph (the right image is the correct one): [
changing the position of the legend:
I just want to answer the question in case any other comes here and want to try what worked for me.
I don't know which package the there is used to make the effect plot but I had the same problem and adding key.args = list(space="top")) to my code worked. To make the effect plot I used the effect package.
Here is an example of my code:
library(effects)
library(car)
library(MASS)
library(splines)
library(lattice)
plot(Effect(focal.predictors = c("income","age", "country"),
mod = mymodel, xlevels = list(age = 50:80), latent = TRUE),
rug = FALSE, axes = list(grid = TRUE), multiline=TRUE,
colors = c("color1", "color2", "color3", "color4", "color5"),
lattice = list(layout = c(5,1), key.args = list(space="top")))
just add number of colors according how many categories you have. I had 5 categories of income, so I wrote name of 5 colors I liked. If you want the legend at the right side of the plot you can just write "right" instead of "top".

Custom plots using the effects package

I try to customize the multiline graphs from the effects package.
Is there anyway to position the legend in the example below within the plotting area and not above the graph?
Alternatively: Does anyone know how to plot the results of the multiline regressions calculated by the effects package using ggplot2?
I appreciate any help.
Andy
Example:
library(effects)
data(Prestige)
mod5 <- lm(prestige ~ income*type + education, data=Prestige)
eff_cf <- effect("income*type", mod5)
print(plot(eff_cf, multiline=TRUE))
This is how you plot effect object in ggplot
library(ggplot2)
## Change effect object to dataframe
eff_df <- data.frame(eff_cf)
## Plot ggplot with legend on the bottom
ggplot(eff_df)+geom_line(aes(income,fit,linetype=type))+theme_bw()+
xlab("Income")+ylab("Prestige")+coord_cartesian(xlim=c(0,25000),ylim=c(30,110))+
theme(legend.position="bottom")
You can change xlim and ylim depending on how you want to display your data.
The output is as follows:
From ?xyplot you read :
Alternatively, the key can be positioned inside the plot region by
specifying components x, y and corner. x and y determine the location
of the corner of the key given by corner, which is usually one of
c(0,0), c(1,0), c(1,1) and c(0,1), which denote the corners of the
unit square.
and from ?plot.eff you read
key.args additional arguments to be passed to the key trellis
argument to xyplot or densityplot, e.g., to position the key (legend)
in the plotting region.
So for example you can do the following:
plot(eff_cf, multiline=TRUE,
key.args=list(x=0.2,y=0.9,corner=c(x=1, y=1)))
Based on Ruben's answer, you can try following:
library(sjPlot)
sjp.int(mod5, type = "eff", swapPredictors = T)
which will reproduce the plot with ggplot, and sjp.int also returns the plot object for further customization. However, you can also set certain legend-parameters with the sjPlot-package:
sjp.setTheme(legend.pos = "bottom right",
legend.inside = T)
sjp.int(mod5, type = "eff", swapPredictors = T)
which gives you following plot:
See sjPlot-manual for examples on how to customize plot-appearance and legend-position/size etc.
For plotting estimates of your model as forest plot, or marginal effects of all model terms, see ?sjp.lm in the sjPlot-package, or you may even try out the latest features in my package from GitHub.
#Tom Wenseleers
You can use sjPlot::sjp.int with type='eff' for this.
However, it won't give you rug plots and no raw data points yet either.
mod5 <- lm(prestige ~ type * income + education, data=Prestige)
library(sjPlot)
sjp.int(mod5,showCI = T, type = 'eff')
There's an argument partial.residuals = T to the effect() function.
This gives you fitted values, partial.residuals.raw and partial.residuals.adjusted.
I suppose you could merge that data on the original dataset and then plot smooths by group, but I ran into some difficulties early on (e.g. na.action=na.exclude is not respected).

How can I plot a histogram of a long-tailed data using R?

I have data that is mostly centered in a small range (1-10) but there is a significant number of points (say, 10%) which are in (10-1000). I would like to plot a histogram for this data that will focus on (1-10) but will also show the (10-1000) data. Something like a log-scale for th histogram.
Yes, i know this means not all bins are of equal size
A simple hist(x) gives
while hist(x,breaks=c(0,1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2,3,4,5,7.5,10,15,20,50,100,200,500,1000,10000))) gives
none of which is what I want.
update
following the answers here I now produce something that is almost exactly what I want (I went with a continuous plot instead of bar-histogram):
breaks <- c(0,1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2,4,8)
ggplot(t,aes(x)) + geom_histogram(colour="darkblue", size=1, fill="blue") + scale_x_log10('true size/predicted size', breaks = breaks, labels = breaks)![alt text][3]
the only problem is that I'd like to match between the scale and the actual bars plotted. There two options for doing that : the one is simply use the actual margins of the plotted bars (how?) then get "ugly" x-axis labels like 1.1754,1.2985 etc. The other, which I prefer, is to control the actual bins margins used so they will match the breaks.
Log scale histograms are easier with ggplot than with base graphics. Try something like
library(ggplot2)
dfr <- data.frame(x = rlnorm(100, sdlog = 3))
ggplot(dfr, aes(x)) + geom_histogram() + scale_x_log10()
If you are desperate for base graphics, you need to plot a log-scale histogram without axes, then manually add the axes afterwards.
h <- hist(log10(dfr$x), axes = FALSE)
Axis(side = 2)
Axis(at = h$breaks, labels = 10^h$breaks, side = 1)
For completeness, the lattice solution would be
library(lattice)
histogram(~x, dfr, scales = list(x = list(log = TRUE)))
AN EXPLANATION OF WHY LOG VALUES ARE NEEDED IN THE BASE CASE:
If you plot the data with no log-transformation, then most of the data are clumped into bars at the left.
hist(dfr$x)
The hist function ignores the log argument (because it interferes with the calculation of breaks), so this doesn't work.
hist(dfr$x, log = "y")
Neither does this.
par(xlog = TRUE)
hist(dfr$x)
That means that we need to log transform the data before we draw the plot.
hist(log10(dfr$x))
Unfortunately, this messes up the axes, which brings us to workaround above.
Using ggplot2 seems like the most easy option. If you want more control over your axes and your breaks, you can do something like the following :
EDIT : new code provided
x <- c(rexp(1000,0.5)+0.5,rexp(100,0.5)*100)
breaks<- c(0,0.1,0.2,0.5,1,2,5,10,20,50,100,200,500,1000,10000)
major <- c(0.1,1,10,100,1000,10000)
H <- hist(log10(x),plot=F)
plot(H$mids,H$counts,type="n",
xaxt="n",
xlab="X",ylab="Counts",
main="Histogram of X",
bg="lightgrey"
)
abline(v=log10(breaks),col="lightgrey",lty=2)
abline(v=log10(major),col="lightgrey")
abline(h=pretty(H$counts),col="lightgrey")
plot(H,add=T,freq=T,col="blue")
#Position of ticks
at <- log10(breaks)
#Creation X axis
axis(1,at=at,labels=10^at)
This is as close as I can get to the ggplot2. Putting the background grey is not that straightforward, but doable if you define a rectangle with the size of your plot screen and put the background as grey.
Check all the functions I used, and also ?par. It will allow you to build your own graphs. Hope this helps.
A dynamic graph would also help in this plot. Use the manipulate package from Rstudio to do a dynamic ranged histogram:
library(manipulate)
data_dist <- table(data)
manipulate(barplot(data_dist[x:y]), x = slider(1,length(data_dist)), y = slider(10, length(data_dist)))
Then you will be able to use sliders to see the particular distribution in a dynamically selected range like this:

Resources