Dealing with factors in geom_pointrange in ggplot - r

I am trying to visualize some data that consist of odds ratios and confidence intervals for regions nested in countries. I am using the geom_pointrange option for that and it general it works very well.
My problem is that since the odds ratios (and upper confidence intervals) can get quite high values, the axes of the plot are stretched to accommodate for that. That has as a result that confidence intervals that lie between 0 and 1 do not appear clearly enough. One option I found through this community is to change the values into factors and the distance between them will be considered the same for every measurement. This works for the odds ratios (still need to tweak the axis tick marks) but when the values of lower and upper confidence intervals are involved, the position is totally wrong and the confidence intervals do not include the point estimate. I tried to solve this by including all values as levels of the factor, but this did not seem to solve the issue.
What i am trying to do is either to be able to "magnify" the area between 0 and 1 in the graph, while leaving the rest of the plot area unchanged or to manage to make ggplot to place the confidence intervals correctly around the odds ratios.
Below I include a simplified version of my data and the code I have been using for re-producibility.
dat <- data.frame(region = rep(LETTERS[1:5], 2),
country = rep(c("A1", "A2"), each = 5),
or = c(6.459578, 1.696221, 0.895115, 3.393235, 2.325510,
4.457805, 0.407111, 22.760861, 3.354883, 2.214915),
lower = c(5.768999699, 0.237062909, 0.347443105, 0.369881529,
0.010233696, 1.020315696, 0.004419494, 3.87391259,
0.808667764, 0.874415935),
upper = c(7.2328221, 12.1367207, 2.3060778, 31.1290104,
28.4497981, 19.4763489, 0.750188, 337.2960785,
13.9182469, 5.610429))
library(ggplot2)
ggplot(dat, aes(x = region, y = or, ymin = lower, ymax = upper))+
geom_pointrange() +
geom_hline(yintercept = 1, linetype = 2) +
theme_bw() +
theme(plot.margin = unit(c(1, 1, 1, 4), "lines"),
axis.title = element_blank(),
axis.ticks.y = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
legend.position="none") +
facet_wrap(~ country) +
coord_flip(ylim = c(0, 100))
# Change numeric variable into factors
f.levels <- c(dat$or, dat$lower, dat$upper)
f.levels <- unique(f.levels)
f.levels <- as.character(f.levels[order(f.levels)])
dat$or <- factor(dat$or, levels = f.levels)
dat$lower <- factor(dat$lower, levels = f.levels)
dat$upper <- factor(dat$upper, levels = f.levels)
ggplot(dat, aes(x = region, y = or, ymin = lower, ymax = upper))+
geom_pointrange() +
geom_hline(yintercept = 1, linetype = 2) +
theme_bw() +
theme(plot.margin = unit(c(1, 1, 1, 4), "lines"),
axis.title = element_blank(),
axis.ticks.y = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
legend.position="none") +
facet_wrap(~ country) +
coord_flip(ylim = c(0, 30))
I am relatively new to ggplot so please excuse any newbie mistakes.
Any suggestions on this problem are highly appreciated.
Thank you!

I think the standard solution for this problem is plotting the OR's in a log(10) scale. For a neat explanation see https://blogs.sas.com/content/iml/2015/07/29/or-plots-log-scale.html
ggplot(dat, aes(x = region, y = or, ymin = lower, ymax = upper)) +
geom_pointrange() +
geom_hline(yintercept = 1, linetype = 2) +
scale_y_log10() + ### This is the line that makes the transfomation
theme_bw() +
theme(plot.margin = unit(c(1, 1, 1, 4), "lines"),
axis.title = element_blank(),
axis.ticks.y = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
legend.position="none") +
facet_wrap(~ country) +
coord_flip()

Related

Modifying a ggplot of odds ratios

I am working on a plot of 21 different odds ratios and their respective confidence intervals - the odds ratios are stratified by racial group (7 groups) and death category (3 categories), and I'm pretty close to what I want, I just am stuck on a few things.
Here is what I've run so far:
library(ggplot2)
install.packages("ggstance")
early <- data.frame(labels=c("Early:Overall","Early:Non-Hispanic White","Early:Non-Hispanic Black","Early:Non-Hispanic Asian",
"Early:Non-Hispanic Other","Early:Hispanic","Early:Unknown","Neo:Overall","Neo:NHW","Neo:NHB",
"Neo:NHA","Neo:NHO","Neo:Hisp","Neo:Unknown","Inf:Overall","Inf:NHW","Inf:NHB","Inf:NHA","Inf:NHO",
"Inf:Hisp","Inf:Unknown"),
odds=c(317.77,355.54,187.82,495.49,213.23,345.45,1818.05,114.02,128.84,52.70,271.15,57.86,158.21,579.40,46.76,52.50,22.46,104.81,22.41,67.93,214.85),
low=c(282.25,301.37,141.12,292.51,113.06,263.85,624.20,103.53,112.63,42.20,168.26,34.34,126.00,255.58,42.87,46.46,18.42,67.25,14.29,55.32,108.01),
high=c(357.64,419.32,249.42,831.05,396.78,450.64,6710.68,125.41,147.03,65.47,426.48,93.53,197.31,1354.38,50.93,59.16,27.16,157.91,33.76,82.78,416.06),
group=rep(c("Overall","Non-Hispanic White","Non-Hispanic Black","Non-Hispanic Asian",
"Non-Hispanic Other","Hispanic","Unknown"),3),
death=c("Early:Overall"="Early Neonatal Death","Early:Non-Hispanic White"="Early Neonatal Death",
"Early:Non-Hispanic Black"="Early Neonatal Death","Early:Non-Hispanic Asian"="Early Neonatal Death",
"Early:Non-Hispanic Other"="Early Neonatal Death","Early:Hispanic"="Early Neonatal Death",
"Early:Unknown"="Early Neonatal Death","Neo:Overall"="Neonatal Death","Neo:NHW"="Neonatal Death",
"Neo:NHB"="Neonatal Death","Neo:NHA"="Neonatal Death","Neo:NHO"="Neonatal Death","Neo:Hisp"="Neonatal Death",
"Neo:Unknown"="Neonatal Death","Inf:Overall"="Infant Death","Inf:NHW"="Infant Death",
"Inf:NHB"="Infant Death","Inf:NHA"="Infant Death","Inf:NHO"="Infant Death","Inf:Hisp"="Infant Death","Inf:Unknown"="Infant Death"))
ggplot(early,aes(x = odds, y = group)) +
geom_rect(aes(xmin = 0.001, xmax = 1000,
ymin = -Inf, ymax = Inf)) +
geom_errorbarh(aes(xmin = low, xmax = high)) +
geom_point(aes(colour = group,shape=death), size = 3 ) +
coord_cartesian(xlim = c(20, 600)) +
facet_grid(labels~., switch = "y") +
theme_bw() +
theme(panel.spacing.y = unit(0, "points"),
panel.border = element_blank(),
panel.background= element_blank(),
panel.grid.major.x = element_line(color="white"),
plot.background = element_rect(fill="white"),
axis.text.y = element_blank(),
axis.ticks.length.y = unit(0, "points"),
strip.text.y.left = element_text(angle = 0),
strip.background.y = element_blank(),
axis.line = element_line(),
legend.box.background = element_blank(),
legend.box.margin = margin(6, 6, 6, 6))
plot+ labs(title="Race-Stratified Odds Ratios by Death Category",x="Odds Ratios",y="Maternal Race Group")
And this is the plot I currently have:
I'm not sure why the background of the plot is still gray or why some of the shapes are partially obstructed, but I'm assuming there's some kind of grey bars over the background. I've tried deleting each line of my code one by one and the grey never went away. I'm trying to make the background just white, so if anyone has any suggestions for how to do that I would really appreciate it!
Also, I was hoping to not show the individual labels (i.e. "Early:Non-Hispanic White") on the plot and instead only have the 3 death labels (i.e. "Early Neonatal Death"). Is there a way to do that?
Thank you!
The problem is simply that you are drawing the gray background with your call to geom_rect, which by default is gray. You can either make this white, or better still, remove it and use scales and themes to give your plot the desired look.
To remove the color guide from the legend, you can add + scale_color_discrete(guide = guide_none()) to your plot.
The symbols are being clipped (and don't align perfectly with the labels) because each of the facets is actually preserving a tiny space for all the groups. You therefore need to specify scales = "free_y" to level everything out, give your error bars greater width and prevent the symbols from clipping.
You can also choose a global theme that requires less individual tweaks to the theme parameters, and you may prefer the look of making the strip labels right-aligned and external to the y axis line.
ggplot(early,aes(x = odds, y = group)) +
geom_errorbarh(aes(xmin = low, xmax = high)) +
geom_point(aes(colour = group, shape = death), size = 3) +
scale_color_discrete(guide = guide_none()) +
coord_cartesian(xlim = c(20, 600)) +
facet_grid(labels ~ ., switch = "y", scales = "free_y") +
labs(title = "Race-Stratified Odds Ratios by Death Category",
x = "Odds Ratios",
y = "Maternal Race Group") +
theme_classic() +
theme(panel.spacing.y = unit(0, "points"),
axis.text.y = element_blank(),
axis.ticks.length.y = unit(0, "points"),
strip.placement = "outside",
strip.text.y.left = element_text(angle = 0, hjust = 1),
strip.background.y = element_blank(),
legend.box.margin = margin(6, 6, 6, 6))

facet_wrap text labelling issues with stat_fit_glance

I am wondering why the text is trending higher in the plots... it won't stay put with the facet_wrap or facet_grid. In a more complex dataset plot, the text is illegible because of the overlap.
Below is data and code to reproduce the plot and issue. Adding geom="text" to stat_fit_glance, results in Error: Discrete value supplied to continuous scale .
library(ggpmisc)
library(ggplot2)
DF <- data.frame(Site = rep(LETTERS[20:24], each = 4),
Region = rep(LETTERS[14:18], each = 4),
time = rep(LETTERS[1:10], each = 10),
group = rep(LETTERS[1:4], each = 10),
value1 = runif(n = 1000, min = 10, max = 15),
value2 = runif(n = 1000, min = 100, max = 150))
DF$time <- as.numeric(DF$time)
formula1 <- y~x
plot1 <- ggplot(data=DF,
aes(x=time, y= value2,group=Site)) +
geom_point(col="gray", alpha=0.5) +
geom_line(aes(group=Site),col="gray", alpha=0.5) +
geom_smooth(se=F, col="darkorange", alpha=0.8, fill="orange",
method="lm",formula=formula1) +
theme_bw() +
theme(strip.text.x = element_text(size=10),
strip.text.y = element_text(size=10, face="bold", angle=0),
strip.background = element_rect(colour="black", fill="gray90"),
axis.text.x = element_text(size=10), # remove x-axis text
axis.text.y = element_text(size=10), # remove y-axis text
axis.ticks = element_blank(), # remove axis ticks
axis.title.x = element_text(size=18), # remove x-axis labels
axis.title.y = element_text(size=25), # remove y-axis labels
panel.background = element_blank(),
panel.grid.major = element_blank(), #remove major-grid labels
panel.grid.minor = element_blank(), #remove minor-grid labels
plot.background = element_blank()) +
labs(y="", x="Year", title = "")+ facet_wrap(~group)
plot1 + stat_fit_glance(method = "lm", label.x="right", label.y="bottom",
method.args = list(formula = formula1),
aes(label = sprintf('R^2~"="~%.3f~~italic(p)~"="~%.2f',
stat(..r.squared..),stat(..p.value..))),
parse = TRUE)
When the position of the labels is set automatically, the npcy position is increased for each level in the grouping variable. You map Site to the group aesthetic, as Site has 5 levels unevenly appearing in different facets, the rather crude algorithm in 'ggpmisc' positions the labels unevenly: the five rows correspond one to each of the five Sites. I have changed the mapping to use colour so that this becomes more obvious. I have also deleted all code that is irrelevant to this question.
plot1 <- ggplot(data=DF,
aes(x=time, y= value2, color=Site)) +
geom_smooth(se=F, alpha=0.8,
method="lm",formula=formula1) +
facet_wrap(~group)
plot1 +
stat_fit_glance(method = "lm", label.x="right", label.y="bottom",
method.args = list(formula = formula1),
aes(label = sprintf('R^2~"="~%.3f~~italic(p)~"="~%.2f',
stat(..r.squared..),stat(..p.value..))),
parse = TRUE) +
expand_limits(y = 110)
To use fixed positions one can pass the npcy coordinates if using the default "geom_text_npcy()" or passing data coordinates and using "geom_text()". One position corresponds to each level of the grouping factor Site. If the vector is shorter, it is recycled. Of course to fit more labels you can reduce the size of the text and add space by expanding the plotting area. In any case, in practice, you will need to indicate in a way or another which estimates correspond to which line.
plot1 +
stat_fit_glance(method = "lm", label.x="right", label.y= c(0.01, 0.06, 0.11, 0.01, 0.06),
method.args = list(formula = formula1),
aes(label = sprintf('R^2~"="~%.3f~~italic(p)~"="~%.2f',
stat(..r.squared..),stat(..p.value..))),
parse = TRUE, size = 2.5) +
expand_limits(y = 110)
Note: Error: Discrete value supplied to continuous scale when attempting to use
geom_text() is a bug in 'ggpmisc' that I fixed some days ago, but has not made it yet to CRAN (future version 0.3.3).

Plot timeseries and regression line for two groups of data

I have data from two sites across years (note the differences in sampling years). A sample is below:
df<- data.frame( year= c(seq(1997,2016,1), seq(2001,2017,1)),
site= c(rep("cr", 20),rep("ec", 17)),
mean= sample(1:50,37))
I would like to make a time series-like graph of mean for each year. Each data point would be connected (in the typical zig-zag fashion of time-series graphs) and then a regression line is superimposed to indicate the trend. I have created a time series-like plot using ggplot (I do not mind a solution from base package), but I am having trouble superimposing a dashed-regression line for each site without error.
Here is the code I have tried:
f1 <- ggplot(data = df, aes(x = year, y = mean, group= site, color=
site))+
geom_line(aes(color=site)) +
geom_point( aes(color=site),size=0.5)+
geom_smooth(method = "lm", se = FALSE, size= 0.5, aes(fill=site,
linetype= 2 ))+
scale_linetype_manual(values=c("solid", "solid"))+
scale_color_manual(values=c("#CC0000", "#000000"))+
theme_minimal()+
scale_x_continuous("Year",limits = c(1997, 2020), breaks =
seq(1995,2020,5)) +
scale_y_continuous("Mean Monthly Abundance", limits = c(0, 1500),
breaks=seq(0, 1500, by = 100)) +
theme_bw()+
theme(axis.line = element_line(colour = "black"),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank())
f1
A few details I would like this graph to illustrate:
Each group (site) will have a different color (black ,red) for the points and the line connecting each point
The regression lines for each group (site) will be dashed and match the color specified above.
The regression lines should NOT extend to the y-axis and be limited to the length the of the data
-Points do not need to be visible. Only the line connecting each point should be visible.
Preferably the dashed regression line will NOT display the shaded 95% CI.
As #kath stated, adding linetype = "dashed" would fix it. I've made some minor modifications to the code as well:
ggplot(data = df, aes(x = year, y = mean, group= site, color = site))+
geom_line() +
geom_point(size=0.5)+
geom_smooth(method = "lm", se = FALSE, size= 0.5, linetype = "dashed")+
scale_color_manual(values=c("#CC0000", "#000000"))+
theme_minimal()+
scale_x_continuous("Year",limits = c(1997, 2020), breaks =
seq(1995,2020,5)) +
scale_y_continuous("Mean Monthly Abundance", limits = c(0, 1500),
breaks=seq(0, 1500, by = 100)) +
theme_bw()+
theme(axis.line = element_line(colour = "black"),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank())

Overlaying Pie Charts in ggplot2

I am making a pie chart to go along with a series of plots all made in ggplot2. The data I'm using have two categories broken in to a total of three subcategories. Basically, the data look like this:
Category Category_Value Super_Category
<fctr> <dbl> <dbl>
1 A 0.03733874 1
2 B 0.66732754 0
3 C 0.29533372 1
Here is the basic pie chart I have at the subcategory level:
And here is what I'd like to have (or something similar):
I had never made a pie chart in ggplot2 before, so here is my basic code to generate the top plot:
pie.chart <- ggplot(pie.data, aes(x = "", y = Category_Value, fill = Category, width = 1)) +
geom_bar(width = 1, stat = "identity") +
coord_polar("y", start = 0) +
scale_fill_manual(values = c("#4DAF4A", "#377EB8", "#E41A1C")) +
theme_bw() +
theme(
axis.title.x = element_blank(),
axis.title.y = element_blank(),
panel.border = element_blank(),
panel.grid = element_blank(),
axis.ticks = element_blank()
)
Is this something that's doable? I messed around with making another plot grouped at the major category level and overlaying them without success.
You could use annotate to get an approximation of your picture.
Firstly I've used your small subset of data
pie.data <- data.frame(
Category = c("A", "B", "C"),
Category_Value = c(0.03733874, 0.66732754, 0.29533372),
Super_Category = c(1,0,1))
Then I've appplied your code
pie.chart <- ggplot(pie.data, aes(x = "", y = Category_Value, fill = Category, width = 1)) +
geom_bar(width = 1, stat = "identity") +
coord_polar("y", start = 0) +
scale_fill_manual(values = c("#4DAF4A", "#377EB8", "#E41A1C")) +
theme_bw() +
theme(
axis.title.x = element_blank(),
axis.title.y = element_blank(),
panel.border = element_blank(),
panel.grid = element_blank(),
axis.ticks = element_blank()
)
And Finnaly I drew line using
pie.chart + annotate("rect", xmin = 1.5, xmax = 1.9, ymin = 0,2, ymax = 0.30, alpha=0,colour = "black")
And the output:
Note that because you have more data that my sample you will have to play with the settings values of ymin = 0,2, ymax = 0.30 in annotate in order your line will cover values that you want.

ggplot2 x - y axis intersect while keeping axis labels

I posted my original question yesterday which got solved perfectly here
Original post
I made a few addition to my code
library(lubridate)
library(ggplot2)
library(grid)
### Set up dummy data.
dayVec <- seq(ymd('2016-01-01'), ymd('2016-01-10'), by = '1 day')
dayCount <- length(dayVec)
dayValVec1 <- c(0,-0.22,0.15,0.3,0.4,0.10,0.17,0.22,0.50,0.89)
dayValVec2 <- c(0,0.2,-0.17,0.6,0.16,0.41,0.55,0.80,0.90,1.00)
dayValVec3 <- dayValVec2
dayDF <- data.frame(Date = rep(dayVec, 3),
DataType = factor(c(rep('A', dayCount), rep('B', dayCount), rep('C', dayCount))),
Value = c(dayValVec1, dayValVec2, dayValVec3))
ggplot(dayDF, aes(Date, Value, colour = DataType)) +
theme_bw() +
ggtitle("Cumulative Returns \n") +
scale_color_manual("",values = c("#033563", "#E1E2D2", "#4C633C"),
labels = c("Portfolio ", "Index ", "In-Sample ")) +
geom_rect(aes(xmin = ymd('2016-01-01'),
xmax = ymd('2016-01-06'),
ymin = -Inf,
ymax = Inf
), fill = "#E1E2D2", alpha = 0.03, colour = "#E1E2D2") +
geom_line(size = 2) +
scale_x_datetime(labels = date_format('%b-%d'),
breaks = date_breaks('1 day'),
expand = c(0,0)) +
scale_y_continuous( expand = c(0,0), labels = percent) +
theme(axis.text.x = element_text(angle = 90),
axis.title.x = element_blank(),
axis.title.y = element_blank(),
panel.grid.minor = element_blank(),
panel.grid.major.x = element_blank(),
axis.line = element_line(size = 1),
axis.ticks = element_line(size = 1),
axis.text = element_text(size = 20, colour = "#033563"),
axis.title.x = element_text(hjust = 2),
plot.title = element_text(size = 40, face = "bold", colour = "#033563"),
legend.position = 'bottom',
legend.text = element_text(colour = "#033563", size = 20),
legend.key = element_blank()
)
which produces this output
The only thing that I still cannot get working is the position of the x axis. I want the x axis to be at y = 0 but still keep the x axis labels under the chart, exactly as in the excel version of it. I know the data sets are not the same but I didn't have the original data at hand so I produced some dummy data. Hope this was worth a new question, thanks.
> grid.ls(grid.force())
GRID.gTableParent.12660
background.1-5-7-1
spacer.4-3-4-3
panel.3-4-3-4
grill.gTree.12619
panel.background.rect.12613
panel.grid.minor.y.zeroGrob.12614
panel.grid.minor.x.zeroGrob.12615
panel.grid.major.y.polyline.12617
panel.grid.major.x.zeroGrob.12618
geom_rect.rect.12607
GRID.polyline.12608
panel.border.rect.12610
axis-l.3-3-3-3
axis.line.y.polyline.12631
axis
axis-b.4-4-4-4
axis.line.x.polyline.12624
axis
xlab.5-4-5-4
ylab.3-2-3-2
guide-box.6-4-6-4
title.2-4-2-4
> grid.gget("axis.1-1-1-1", grep=T)
NULL
ggplot2 doesn't make this easy. Below is one-way to approach this interactively. Basically, you just grab the relevant part of the plot (the axis line and ticks) and reposition them.
If p is your plot
p
grid.force()
# grab the relevant parts - have a look at grid.ls()
tck <- grid.gget("axis.1-1-1-1", grep=T)[[2]] # tick marks
ax <- grid.gget("axis.line.x", grep=T) # x-axis line
# add them to the plot, this time suppressing the x-axis at its default position
p + lapply(list(ax, tck), annotation_custom, ymax=0) +
theme(axis.line.x=element_blank(),
axis.ticks.x=element_blank())
Which produces
A quick note: the more recent versions of ggplot2 have the design decision to not show the axis. Also changes to axis.line are not automatically passed down to the x and y axis. Therefore, I tweaked your theme to define axis.line.x and axis.line.y separately.
That siad, perhaps its easier (and more robust??) to use geom_hline as suggested in the comments, and geom_segment for the ticks.

Resources