How to create odds ratio and 95 % CI plot in R - r

I have estimates of odds ratio with corresponding 95% CI of six pollutants overs 4 lag periods. How can I create a vertical plot similar to the attached figure in R? The figure below was created in SPSS.
Sample data that produced the figure is the following:
lag pollut or lcl ucl
0 CO 0.97 0.90 1.06
0 PM10 1.00 0.91 1.09
0 NO 0.97 0.92 1.02
0 NO2 1.01 0.89 1.15
0 SO2 0.97 0.85 1.11
0 Ozone 1.00 0.87 1.15
1 CO 1.03 0.95 1.10
1 PM10 0.93 0.86 1.01
1 NO 1.01 0.97 1.06
1 NO2 1.08 0.97 1.20
1 SO2 0.94 0.84 1.04
1 Ozone 0.94 0.84 1.04
2 CO 1.09 1.02 1.16
2 PM10 1.04 0.96 1.13
2 NO 1.04 1.00 1.08
2 NO2 1.07 0.96 1.18
2 SO2 1.05 0.95 1.17
2 Ozone 0.93 0.84 1.03
3 CO 0.98 0.91 1.06
3 PM10 1.14 1.05 1.24
3 NO 0.99 0.95 1.04
3 NO2 1.01 0.91 1.12
3 SO2 1.11 1.00 1.23
3 Ozone 1.00 0.90 1.11

You can also do this with ggplot2. The code is somewhat shorter:
dat <- read.table("clipboard", header = T)
dat$lag <- paste0("L", dat$lag)
library(ggplot2)
ggplot(dat, aes(x = pollut, y = or, ymin = lcl, ymax = ucl)) + geom_pointrange(aes(col = factor(lag)), position=position_dodge(width=0.30)) +
ylab("Odds ratio & 95% CI") + geom_hline(aes(yintercept = 1)) + scale_color_discrete(name = "Lag") + xlab("")
EDIT: Here is a version is closer to the SPSS figure:
ggplot(dat, aes(x = pollut, y = or, ymin = lcl, ymax = ucl)) + geom_linerange(aes(col = factor(lag)), position=position_dodge(width=0.30)) +
geom_point(aes(shape = factor(lag)), position=position_dodge(width=0.30)) + ylab("Odds ratio & 95% CI") + geom_hline(aes(yintercept = 1)) + xlab("")

Assuming your data are in datf...
I'd sort it first into just what you want order wise.
datf <- datf[order(datf$pollut, datf$lag), ]
You want a space before and after every lab grouping so I'd add some extra rows in that are NA. That makes it easier because then you'll automatically have blanks in your plot calls.
datfPlusNA <- lapply(split(datf, datf$pollut), function(x) rbind(NA, x, NA))
datf <- do.call(rbind, datfPlusNA)
Now that you have your data.frame sorted and with the extra NAs the plotting is easy.
nr <- nrow(datf) # find out how many rows all together
with(datf, {# this allows entering your commands more succinctly
# first you could set up the plot so you can select the order of drawing
plot(1:nr, or, ylim = c(0.8, 1.3), type = 'n', xaxt = 'n', xlab = '', ylab = 'Odds Ratio and 95% CI', frame.plot = TRUE, panel.first = grid(nx = NA, ny = NULL))
# arrows(1:nr, lcl, 1:nr, ucl, length = 0.02, angle = 90, code = 3, col = factor(lag))
# you could use arrows above but you don't want ends so segments is easier
segments(1:nr, lcl, 1:nr, ucl, col = factor(lag))
# add your points
points(1:nr, or, pch = 19, cex = 0.6)
xLabels <- na.omit(unique(pollut))
axis(1, seq(4, 34, by = 6) - 0.5, xLabels)
})
abline(h = 1.0)
There are packages that make this kind of thing easier but if you can do it like this you can start doing any graphs that you can imagine.

Related

Need to put asterisk on the top of ggplot barplot to flag the level of significance (pvalue)?

I have a lm model results containing R2 and pvalue, and I plotted them in a bar plot. I have then facetted them using two discrete variables.
I want to put * on the top of bars to flag statistical significance (pvlue <= 0.05), as shown on the bottom-left-most panel of the below image.
I have not found an insightful tutorial on how to do this.
Any way to do this, please?
Here is some code I used
> head(res_all_s2)
WI aggre_per Season yield_level slope Intercept r.squared
1 R IDW2 Dec Season2 Region II -7.06 6091 0.41
2 R IDW2 Dec Season2 Region I -7.29 6280 0.40
3 GDD AS OND Season2 Region II 14.23 -18270 0.34
4 GDD AS Nov Season2 Region II 36.84 -14760 0.33
5 SPI1 IDW2 Dec Season2 Region II -405.10 5358 0.31
6 SPI1 IDW2 Dec Season2 Region I -421.70 5523 0.32
adj.r.squared fstatistic.value pval pearson
1 0.36 9.58 0.01 -0.64
2 0.36 9.49 0.01 -0.64
3 0.29 7.09 0.02 0.58
4 0.28 6.97 0.02 0.58
5 0.26 6.40 0.02 -0.56
6 0.27 6.51 0.02 -0.56
> # significance (pval <= 0.05)
> signif_reg <- res_all_s2 %>% filter(pval <= 0.05)
> head(signif_reg)
WI aggre_per Season yield_level slope Intercept r.squared
1 R IDW2 Dec Season2 Region II -7.06 6091 0.41
2 R IDW2 Dec Season2 Region I -7.29 6280 0.40
3 GDD AS OND Season2 Region II 14.23 -18270 0.34
4 GDD AS Nov Season2 Region II 36.84 -14760 0.33
5 SPI1 IDW2 Dec Season2 Region II -405.10 5358 0.31
6 SPI1 IDW2 Dec Season2 Region I -421.70 5523 0.32
adj.r.squared fstatistic.value pval pearson
1 0.36 9.58 0.01 -0.64
2 0.36 9.49 0.01 -0.64
3 0.29 7.09 0.02 0.58
4 0.28 6.97 0.02 0.58
5 0.26 6.40 0.02 -0.56
6 0.27 6.51 0.02 -0.56
>
> # Plot R2
>
> r <- res_all_s2 %>% ggplot(aes(x=aggre_per,
+ y=r.squared )) +
+ geom_bar(stat="identity", width=0.8) +
+ facet_grid(yield_level ~ WI,
+ scales = "free_y",
+ switch = "y") +
+ scale_y_continuous(limits = c(0, 1)) +
+ xlab("Aggregation period") +
+ ylab(expression(paste("R-squared"))) +
+ theme_bw() +
+ theme(axis.title = element_text(size = 12), # all titles
+ axis.text = element_text(colour = "black"),
+ axis.text.x = element_text(angle = 90, vjust = 0.5,
+ hjust = 1, color = "black"),
+ strip.text.y.left = element_text(angle = 0),
+ panel.border = element_rect(color = "black",
+ size = .5))
> r
And, here is the link to my res_all_s2 dataset https://1drv.ms/u/s!Ajl_vaNPXhANgckJeqDKA0fzfFEbhg?e=VfoFaB
Technically, you can always add an appropriate geom with its independent dataset (that would be your data filtered to exclude pval > .05):
df_filtered <- res_all_s2 %>% filter(...)
## ggplot(...) +
geom_point(data = df_filtered, pch = 8)
## pch = point character, no. 8 = asterisk
or
## ... +
geom_text(data = df_filtered, aes(label = '*'), nudge_y = .05)
## nudge_y = vertical offset
or color only significant columns:
## ... +
geom_col(aes(fill = c('grey','red')[1 + pval <= .05]))
So, yes, technically that's feasible. But before throwing the results of 13 x 7 x 5 = 455 linear models at your audience, please consider the issues of p-hacking, the benefits of multivariate analysis and the viewers' ressources ;-)

How to make a forest plot from logistic regression results before and after adjusting for covariates

I have a result that I intend to have a forest plot from. This is the result of logistic regression (OR[95%CI]) in which I examined the association of three groups with several treatments before and after adjustment for covariates. The table is as follows. I used the following codes but the result of before and after are shown as merged lines without any labels that make the figure difficult to understand. How can I split the figure to before and after adjustment results?
Treatment
OR
Low_CI
High_CI
P_value
Group
Adjustment
A
0.78
0.64
0.94
0.011
1
Before
B
0.91
0.84
0.98
0.025
1
Before
C
1.63
1.19
2.21
0.0019
2
Before
A
1.24
1.03
1.50
0.02
2
Before
C
0.46
0.25
0.79
0.008
4
Before
B
0.88
0.79
0.98
0.025
1
After
C
1.72
1.19
2.44
0.002
2
After
D
0.89
0.79
0.99
0.03
2
After
c
0.48
0.23
0.87
0.027
4
After
data$Sortnum <- nrow(data) - as.numeric(rownames(data))
data$Label <- fct_reorder(data$Treatment,data$Sortnum)
ggplot(data ,aes(x=Label, y=OR, color =Group )) + geom_point(size=1) + geom_line(size=2) + coord_flip() + geom_errorbar(aes(ymin=Low.CI, ymax=Up.CI),width=0.2,cex=1, color = 'gray50')+ ylab("OR(95% Confidence Interval)")+ geom_hline(aes(fill = Label),yintercept =1, linetype=2)+ facet_grid(Group~., scales = "free_y", space = "free_y") + scale_alpha_identity()+ scale_y_log10(breaks=c(0.2,0.4,0.6,0.8,1.0,1.2,1.4, 1.6, 2.0, 2.4),position="left",limits=c(0.2,2.5))+ theme(panel.background = element_blank(), axis.line = element_line(colour = "black"),axis.title=element_text(size=20),text = element_text(size=18), axis.title.y=element_blank(),axis.text.x = element_text(colour="grey20",size=16),axis.text.y = element_text(colour="black",size=20), legend.position=c(0.25, 0.2), legend.title=element_blank(), legend.text = element_text(size=16), legend.background = element_rect(color = "black",fill = "white", size = 0.3, linetype = "solid")
)

several return plotting in R

Try to plot very basic data in R.
Year X1 X2 X3 X4 X5 X6 X7
2004 0.91 0.23 0.28 1.02 0.90 0.95 0.94
2005 0.57 -0.03 0.88 0.52 0.47 0.55 0.56
2006 1.30 -0.43 1.95 1.27 1.00 1.19 1.26
2007 0.44 0.63 0.60 0.34 0.60 0.50 0.46
2008 1.69 0.34 -2.81 -2.41 -1.80 -1.87 -1.83
What I am looking for is a basic line chart over time with x = year and y = value and the chart itself should include all X1-X7.
I was looking at the ggplot2 functionality, but I don't know where to start.
# Libraries
library(tidyverse)
library(streamgraph)
library(viridis)
library(plotly)
# Plot
p <- data %>%
ggplot(aes(x = year, y = n) +
geom_area() +
scale_fill_viridis(discrete = TRUE) +
theme(legend.position = "none") +
ggtitle("multiple X over time") +
theme_ipsum() +
theme(legend.position = "none")
ggplotly(p, tooltip = "text")
Would anyone give me a hand on it, please? Is there an easy way to do it in basic R?
Thanks.
It’s unclear what you intend to do with geom_area but in the following you’ll see a basic line chart of the data you’ve shown.
The key point is that ‘ggplot2’ works on tidy data, so you need to first transform your data into long form:
data %>%
pivot_longer(-Year, names_to = 'Vars', values_to = 'Values') %>%
ggplot() +
aes(x = Year, y = Values, color = Vars) +
geom_line()

Different colors in ggplot2 using groups

I have a problem trying to use different colors in my plot for two groups. I created a plot with odds ratios (including 95%CI) over a period of serveral years for 2 groups (mfin and ffin). When using the syntax below, all points and lines are black and my attempts to adjust them e.g. geom_linerange(colour=c("red","blue")) have failed (Error: Incompatible lengths for set aesthetics: colour).
Can anyone help me with this?
ggplot(rbind(data.frame(mfin, group=mfin), data.frame(ffin, group=ffin)),
aes(x = JAAR, y = ror, ymin = llror, ymax = ulror)) +
geom_linerange() +
geom_point() +
geom_hline(yintercept = 1) +
ylab("Odds ratio & 95% CI") +
xlab("") +
geom_errorbar(width=0.2)
Below are some sample data (1st group = mfin, #ND GROUP + ffin)
JAAR ror llror ulror
2008 2.00 1.49 2.51
2009 2.01 1.57 2.59
2010 2.06 1.55 2.56
2011 2.07 1.56 2.58
2012 2.19 1.70 2.69
2013 2.23 1.73 2.72
2014 2.20 1.71 2.69
2015 2.31 1.84 2.78
2016 .230 1.83 2.76
JAAR ror llror ulror
2008 1.36 0.88 1.84
2009 1.20 0.73 1.68
2010 1.16 0.68 1.64
2011 1.23 0.77 1.69
2012 1.43 1.00 1.86
2013 1.46 1.04 1.88
2014 1.49 1.07 1.90
2015 1.30 0.89 1.70
2016 1.29 0.89 1.70
You need to map the group membership variable to the color aesthetic (in the long version of the data):
library(readr)
library(dplyr)
library(ggplot2)
# simulate some data
year_min = 1985
year_max = 2016
num_years = year_max - year_min + 1
num_groups = 2
num_estimates = num_years*num_groups
df_foo = data_frame(
upper_limit = runif(n = num_estimates, min = -20, max = 20),
lower_limit = upper_limit - runif(n = num_estimates, min = 0, max = 5),
point_estimate = runif(num_estimates, min = lower_limit, max = upper_limit),
year = rep(seq(year_min, year_max), num_groups),
group = rep(c("mfin", "ffin"), each = num_years)
)
# plot the confidence intervals
df_foo %>%
ggplot(aes(x = year, y = point_estimate,
ymin = lower_limit, ymax = upper_limit,
color = group)) +
geom_point() +
geom_errorbar() +
theme_bw() +
ylab("Odds Ratio & 95% CI") +
xlab("Year") +
scale_color_discrete(name = "Group")
This produces what I think you are looking for, except the simulated data makes it look somewhat messy:

How to add shaded confidence intervals to line plot with specified values

I have a small table of summary data with the odds ratio, upper and lower confidence limits for four categories, with six levels within each category. I'd like to produce a chart using ggplot2 that looks similar to the usual one created when you specify a lm and it's se, but I'd like R just to use the pre-specified values I have in my table. I've managed to create the line graph with error bars, but these overlap and make it unclear. The data look like this:
interval OR Drug lower upper
14 0.004 a 0.002 0.205
30 0.022 a 0.001 0.101
60 0.13 a 0.061 0.23
90 0.22 a 0.14 0.34
180 0.25 a 0.17 0.35
365 0.31 a 0.23 0.41
14 0.84 b 0.59 1.19
30 0.85 b 0.66 1.084
60 0.94 b 0.75 1.17
90 0.83 b 0.68 1.01
180 1.28 b 1.09 1.51
365 1.58 b 1.38 1.82
14 1.9 c 0.9 4.27
30 2.91 c 1.47 6.29
60 2.57 c 1.52 4.55
90 2.05 c 1.31 3.27
180 2.422 c 1.596 3.769
365 2.83 c 1.93 4.26
14 0.29 d 0.04 1.18
30 0.09 d 0.01 0.29
60 0.39 d 0.17 0.82
90 0.39 d 0.2 0.7
180 0.37 d 0.22 0.59
365 0.34 d 0.21 0.53
I have tried this:
limits <- aes(ymax=upper, ymin=lower)
dodge <- position_dodge(width=0.9)
ggplot(data, aes(y=OR, x=days, colour=Drug)) +
geom_line(stat="identity") +
geom_errorbar(limits, position=dodge)
and searched for a suitable answer to create a pretty plot, but I'm flummoxed!
Any help greatly appreciated!
You need the following lines:
p<-ggplot(data=data, aes(x=interval, y=OR, colour=Drug)) + geom_point() + geom_line()
p<-p+geom_ribbon(aes(ymin=data$lower, ymax=data$upper), linetype=2, alpha=0.1)
Here is a base R approach using polygon() since #jmb requested a solution in the comments. Note that I have to define two sets of x-values and associated y values for the polygon to plot. It works by plotting the outer perimeter of the polygon. I define plot type = 'n' and use points() separately to get the points on top of the polygon. My personal preference is the ggplot solutions above when possible since polygon() is pretty clunky.
library(tidyverse)
data('mtcars') #built in dataset
mean.mpg = mtcars %>%
group_by(cyl) %>%
summarise(N = n(),
avg.mpg = mean(mpg),
SE.low = avg.mpg - (sd(mpg)/sqrt(N)),
SE.high =avg.mpg + (sd(mpg)/sqrt(N)))
plot(avg.mpg ~ cyl, data = mean.mpg, ylim = c(10,30), type = 'n')
#note I have defined c(x1, x2) and c(y1, y2)
polygon(c(mean.mpg$cyl, rev(mean.mpg$cyl)),
c(mean.mpg$SE.low,rev(mean.mpg$SE.high)), density = 200, col ='grey90')
points(avg.mpg ~ cyl, data = mean.mpg, pch = 19, col = 'firebrick')

Resources