Error when using multiple datasets to plot polygon annotation on ggplot2 - r

I am creating a forest plot for a meta-analysis using ggplot2. I want to manually add a skewed diamond shape (asymmetric on the y-scale) to represent an effect size and confidence interval.
I can draw the forest plot and add four segments to create the diamond but this doesn't give a nice clear, sharp diamond. Instead I've used geom_polygon with a set of co-ordinates in a second dataframe. When I try to write to pdf I receive the following error
summarydiamond <- data.frame(
x = c(sleepstress.r.CI.L, sleepstress.r.estimate, sleepstress.r.CI.U, sleepstress.r.estimate, sleepstress.r.CI.L),
y = c(-1, -1.5, -1, -0.5, -1)
)
forest.plot <-
dat.sleepstress %>%
ggplot(aes(x = rev(key.pairing), y = r, ymin = r.CI.lower, ymax = r.CI.upper))+
geom_errorbar(width = 0.5) +
geom_point(aes(size = r.weights)) +
scale_size(range = c(1, 7)) +
geom_hline(yintercept = 0) +
theme_minimal() +
coord_flip() +
theme(legend.position = "none") +
labs(x = "", y = "Correlation coefficient") +
theme(text = element_text(size=14)) +
scale_x_discrete(limits=rev) +
geom_text(aes(label = paste0(format(round(r, 2),nsmall = 2),
" (",
format(round(r.CI.lower, 2),nsmall = 2),
", ",
format(round(r.CI.upper, 2),nsmall = 2),
")"),
y = 0.85),
hjust="inward") +
geom_polygon(aes(x=x, y=y), data = summarydiamond)
pdf(file = 'forestplot.pdf', width = 10, height = 10)
forest.plot
dev.off()
Output:
forest.plot
Error in FUN(X[[i]], ...) : object 'r.CI.lower' not found
I've tried adding the data= argument to all of the geom_ calls but this doesn't fix it.

Related

How to add OR and 95% CI as text outside a forest plot?

I have previously asked a similar question, which was "how to add OR and CI 95% as text into a forest plot".
In that previous question, I got my codes from a third question by someone named stupidwolf.
I used his codes to get a forest plot, BUT without OR and CI as text. This is the codes I used from stupidwolf, which worked for me.
library('ggplot2')
Outcome_order <- c('Outcome C', 'Outcome A', 'Outcome B', 'Outcome D')
#this is the first dataset you have
df1 <- data.frame(Outcome=c("Outcome A", "Outcome B", "Outcome C", "Outcome D"),
OR=c(1.50, 2.60, 1.70, 1.30),
Lower=c(1.00, 0.98, 0.60, 1.20),
Upper=c(2.00, 3.01, 1.80, 2.20))
# add a group column
df1$group <- "X"
# create a second dataset, similar format to first
df2 <- df1
# different group
df2$group <- "Y"
# and we adjust the values a bit, so it will look different in the plot
df2[,c("OR","Lower","Upper")] <- df2[,c("OR","Lower","Upper")] +0.5
# combine the two datasets
df = rbind(df1,df2)
# you can do the factoring here
df$Outcome = factor (df$Outcome, level=Outcome_order)
#define colours for dots and bars
dotCOLS = c("#a6d8f0","#f9b282")
barCOLS = c("#008fd5","#de6b35")
p <- ggplot(df, aes(x=Outcome, y=OR, ymin=Lower, ymax=Upper,col=group,fill=group)) +
#specify position here
geom_linerange(size=5,position=position_dodge(width = 0.5)) +
geom_hline(yintercept=1, lty=2) +
#specify position here too
geom_point(size=3, shape=21, colour="white", stroke = 0.5,position=position_dodge(width = 0.5)) +
scale_fill_manual(values=barCOLS)+
scale_color_manual(values=dotCOLS)+
scale_x_discrete(name="(Post)operative outcomes") +
scale_y_continuous(name="Odds ratio", limits = c(0.5, 5)) +
coord_flip() +
theme_minimal()
Then I asked in my previous question, if someone could help me with adding the OR and CI as text on the forest plot, which Allan Cameron helped me with.
This almost solved my problem.
So what I did was this, as he suggested me to do and it worked for me as well:
ggplot(df, aes(x = Outcome, y = OR, ymin = Lower, ymax = Upper,
col = group, fill = group)) +
geom_linerange(linewidth = 5, position = position_dodge(width = 0.5)) +
geom_hline(yintercept = 1, lty = 2) +
geom_point(size = 3, shape = 21, colour = "white", stroke = 0.5,
position = position_dodge(width = 0.5)) +
geom_text(aes(y = 3.75, group = group,
label = paste0("OR ", round(OR, 2), ", (", round(Lower, 2),
" - ", round(Upper, 2), ")")), hjust = 0,
position = position_dodge(width = 0.5), color = "black") +
scale_fill_manual(values = barCOLS) +
scale_color_manual(values = dotCOLS) +
scale_x_discrete(name = "(Post)operative outcomes") +
scale_y_continuous(name = "Odds ratio", limits = c(0.5, 5)) +
coord_flip() +
theme_minimal()
And I get this forest plot
As you can see on the forest plot the OR and CI text is inside the plot area. So I have following questions that I hope someone can help me to fix:
How to add one title "OR" above all the OR values instead of it is written for each OR value?
How can I plot the OR and CI text outside the plot, like on the side to the right. Because on my real plot I have very long CI unfortunately, so I can't avoid the text merging with the horizontal CI lines. If I start moving the OR text by changing the y = 3.75 position more to the right, then the OR and 95%CI text starts to disappear (half of it), because it gets pushed out of the plot. So I was thinking if I could plot it outside the plot, then it will solve the issue maybe? But how?
This is the link to my previous question if necessary: How to add OR and 95% CI as text into a forest plot?
Using the patchwork package.
library(ggplot2)
library(patchwork)
p1 <- ggplot(df, aes(x = Outcome, y = OR, ymin = Lower, ymax = Upper,
col = group, fill = group)) +
geom_linerange(size = 5, position = position_dodge(width = 0.5)) +
geom_hline(yintercept = 1, lty = 2) +
geom_point(size = 3, shape = 21, colour = "white", stroke = 0.5,
position = position_dodge(width = 0.5)) +
scale_fill_manual(values = barCOLS) +
scale_color_manual(values = dotCOLS) +
scale_x_discrete(name = "(Post)operative outcomes") +
scale_y_continuous(name = "Odds ratio", limits = c(0.5, 5)) +
coord_flip() +
theme_minimal()
p2 <- ggplot(df, aes(x = Outcome, y = 1.25, ymin = Lower, ymax = Upper)) +
geom_text(aes(group = group,
label = paste0(round(OR, 2), ", (", round(Lower, 2),
" - ", round(Upper, 2), ")")),
position = position_dodge(width = 0.5), color = "black") +
labs(title = "OR") +
coord_flip() +
theme_void()
p1 + p2

Not smooth density plot using ggplot2

When I try to plot the density of some numerical data either using geom_density() or stat_density(), I get a non-smooth curve. Using adjust do not change this.
Here I've used facet_zoom(), but also coord_cartesian(xlim = c(...)) produces this non-smooth curve. Pretty weird in my opinion. Any suggestions what's going on?
https://drive.google.com/file/d/1PjQp7XkY5G21NoIo8y8lyeaXKvuvrqVk/view?usp=sharing
Edit: I have uploaded 50000 rows of the original data. To reproduce the plot (not using ggforce), use the code:
data <- read.table("rep.txt")
(
ggplot(data, aes(x = x))
+ geom_density(adjust = 1, fill = "grey")
+ coord_cartesian(xlim = c(-50000,50000))
+ labs(x = "", y = "")
+ theme_bw()
)
I reproduced your code but was unable to reproduce the exact image in your original question. Are you concerned about the lack of smoothness at the very tip of the geom_density plot? There are other arguments you can try like kernel and bw, but the sheer number of zeroes in your data will make it hard to achieve a smooth curve (unless you ramp up your adjust value).
library(tidyverse)
options(scipen = 999999)
# https://stackoverflow.com/questions/33135060/read-csv-file-hosted-on-google-drive
id <- "1PjQp7XkY5G21NoIo8y8lyeaXKvuvrqVk" # google file ID
data <- read.table(sprintf("https://docs.google.com/uc?id=%s&export=download", id)) %>%
rownames_to_column(var = "var")
ggplot(data, aes(x = x)) +
geom_density(
adjust = 10,
fill = "grey",
kernel = "cosine",
bw = "nrd0") +
coord_cartesian(xlim = c(-50000,50000)) +
labs(x = "", y = "") + theme_bw()
# I didn't export images for these, but they showcase how many zeroes you have
ggplot(data, aes(x = x)) +
geom_histogram(bins = 1000) +
coord_cartesian(xlim = c(0,50000)) +
labs(x = "", y = "") + theme_bw()
ggplot(data, aes(x = x)) +
geom_freqpoly(bins = 1000) +
coord_cartesian(xlim = c(0,50000)) +
labs(x = "", y = "") + theme_bw()

Why is fullrange=TRUE not working for geom_smooth in ggplot2?

I have a plot where I am plotting both the linear regressions for each level of a variable as well as the linear regression for the total sample.
library(ggplot2);library(curl)
df<-read.csv(curl("https://raw.githubusercontent.com/megaraptor1/mydata/main/example.csv"))df$group<-as.factor(df$group)
ggplot(df,aes(x,y))+
geom_point(size=2.5,shape=21,aes(fill=group),col="black")+
geom_smooth(formula=y~x,aes(col=group,group=group),method="lm",size=1,se=F)+
geom_smooth(formula=y~x,method="lm",col="black",size=1,fullrange=T,se=F)+
theme_classic()+
theme(legend.position = "none")
I am trying to extend the black line (which represents all specimens) to span the full range of the axes using the command fullrange=T. However, I have found the command fullrange=T is not working on this graph regardless of what I try. This is especially strange as I have not called any limits for the graph or set any additional global factors.
This question was the closest I was able to find to my current problem, but it does not appear to be describing the same issue because that issue had to do with how the limits of the graph were called.
This seems a bit heavy handed but allows you to extent your regression line to whatever limits you choose for the x axis.
The argument fullrange is not really documented very helpfully. If you have a look at http://www.mosaic-web.org/ggformula/reference/gf_smooth.html it appears that "fullrange" applies to the points in the dataframe that is used to generate the regression line. So in your case your regression line is extending to the "fullrange". It's just that your definition of "fullrange" is not quite the same as that used by geom_smooth.
library(ggplot2)
library(dplyr)
library(curl)
lm_formula <- lm(formula = y~x, data = df)
f_lm <- function(x){lm_formula$coefficients[1] + lm_formula$coefficients[2] * x}
df_lim <-
data.frame(x = c(0, 5)) %>%
mutate(y = f_lm(x))
ggplot(df,aes(x,y))+
geom_point(size=2.5,shape=21,aes(fill=group),col="black")+
geom_smooth(formula=y~x,aes(col=group,group=group),method="lm",size=1,se=F)+
geom_line(data = df_lim)+
coord_cartesian(xlim = df_lim$x, ylim = df_lim$y, expand = expansion(mult = 0))+
theme_classic()+
theme(legend.position = "none")
data
df<-read.csv(curl("https://raw.githubusercontent.com/megaraptor1/mydata/main/example.csv"))
df$group<-as.factor(df$group)
Created on 2021-04-05 by the reprex package (v1.0.0)
I had the same issue. Despite setting fullrange = TRUE, the line of best fit was only being drawn in the data range.
ggplot(data = df, aes(x = diameter, y = height)) +
geom_point(size = 2) +
geom_smooth(method = lm, se = FALSE, fullrange = TRUE) +
labs(x = "Diameter", y = "Height", title = "Tree Height vs. Diameter") +
theme(plot.title = element_text(hjust = 0.5, size = 15, face = 'bold'))
Bad plot: 1
Using scale_x_continuous() and scale_y_continuous() worked for me (thank you #markus). I added two lines of code, below geom_smooth(), to fix the issue.
ggplot(data = df, aes(x = diameter, y = height)) +
geom_point(size = 2) +
geom_smooth(method = lm, se = FALSE, fullrange = TRUE) +
scale_x_continuous(expand = c(0,0), limits=c(5, 32)) + #expand = c(num1,num2) => line of best fit stops being drawn at x = 32 + (32 - 5)*num1 + num2 = 32 + (32 - 5)*0 + 0 = 32
scale_y_continuous(expand = c(0,0), limits=c(7, 25)) + #expand = c(num1,num2) => line of best fit stops being drawn at y = 25 + (25 - 7)*num1 + num2 = 25 + (25 - 7)*0 + 0 = 25
labs(x = "Diameter", y = "Height", title = "Tree Height vs. Diameter") +
theme(plot.title = element_text(hjust = 0.5, size = 15, face = 'bold'))
Good plot: 2
Source: How does ggplot scale_continuous expand argument work?

Plotting power vs. effect size using R pwr package

I can successfully create plots of power vs. sample size in R using the pwr package. Example code below.
library(pwr)
library(tidyverse)
plot.out <- pwr.t2n.test(n1=30, n2=30, d=0.5, alternative="two.sided")
#See output in link below
plot(plot.out)
plot() output
I would like to create a similar plot -- a two-sample t-test in which effect size is on the y-axis and power is on the x-axis, with fixed sample sizes.
Is there a way to do this using pwr and/or the plot function? Or would I have to unlist the plot.out object and use it somehow?
I'm still new to power curves in R. Thanks in advance for any advice.
In the code below the power is computed in a loop on effect size d_seq. Then the power d is extracted from the results list, a data.frame is created and plotted.
library(pwr)
library(ggplot2)
d_seq <- seq(0, 2, by = 0.1)
pwr_list <- lapply(d_seq, function(d){
pwr.t2n.test(n1 = 30, n2 = 30,
d = d,
power = NULL,
sig.level = 0.05,
alternative = "two.sided")
})
pwr <- sapply(pwr_list, '[[', 'power')
dfpwr <- data.frame(power = pwr, effect.size = d_seq)
ggplot(dfpwr, aes(effect.size, power)) +
geom_point(size = 2, colour = "black") +
geom_line(size = 0.5, colour = "red") +
scale_y_continuous(labels = scales::percent) +
xlab("effect size") +
ylab(expression("test power =" ~ 1 - beta))
To draw a line where power is 80% and get the effect size, first compute the effect size from the pwr vector by linear interpolation.
pwr80 <- approx(x = pwr, y = d_seq, xout = 0.8)
Now create a label for geom_text and plot it.
lbl80 <- paste("Power = 80%\n")
lbl80 <- paste(lbl80, "Effect size =", round(pwr80$y, 2))
ggplot(dfpwr, aes(effect.size, power)) +
geom_point(size = 2, colour = "black") +
geom_line(size = 0.5, colour = "red") +
geom_hline(yintercept = 0.8, linetype = "dotted") +
geom_text(x = pwr80$y, y = pwr80$x,
label = lbl80,
hjust = 1, vjust = -1) +
scale_y_continuous(labels = scales::percent) +
xlab("effect size") +
ylab(expression("test power =" ~ 1 - beta))
To also draw a vertical line, add
geom_vline(xintercept = pwr80$y, linetype = "dotted")

Inserting a custom label on the y axis in ggplot2

Using ggplot2 in R, i'm trying to insert a red line that indicates the average of a chain. I would like to insert the average value close to the line so that it was not necessary to "deduct" the value.
I tried to use a negative coordinate for x, but it did not work, the value is behind the axis.
ggplot(data = chain.fmBC) +
geom_line(aes(1:25000, chain.fmBC$V2)) +
labs(y = "", x = "") +
labs(caption= "Bayes C") +
geom_hline(yintercept = mean(chain.fmBC$V2), colour = "RED") +
geom_text(label = round(mean(chain.fmBC$V2), 2),
x = 0, y = min(chain.fmBC$V2), colour = "RED")
this is a picture of my graph:
How could I put the value that is in red (media) to the left of the y-axis of the graph, between 0 and 5000, as if it were a label of the y-axis?
You can set your y axis ticks manually so that it includes the mean value. This will give you a nicely positioned annotation. If the real issue is the colored axis label, unfortunately this does not solve that
Example:
ggplot(mtcars, aes(disp)) +
geom_histogram() +
geom_hline(yintercept = 0.5, color = "red") +
scale_y_continuous(breaks = c(0,0.5,1,2,3,4)) +
theme(axis.text.y = element_text())
Which will give you this:
I was successful following the suggestions, I would like to share.
I got good help here.
cadeia.bayesc <- ggplot(data = chain.fmBC) + geom_line(aes(1:25000, chain.fmBC$V2)) +
theme(plot.margin = unit(c(0.5,0.5,0.5,1), "lines")) + # Make room for the grob
labs(y = "", x = "") + labs(caption= "Bayes C") +
cadeia.bayesc <- cadeia.bayesc + geom_hline(yintercept = mean(chain.fmBC$V2), colour = "RED") # insert the line
cadeia.bayesc <- cadeia.bayesc + annotation_custom( # grid::textgrob configure the label
grob = textGrob(label = round(mean(chain.fmBC$V2),2), hjust = 0, gp = gpar(cex = .7, col ="RED")),
xmin = -6000, xmax = -100, ymin = mean(chain.fmBC$V2), ymax = mean(chain.fmBC$V2))
# Code to override clipping
cadeia.bayesc.plot <- ggplot_gtable(ggplot_build(cadeia.bayesc))
cadeia.bayesc.plot$layout$clip[cadeia.bayesc.plot$layout$name == "panel"] <- "off"
grid.draw(cadeia.bayesc.plot)
result (https://i.imgur.com/ggbuNuK.jpg)

Resources