I'm trying to do almost exactly this. I have this graph:
From this data frame:
Data<-structure(list(Op = c("No", "No", "No", "No", "Yes", "Yes", "Yes",
"Yes"), Drug = c("No", "No", "Yes", "Yes", "No", "No", "Yes",
"Yes"), Follow = c("No", "Yes", "No", "Yes", "No", "Yes", "No",
"Yes"), n = c(46, 101, 25, 27, 2, 65, 2, 22), Percent = c(31.29251701,
68.70748299, 48.07692308, 51.92307692, 2.985074627, 97.01492537,
8.333333333, 91.66666667)), spec = structure(list(cols = list(
Op = structure(list(), class = c("collector_character", "collector"
)), Drug = structure(list(), class = c("collector_character",
"collector")), Follow = structure(list(), class = c("collector_character",
"collector")), n = structure(list(), class = c("collector_double",
"collector")), Percent = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), delim = ","), class = "col_spec"), problems = <pointer: 0x000001fa6e358930>, row.names = c(NA,
-8L), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"))
And the graph was made with this code:
Data %>%
ggplot(aes(x = Drug,
y = Percent,
fill = Follow)) +
geom_col(position = "dodge2") +
labs(x = "",
fill = "Follow",
title = "Drug vs Follow") +
geom_text(aes(label = paste0("n=",n)),
vjust = -0.2,
position = position_dodge(.9)) +
facet_grid(.~Op) +
geom_text(aes(label=paste0("n=",n)),
vjust = -0.2,
position = position_dodge(.9))
I'd love to manually draw in significance bars that look like this:
But I am having trouble following the code in that linked answer. When I add:
geom_signif(data = data.frame(Op = c("No","Yes")),
aes(y_position=c(5.3, 8.3),
xmin=c(0.8, 0.8),
xmax=c(1.2, 1.2),
annotations=c("**", "NS")),
tip_length=0,
manual = T)
to the end of my code, I get the error ' "Follow" not found '. (P.s. I'm sure the numbers were wrong and in wrong position, I was just hoping to get some bars and then I'd move them around)
Any help would be appreciated! Thank you!
The issue is that geom_signif inherits the global aesthetics you set inside ggplot(). In particular, as you set fill=Follow ggplot is expecting a variable with this name in the dataframe you passed via the data argument of geom_signif.
One option to solve this issue would be to make fill=Follow a local aesthetic to geom_col which also requires to map Follow on the group aes inside geom_text.
library(ggplot2)
library(ggsignif)
ggplot(Data, aes(x = Drug, y = Percent)) +
geom_col(aes(fill = Follow), position = "dodge2") +
labs(x = "", fill = "Follow", title = "Drug vs Follow") +
geom_text(aes(label = paste0("n=", n), group = Follow), vjust = -0.2, position = position_dodge(.9)) +
facet_grid(. ~ Op) +
geom_signif(data = data.frame(Op = c("No", "Yes")), aes(y_position = c(5.3, 8.3), xmin = c(0.8, 0.8), xmax = c(1.2, 1.2), annotations = c("**", "NS")), tip_length = 0, manual = T)
#> Warning: Ignoring unknown aesthetics: y_position, xmin, xmax, annotations
A second option would be to set inherit.aes=FALSE in geom_signif.
ggplot(Data, aes(x = Drug, y = Percent, fill = Follow)) +
geom_col(position = "dodge2") +
labs(x = "", fill = "Follow", title = "Drug vs Follow") +
geom_text(aes(label = paste0("n=", n)), vjust = -0.2, position = position_dodge(.9)) +
facet_grid(. ~ Op) +
geom_signif(data = data.frame(Op = c("No", "Yes")),
aes(y_position = c(5.3, 8.3), xmin = c(0.8, 0.8), xmax = c(1.2, 1.2),
annotations = c("**", "NS")), tip_length = 0, manual = T,
inherit.aes = FALSE)
#> Warning: Ignoring unknown aesthetics: y_position, xmin, xmax, annotations
Related
I'm using geom_label_repel to place percentages in a faceted horizontal bar chart, which mostly works great, except that I can't seem to find a way to shift the labels to the middle (or as close as possible) of each bar. Instead, they are justified right, which visually is a bit confusing as some bars are close to one another.
I've tried using h_just and v_just, but these just center the text within its box rather than the box within the plot. If I drop position_identity() in favour of nudge_x the bars all drop out, so that doesn't get me there. Assuming there is some simple way to do this, but I'm not seeing it!
Here's the function I've made to do the plot:
plot_single_result_with_facets <- function(x) {
x %>%
# we need to get the data including facet info in long format, so we use pivot_longer()
pivot_longer(!response, names_to = "bin_name", values_to = "b") %>%
# add counts for plot below
count(response, bin_name, b) %>%
# remove nas
filter(!is.na(response)) %>%
# add grouping by bins
group_by(bin_name,b) %>%
# calculate percentages
mutate(perc=paste0(round(n*100/sum(n),0),"%")) %>%
# run ggplot
ggplot(aes(x = n, y = "", fill = response, label = perc)) +
# reversing order here using forcats::fct_rev() note - needs to be changed under geom_label_repel as well
geom_col(position=position_fill(), aes(fill=forcats::fct_rev(response))) +
coord_cartesian(clip = "off") +
geom_vline(xintercept = x_limits, linetype = 3) +
geom_label_repel(
# important to make sure grouping of data matches grouping of labels so they aren't backwards
# reversing order here using forcats::fct_rev() note - needs to be changed above as well
aes(group = forcats::fct_rev(response), label = perc),
# justify text using center = 0.5, left = 0 and right = 1
hjust = 0.5,
vjust = 0.5,
direction = "y",
force = 1.5,
fill = "white",
# font size in the text labels
size = 1.5,
# allow labels to overlap
max.overlaps = Inf,
# make sure that bars are included
position = position_fill(),
# hide points
segment.size = 0.2,
point.size = NA,
# reduce padding around each text label
box.padding = 0.001
) +
scale_fill_brewer(palette="YlOrBr", direction = -1) +
scale_x_continuous(labels = scales::percent_format(), expand = c(0.05, 0.05)) +
facet_grid(vars(b), vars(bin_name), labeller=as_labeller(facet_names)) +
labs(title = title, caption = caption, x = "", y = "") +
guides(fill = guide_legend(title = NULL)) +
theme_classic()
}
And the code that uses it:
caption <- NULL
df <- select(data, Q51_bin, Q52_bin, Q57_bin, Q53_bin, Q4)
df <- as_factor(df)
names(df) <- c("Q51_bin", "Q52_bin", "Q57_bin", "Q53_bin", "response")
facet_names <- c(`Q51_bin` = "Nature Relatedness", `Q52_bin` = "Spirituality", `Q57_bin` = "Religiosity", `Q53_bin` = "Politics L/R", `low`="low", `medium`="medium", `high`="high")
facet_labeller <- function(variable,value){return(facet_names[value])}
x_limits <- c(.50, NA)
facet_grid(~fct_relevel(df,'Nature Relatedness','Spirituality','Religiosity','Politics L/R'))
plot_single_result_with_facets(df)
ggsave("figures/q5_faceted.png", width = 20, height = 10, units = "cm")
Here's the plot as it currently stands:
And a bit of data to make it reproducible:
df <- structure(list(Q51_bin = structure(c(2L, 2L, 2L, 2L), levels = c("low", "medium", "high"), class = "factor"), Q52_bin = structure(c(3L, 2L, 2L, 2L), levels = c("low", "medium", "high"), class = "factor"), Q57_bin = structure(c(2L, 2L, 2L, 2L), levels = c("low", "medium", "high"), class = "factor"), Q53_bin = structure(c(2L, 3L, 2L, 2L), levels = c("low", "medium", "high"), class = "factor"), Q4 = structure(c(2, 3, 3, 5), label = "How much have you thought about climate change before today?", format.spss = "F40.0", display_width = 5L, labels = c(`Not at all` = 1, `A little` = 2, Some = 3, `A lot` = 4, `A great deal` = 5, `Don't know` = 99), class = c("haven_labelled", "vctrs_vctr", "double"))), class = c("rowwise_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA, -4L), groups = structure(list(.rows = structure(list(1L, 2L, 3L, 4L), ptype = integer(0), class = c("vctrs_list_of", "vctrs_vctr", "list"))), row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame")))
You need to put vjust = 0.5 inside position_fill:
x %>%
pivot_longer(!response, names_to = "bin_name", values_to = "b") %>%
count(response, bin_name, b) %>%
filter(!is.na(response)) %>%
group_by(bin_name,b) %>%
mutate(perc=paste0(round(n*100/sum(n),0),"%")) %>%
ggplot(aes(x = n, y = "", fill = response, label = perc)) +
geom_col(position=position_fill(), aes(fill=forcats::fct_rev(response))) +
coord_cartesian(clip = "off") +
geom_vline(xintercept = x_limits, linetype = 3) +
geom_label_repel(
aes(group = forcats::fct_rev(response), label = perc),
hjust = 0.5,
vjust = 0.5,
direction = "y",
force = 1.5,
fill = "white",
size = 1.5,
max.overlaps = Inf,
position = position_fill(vjust = 0.5),
segment.size = 0.2,
point.size = NA,
box.padding = 0.001
) +
scale_fill_brewer(palette="YlOrBr", direction = -1) +
scale_x_continuous(labels = scales::percent_format(), expand = c(0.05, 0.05)) +
facet_grid(vars(b), vars(bin_name), labeller=as_labeller(facet_names)) +
labs(title = 'title', caption = caption, x = "", y = "") +
guides(fill = guide_legend(title = NULL)) +
theme_classic()
dput (Data) is, as follows:
structure(list(Year = c(1986, 1987, 1988, 1989, 1990, 1991, 1992,
1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001), RwandaGDP = c(266296395453522,
266232388162044, 278209717380819, 278108075482628, 271435453185924,
264610535380715, 280150385073342, 257433853555685, 128078318071279,
173019272512077, 195267342948145, 222311386633263, 242005217615319,
252537014428159, 273676681432581, 296896832706772), ChadGDP = c(221078469390513,
215510570376333, 248876690715831, 261033657789193, 250126438514823,
271475073131674, 293196997307684, 247136226809204, 272188148422562,
275553889112468, 282165595568286, 297579071872462, 318265518859647,
316009224207253, 313311638596115, 349837931311225), RwandaLifeExpectancy = c(50.233,
47.409, 43.361, 38.439, 33.413, 29.248, 26.691, 26.172, 27.738,
31.037, 35.38, 39.838, 43.686, 46.639, 48.649, 49.936), ChadLifeExpectancy = c(46.397,
46.601, 46.772, 46.91, 47.019, 47.108, 47.187, 47.265, 47.345,
47.426, 47.498, 47.559, 47.61, 47.657, 47.713, 47.789)), row.names = c(NA,
-16L), spec = structure(list(cols = list(Year = structure(list(), class = c("collector_double",
"collector")), RwandaGDP = structure(list(), class = c("collector_double",
"collector")), ChadGDP = structure(list(), class = c("collector_double",
"collector")), RwandaLifeExpectancy = structure(list(), class = c("collector_double",
"collector")), ChadLifeExpectancy = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), delim = ";"), class = "col_spec"), problems = <pointer: 0x000001f0ef568410>, class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"))
I come from performing a Difference in Differences regression in R, with the following code:
GDP <- as.numeric(Data$RwandaGDP, Data$ChadGDP)
MyDataTime <- ifelse(Data$Year >= "1994", 1, 0)
MyDataTreated <- Data$RwandaLifeExpectancy
MyDataDiD <- MyDataTime * MyDataTreated
DidReg = lm(GDP ~ MyDataTime + MyDataTreated + MyDataDiD, data = Data)
summary(DidReg)
Now, there is only one thing left to do, which is to plot the results.
I am looking for something akin to what can be seen in point 3.4 (line plot) on this website:
https://rpubs.com/phle/r_tutorial_difference_in_differences
However, when I try to adapt my code to the one that is facilitated on the aforementioned website, I keep getting the error "Discrete value supplied to continuous scale".
I've been stuck with this issue for hours, and I really don't know what I am doing wrong in my code.
Any help would be enormously appreciated!
Many thanks in advance!
EDIT
My adapted code is, as follows:
Data %>%
mutate(label = if_else(Year == "1994", as.character(GDP), NA_character_)) %>%
ggplot(aes(x=Data$Year,y=Data$RwandaGDP, group=GDP)) +
geom_line(aes(color=GDP), size=1.2) +
geom_vline(xintercept = "Rwandan Genocide", linetype="dotted",
color = "black", size=1.1) +
scale_color_brewer(palette = "Accent") +
scale_y_continuous(limits = c(17,24)) +
ggrepel::geom_label_repel(aes(label = label),
nudge_x = 0.5, nudge_y = -0.5,
na.rm = TRUE) +
guides(scale="none") +
labs(x="", y="GDP") +
annotate(
"text",
x = "1994",
y = "",
label = "{Difference-in-Differences}",
angle = 90,
size = 3
)
You can use more than one geom but will need to approach your labels differently. If you have more than one country you are looking to display, consider putting your data in a long format with tidyr::pivot_longer.
Data %>%
ggplot(aes(x=Year)) +
geom_line(aes(y=ChadGDP, color=GDP), size=1.2) +
geom_line(aes(y=RwandaGDP, color=GDP), size=1.2)+
scale_color_continuous(type = "gradient") +
geom_vline(xintercept = 1994, linetype="dotted",
color = "black", size=1.1) +
ggrepel::geom_label_repel(data = Data[Data$Year == 1994,], aes(label = RwandaGDP, y = RwandaGDP),
nudge_x = 0.5, nudge_y = -0.5,
na.rm = TRUE) +
ggrepel::geom_label_repel(data = Data[Data$Year == 1994,], aes(label = ChadGDP, y = ChadGDP),
nudge_x = 0.5, nudge_y = -0.5,
na.rm = TRUE) +
guides(scale="none") +
labs(x="", y="GDP") +
annotate(
"text",
x = 1994,
y = median(GDP),
label = "{Difference-in-Differences}",
angle = 90,
size = 3
)
Am not sure why you want to use a gradient color scale when you already have the GDP represented on the y-axis. Consider something like this. This approach also sets you up to graph your other variables and multiple countries.
Rwanda <- Data %>%
select(Year, LifeExpectancy = RwandaLifeExpectancy, GDP = RwandaGDP) %>%
mutate(Country = "Rwanda")
Chad <- Data %>%
select(Year, LifeExpectancy = ChadLifeExpectancy, GDP = ChadGDP) %>%
mutate(Country = "Chad")
CountryData <- rbind(Rwanda, Chad) %>%
mutate(`GDP(Trillions)` = round(GDP/1000000000000,2))
CountryData %>%
ggplot(aes(x=Year)) +
geom_line(aes(y=`GDP(Trillions)`, group = Country, color = Country), size=1.2) +
geom_vline(xintercept = 1994, linetype="dotted",
color = "black", size=1.1) +
ggrepel::geom_label_repel(data = CountryData[CountryData$Year == 1994,], aes(label = `GDP(Trillions)`, y = `GDP(Trillions)`),
nudge_x = 0.5, nudge_y = -0.5,
na.rm = TRUE) +
guides(scale="none") +
labs(x="", y="GDP (USD Trillions)") +
annotate(
"text",
x = 1994,
y = median(CountryData$`GDP(Trillions)`),
label = "{Difference-in-Differences}",
angle = 90,
size = 3
)
Try changing your scale_colour_continuous(palette = "Accent") to scale_colour_continuous(type = "gradient")
I also removed your scale_y_continous. Unsure rationale behind this code.
added pivot_longer
data %>%
pivot_longer(cols = c("RwandaGDP","ChadGDP"), names_to = "country", values_to = "value") %>%
mutate(Year = as.numeric(Year),
label = if_else(Year == "1994", as.character(value), NA_character_)) %>%
ggplot(aes(x=Year,y=value,col=country)) +
geom_line(size=1.2) +
# scale_color_continuous(type = "gradient") +
geom_vline(xintercept = 1994, linetype="dotted",
color = "black", size=1.1) +
# scale_color_discrete(palette = "Accent") +
# scale_y_continuous(limits = c(17,24)) +
ggrepel::geom_label_repel(aes(label = label),
nudge_x = 0.5, nudge_y = -0.5,
na.rm = TRUE) +
guides(scale="none") +
labs(x="", y="GDP") +
annotate(
geom = "text",
x = 1994,
y = median(GDP),
label = "Difference-in-Differences",
angle = 90,
size = 3
)
I am tryin to add p-values of 3 factors T-test on ggplot. But it keeps warning me this kind of error calls 'Error in FUN(X[[i]], ...) : object 'estimate' not found'. My coding is following:
Fisrt, I did a t-test
mixt.test_others<-studymix_fit %>%
group_by(reciprocity,stimu)%>%
t_test(Study_rate ~ estimate)%>%
adjust_pvalue()%>%
add_significance()
mixt.test_others1 <- mixt.test_others %>%
add_xy_position(x = "estimate", fun = "mean_sd", dodge = 0.8)
Then I add it to ggplot
ggplot(studymix_fit,aes(x=reciprocity, y=Study_rate,color = estimate,
fill = estimate))+
facet_grid(. ~ stimu)+
geom_violin(alpha = 0.1, adjust = 1.5)+
geom_boxplot(width = 0.1,alpha = 0.2,
position = position_dodge(width = 0.9))+
geom_dotplot(binaxis = "y", stackdir = 'center',
dotsize = 0.5, alpha = 0.5,
position = position_dodge(width = 0.9) )+
stat_summary(fun.data="mean_sdl", fun.args = list(mult=1),
geom="pointrange",
color = "red", alpha = 1,width = 0.15,
position = position_dodge(width = 0.9))+
stat_compare_means(method = 'anova', label.y = 1.4)+
add_pvalue(mixt.test_others1,
label = "p = {p.adj.signif}",
tip.length = 0.01,
step.increase = 0.05,
y.position = 1.05)+
theme_classic()+
scale_fill_brewer(type = 'div', palette = 'Accent', direction = 1)+
scale_color_brewer(type = 'div', palette = 'Accent', direction = 1)+
labs(x="Reciprocity",y="Select Rate of Positive Reciprocity")
Here are part of my dataset:
structure(list(ID = c(102, 102, 102, 102, 103),
condition = c("A", "C", "B", "D", "A"),
Study_rate = c(1, 0, 1, 0, 0.666666667),
reciprocity = c("PS", "NS", "PS", "NS", "PS"),
estimate = c("PO", "PO", "NO", "NO", "PO"),
stimu = c("subject", "subject", "subject", "subject", "subject"))
Actually, I checked my original data set, it seems no problem.
I wanna know if I did the 3 factor t-test well?
Is there any other problem I made?
Please help me figure it out, thanks very much
If anyone has similar problems, we can disccus and figure them out~
Thanks
I have a plot that looks at 2 quarts worth of data. I also included a target value (dashed line) and a YTD section (which is the cumsum(count).
I am having an issue trying to show the # in that section added for YTD but only for 1 of the quarters (since Q1 should already have a value inside the bar plot). Currently it is showing 0 and 2 in the plot below but I only want to show everything > Q1 values.
Current plot
I have tried with this current approach but does not seem to work:
**geom_text(aes(label = ifelse((quarter_2022= "Q1"), total_attainment, ifelse(quarter_2022="Q2",total_attainment+2)),
position = position_stack(vjust = 1))) +**
Plot Code
ggplot(df1, aes(x=quarter_2022, y=total_attainment)) +
geom_col(aes(y = YTD_TOTAL), fill = c("green1", "green2"), color = "black") +
geom_text(aes(y = YTD_TOTAL, label = scales::percent(YTD_PERCENT_ATTAINMENT)),
vjust = -0.5) +
geom_col(fill = "gray70", color = "gray20") +
geom_text(aes(label = YTD_TOTAL - total_attainment),
position = position_stack(vjust = 1.25))+
geom_text(aes(label = total_attainment),
position = position_stack(vjust = 0.5))+
geom_segment(aes(x = as.numeric(as.factor(quarter_2022)) - 0.4,
xend = as.numeric(as.factor(quarter_2022)) + 0.4,
y = attainment_target, yend = attainment_target),
linetype = "dashed") +
geom_text(aes(label = attainment_target),
position = position_stack(vjust = 4))
Here is the data:
structure(list(attainment_target = c(7.5, 15), quarter_2022 = c("Q1",
"Q2"), year = structure(c(1640995200, 1640995200), class = c("POSIXct",
"POSIXt"), tzone = ""), total_attainment = c(2, 4), percent_attainment_by_quarter = c(0.2666,
0.2666), ytd = c(2, 6), YTD_TOTAL = c(2, 6), YTD_PERCENT_ATTAINMENT = c(0.266666666666667,
0.4)), row.names = c(NA, -2L), class = c("tbl_df", "tbl", "data.frame"
))
Create a logical column in your dataset that indicates whether the label is 0. In the geom_text that creates the label, set the color aesthetic to the logical column. Use scale_color_manual(values = c(NA, "black"), na.value = NA) to assign no color to the labels that were 0s.
I have a stacked bar plot, with highly unequal heights of bars. I would like to show the percentages on top of each bar.
What I have done so far is the following
df = structure(list(Type = c("Bronchoscopy", "Bronchoscopy", "Endoscopy",
"Endoscopy"), Bacteremia = structure(c(1L, 2L, 1L, 2L), .Label = c("False",
"True"), class = "factor"), count = c(2710L, 64L, 13065L, 103L
), perc = c(97.6928622927181, 2.3071377072819, 99.2178007290401,
0.782199270959903)), class = c("grouped_df", "tbl_df", "tbl",
"data.frame"), row.names = c(NA, -4L), groups = structure(list(
Type = c("Bronchoscopy", "Endoscopy"), .rows = list(1:2,
3:4)), row.names = c(NA, -2L), class = c("tbl_df", "tbl",
"data.frame"), .drop = TRUE))
ggplot(df, aes(x = Type, y = perc, fill = Bacteremia)) +
geom_bar(stat = "identity") +
ylab("percent") +
geom_text(aes(label = paste0(round(perc, 2), "%")), position =
position_stack(vjust = -0.1), color = "black", fontface = "bold")
I can't seem to get the vjust right. It seems like it's not behaving in the same way for the bottom versus the top bar.
What I would like to achieve is to place the percentages slightly higher than the top edge of each bar.
Any ideas?
Here's a possible approach:
ggplot(df, aes(x = Type, y = perc, fill = Bacteremia)) +
geom_bar(stat = "identity") +
ylab("percent") +
geom_text(aes(label = paste0("", round(perc, 2), "%\n"), y = perc),
color = "black", fontface = "bold", nudge_y = 2)
I should elaborate that ggplot2 is going to try to place the geom_text() relative to the data. If you are trying to align horizontally the text labels, you will need to either use annotate() or supply a labelling dataset with type, percent and Bacteremia and call that in geom_text() as below.
labdf <- cbind(df, ypos = c(103, 5, 103, 5))
ggplot(df, aes(x = Type, y = perc, fill = Bacteremia)) +
geom_bar(stat = "identity") +
ylab("percent") +
geom_text(data = labdf,
aes(label = paste0("", round(perc, 2), "%"), y = ypos, x = Type),
color = "black", fontface = "bold")
Here's one way to do it:
df <-
tibble(
Type = c("Bronchoscopy", "Bronchoscopy", "Endoscopy", "Endoscopy"),
Bacteremia = c("False", "True", "False", "True"),
count = c(2710L, 64L, 13065L, 103L)
) %>%
group_by(Type) %>%
mutate(Percent = round((count / sum(count) * 100), 1))
df %>%
ggplot(aes(x = Type, y = Percent, fill = Bacteremia)) +
geom_col() +
geom_label(
data = . %>% filter(Bacteremia == "True"),
aes(y = Percent + 5, label = str_c(Percent, "%")),
show.legend = FALSE
) +
geom_label(
data = . %>% filter(Bacteremia == "False"),
aes(y = 105, label = str_c(Percent, "%")),
show.legend = FALSE
)
The choices of 5 and 105 work on my computer, but may need to be tweaked a bit based on your specific settings and aspect ratio. The first geom_label call sets the y-axis based on the precise percentage, while the second one sets it at a constant level above the bars.
You might also want to play around with using geom_text vs. geom_label to experiment with different color and label settings. The nice thing about geom_label is that it will make it very clear which group is being labeled.