ggplot does not find the value that is present in the dataset - r

I have data as follows:
library(ggplot2)
library(dplyr)
library(tidyverse)
library(ggsignif)
graph <- structure(list(Constraint = structure(c(4L, 2L, 3L, 1L, 5L, 4L,
2L, 3L, 1L, 5L), .Label = c("Major Constraint", "Minor Constraint",
"Moderate Constraint", "No Constraint", "Total"), class = "factor"),
`Observation for Crime = 0` = c(3124, 2484, 3511, 4646, 13765,
3124, 2484, 3511, 4646, 13765), `Observation for Crime = 1` = c(762,
629, 1118, 1677, 4186, 762, 629, 1118, 1677, 4186), `Total Observations` = c(3886,
3113, 4629, 6323, 17951, 3886, 3113, 4629, 6323, 17951),
variable = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L), .Label = c("Crime = 0", "Crime = 1"), class = "factor"),
value = c(80.3911477097272, 79.79441053646, 75.847915316483,
73.4777795350308, 76.6809648487549, 19.6088522902728, 20.20558946354,
24.152084683517, 26.5222204649692, 23.3190351512451)), row.names = c(NA,
-10L), class = "data.frame")
Constraint Observation for Crime = 0 Observation for Crime = 1 Total Observations variable value
1 No Constraint 3124 762 3886 Crime = 0 80.39115
2 Minor Constraint 2484 629 3113 Crime = 0 79.79441
3 Moderate Constraint 3511 1118 4629 Crime = 0 75.84792
4 Major Constraint 4646 1677 6323 Crime = 0 73.47778
5 Total 13765 4186 17951 Crime = 0 76.68096
6 No Constraint 3124 762 3886 Crime = 1 19.60885
7 Minor Constraint 2484 629 3113 Crime = 1 20.20559
8 Moderate Constraint 3511 1118 4629 Crime = 1 24.15208
9 Major Constraint 4646 1677 6323 Crime = 1 26.52222
10 Total 13765 4186 17951 Crime = 1 23.31904
I am trying to create something like this:
graph %>%
mutate(`Constraint` = fct_relevel(`Constraint`, "No Constraint", "Minor Constraint", "Moderate Constraint", "Major Constraint")) %>%
ggplot(aes(x = `Constraint`, y = value, fill = variable, label=sprintf("%.02f %%", round(value, digits = 1)))) +
geom_col(position = 'dodge') +
geom_text(position = position_dodge(width = .9), # move to center of bars
vjust = -0.5, # nudge above top of bar
size = 4) +
scale_fill_grey(start = 0.8, end = 0.5) +
theme_bw(base_size = 15) +
geom_signif(stat="identity",
data=data.frame(x=c(0.875, 1.875), xend=c(1.125, 2.125),
y=c(5.8, 8.5), annotation=c("**", "NS")),
aes(x=x,xend=xend, y=y, yend=y, annotation=annotation)) +
geom_signif(comparisons=list(c("treatment", "control")), annotations="***",
y_position = 9.3, tip_length = 0, vjust=0.4)
Hoping for an appearance close to the following picture:
But it gives the error that the value is not found, while the value is in the data. Does anyone know what the problem could be?

Include fill and label in geom_col and geom_text -
library(tidyverse)
library(ggsignif)
graph %>%
mutate(`Constraint` = fct_relevel(`Constraint`, "No Constraint", "Minor Constraint", "Moderate Constraint", "Major Constraint")) %>%
ggplot(aes(x = `Constraint`, y = value)) +
geom_col(position = 'dodge', aes(fill = variable)) +
geom_text(position = position_dodge(width = .9), # move to center of bars
aes(label=sprintf("%.02f %%", round(value, digits = 1))),
vjust = -0.5, # nudge above top of bar
size = 4) +
scale_fill_grey(start = 0.8, end = 0.5) +
theme_bw(base_size = 15) +
geom_signif(stat="identity",
data=data.frame(x=c(0.875, 1.875), xend=c(1.125, 2.125),
y=c(5.8, 8.5), annotation=c("**", "NS")),
aes(x=x,xend=xend, y=y, yend=y, annotation=annotation))

Related

Adding line with specific df value to each bar in ggplot

My df is organized this way for example:
OCCURED_COUNTRY_DESC | a | b | c | d | flagged | type | MedDRA_PT| **E** |
__________________________________________________________________________
UNITED STATES |403|1243|473|4077| yes | disp | Seizure |144.208|
__________________________________________________________________________
My data:
structure(list(OCCURED_COUNTRY_DESC = c("AUSTRALIA", "AUSTRIA",
"BELGIUM", "BRAZIL", "CANADA"), a = c(4L, 7L, 20L, 5L, 11L),
b = c(31, 27, 100, 51, 125), c = c(872, 869, 856, 871, 865
), d = c(5289, 5293, 5220, 5269, 5195), w = c(876, 876, 876,
876, 876), x = c(5320, 5320, 5320, 5320, 5320), y = c(35L,
34L, 120L, 56L, 136L), z = c(6161, 6162, 6076, 6140, 6060
), N = c(6196, 6196, 6196, 6196, 6196), k = c("0.5", "0.5",
"0.5", "0.5", "0.5"), SOR = c(0.80199821407511, 1.52042360060514,
1.21312776329214, 0.615857869962066, 0.539569644832803),
log = c(-0.318329070860348, 0.604473324558599, 0.278731499148795,
-0.699330656240263, -0.890118907611227), LL99 = c(-0.695969674426877,
0.382102954188229, 0.198127619344382, -1.00534117464748,
-1.03425468471322), UL99 = c(-0.0544058884186467, 0.763880731966007,
0.337239065783058, -0.482651467660248, -0.785935460582379
), flagged = c("no", "no", "no", "no", "yes"), type = c(NA,
NA, NA, NA, "under"), MedDRA_PT = c("Seizure", "Seizure",
"Seizure", "Seizure", "Seizure"), E = c(5.11098506333901,
4.43283582089552, 16.3984674329502, 8.43063199848168, 20.8132820019249
)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"
))
I am using ggplot2 to create a bar chart using the following piece of code:
test2 %>% #using test2 as the df
ggplot(aes(a, OCCURED_COUNTRY_DESC, fill=type)) +
geom_bar(stat="identity")+
scale_fill_manual(values = c("disp" = "#FF8C00",
"under" = "#7EC0EE",
"NA"="#EEE9E9"))+
theme_classic()+
labs(title = "Seizure",
x = "Count",
y = "")
What I would like to do is to add a black line in each bar correspondent to the E value, from the dataframe, for that country. However I haven't been successful. Can someone kindly guide me on how to achieve this?
Thanks!
One option to achieve your desired result would be via a geom_segment, where you map your E column on both the x and the xend position. "Tricky" part are the y positions. However, as a categorical axis is still a numeric axis we could add a helper column to your data which contains the numeric positions of your categorical OCCURED_COUNTRY_DESC column. This helper column could then be mapped on the y and the yend aes needed by geom_segment where we also take into account the width of the bars:
library(ggplot2)
test2$OCCURED_COUNTRY_DESC_num <- as.numeric(factor(test2$OCCURED_COUNTRY_DESC))
width <- .9 # defautlt width of bars
ggplot(test2, aes(a, OCCURED_COUNTRY_DESC, fill = type)) +
geom_bar(stat = "identity") +
geom_segment(aes(x = E, xend = E,
y = OCCURED_COUNTRY_DESC_num - width / 2,
yend = OCCURED_COUNTRY_DESC_num + width / 2),
color = "black", size = 1) +
scale_fill_manual(values = c(
"disp" = "#FF8C00",
"under" = "#7EC0EE",
"NA" = "#EEE9E9"
)) +
theme_classic() +
labs(
title = "Seizure",
x = "Count",
y = ""
)

Adding p-values to ggplot; ggsignif says it can only handle data with groups that are plotted on the x-axis

I have data as follows, to which I am trying to add p-values:
library(ggplot2)
library(ggsignif)
library(dplyr)
data <- structure(list(treatment = c(0, 1, 0, 1, 0, 1, 0, 1, 0, 1), New_Compare_Truth = c(57,
61, 12, 14, 141, 87, 104, 90, 12, 14), total_Hy = c(135,
168, 9, 15, 103, 83, 238, 251, 9, 15), total = c(285, 305, 60,
70, 705, 435, 520, 450, 60, 70), ratio = c(47.3684210526316,
55.0819672131148, 15, 21.4285714285714, 14.6099290780142, 19.0804597701149,
45.7692307692308, 55.7777777777778, 15, 21.4285714285714), Type = structure(c(2L,
2L, 1L, 1L, 3L, 3L, 5L, 5L, 4L, 4L), .Label = c("A1. Others \nMore \nH",
"A2. Similar \nNorm", "A3. Others \nLess \nH", "B1. Others \nMore \nH",
"B2. Similar \nNorm or \nHigher"), class = "factor"), `Sample Selection` = c("Answers pr",
"Answers pu", "Answers pr", "Answers pu", "Answers pr",
"Answers pu", "Answers pr", "Answers pu", "Answers pr",
"Answers pu"), p_value = c(0.0610371842601616, 0.0610371842601616,
0.346302201593934, 0.346302201593934, 0.0472159407450147, 0.0472159407450147,
0.0018764377521242, 0.0018764377521242, 0.346302201593934, 0.346302201593934
), x = c(2, 2, 1, 1, 3, 3, 5.5, 5.5, 4.5, 4.5)), row.names = c(NA,
-10L), class = c("data.table", "data.frame"))
breaks_labels <- structure(list(Type = structure(c(2L, 1L, 3L, 5L, 4L), .Label = c("A1. Others \nMore \nH",
"A2. Similar \nNorm", "A3. Others \nLess \nH", "B1. Others \nMore \nH",
"B2. Similar \nNorm or \nHigher"), class = "factor"), x = c(2,
1, 3, 5.5, 4.5)), row.names = c(NA, -5L), class = c("data.table",
"data.frame"))
data %>%
ggplot(aes(x = x, y = ratio)) +
geom_col(aes(fill = `Sample Selection`), position = position_dodge(preserve = "single"), na.rm = TRUE) +
geom_text(position = position_dodge(width = .9), # move to center of bars
aes(label=sprintf("%.02f %%", round(ratio, digits = 1)), group = `Sample Selection`),
vjust = -1.5, # nudge above top of bar
size = 4,
na.rm = TRUE) +
# geom_text(position = position_dodge(width = .9), # move to center of bars
# aes(label= paste0("(", ifelse(variable == "Crime = 0", `Observation for Crime = 0`, `Observation for Crime = 1`), ")"), group = `Sample Selection`),
# vjust = -0.6, # nudge above top of bar
# size = 4,
# na.rm = TRUE) +
scale_fill_grey(start = 0.8, end = 0.5) +
scale_y_continuous(expand = expansion(mult = c(0, .1))) +
scale_x_continuous(breaks = breaks_labels$x, labels = breaks_labels$Type) +
theme_bw(base_size = 15) +
xlab("Norm group for corporate Hy") +
ylab("Percentage Compliant Decisions") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
geom_signif(annotation=c("p=0.35", "p=0.06", "p=0.05", "p=0.34", "p=0.00"), y_position = c(30, 40, 55 ,75, 90), xmin=c(0.75,1.75,2.75,3.75,4.75),
xmax=c(1.25,2.25,3.25,4.25,5.25))
For some reason, the last line causes the following error:
Error in f(...) :
Can only handle data with groups that are plotted on the x-axis
Since I am just putting in text and not referring to any variable, I don't really understand why this happens. Can anyone help me out? Without the last line it looks like this:
EDIT: Please note that I would like to keep the space between the third and the fourth column (which is apparently also what caused the problem, see Jared's answer).
Edit
Thanks for clarifying your expected outcome. Here is one way to include geom_signif() annotations without altering the original plot:
library(tidyverse)
library(ggsignif)
data <- structure(list(treatment = c(0, 1, 0, 1, 0, 1, 0, 1, 0, 1), New_Compare_Truth = c(57,
61, 12, 14, 141, 87, 104, 90, 12, 14), total_Hy = c(135,
168, 9, 15, 103, 83, 238, 251, 9, 15), total = c(285, 305, 60,
70, 705, 435, 520, 450, 60, 70), ratio = c(47.3684210526316,
55.0819672131148, 15, 21.4285714285714, 14.6099290780142, 19.0804597701149,
45.7692307692308, 55.7777777777778, 15, 21.4285714285714), Type = structure(c(2L,
2L, 1L, 1L, 3L, 3L, 5L, 5L, 4L, 4L), .Label = c("A1. Others \nMore \nH",
"A2. Similar \nNorm", "A3. Others \nLess \nH", "B1. Others \nMore \nH",
"B2. Similar \nNorm or \nHigher"), class = "factor"), `Sample Selection` = c("Answers pr",
"Answers pu", "Answers pr", "Answers pu", "Answers pr",
"Answers pu", "Answers pr", "Answers pu", "Answers pr",
"Answers pu"), p_value = c(0.0610371842601616, 0.0610371842601616,
0.346302201593934, 0.346302201593934, 0.0472159407450147, 0.0472159407450147,
0.0018764377521242, 0.0018764377521242, 0.346302201593934, 0.346302201593934
), x = c(2, 2, 1, 1, 3, 3, 5.5, 5.5, 4.5, 4.5)), row.names = c(NA,
-10L), class = c("data.table", "data.frame"))
breaks_labels <- structure(list(Type = structure(c(2L, 1L, 3L, 5L, 4L), .Label = c("A1. Others \nMore \nH",
"A2. Similar \nNorm", "A3. Others \nLess \nH", "B1. Others \nMore \nH",
"B2. Similar \nNorm or \nHigher"), class = "factor"), x = c(2,
1, 3, 5.5, 4.5)), row.names = c(NA, -5L), class = c("data.table",
"data.frame"))
annotation_df <- data.frame(signif = c("p=0.35", "p=0.06", "p=0.05", "p=0.34", "p=0.00"),
y_position = c(30, 40, 55 ,75, 90),
xmin = c(0.75,1.75,2.75,4.25,5.25),
xmax = c(1.25,2.25,3.25,4.75,5.75),
group = c(1,2,3,4,5))
data %>%
ggplot(aes(x = x, y = ratio, group = `Sample Selection`)) +
geom_col(aes(fill = `Sample Selection`),
position = position_dodge(preserve = "single"), na.rm = TRUE) +
geom_text(position = position_dodge(width = .9), # move to center of bars
aes(label=sprintf("%.02f %%", round(ratio, digits = 1))),
vjust = -1.5, # nudge above top of bar
size = 4,
na.rm = TRUE) +
scale_fill_grey(start = 0.8, end = 0.5) +
scale_y_continuous(expand = expansion(mult = c(0, .1))) +
scale_x_continuous(breaks = breaks_labels$x, labels = breaks_labels$Type) +
theme_bw(base_size = 15) +
xlab("Norm group for corporate Hy") +
ylab("Percentage Compliant Decisions") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
geom_signif(aes(xmin = xmin,
xmax = xmax,
y_position = y_position,
annotations = signif,
group = group),
data = annotation_df, manual = TRUE)
#> Warning: Ignoring unknown aesthetics: xmin, xmax, y_position, annotations
Created on 2021-07-20 by the reprex package (v2.0.0)
Previous answer
One potential solution to your problem is to plot "Type" on the x axis instead of "x", e.g.
data %>%
ggplot(aes(x = Type, y = ratio)) +
geom_col(aes(fill = `Sample Selection`),
position = position_dodge(preserve = "single"), na.rm = TRUE) +
geom_text(position = position_dodge(width = .9), # move to center of bars
aes(label=sprintf("%.02f %%", round(ratio, digits = 1)),
group = `Sample Selection`),
vjust = -1.5,
size = 4,
na.rm = TRUE) +
scale_fill_grey(start = 0.8, end = 0.5) +
scale_y_continuous(expand = expansion(mult = c(0, .1))) +
theme_bw(base_size = 15) +
xlab("Norm group for corporate Hy") +
ylab("Percentage Compliant Decisions") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
geom_signif(annotation=c("p=0.35", "p=0.06", "p=0.05", "p=0.34", "p=0.00"),
y_position = c(30, 40, 55 ,75, 90),
xmin=c(0.75,1.75,2.75,3.75,4.75),
xmax=c(1.25,2.25,3.25,4.25,5.25))

How do I label the sum the total of y-axis column values from consecutive bar values like in the example “Confirmed” Cases per x-axis “Date”

I have been working on this for some time, and am re-posting this hoping to simplify the definition of the problem and to bring some clarity from feedback of my previous attempt. I am able to label each individual column value, but not able to put the code together necessary to sum the total. The examples I have looked at never work the way I try to put them together, for example with goup_by, or summarize etc.. I would like to only sum the values of "Confirmed Cases", and not show the other column values as with many c("x", "Y", ... "data"), it becomes impossible to read.
Here is the data frame:
dput(COVID1[1:12, ])
structure(list(COUNTY = c("Antrim", "Antrim", "Antrim", "Charlevoix",
"Charlevoix", "Grand Traverse", "Grand Traverse", "Grand Traverse",
"Antrim", "Grand Traverse", "Grand Traverse", "Grand Traverse"
), Date = structure(c(18453, 18456, 18457, 18453, 18455, 18453,
18456, 18457, 18455, 18453, 18456, 18457), class = "Date"), CASE_STATUS = c("Confirmed",
"Confirmed", "Confirmed", "Confirmed", "Confirmed", "Confirmed",
"Confirmed", "Confirmed", "Probable", "Probable", "Probable",
"Probable"), Cases = c(1L, 1L, 2L, 1L, 3L, 2L, 2L, 1L, 1L, 1L,
1L, 1L)), row.names = c(NA, 12L), class = "data.frame")
Code:
ggplot(filter(COVID1, COUNTY %in% c("Antrim", "Charlevoix", "Grand Traverse"), Cases > 0)) +
geom_col(aes(x = Date, y = Cases, fill = CASE_STATUS), position = position_stack(reverse = TRUE), width = .88)+
geom_text(aes(x = Date, y = Cases, label = (Cases)), position = position_stack(reverse = TRUE), vjust = 1.5, size = 3, color = "white") +
scale_fill_manual(values = c('blue',"tomato"))+
scale_x_date(labels = date_format("%m/%d"), limits = as.Date(c('2020-07-09','today()')), breaks = "1 week")+
theme(axis.text.x = element_text(angle=0))+
labs(title = "Antrim - Grand Traverse - Charlevoix")
I'm not sure if I understood the question but I think you want to add the sum of the confirmed cases as labels. There might be a ggplot way of doing it but I think the most straightforward way is to make another dataset with your labels and feed it in.
date_labels <- filter(COVID1, COUNTY %in% c("Antrim", "Charlevoix", "Grand Traverse"), Cases > 0) %>% group_by(Date) %>% summarise(confirmed_cases = sum(Cases[CASE_STATUS == "Confirmed"]))
ggplot(filter(COVID1, COUNTY %in% c("Antrim", "Charlevoix", "Grand Traverse"), Cases > 0)) +
geom_col(aes(x = Date, y = Cases, fill = CASE_STATUS), position = position_stack(reverse = TRUE), width = .88)+
geom_text(data = date_labels, aes(x = Date, y = 1, label = confirmed_cases), position = position_stack(reverse = TRUE), vjust = 1.5, size = 3, color = "white") +
scale_fill_manual(values = c('blue',"tomato"))+
scale_x_date(labels = label_date("%m/%d"), limits = as.Date(c('2020-07-09','today()')), breaks = "1 week")+
theme(axis.text.x = element_text(angle=0))+
labs(title = "Antrim - Grand Traverse - Charlevoix")
Gives me this result:

Add character variable (weekdays) to plot

I want to add on this plot the weekday as text on top of the bars.
The only function to add text in ggplot I found, is "annotate", which does not work the way I want.
It should look like this:
Plot with weekdays
geom_text gives me this
Geom_text
My code:
ggplot(data = filter(T2G2_dayav, site %in% c("S17S", "S17N"), !is.na(distance)),
mapping = aes(as.factor(x = date_days))) +
geom_col(mapping = aes(y = T2pn_av, fill = as.factor(distance)),
position = position_dodge(width = 0.9)) +
theme_bw() + ylab("Particle Number (#/cm³), day-av") + xlab("Date") +
scale_y_continuous(limits = c(0, 30000)) +
scale_fill_discrete(name = "T2, Distance from road (m)") +
scale_color_grey(name = "Reference intrument G2") +
ggtitle("Day-averaged Particle Number (PN) per distance")
the head of my data:
distance date_days site T2pn_av T2pn_avambient T2wdir_med weekday Date G2pn_av G2pn_min G2pn_max G2ws_av G2ws_min G2ws_max G2wdir_med
<int> <dttm> <chr> <dbl> <dbl> <dbl> <chr> <dttm> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 -10 2017-07-18 S17N 28814.83 16917.831 110 Di 2017-07-18 13655.29 4621 105100 0.6781284 0 3.6 51.0
2 -10 2017-07-19 S17N 24210.95 15565.951 100 Mi 2017-07-19 10627.73 2908 67250 1.3673618 0 5.5 70.0
3 -10 2017-07-24 S17N 16143.44 7907.442 80 Mo 2017-07-24 11686.54 3582 55080 0.8178753 0 4.8 95.5
4 -10 2017-07-29 S17N 11762.56 5574.563 270 Sa 2017-07-29 12180.73 5413 45490 1.0304985 0 5.7 265.0
5 -10 2017-07-30 S17N 12138.22 6360.225 290 So 2017-07-30 10404.75 6113 23860 1.2385791 0 6.6 274.0
6 -10 2017-07-31 S17N 13815.32 9008.320 270 Mo 2017-07-31 11849.89 4595 46270 0.8554044 0 4.4 230.0
dput(head(T2G2_dayav))
structure(list(distance = c(-10L, -10L, -10L, -10L, -10L, -10L
), date_days = structure(c(1500328800, 1500415200, 1500847200,
1501279200, 1501365600, 1501452000), class = c("POSIXct", "POSIXt"
), tzone = "Europe/Berlin"), site = c("S17N", "S17N", "S17N",
"S17N", "S17N", "S17N"), T2pn_av = c(28814.8306772908, 24210.9512670565,
16143.442364532, 11762.5630630631, 12138.2247114732, 13815.3198380567
), T2pn_avambient = c(16917.8306772908, 15565.9512670565, 7907.44236453202,
5574.56306306306, 6360.22471147318, 9008.31983805668), T2wdir_med = c(110,
100, 80, 270, 290, 270), weekday = c("Di", "Mi", "Mo", "Sa",
"So", "Mo"), Date = structure(c(1500328800, 1500415200, 1500847200,
1501279200, 1501365600, 1501452000), class = c("POSIXct", "POSIXt"
), tzone = "Europe/Berlin"), G2pn_av = c(13655.2885517401, 10627.7329973352,
11686.5429216867, 12180.7308516181, 10404.7472642001, 11849.8893070109
), G2pn_min = c(4621, 2908, 3582, 5413, 6113, 4595), G2pn_max = c(105100,
67250, 55080, 45490, 23860, 46270), G2ws_av = c(0.678128438241936,
1.36736183524505, 0.817875347544022, 1.0304984658137, 1.23857912107,
0.855404388351763), G2ws_min = c(0, 0, 0, 0, 0, 0), G2ws_max = c(3.6,
5.5, 4.8, 5.7, 6.6, 4.4), G2wdir_med = c(51, 70, 95.5, 265, 274,
230)), .Names = c("distance", "date_days", "site", "T2pn_av",
"T2pn_avambient", "T2wdir_med", "weekday", "Date", "G2pn_av",
"G2pn_min", "G2pn_max", "G2ws_av", "G2ws_min", "G2ws_max", "G2wdir_med"
), row.names = c(NA, -6L), class = c("grouped_df", "tbl_df",
"tbl", "data.frame"), vars = c("distance", "date_days"), drop = TRUE, indices = list(
0L, 1L, 2L, 3L, 4L, 5L), group_sizes = c(1L, 1L, 1L, 1L,
1L, 1L), biggest_group_size = 1L, labels = structure(list(distance = c(-10L,
-10L, -10L, -10L, -10L, -10L), date_days = structure(c(1500328800,
1500415200, 1500847200, 1501279200, 1501365600, 1501452000), class = c("POSIXct",
"POSIXt"), tzone = "Europe/Berlin")), row.names = c(NA, -6L), class = "data.frame", vars = c("distance",
"date_days"), drop = TRUE, .Names = c("distance", "date_days"
)))
The idea is that you add a text on top of every bar (that's why vjust = 0, but you could also do vjust = -.5 to allow more space or vjust = 1.5 to put it in the bars, which is nice as well). The rest within the geom_text ist basically the same as in geom_col. But in general, you could put commonly used aesthetics in the first occurency within ggplot(aes(...)), as you already did with the x-value.
ggplot(data = filter(T2G2_dayav, site %in% c("S17S", "S17N"), !is.na(distance)),
mapping = aes(as.factor(x = date_days))) +
geom_col(mapping = aes(y = T2pn_av, fill = as.factor(distance)),
position = position_dodge(width = 0.9)) +
geom_text(aes(label = weekday, y = T2pn_av), vjust = -.5, # add these
position = position_dodge(width = 0.9)) + # lines
theme_bw() + ylab("Particle Number (#/cm³), day-av") + xlab("Date") +
scale_y_continuous(limits = c(0, 30000)) +
scale_fill_discrete(name = "T2, Distance from road (m)") +
scale_color_grey(name = "Reference intrument G2") +
ggtitle("Day-averaged Particle Number (PN) per distance")
The following should solve your problem with too many labels. It takes the highest label and places it in the center of the bars of that x-value. Find a plot below with additional rows added to your data:
T2G2_dayav <- rbind(T2G2_dayav %>% ungroup(), T2G2_dayav %>% ungroup() %>% mutate(distance = 5)) # add more observations for testing
T2G2_dayav <- T2G2_dayav %>% mutate(T2pn_av = ifelse(distance == 5, T2pn_av/2, T2pn_av)) # label only the highest bar
The following should work with your data:
ggplot(data = filter(T2G2_dayav, site %in% c("S17S", "S17N"), !is.na(distance)) %>%
group_by(date_days) %>% # group by days
mutate(weekday2 = ifelse(T2pn_av == max(T2pn_av), weekday, NA)), # within each day (group), only label the highest
mapping = aes(as.factor(x = date_days))) +
geom_col(mapping = aes(y = T2pn_av, fill = as.factor(distance)),
position = position_dodge(width = 0.9)) +
geom_text(aes(label = weekday2, y = T2pn_av), vjust = -.5, # add these
position = position_dodge(with = 0.9)) + # lines
theme_bw() + ylab("Particle Number (#/cm³), day-av") + xlab("Date") +
scale_y_continuous(limits = c(0, 30000)) +
scale_fill_discrete(name = "T2, Distance from road (m)") +
scale_color_grey(name = "Reference intrument G2") +
ggtitle("Day-averaged Particle Number (PN) per distance")

Why ggplot2 geom_hlines plots more than intended?

Here is a sample of the dataframe I am working with.
> head(tbl[,c('logFC', 'CI_L', 'CI_R', "adj_P_Value","gene",'Group1','Group2', 'Study_ID')])
logFC CI_L CI_R adj_P_Value gene Group1 Group2 Study_ID
1 -0.09017596 -0.43955752 0.25920561 1 CD244 Male Female GSE2461
2 0.08704844 -0.26134341 0.43544028 1 CD244 ulcerative colitis irritable bowel syndrome GSE2461
3 -0.03501474 -0.12677636 0.05674688 1 CD244 nonlesional skin lesional skin GSE27887
4 0.01096914 -0.08064105 0.10257932 1 CD244 pretreatment posttreatment GSE27887
5 -0.03707265 -0.12407201 0.04992672 1 CD244 Infliximab Before treatment GSE42296
6 0.07644834 -0.02849309 0.18138977 1 CD244 Responder Nonresponder GSE42296
> dput(droplevels(head(tbl, 4)))
structure(list(Probe_gene = c("211828_s_at", "213107_at", "213109_at",
"211828_s_at"), logFC = c(0.299038590078202, 0.110797898105632,
0.183214738942169, -0.733505457149486), CI_L = c(-0.0332844208935414,
-0.246475718463096, -0.103358698007331, -1.06488707237429), CI_R = c(0.631361601049945,
0.46807151467436, 0.469788175891669, -0.402123841924678), AveExpr = c(7.38827278419383,
7.83576862202959, 6.68411901305011, 7.38827278419383), t = c(2.08930195860002,
0.720053829585981, 1.48442706763586, -5.13936340603241), P_Value = c(0.0714526369900392,
0.492771856681782, 0.177447421180599, 0.000998740960213292),
adj_P_Value = c(1, 1, 1, 1), B = c(-4.07430683864883, -5.56181503167371,
-4.83144498851773, -0.294306065125513), gene = c("TNIK",
"TNIK", "TNIK", "TNIK"), Study_ID = c("GSE2461", "GSE2461",
"GSE2461", "GSE2461"), Group1 = c("Male", "Male", "Male",
"ulcerative colitis"), Group2 = c("Female", "Female", "Female",
"irritable bowel syndrome"), Study_ID = c("GSE2461", "GSE2461",
"GSE2461", "GSE2461"), Disease = c("irritable bowel syndrome; ulcerative colitis",
"irritable bowel syndrome; ulcerative colitis", "irritable bowel syndrome; ulcerative colitis",
"irritable bowel syndrome; ulcerative colitis"), DOID = c(9778L,
9778L, 9778L, 9778L), Title = c("Control (IBS) & Ulcerative colitis (UC) subjects",
"Control (IBS) & Ulcerative colitis (UC) subjects", "Control (IBS) & Ulcerative colitis (UC) subjects",
"Control (IBS) & Ulcerative colitis (UC) subjects"), GEO_Platform_ID = c("GPL96",
"GPL96", "GPL96", "GPL96"), Platform = c("Affymetrix Human U133A Array",
"Affymetrix Human U133A Array", "Affymetrix Human U133A Array",
"Affymetrix Human U133A Array"), PMID = c(0L, 0L, 0L, 0L),
Organism = c("Homo sapiens", "Homo sapiens", "Homo sapiens",
"Homo sapiens"), Data_Type = c("RNA", "RNA", "RNA", "RNA"
), Biomaterial = c("Colonic Mucosal biopsy", "Colonic Mucosal biopsy",
"Colonic Mucosal biopsy", "Colonic Mucosal biopsy"), Study_Type = c("in vivo",
"in vivo", "in vivo", "in vivo"), Samples = c(8L, 8L, 8L,
8L), Time_Point = c("Baseline", "Baseline", "Baseline", "Baseline"
), Treatment = c("NA", "NA", "NA", "NA"), Treatment_Protocol = c("NA",
"NA", "NA", "NA"), Raw_Data = c(0L, 0L, 0L, 0L), Notes = c("controls are IBS, not healty",
"controls are IBS, not healty", "controls are IBS, not healty",
"controls are IBS, not healty"), ylab = c("Female → Male",
"Female → Male", "Female → Male", "irritable bowel syndrome → ulcerative colitis"
)), .Names = c("Probe_gene", "logFC", "CI_L", "CI_R", "AveExpr",
"t", "P_Value", "adj_P_Value", "B", "gene", "Study_ID", "Group1",
"Group2", "Study_ID", "Disease", "DOID", "Title", "GEO_Platform_ID",
"Platform", "PMID", "Organism", "Data_Type", "Biomaterial", "Study_Type",
"Samples", "Time_Point", "Treatment", "Treatment_Protocol", "Raw_Data",
"Notes", "ylab"), row.names = c(NA, 4L), class = "data.frame")
I am using this to construct a plot that has the GSE # (Study_ID), followed by the contrast (Group1 vs Group2) on the y-axis, and logFC as the x-axis. I want to plot a horizontal line between each of the different GSE #'s for visual clarity, but my code doesn't seem to be working.
datasetList = tbl$Study_ID
hLines =(which(duplicated(datasetList) == FALSE) - 0.5)
tbl$ylab <- paste(tbl$Group2," \U2192 ", tbl$Group1, sep = "")
p <- ggplot(data = tbl, aes(x = logFC, y = Probe_gene, group = Study_ID)) +
geom_point() +
geom_vline(xintercept = log(0.5,2), size = 0.2) +
geom_vline(xintercept = log(2/3,2), size = 0.2) +
geom_vline(xintercept = log(1.5,2), size = 0.2) +
geom_vline(xintercept = log(2,2), size = 0.2) +
geom_hline(yintercept = hLines) +
labs(title = tbl$gene, y = "Contrasts", x = bquote(~Log[2]~'(Fold Change)')) +
geom_errorbarh(aes(x = logFC, xmin = CI_L, xmax = CI_R), height = .1) +
geom_point(aes(colour = cut(adj_P_Value, c(-Inf, 0.01, 0.05, Inf)))) +
scale_color_manual(name = "P Value",
values = c("(-Inf,0.01]" = "red",
"(0.01,0.05)" = "orange",
"(0.05, Inf]" = "black"),
labels = c("<= 0.01", "0.01 < P Value <= 0.05", "> 0.05")) +
#theme_bw()+
theme(axis.text.y = element_blank(), strip.text.y = element_text(angle = 180),
panel.spacing.y = unit(0,'lines'), axis.ticks.y = element_blank()) +
facet_grid(Study_ID+ylab~ ., scales = 'free', space = 'free', switch = 'both')
p
For some reason with the code I have now, ggplot prints many more horizontal lines than I need. It is printing a line in between each GSE #, when I only need it to print a line in between the unique GSE #'s. What I am doing wrong? hLines contains the y-intercepts of where the lines should go.
P.S. As a bit of a side question, if anyone knows of a way for me to specify the shapes that appears (similar to how I specify the colors), that would be very appreciated. In reference to the colors, I need red circles, orange squares, and black crosses for the same conditions that appear in the scale_color_manual() function.

Resources