ggplot boxplots with 2 y axes - r

I have been looking everywhere to find out how to ggplot boxplots with 2 y axes.
This is what I want the plot to look like:
boxplot
Example data:
Sample Tumor Score_1 Score_2
1 A 100 -20
2 B 80 -10
3 C 5 -5
4 C 6 -7
5 C 80 -8
6 C 70 -30
7 C 80 -5
8 C 90 -6
9 A 150 -8
10 B 1 -10
11 B 2 -10
12 B 4 -9
13 B 5 -7
14 B 8 -6
15 B 10 -4
16 B 12 -8
17 B 7 -10
18 B 6 -11
19 C 70 -15
20 C 90 -4
21 C 95 -3
22 C 120 -6
23 C 130 -9
24 C 50 -5
25 C 113 -10
26 C 100 -2
27 C 90 -1
28 C 50 -11
29 C 80 -15
30 A 200 -7
31 A 200 -4
32 A 180 -3
33 A 160 -9
34 A 107 -15
35 A 115 -11
36 A 80 -12
37 A 90 -14
38 A 130 -13
39 A 140 -9
40 A 120 -10
myboxplot <- read.csv("Example.csv")
#Set up labels
ylim.prim <- c(0, 500)
ylim.sec <- c(-35, 0)
b <- diff(ylim.prim)/diff(ylim.sec)
a <- b*(ylim.prim[1] - ylim.sec[1])
myboxplot %>%
pivot_longer(cols = c(Score_1, Score_2)) %>%
mutate(name = factor(name, levels = c("Score_1", "Score_2"))) %>%
ggplot(aes(x = Tumor)) +
geom_boxplot(aes(y = value, fill = name)) +
scale_y_continuous(name ="Score 1", sec.axis = sec_axis(~ ((. - a)/b), name = expression("Score 2"))) +
scale_x_discrete(name = "Tumor") +
theme_bw() +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank())+
theme(plot.title = element_text(size = 14, face = "bold"),
text = element_text(size = 12),
#axis.title = element_text(face="bold"),
axis.text.x=element_text(size = 11),
legend.position = "right") +
scale_fill_manual(values = wes_palette("GrandBudapest2"))
I do get the plot in the image (linked above), the problem is my second set of data (the purple boxplots "Score 2") is not being aligned with the second y axis, it is aligning with the first y axis. Since the data is much smaller with a range of -35 to 0, you can't see the difference between the tumor types. Does anyone have any ideas how to change this?
Thank you in advance!

I think the plot you are requesting might be misleading. Instead, how about a facet?
library(tidyverse)
data %>%
pivot_longer(-c("Sample","Tumor"), names_to = "Score") %>%
ggplot(aes( x= Tumor, y = value, fill = Score)) +
geom_boxplot() +
facet_wrap(.~Score, scales = "free")
Or as #NickCox suggests:
data %>%
pivot_longer(-c("Sample","Tumor"), names_to = "Score") %>%
group_by(Score,Tumor) %>%
arrange(value) %>%
mutate(xcoord = seq(-0.25,0.25,length.out = n()),
Tumor = factor(Tumor)) %>%
ggplot(aes( x= Tumor, y = value, fill = Score)) +
geom_boxplot(outlier.shape = NA, coef = 0) +
geom_point(aes(x = xcoord + as.integer(Tumor))) +
facet_wrap(.~Score, scales = "free")

[This was posted when the question was on Cross Validated]
Box plots I find oversold whenever, as usually, there is scope to show more detail. Here is one of several possibilities, a quantile-box plot in Parzen's sense in which for each group a standard box showing median and quartiles is superimposed on a quantile plot, in which the implicit horizontal axis is rank order. The detail that apart from some small integers many values are just multiples of 10 is of interest and should help a little in interpretation.
This plot doesn't use R. People who use R should find doing something similar or better to be trivial -- and those whose favourite software is different should be able to say the same. If not, you need new favourite software.

Related

Adding text in one of the four facets [duplicate]

This question already has an answer here:
Annotation on only the first facet of ggplot in R?
(1 answer)
Closed last month.
I want to add a few texts in one facet out of four facets in my ggplot.
I am using annotate function to add a text but it generates the text at a given location (x,y) in every facet. Because the data variables have different ranges of y in each facet, the texts are not coming at a desired location (x,y).
Please let me know what should be done. Thanks.
library(dplyr)
library(tidyr)
library(ggplot2)
df%>%
select(Date, Ca, Na, K, Mg)%>%
gather(var,value,-Date)%>%
ggplot(aes(as.Date(Date), value))+
geom_point()+
theme_bw()+
facet_wrap(~var,scales = 'free_y',ncol = 1)+
ylab(" (ppm) (ppm)
(ppm) (ppm)")+
facet_wrap(~var,scales = 'free_y',ncol = 1, strip.position = "right")+
geom_vline(aes(xintercept = as.Date("2021-04-28")), col = "red")+
geom_vline(aes(xintercept = as.Date("2021-04-28")), col = "red")+
geom_vline(aes(xintercept = as.Date("2021-04-29")), col = "red")+
theme(axis.title = element_text(face="bold"))+
theme(axis.text = element_text(face="bold"))+
xlab('Date')+
theme(axis.title.x = element_text(margin = margin(t = 10)))+
theme(axis.title.y = element_text(margin = margin(r = 10)))+
annotate("text", label = "E1", x = as.Date("2021-04-28"), y = 2.8)
This is the code I am using for the desired output. I want to name all the xintercept lines which is E1, E2, E3 (from left to right) on the top of xaxis i.e. above the first facet of variable Ca in the data. Any suggestions?
Here is a part of my data:
df <- read.table(text = "
Date Ca K Mg Na
2/18/2021 1 25 21 19
2/22/2021 2 26 22 20
2/26/2021 3 27 23 21
3/4/2021 4 28 5 22
3/6/2021 5 29 6 8
3/10/2021 6 30 7 9
3/13/2021 7 31 8 10
3/17/2021 8 32 9 11
3/20/2021 9 33 10 12
3/23/2021 10 34 11 13
3/27/2021 11 35 12 14
3/31/2021 12 36 13 15
4/3/2021 13 37 14 16
4/7/2021 14 38 15 17
4/10/2021 15 39 16 18
4/13/2021 16 40 17 19
4/16/2021 17 41 18 20
4/19/2021 8 42 19 21
4/22/2021 9 43 20 22
4/26/2021 0 44 21 23
4/28/2021 1 45 22 24
4/28/2021 2 46 23 25
4/28/2021 3 47 24 26
4/28/2021 5 48 25 27
4/29/2021 6 49 26 28
5/4/2021 7 50 27 29
5/7/2021 8 51 28 30
5/8/2021 9 1 29 31
5/10/2021 1 2 30 32
5/29/2021 3 17 43 45
5/31/2021 6 18 44 46
6/1/2021 4 19 45 47
6/2/2021 8 20 46 48
6/3/2021 2 21 47 49
6/7/2021 3 22 48 50
6/10/2021 5 23 49 51
6/14/2021 3 5 50 1
6/18/2021 1 6 51 2
", header = TRUE)
Prepare the data before plotting, make a separate data for text annotation:
dfplot <- df %>%
select(Date, Ca, Na, K, Mg) %>%
#convert to date class before plotting
mutate(Date = as.Date(Date, "%m/%d/%Y")) %>%
#using pivot instead of gather. gather is superseded.
#gather(var, value, -Date)
pivot_longer(cols = 2:5, names_to = "grp", values_to = "ppm")
dftext <- data.frame(grp = "Ca", # we want text to show up only on "Ca" facet.
ppm = max(dfplot[ dfplot$grp == "Ca", "ppm" ]),
Date = as.Date(c("2021-04-27", "2021-04-28", "2021-04-29")),
label = c("E1", "E2", "E3"))
After cleaning up your code, we can use geom_text with dftext:
ggplot(dfplot, aes(Date, ppm)) +
geom_point() +
facet_wrap(~grp, scales = 'free_y',ncol = 1, strip.position = "right") +
geom_vline(xintercept = dftext$Date, col = "red") +
geom_text(aes(x = Date, y = ppm, label = label), data = dftext, nudge_y = -2)
Try using ggrepel library to avoid label overlap, replace geom_text with one of these:
#geom_text_repel(aes(x = Date, y = ppm, label = label), data = dftext)
#geom_label_repel(aes(x = Date, y = ppm, label = label), data = dftext)
After cleaning up the code and seeing the plot, I think this post is a duplicate of Annotation on only the first facet of ggplot in R? .

Remove link between time series and add minor date tick on x_axis in ggplot

I was trying to plot a time series composed of weekly averanges. Here is the plot that I have obtained:
[weekly averages A]
[1]: https://i.stack.imgur.com/XMGMs.png
As you can see the time serie do not cover all the years completely, so, when I have got no data ggplot links two subsequent years. I think I have to group the data in some ways, but I do not understand how. Here is the code:
df4 <- data.frame(df$Date, df$A)
colnames(df4)<- c("date","A")
df4$date <- as.Date(df4$date,"%Y/%m/%d")
df4$week_day <- as.numeric(format(df4$date, format='%w'))
df4$endofweek <- df4$date + (6 - df4$week_day)
week_aveA <- df4 %>%
group_by(endofweek) %>%
summarise_all(list(mean=mean), na.rm=TRUE) %>%
na.omit()
g1 = ggplot() +
geom_step(data=week_aveA, aes(group = 1, x = (endofweek), y = (A_mean)), colour="gray25") +
scale_y_continuous(expand = c(0, 0), limits = c(0, 2500)) +
scale_x_date(breaks="year", labels=date_format("%Y")) +
labs(y = expression(A~ ~index),
x = NULL) +
theme(axis.text.x = element_text(size=10),
axis.title = element_text(size=10))
Here an extraction (the former three years) of the dataset:
endofweek date_mean A_mean week_day_mean
1 20/03/2010 17/03/2010 939,2533437 3
2 27/03/2010 24/03/2010 867,3620121 3
3 03/04/2010 31/03/2010 1426,791222 3
4 10/04/2010 07/04/2010 358,5698314 3
5 17/04/2010 13/04/2010 301,1815352 2
6 24/04/2010 21/04/2010 273,4922895 3,333333333
7 01/05/2010 28/04/2010 128,5989633 3
8 08/05/2010 05/05/2010 447,8858881 3
9 15/05/2010 12/05/2010 387,9828891 3
10 22/05/2010 19/05/2010 138,0770986 3
11 29/05/2010 26/05/2010 370,2147933 3
12 05/06/2010 02/06/2010 139,0451791 3
13 12/06/2010 09/06/2010 217,1286356 3
14 19/06/2010 16/06/2010 72,36972411 3
15 26/06/2010 23/06/2010 282,2911902 3
16 03/07/2010 30/06/2010 324,3215936 3
17 10/07/2010 07/07/2010 210,568691 3
18 17/07/2010 14/07/2010 91,76930829 3
19 24/07/2010 21/07/2010 36,4211218 3,666666667
20 31/07/2010 28/07/2010 37,53981103 3
21 07/08/2010 04/08/2010 91,33282642 3
22 14/08/2010 11/08/2010 28,38587352 3
23 21/08/2010 18/08/2010 58,72836406 3
24 28/08/2010 24/08/2010 102,1050612 2,5
25 04/09/2010 02/09/2010 13,45357513 4,5
26 11/09/2010 08/09/2010 51,24017212 3
27 18/09/2010 15/09/2010 159,7395663 3
28 25/09/2010 21/09/2010 62,71136678 2
29 02/04/2011 31/03/2011 1484,661164 4
30 09/04/2011 06/04/2011 656,1827964 3
31 16/04/2011 13/04/2011 315,3097313 3
32 23/04/2011 20/04/2011 293,2904042 3
33 30/04/2011 26/04/2011 255,7517519 2,4
34 07/05/2011 04/05/2011 360,7035289 3
35 14/05/2011 11/05/2011 342,0902797 3
36 21/05/2011 18/05/2011 386,1380421 3
37 28/05/2011 24/05/2011 418,9624807 2,833333333
38 04/06/2011 01/06/2011 112,7568 3
39 11/06/2011 08/06/2011 85,17855619 3,2
40 18/06/2011 15/06/2011 351,8714638 3
41 25/06/2011 22/06/2011 139,7936898 3
42 02/07/2011 29/06/2011 68,57716191 3,6
43 09/07/2011 06/07/2011 62,31823822 3
44 16/07/2011 13/07/2011 80,7328917 3
45 23/07/2011 20/07/2011 114,9475331 3
46 30/07/2011 27/07/2011 90,13118758 3
47 06/08/2011 03/08/2011 43,29372258 3
48 13/08/2011 10/08/2011 49,39935204 3
49 20/08/2011 16/08/2011 133,746822 2
50 03/09/2011 31/08/2011 76,03928942 3
51 10/09/2011 05/09/2011 27,99834637 1
52 24/03/2012 23/03/2012 366,2625797 5,5
53 31/03/2012 28/03/2012 878,8535513 3
54 07/04/2012 04/04/2012 1029,909052 3
55 14/04/2012 11/04/2012 892,9163416 3
56 21/04/2012 18/04/2012 534,8278693 3
57 28/04/2012 25/04/2012 255,1177585 3
58 05/05/2012 02/05/2012 564,5280546 3
59 12/05/2012 09/05/2012 767,5018168 3
60 19/05/2012 16/05/2012 516,2680148 3
61 26/05/2012 23/05/2012 241,2113073 3
62 02/06/2012 30/05/2012 863,6123397 3
63 09/06/2012 06/06/2012 201,2019288 3
64 16/06/2012 13/06/2012 222,9955486 3
65 23/06/2012 20/06/2012 91,14166632 3
66 30/06/2012 27/06/2012 26,93145693 3
67 07/07/2012 04/07/2012 67,32183278 3
68 14/07/2012 11/07/2012 46,25297513 3
69 21/07/2012 18/07/2012 81,34359825 3,666666667
70 28/07/2012 25/07/2012 49,59130851 3
71 04/08/2012 01/08/2012 44,13438077 3
72 11/08/2012 08/08/2012 30,15773151 3
73 18/08/2012 15/08/2012 57,47256772 3
74 25/08/2012 22/08/2012 31,9109555 3
75 01/09/2012 29/08/2012 52,71058484 3
76 08/09/2012 04/09/2012 24,52495229 2
77 06/04/2013 01/04/2013 1344,388042 1,5
78 13/04/2013 10/04/2013 1304,838687 3
79 20/04/2013 17/04/2013 892,620141 3
80 27/04/2013 24/04/2013 400,1720434 3
81 04/05/2013 01/05/2013 424,8473083 3
82 11/05/2013 08/05/2013 269,2380208 3
83 18/05/2013 15/05/2013 238,9993749 3
84 25/05/2013 22/05/2013 128,4096151 3
85 01/06/2013 29/05/2013 158,5576121 3
86 08/06/2013 05/06/2013 175,2036942 3
87 15/06/2013 12/06/2013 79,20250839 3
88 22/06/2013 19/06/2013 126,9065428 3
89 29/06/2013 26/06/2013 133,7480108 3
90 06/07/2013 03/07/2013 218,0092943 3
91 13/07/2013 10/07/2013 54,08460936 3
92 20/07/2013 17/07/2013 91,54285041 3
93 27/07/2013 24/07/2013 44,64567928 3
94 03/08/2013 31/07/2013 229,5067999 3
95 10/08/2013 07/08/2013 49,70729373 3
96 17/08/2013 14/08/2013 53,38618335 3
97 24/08/2013 21/08/2013 217,2800997 3
98 31/08/2013 28/08/2013 49,43590136 3
99 07/09/2013 04/09/2013 64,88783029 3
100 14/09/2013 11/09/2013 11,04300773 3
So at the end I have one mainly question: how can I eliminated the connection between the years? ... and an aesthetic question: how can I add minor ticks on the x_axis? At least one every 6 months, just to make the plot easy to read.
Thanks in advance for any suggestion!
Edit
This is the code I tried with the suggestion, maybe I mistype some part of it.
library(tidyverse)
library(dplyr)
library(lubridate)
df4 <- data.frame(df$Date, df$A)
colnames(df4)<- c("date","A")
df4$date <- as.Date(df4$date,"%Y/%m/%d")
df4$week_day <- as.numeric(format(df4$date, format='%w'))
df4$endofweek <- df4$date + (6 - df4$week_day)
week_aveA <- df4 %>%
group_by(endofweek) %>%
summarise_all(list(mean=mean), na.rm=TRUE) %>%
na.omit()
week_aveA$endofweek <- as.Date(week_aveA$endofweek,"%d/%m/%Y")
week_aveA$A_mean <- as.numeric(gsub(",", ".", week_aveA$A_mean))
week_aveA$week_day_mean <- as.numeric(gsub(",", ".", week_aveA$week_day_mean))
week_aveA$year <- format(week_aveA$endofweek, "%Y")
library(ggplot2)
library(methods)
library(scales)
mylabel <- function(x) {
ifelse(grepl("-07-01$", x), "", format(x, "%Y"))
}
ggplot() +
geom_step(data=week_aveA, aes(x = endofweek, y = A_mean, group = year), colour="gray25") +
scale_y_continuous(expand = c(0, 0), limits = c(0, 2500)) +
scale_x_date(breaks="6 month", labels = mylabel) +
labs(y = expression(A~ ~index),
x = NULL) +
theme(axis.text.x = element_text(size=10),
axis.title = element_text(size=10))
You have to group by year:
Add a variable with the year to your dataset
Map the year variable on the group aesthetic
For the ticks. Increase the number of the breaks. If you want only ticks but not labels you can use a custom function to get rid of unwanted labels, e.g. my approach below set the breaks to "6 month" but replaces the mid-year labels with an empty string:
week_aveA$endofweek <- as.Date(week_aveA$endofweek,"%d/%m/%Y")
week_aveA$A_mean <- as.numeric(gsub(",", ".", week_aveA$A_mean))
week_aveA$week_day_mean <- as.numeric(gsub(",", ".", week_aveA$week_day_mean))
week_aveA$year <- format(week_aveA$endofweek, "%Y")
library(ggplot2)
mylabel <- function(x) {
ifelse(grepl("-07-01$", x), "", format(x, "%Y"))
}
ggplot() +
geom_step(data=week_aveA, aes(x = endofweek, y = A_mean, group = year), colour="gray25") +
scale_y_continuous(expand = c(0, 0), limits = c(0, 2500)) +
scale_x_date(breaks="6 month", labels = mylabel) +
labs(y = expression(A~ ~index),
x = NULL) +
theme(axis.text.x = element_text(size=10),
axis.title = element_text(size=10))

ggplot2: geom_bar(); how to alternate order of fill so bars are not lost inside a bar with a higher value?

I am trying to position two bars at the same position on the x-axis and seperated out by colour (almost as if stacking).
However, instead of stacking I want the bar simply inside the other bar - with the smallest Y-value being visable inside the bar with the highest Y-value.
I can get this to work to some extent - but the issue is that one Y-value is not consistently higher across one of the two factors. This leads to bars being 'lost' within a bar with a higher Y-value.
Here is a subset of my dataset and the current ggplot code:
condition hours expression freq_genes
1 tofde 9 up 27
2 tofde 12 up 92
3 tofde 15 up 628
17 tofde 9 down 0
18 tofde 12 down 1
19 tofde 15 down 0
33 tofp 9 up 2462
34 tofp 12 up 786
35 tofp 15 up 298
49 tofp 9 down 651
50 tofp 12 down 982
51 tofp 15 down 1034
65 tos 0 up 27
66 tos 3 up 123
67 tos 6 up 752
81 tos 0 down 1
82 tos 3 down 98
83 tos 6 down 594
sf_plot <- ggplot(data = gene_freq,
aes(x = hours,
y = freq_genes,
group = condition,
fill = factor(expression,
labels=c("Down",
"Up"))))
sf_plot <- sf_plot + labs(fill="Expression")
sf_plot <- sf_plot + geom_bar(stat = "identity",
width = 2.5,
position = "dodge")
sf_plot <- sf_plot + scale_fill_manual(values=c("#9ecae1",
"#3182bd"))
sf_plot <- sf_plot + xlab("Time (Hours)")
sf_plot <- sf_plot + scale_x_continuous(breaks =
seq(min(gene_freq$freq_genes),
max(gene_freq$freq_genes),
by = 3))
sf_plot <- sf_plot + ylab("Gene Frequency")
sf_plot <- sf_plot + facet_grid(. ~ condition, scales = "free")
sf_plot <- sf_plot + theme_bw()
sf_plot <- sf_plot + theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank())
sf_plot <- sf_plot + theme(axis.text.x = element_text(angle = 90))
# Print plot
sf_plot
You can add alpha = 0.5 to your geom_bar() statement to make the bars transparent. This will allow both bars to be seen. Adding that alpha statement and nothing else will produce what you're looking for, to make both overlaid bars visible. The colors, however, make seeing the two different bars challenging.
Another (and maybe better) option is to change the order in which the plot is created. If I recall correctly, ggplot will plot the bars in alphabetical or numeric or factor-level order. Here, your expression values are c("Down", "Up") and "Down" is being plotted first. If you force "Up" to be plotted first, you could resolve this, too.
library(dplyr)
library(ggplot2)
dat <-
read.table(text = "condition hours expression freq_genes
1 tofde 9 up 27
2 tofde 12 up 92
3 tofde 15 up 628
17 tofde 9 down 0
18 tofde 12 down 1
19 tofde 15 down 0
33 tofp 9 up 2462
34 tofp 12 up 786
35 tofp 15 up 298
49 tofp 9 down 651
50 tofp 12 down 982
51 tofp 15 down 1034
65 tos 0 up 27
66 tos 3 up 123
67 tos 6 up 752
81 tos 0 down 1
82 tos 3 down 98
83 tos 6 down 594") %>%
mutate(expression2 = ifelse(expression == "up", 1, 2))
dat %>%
ggplot(aes(x = hours, y = freq_genes, group = condition,
fill = factor(expression2, labels=c("Up", "Down")))) +
labs(fill="Expression") +
geom_bar(stat = "identity", position = "dodge", width = 2.5, alpha = 0.5) +
scale_fill_manual(values=c("#9ecae1", "#3182bd")) +
xlab("Time (Hours)") +
scale_x_continuous(breaks = seq(min(dat$freq_genes),
max(dat$freq_genes),
by = 3)) +
ylab("Gene Frequency") +
facet_grid(. ~ condition, scales = "free") +
theme_bw() +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
legend.position = "bottom",
axis.text.x = element_text(angle = 90))
Here, I've created a new column called expression2 that is just a numeric version of expression. I changed the fill variable in aes() to match with those new labels. I left the colors in scale_fill_manual() the same as in your original statement and kept the alpha value. "Down" is being plotted on top of "Up" but in keeping the same colors with the alpha value, both bars are easier to see. You can play with the legend to display "Down" before "Up" if that's necessary.
Note that providing machine readable data goes a long way in allowing others to help you out. Consider using dput() to output your data next time rather than pasting it in. Also note that you can "chain" together ggplot() statements with a +. This makes code much more compact and easier to read.

Add a percent to y axis labels [duplicate]

This question already has answers here:
How can I change the Y-axis figures into percentages in a barplot?
(4 answers)
Closed 4 years ago.
I'm sure I missed an obvious solution tot his problem but I can't figure out how to add a percent sign to the y axis labels.
Data Sample:
Provider Month Total_Count Total_Visits Procedures RX State
Roberts 2 19 19 0 0 IL
Allen 2 85 81 4 4 IL
Dawson 2 34 34 0 0 CA
Engle 2 104 100 4 4 CA
Goldbloom 2 7 6 1 1 NM
Nathan 2 221 192 29 20 NM
Castro 2 6 6 0 0 AK
Sherwin 2 24 24 0 0 AK
Brown 2 282 270 12 12 UT
Jackson 2 114 96 18 16 UT
Corwin 2 22 22 0 0 CO
Dorris 2 124 102 22 22 CO
Ferris 2 427 318 109 108 OH
Jeffries 2 319 237 82 67 OH
The following code gives graphs with inaccurate values because R seems to be multiplying by 100.
procs <- read.csv(paste0(dirdata, "Procedure percents Feb.csv"))
procs$Percentage <- round(procs$Procedures/procs$Total.Visits*100, 2)
procs$Percentage[is.na(procs$Percentage)] <- 0
procsplit <- split(procs, procs$State)
plots <- function(procs) {
ggplot(data = procs, aes(x= Provider, y= Percentage, fill= Percentage)) +
geom_bar(stat = "identity", position = "dodge") +
geom_text(aes(x = Provider, y = Percentage, label = sprintf("%.1f%%", Percentage)), position = position_dodge(width = 0.9), hjust = .5, vjust = 0 , angle = 0) +
theme(axis.text.x = element_text(angle = 45, vjust = .5)) +
ggtitle("Procedure Percentages- February 2018", procs$State) +
theme(plot.title = element_text(size = 22, hjust = .5, family = "serif")) +
theme(plot.subtitle = element_text(size = 18, hjust = .5, family = "serif")) +
scale_y_continuous(name = "Percentage", labels = percent)
}
lapply(procsplit, plots)
I'm not sure if there's a way to use sprintf to add it or if there's a way to paste it onto the labels.
adding + scale_y_continuous(labels = function(x) paste0(x, "%")) to the ggplot statement fixes this issue

R hist vs geom_hist break points

I am using both geom_hist and histogram in R with the same breakpoints but I get different graphs. I did a quick search, does anyone know what the definition breaks are and why they would be a difference
These produce two different plots.
set.seed(25)
data <- data.frame(Mos=rnorm(500, mean = 25, sd = 8))
data$Mos<-round(data$Mos)
pAge <- ggplot(data, aes(x=Mos))
pAge + geom_histogram(breaks=seq(0, 50, by = 2))
hist(data$Mos,breaks=seq(0, 50, by = 2))
Thanks
To get the same histogram in ggplot2 you specify the breaks inside scale_x_continuous and binwidth inside geom_histogram.
Additionally, hist and histograms in ggplot2 use different defaults to create the intervals:
hist: right-closed (left open) intervals. Default: right = TRUE
stat_bin (ggplot2): left-closed (right open) intervals. Default: right = FALSE
**hist** **ggplot2**
freq1 Freq freq2 Freq
1 (0,2] 0 [0,2) 0
2 (2,4] 2 [2,4) 2
3 (4,6] 2 [4,6) 1
4 (6,8] 1 [6,8) 2
5 (8,10] 6 [8,10) 2
6 (10,12] 9 [10,12) 7
7 (12,14] 24 [12,14) 17
8 (14,16] 27 [14,16) 26
9 (16,18] 39 [16,18) 31
10 (18,20] 48 [18,20) 46
11 (20,22] 52 [20,22) 43
12 (22,24] 38 [22,24) 57
13 (24,26] 44 [24,26) 36
14 (26,28] 46 [26,28) 52
15 (28,30] 39 [28,30) 39
16 (30,32] 31 [30,32) 33
17 (32,34] 30 [32,34) 26
18 (34,36] 24 [34,36) 29
19 (36,38] 18 [36,38) 27
20 (38,40] 9 [38,40) 12
21 (40,42] 5 [40,42) 6
22 (42,44] 4 [42,44) 0
23 (44,46] 1 [44,46) 5
24 (46,48] 1 [46,48) 0
25 (48,50] 0 [48,50) 1
I included the argument right = FALSE so the histogram intervalss are left-closed (right open) as they are in ggplot2. I added the labels in both plots, so it is easier to check the intervals are the same.
ggplot(data, aes(x = Mos))+
geom_histogram(binwidth = 2, colour = "black", fill = "white")+
scale_x_continuous(breaks = seq(0, 50, by = 2))+
stat_bin(binwidth = 2, aes(label=..count..), vjust=-0.5, geom = "text")
hist(data$Mos,breaks=seq(0, 50, by = 2), labels =TRUE, right =FALSE)
To check the frequencies in each bin:
freq <- cut(data$Mos, breaks = seq(0, 50, by = 2), dig.lab = 4, right = FALSE)
as.data.frame(table(frecuencias))

Resources