I'll use the diamond data set in ggplot to illustrate my point , I want to draw a histogram for price , but I want to show the count for each bin for each cut
this is my code
ggplot(aes(x = price ) , data = diamonds_df) +
geom_histogram(aes(fill = cut , binwidth = 1500)) +
stat_bin(binwidth= 1500, geom="text", aes(label=..count..) ,
vjust = -1) +
scale_x_continuous(breaks = seq(0 , max(stores_1_5$Weekly_Sales) , 1500 )
, labels = comma)
here is my current plot
but as you see the number shows the count for all cuts at each bin , I want to display the count for each cut on each bin .
also a bonus point if if I would be able to configure Y axis instead of displaying numbers at step of 5000 to something else I can configure manually
Update for ggplot2 2.x
You can now center labels within stacked bars without pre-summarizing the data using position=position_stack(vjust=0.5). For example:
ggplot(aes(x = price ) , data = diamonds) +
geom_histogram(aes(fill=cut), binwidth=1500, colour="grey20", lwd=0.2) +
stat_bin(binwidth=1500, geom="text", colour="white", size=3.5,
aes(label=..count.., group=cut), position=position_stack(vjust=0.5)) +
scale_x_continuous(breaks=seq(0,max(diamonds$price), 1500))
Original Answer
You can get the counts for each value of cut by adding cut as a group aesthetic to stat_bin. I also moved binwidth outside of aes, which was causing binwidth to be ignored in your original code:
ggplot(aes(x = price ), data = diamonds) +
geom_histogram(aes(fill = cut ), binwidth=1500, colour="grey20", lwd=0.2) +
stat_bin(binwidth=1500, geom="text", colour="white", size=3.5,
aes(label=..count.., group=cut, y=0.8*(..count..))) +
scale_x_continuous(breaks=seq(0,max(diamonds$price), 1500))
One issue with the code above is that I'd like the labels to be vertically centered within each bar section, but I'm not sure how to do that within stat_bin, or if it's even possible. Multiplying by 0.8 (or whatever) moves each label by a different relative amount. So, to get the labels centered, I created a separate data frame for the labels in the code below:
# Create text labels
dat = diamonds %>%
group_by(cut,
price=cut(price, seq(0,max(diamonds$price)+1500,1500),
labels=seq(0,max(diamonds$price),1500), right=FALSE)) %>%
summarise(count=n()) %>%
group_by(price) %>%
mutate(ypos = cumsum(count) - 0.5*count) %>%
ungroup() %>%
mutate(price = as.numeric(as.character(price)) + 750)
ggplot(aes(x = price ) , data = diamonds) +
geom_histogram(aes(fill = cut ), binwidth=1500, colour="grey20", lwd=0.2) +
geom_text(data=dat, aes(label=count, y=ypos), colour="white", size=3.5)
To configure the breaks on the y axis, just add scale_y_continuous(breaks=seq(0,20000,2000)) or whatever breaks you'd like.
Now with GGPLOT 2.2.0 position_stack options makes it easier
library(ggplot2)
s <- ggplot(mpg, aes(manufacturer, fill = class))
s + geom_bar(position = "stack") +
theme(axis.text.x = element_text(angle=90, vjust=1)) +
geom_text(stat='count', aes(label=..count..), position = position_stack(vjust = 0.5),size=4)
Related
I have a plot of depth of fish individuals over time. The background represents the temperature, the grey dots is the raw depth data, and the black line is the geom_smooth line of raw data (image of plot is attached here). I used ggplot to make the graphs, but my x-axis (= date/time) is slightly moved to the right. I need the axis to be adjusted in the middle (standardized). This is my very long code for the plot:
tibble(y=c(-7:0)) %>%
expand_grid(TBRtemperature %>% select(`Date and Time (UTC)`, Temperature)) %>%
rename(dt="Date and Time (UTC)") %>%
filter(yday(dt)>136&yday(dt)<147) %>%
mutate(dt=with_tz(dt, "Europe/Oslo")) %>%
ggplot(aes(dt, y, fill=Temperature)) +
geom_tile() +
scale_fill_gradientn(colours = c("lightblue", "white", "red")) +
scale_x_datetime(expand = c(0,0)) +
scale_y_continuous(expand = c(0,0)) +
geom_point(data=fbd_TBR %>% filter(yday(dt)>136&yday(dt)<147, n()>100), aes(dt, -Data/10, group=paste0(ID, Trial)), colour="grey50", alpha=0.2) +
geom_smooth(data=fbd_TBR %>% filter(yday(dt)>136&yday(dt)<147, n()>100), aes(dt, -Data/10, group=paste0(ID, Trial), colour=paste0(ID, sep=" - ", Weight)), colour="black") +
labs(y="Depth (m)", x=("Time (days)"), title = "Trial 2") +
facet_wrap(~paste(ID, sep = " - ", Weight)) +
theme_classic() +
theme(plot.title = element_text(face="bold"), strip.text = element_text(face = "bold"))
Anyone who knows how the axis can be adjusted?
I think specifying the range you want by adding this argument into the ggplot section should solve it
+coord_cartesian(xlim = c(1, 10),ylim = c(10,40))
I am trying to figure out how to add counts on top of the histogram bins as I also use cuts().
Age is one of the variables in my data set (continuous, range 23~99), and I need to produce a histogram with 8 bins, each indicating a combined age group (<30, 30-39, 40-49, 50-59, 60-69, 70-79 ,"80-89","90-99").
I was able to figure out a code for everything except for adding the count on top of each bin.
The code I normally use for adding counts is something like this:
geom_text(stat= "count", aes(label=..count..), vjust=-1, size=3)
+ ylim(c(0,300))
However, I don't think stat="count" works in this case.
The code I show below works fine except for the last two lines (my attempt to add the counts).
Thanks to everyone for their help!!
output <- cut(df$age, breaks = seq(20,100, by= 10), labels = c("
<30","30-39","40-49", "50-59","60-69","70-79","80-89","90-99"))
table(output) %>%
as.data.frame() %>%
ggplot(aes(x = output, y = Freq, fill=output)) +
geom_col() +
scale_fill_manual(values=c("firebrick1", "chocolate1",
"yellow1", "springgreen3", "steelblue1",
"navyblue", "darkorchid1", "darkmagenta"),
name="Age group",
labels=c("<30","30-39","40-49", "50-59",
"60-69","70-79","80-89","90-99")) +
theme(legend.title = element_blank()) +
theme(legend.position = "none") +
labs(title="Histogram for Age") +
labs(x="Age Group", y="Frequency") +
theme(plot.title = element_text(hjust = 0.5)) +
geom_text(stat= "count", aes(label=..count..), vjust=-1,
size=3) +
ylim(c(0,300))
Error: stat_count() must not be used with a y aesthetic.
You don't need to calculate the frequencies with stat = "count", you already have them in your data, and in fact you're already using them in aes(x = output, y = Freq, fill=output). So you can do:
geom_text(aes(label=Freq), vjust=-1, size=3)
I am trying to create a donut chart using ggplot2 with the following data (example).
library(ggplot2)
library(svglite)
library(scales)
# dataframe
Sex = c('Male', 'Female')
Number = c(125, 375)
df = data.frame(Sex, Number)
df
The code I used to generate donut chart is
ggplot(aes(x= Sex, y = Number, fill = Sex), data = df) +
geom_bar(stat = "identity") +
coord_polar("y") +
theme_void() +
theme (legend.position="top") + # legend position
geom_text(aes(label = percent(Number/sum(Number))), position = position_stack(vjust = 0.75), size = 3) +
ggtitle("Participants by Sex")
The above code generated the following chart. Some how not convinced with the chart.
For our purposes, the following chart would better communicate the message. How do I create a chart like this. Where am I doing wrong in my code? I have googled with out any success.
Thanks in advance for help.
They aren't in the same 'circle' because they have different x values. Imagine it as a normal plot first (i.e. without coord_polar("y")) and this will become clear. What you really want is them set at the same x value and then stacked. Here I set x to 2 because it then makes a nicely sized "donut".
donut <- ggplot(df, aes(x = 2, y = Number, fill = Sex)) +
geom_col(position = "stack", width = 1) +
geom_text(aes(label = percent(Number/sum(Number))), position = position_stack(vjust = 0.75), size = 3) +
xlim(0.5, 2.5) +
ggtitle("Participants by Sex")
donut
donut +
coord_polar("y") +
theme_void() +
theme(legend.position="top")
Using ggplot2 1.0.0, I followed the instructions in below post to figure out how to plot percentage bar plots across factors:
Sum percentages for each facet - respect "fill"
test <- data.frame(
test1 = sample(letters[1:2], 100, replace = TRUE),
test2 = sample(letters[3:8], 100, replace = TRUE)
)
library(ggplot2)
library(scales)
ggplot(test, aes(x= test2, group = test1)) +
geom_bar(aes(y = ..density.., fill = factor(..x..))) +
facet_grid(~test1) +
scale_y_continuous(labels=percent)
However, I cannot seem to get a label for either the total count or the percentage above each of the bar plots when using geom_text.
What is the correct addition to the above code that also preserves the percentage y-axis?
Staying within ggplot, you might try
ggplot(test, aes(x= test2, group=test1)) +
geom_bar(aes(y = ..density.., fill = factor(..x..))) +
geom_text(aes( label = format(100*..density.., digits=2, drop0trailing=TRUE),
y= ..density.. ), stat= "bin", vjust = -.5) +
facet_grid(~test1) +
scale_y_continuous(labels=percent)
For counts, change ..density.. to ..count.. in geom_bar and geom_text
UPDATE for ggplot 2.x
ggplot2 2.0 made many changes to ggplot including one that broke the original version of this code when it changed the default stat function used by geom_bar ggplot 2.0.0. Instead of calling stat_bin, as before, to bin the data, it now calls stat_count to count observations at each location. stat_count returns prop as the proportion of the counts at that location rather than density.
The code below has been modified to work with this new release of ggplot2. I've included two versions, both of which show the height of the bars as a percentage of counts. The first displays the proportion of the count above the bar as a percent while the second shows the count above the bar. I've also added labels for the y axis and legend.
library(ggplot2)
library(scales)
#
# Displays bar heights as percents with percentages above bars
#
ggplot(test, aes(x= test2, group=test1)) +
geom_bar(aes(y = ..prop.., fill = factor(..x..)), stat="count") +
geom_text(aes( label = scales::percent(..prop..),
y= ..prop.. ), stat= "count", vjust = -.5) +
labs(y = "Percent", fill="test2") +
facet_grid(~test1) +
scale_y_continuous(labels=percent)
#
# Displays bar heights as percents with counts above bars
#
ggplot(test, aes(x= test2, group=test1)) +
geom_bar(aes(y = ..prop.., fill = factor(..x..)), stat="count") +
geom_text(aes(label = ..count.., y= ..prop..), stat= "count", vjust = -.5) +
labs(y = "Percent", fill="test2") +
facet_grid(~test1) +
scale_y_continuous(labels=percent)
The plot from the first version is shown below.
This is easier to do if you pre-summarize your data. For example:
library(ggplot2)
library(scales)
library(dplyr)
set.seed(25)
test <- data.frame(
test1 = sample(letters[1:2], 100, replace = TRUE),
test2 = sample(letters[3:8], 100, replace = TRUE)
)
# Summarize to get counts and percentages
test.pct = test %>% group_by(test1, test2) %>%
summarise(count=n()) %>%
mutate(pct=count/sum(count))
ggplot(test.pct, aes(x=test2, y=pct, colour=test2, fill=test2)) +
geom_bar(stat="identity") +
facet_grid(. ~ test1) +
scale_y_continuous(labels=percent, limits=c(0,0.27)) +
geom_text(data=test.pct, aes(label=paste0(round(pct*100,1),"%"),
y=pct+0.012), size=4)
(FYI, you can put the labels inside the bar as well, for example, by changing the last line of code to this: y=pct*0.5), size=4, colour="white"))
I've used all of your code and came up with this. First assign your ggplot to a variable i.e. p <- ggplot(...) + geom_bar(...) etc. Then you could do this. You don't need to summarize much since ggplot has a build function that gives you all of this already. I'll leave it to you for the formatting and such. Good luck.
dat <- ggplot_build(p)$data %>% ldply() %>% select(group,density) %>%
do(data.frame(xval = rep(1:6, times = 2),test1 = mapvalues(.$group, from = c(1,2), to = c("a","b")), density = .$density))
p + geom_text(data=dat, aes(x = xval, y = (density + .02), label = percent(density)), colour="black", size = 3)
Using ggplot2 1.0.0, I followed the instructions in below post to figure out how to plot percentage bar plots across factors:
Sum percentages for each facet - respect "fill"
test <- data.frame(
test1 = sample(letters[1:2], 100, replace = TRUE),
test2 = sample(letters[3:8], 100, replace = TRUE)
)
library(ggplot2)
library(scales)
ggplot(test, aes(x= test2, group = test1)) +
geom_bar(aes(y = ..density.., fill = factor(..x..))) +
facet_grid(~test1) +
scale_y_continuous(labels=percent)
However, I cannot seem to get a label for either the total count or the percentage above each of the bar plots when using geom_text.
What is the correct addition to the above code that also preserves the percentage y-axis?
Staying within ggplot, you might try
ggplot(test, aes(x= test2, group=test1)) +
geom_bar(aes(y = ..density.., fill = factor(..x..))) +
geom_text(aes( label = format(100*..density.., digits=2, drop0trailing=TRUE),
y= ..density.. ), stat= "bin", vjust = -.5) +
facet_grid(~test1) +
scale_y_continuous(labels=percent)
For counts, change ..density.. to ..count.. in geom_bar and geom_text
UPDATE for ggplot 2.x
ggplot2 2.0 made many changes to ggplot including one that broke the original version of this code when it changed the default stat function used by geom_bar ggplot 2.0.0. Instead of calling stat_bin, as before, to bin the data, it now calls stat_count to count observations at each location. stat_count returns prop as the proportion of the counts at that location rather than density.
The code below has been modified to work with this new release of ggplot2. I've included two versions, both of which show the height of the bars as a percentage of counts. The first displays the proportion of the count above the bar as a percent while the second shows the count above the bar. I've also added labels for the y axis and legend.
library(ggplot2)
library(scales)
#
# Displays bar heights as percents with percentages above bars
#
ggplot(test, aes(x= test2, group=test1)) +
geom_bar(aes(y = ..prop.., fill = factor(..x..)), stat="count") +
geom_text(aes( label = scales::percent(..prop..),
y= ..prop.. ), stat= "count", vjust = -.5) +
labs(y = "Percent", fill="test2") +
facet_grid(~test1) +
scale_y_continuous(labels=percent)
#
# Displays bar heights as percents with counts above bars
#
ggplot(test, aes(x= test2, group=test1)) +
geom_bar(aes(y = ..prop.., fill = factor(..x..)), stat="count") +
geom_text(aes(label = ..count.., y= ..prop..), stat= "count", vjust = -.5) +
labs(y = "Percent", fill="test2") +
facet_grid(~test1) +
scale_y_continuous(labels=percent)
The plot from the first version is shown below.
This is easier to do if you pre-summarize your data. For example:
library(ggplot2)
library(scales)
library(dplyr)
set.seed(25)
test <- data.frame(
test1 = sample(letters[1:2], 100, replace = TRUE),
test2 = sample(letters[3:8], 100, replace = TRUE)
)
# Summarize to get counts and percentages
test.pct = test %>% group_by(test1, test2) %>%
summarise(count=n()) %>%
mutate(pct=count/sum(count))
ggplot(test.pct, aes(x=test2, y=pct, colour=test2, fill=test2)) +
geom_bar(stat="identity") +
facet_grid(. ~ test1) +
scale_y_continuous(labels=percent, limits=c(0,0.27)) +
geom_text(data=test.pct, aes(label=paste0(round(pct*100,1),"%"),
y=pct+0.012), size=4)
(FYI, you can put the labels inside the bar as well, for example, by changing the last line of code to this: y=pct*0.5), size=4, colour="white"))
I've used all of your code and came up with this. First assign your ggplot to a variable i.e. p <- ggplot(...) + geom_bar(...) etc. Then you could do this. You don't need to summarize much since ggplot has a build function that gives you all of this already. I'll leave it to you for the formatting and such. Good luck.
dat <- ggplot_build(p)$data %>% ldply() %>% select(group,density) %>%
do(data.frame(xval = rep(1:6, times = 2),test1 = mapvalues(.$group, from = c(1,2), to = c("a","b")), density = .$density))
p + geom_text(data=dat, aes(x = xval, y = (density + .02), label = percent(density)), colour="black", size = 3)