I am plotting the number of covid19 PCR in the towns of my province. The problem its that many town havenĀ“t any PCR positive. I need a way to plot only the towns with at least 1+ PCR.
This is my code:
library(tidyverse)
library('data.table')
dfcsv1 <- read.csv("https://dadesobertes.gva.es/datastore/dump/ee17a346-a596-4866-a2ac-a530eb811737?bom=True",
encoding = "UTF-8", header = TRUE, sep = ",")
colnames(dfcsv1) <- c("code","code2","Municipio", "PCR", "TasaPCR", "PCR14",
"TasaPCR14", "Muertos", "TasaMuertos")
dfcsv1$TasaMuertos = as.numeric(gsub(",","\\.",dfcsv1$TasaMuertos))
dfcsv1$TasaPCR = as.numeric(gsub(",","\\.",dfcsv1$TasaPCR))
dfcsv1$TasaPCR14 = as.numeric(gsub(",","\\.",dfcsv1$TasaPCR14))
dfcsv1 %>%
mutate(Municipio = fct_reorder(Municipio, PCR14)) %>%
ggplot(aes(x=Municipio, y=PCR14, fill =TasaPCR14)) +
geom_bar(stat="identity", width=0.6) +
coord_flip() +
geom_text(data=dfcsv1, aes(y=PCR14,label=PCR14),vjust=1)+
scale_fill_gradient(low="steelblue", high="red")
As others have said in the comments, you need to filter out the PCR14 that is greater than 0 before reordering the factor levels. However, you will also need to remove the data parameter from geom_text, otherwise all those factor levels come back and you will have a big mess. It's already a bit crowded with the zero levels removed.
I think you should also change the vjust to an hjust to put the text in a nicer position since you have flipped the coordinates, with a compensating increase in the (flipped) y axis range to accommodate it:
dfcsv1 %>%
filter(PCR14 > 0) %>%
mutate(Municipio = fct_reorder(Municipio, PCR14)) %>%
ggplot(aes(x = Municipio, y = PCR14, fill = TasaPCR14)) +
geom_bar(stat = "identity", width = 0.6) +
coord_flip() +
geom_text(aes(y = PCR14,label = PCR14), hjust= -0.5) +
scale_fill_gradient(low = "steelblue", high = "red") +
ylim(c(0, 45))
Incidentally, it looks a lot better with the ones removed too:
dfcsv1 %>%
filter(PCR14 > 1) %>%
mutate(Municipio = fct_reorder(Municipio, PCR14)) %>%
ggplot(aes(x=Municipio, y=PCR14, fill =TasaPCR14)) +
geom_bar(stat="identity", width=0.6) + coord_flip() +
geom_text(aes(y=PCR14,label=PCR14),hjust=-0.5)+
scale_fill_gradient(low="steelblue", high="red") +
ylim(c(0, 45))
As a general rule, regardless of the type of plot or whether you are using ggplot , lattice or the base plot function, subsetting should happen first.
plot(x[y>0] , y[y>0])
The rest is aesthetics.
Related
This is not my data (for confidentiality reasons), but I have tried to create a reproducible example using a dataset included in the ggplot2 library. I have an histogram summarizing the value of some variable by group (factor of 2 levels). First, I did not want the counts but proportions of the total, so I used that code:
library(ggplot2)
library(dplyr)
df_example <- diamonds %>% as.data.frame() %>% filter(cut=="Premium" | cut=="Ideal")
ggplot(df_example,aes(x=z,fill=cut)) +
geom_histogram(aes(y=after_stat(width*density)),binwidth=1,center=0.5,col="black") +
facet_wrap(~cut) +
scale_x_continuous(breaks=seq(0,9,by=1)) +
scale_y_continuous(labels=scales::percent_format(accuracy=2,suffix="")) +
scale_fill_manual(values=c("#CC79A7","#009E73")) +
labs(x="Depth (mm)",y="Count") +
theme_bw() + theme(legend.position="none")
It gave me this as a result.
enter image description here
The issue is that I would like to print the numeric percentages on top of the bins and haven't find a way to do so.
As I saw it done for printing counts elsewhere, I attempted to print them using stat_bin(), including the same y and label values as the y in geom_histogram, thinking it would print the right numbers:
ggplot(df_example,aes(x=z,fill=cut)) +
geom_histogram(aes(y=after_stat(width*density)),binwidth=1,center=0.5,col="black") +
stat_bin(aes(y=after_stat(width*density),label=after_stat(width*density*100)),geom="text",vjust=-.5) +
facet_wrap(~cut) +
scale_x_continuous(breaks=seq(0,9,by=1)) +
scale_y_continuous(labels=scales::percent_format(accuracy=2,suffix="")) +
scale_fill_manual(values=c("#CC79A7","#009E73")) +
labs(x="Depth (mm)",y="%") +
theme_bw() + theme(legend.position="none")
However, it does print way more values than there are bins, these values do not appear consistent with what is portrayed by the bar heights and they do not print in respect to vjust=-.5 which would make them appear slightly above the bars.
enter image description here
What am I missing here? I know that if there was no grouping variable/facet_wrap, I could use after_stat(count/sum(count)) instead of after_stat(width*density) and it seems that it would have fixed my issue. But I need the histograms for both groups to appear next to each other. Thanks in advance!
You have to use the same arguments in stat_bin as for the histogram when adding your labels to get same binning for both layers and to align the labels with the bars:
library(ggplot2)
library(dplyr)
df_example <- diamonds %>%
as.data.frame() %>%
filter(cut == "Premium" | cut == "Ideal")
ggplot(df_example, aes(x = z, fill = cut)) +
geom_histogram(aes(y = after_stat(width * density)),
binwidth = 1, center = 0.5, col = "black"
) +
stat_bin(
aes(
y = after_stat(width * density),
label = scales::number(after_stat(width * density), scale = 100, accuracy = 1)
),
geom = "text", binwidth = 1, center = 0.5, vjust = -.25
) +
facet_wrap(~cut) +
scale_x_continuous(breaks = seq(0, 9, by = 1)) +
scale_y_continuous(labels = scales::number_format(scale = 100)) +
scale_fill_manual(values = c("#CC79A7", "#009E73")) +
labs(x = "Depth (mm)", y = "%") +
theme_bw() +
theme(legend.position = "none")
I have the following graph and I want to highlight the columns (both) for watermelons as it has the highest juice_content and weight. I know how to change the color of the columns but I would like to WHOLE columns to be highlighted. Any idea on how to achieve this? There doesn't seems to be any similar online.
fruits <- c("apple","orange","watermelons")
juice_content <- c(10,1,1000)
weight <- c(5,2,2000)
df <- data.frame(fruits,juice_content,weight)
df <- gather(df,compare,measure,juice_content:weight, factor_key=TRUE)
plot <- ggplot(df, aes(fruits,measure, fill=compare)) + geom_bar(stat="identity", position=position_dodge()) + scale_y_log10()
An option is to use gghighlight
library(gghighlight)
ggplot(df, aes(fruits,measure, fill = compare)) +
geom_col(position = position_dodge()) +
scale_y_log10() +
gghighlight(fruits == "watermelons")
In response to your comment, how about working with different alpha values
ggplot(df, aes(fruits,measure)) +
geom_col(data = . %>% filter(fruits == "watermelons"),
mapping = aes(fill = compare),
position = position_dodge()) +
geom_col(data = . %>% filter(fruits != "watermelons"),
mapping = aes(fill = compare),
alpha = 0.2,
position = position_dodge()) +
scale_y_log10()
Or you can achieve the same with one geom_col and a conditional alpha (thanks #Tjebo)
ggplot(df, aes(fruits, measure)) +
geom_col(
mapping = aes(fill = compare, alpha = fruits == 'watermelons'),
position = position_dodge()) +
scale_alpha_manual(values = c(0.2, 1)) +
scale_y_log10()
You could use geom_area to highlight behind the bars. You have to force the x scale to discrete first which is why I've used geom_blank (see this answer geom_ribbon overlay when x-axis is discrete) noting that geom_ribbon and geom_area are effectively the same except geom_area always has 0 as ymin
#minor edit so that the level isn't hard coded
watermelon_level <- which(levels(df$fruits) == "watermelons")
AreaDF <- data.frame(fruits = c(watermelon_level-0.5,watermelon_level+0.5))
plot <- ggplot(df, aes(fruits)) +
geom_blank(aes(y=measure, fill=compare))+
geom_area(data = AreaDF, aes( y = max(df$measure)), fill= "yellow")+
geom_bar(aes(y=measure, fill=compare),stat="identity", position=position_dodge()) + scale_y_log10()
Edit to address comment
If you want to highlight multiple fruits then you could do something like this. You need a data.frame with where you want the geom_area x and y, including dropping it to 0 between. I'm sure there's slightly tidier methods of getting the data.frame but this one works
highlight_level <- which(levels(df$fruits) %in% c("apple", "watermelons"))
AreaDF <- data.frame(fruits = unlist(lapply(highlight_level, function(x) c(x -0.51,x -0.5,x+0.5,x+0.51))),
yval = rep(c(1,max(df$measure),max(df$measure),1), length(highlight_level)))
AreaDF <- AreaDF %>% mutate(
yval = ifelse(floor(fruits) %in% highlight_level & ceiling(fruits) %in% highlight_level, max(df$measure), yval)) %>%
arrange(fruits) %>% distinct()
plot <- ggplot(df, aes(fruits)) +
geom_blank(aes(y=measure, fill=compare))+
geom_area(data = AreaDF, aes(y = yval ), fill= "yellow")+
geom_bar(aes(y=measure, fill=compare),stat="identity", position=position_dodge()) + scale_y_log10()
plot
I'm trying to display percentage numbers as labels inside the bars of a stacked bar plot in ggplot2. I found some other post from 3 years ago but I'm not able to reproduce it: How to draw stacked bars in ggplot2 that show percentages based on group?
The answer to that post is almost exactly what I'm trying to do.
Here is a simple example of my data:
df = data.frame('sample' = c('cond1','cond1','cond1','cond2','cond2','cond2','cond3','cond3','cond3','cond4','cond4','cond4'),
'class' = c('class1','class2','class3','class1','class2','class3','class1','class2','class3','class1','class2','class3'))
ggplot(data=df, aes(x=sample, fill=class)) +
coord_flip() +
geom_bar(position=position_fill(reverse=TRUE), width=0.7)
I'd like for every bar to show the percentage/fraction, so in this case they would all be 33%. In reality it would be nice if the values would be calculated on the fly, but I can also hand the percentages manually if necessary. Can anybody help?
Side question: How can I reduce the space between the bars? I found many answers to that as well but they suggest using the width parameter in position_fill(), which doesn't seem to exist anymore.
Thanks so much!
EDIT:
So far, there are two examples that show exactly what I was asking for (big thanks for responding so quickly), however they fail when applying it to my real data. Here is the example data with just another element added to show what happens:
df = data.frame('sample' = c('cond1','cond1','cond1','cond2','cond2','cond2','cond3','cond3','cond3','cond4','cond4','cond4','cond1'),
'class' = c('class1','class2','class3','class1','class2','class3','class1','class2','class3','class1','class2','class3','class2'))
Essentially, I'd like to have only one label per class/condition combination.
I think what OP wanted was labels on the actual sections of the bars. We can do this using data.table to get the count percentages and the formatted percentages and then plot using ggplot:
library(data.table)
library(scales)
dt <- setDT(df)[,list(count = .N), by = .(sample,class)][,list(class = class, count = count,
percent_fmt = paste0(formatC(count*100/sum(count), digits = 2), "%"),
percent_num = count/sum(count)
), by = sample]
ggplot(data=dt, aes(x=sample, y= percent_num, fill=class)) +
geom_bar(position=position_fill(reverse=TRUE), stat = "identity", width=0.7) +
geom_text(aes(label = percent_fmt),position = position_stack(vjust = 0.5)) + coord_flip()
Edit: Another solution where you calculate the y-value of your label in the aggregate. This is so we don't have to rely on position_stack(vjust = 0.5):
dt <- setDT(df)[,list(count = .N), by = .(sample,class)][,list(class = class, count = count,
percent_fmt = paste0(formatC(count*100/sum(count), digits = 2), "%"),
percent_num = count/sum(count),
cum_pct = cumsum(count/sum(count)),
label_y = (cumsum(count/sum(count)) + cumsum(ifelse(is.na(shift(count/sum(count))),0,shift(count/sum(count))))) / 2
), by = sample]
ggplot(data=dt, aes(x=sample, y= percent_num, fill=class)) +
geom_bar(position=position_fill(reverse=TRUE), stat = "identity", width=0.7) +
geom_text(aes(label = percent_fmt, y = label_y)) + coord_flip()
Here is a solution where you first calculate the percentages using dplyr and then plot them:
UPDATED:
options(stringsAsFactors = F)
df = data.frame(sample = c('cond1','cond1','cond1','cond2','cond2','cond2','cond3','cond3','cond3','cond4','cond4','cond4'),
class = c('class1','class2','class3','class1','class2','class3','class1','class2','class3','class1','class2','class3'))
library(dplyr)
library(scales)
df%>%
# count how often each class occurs in each sample.
count(sample, class)%>%
group_by(sample)%>%
mutate(pct = n / sum(n))%>%
ggplot(aes(x = sample, y = pct, fill = class)) +
coord_flip() +
geom_col(width=0.7)+
geom_text(aes(label = paste0(round(pct * 100), '%')),
position = position_stack(vjust = 0.5))
Use scales
library(scales)
ggplot(data=df, aes(x=sample, fill=class)) +
coord_flip() +
geom_bar(position=position_fill(reverse=TRUE), width=0.7) +
scale_y_continuous(labels =percent_format())
I am trying to make some changes to my plot, but am having difficulty doing so.
(1) I would like warm, avg, and cold to be filled in as the colors red, yellow, and blue, respectively.
(2) I am trying to make the y-axis read "Count" and have it be horizontally written.
(3) In the legend, I would like the title to be Temperatures, rather than variable
Any help making these changes would be much appreciated along with other suggestions to make the plot look nicer.
df <- read.table(textConnection(
'Statistic Warm Avg Cold
Homers(Away) 1.151 1.028 .841
Homers(Home) 1.202 1.058 .949'), header = TRUE)
library(ggplot2)
library(reshape2)
df <- melt(df, id = 'Statistic')
ggplot(
data = df,
aes(
y = value,
x = Statistic,
group = variable,
shape = variable,
fill = variable
)
) +
geom_bar(stat = "identity")
You are on the right lines by trying to reshape the data into long format. My preference is to use gather from the tidyr package for that. You can also create the variable names Temperatures and Count in the gather step.
The next step is to turn the 3 classes of temperature into a factor, ordered from cold, through average, to warm.
Now you can plot. You want position = "dodge" to get the bars side by side, since it makes no sense to stack the values in a single bar. Fill colours you specify using scale_fill_manual.
You rotate the y-axis title by manipulating axis.title.y.
So putting all of that together (plus a black/white theme):
library(dplyr)
library(tidyr)
library(ggplot2)
df %>%
gather(Temperatures, Count, -Statistic) %>%
mutate(Temperatures = factor(Temperatures, c("Cold", "Avg", "Warm"))) %>%
ggplot(aes(Statistic, Count)) +
geom_col(aes(fill = Temperatures), position = "dodge") +
scale_fill_manual(values = c("blue", "yellow", "red")) +
theme_bw() +
theme(axis.title.y = element_text(angle = 0, vjust = 0.5))
Result:
I'd question whether Count is a sensible variable name in this case.
You are almost there. To map specific colors to specific factor levels you can use scale_fill_manual and create your own scale:
scale_fill_manual(values=c("Warm"="red", "Avg"="yellow", "Cold"="blue")) +
Changing the y axis legend is also easy in ggplot:
ylab("Count") +
And to change the legend title you can use:
labs(fill='TEMPERATURE') +
Giving us:
ggplot(df, aes(y = value, x = Statistic, group= variable, fill = variable)) +
geom_bar(stat = "identity") +
scale_fill_manual(values=c("Warm"="red", "Avg"="yellow", "Cold"="blue")) +
labs(fill='TEMPERATURE') +
ylab("Count") +
xlab("") +
theme_bw() +
theme(axis.title.y = element_text(angle = 0, vjust = 0.5))
I'll use the diamond data set in ggplot to illustrate my point , I want to draw a histogram for price , but I want to show the count for each bin for each cut
this is my code
ggplot(aes(x = price ) , data = diamonds_df) +
geom_histogram(aes(fill = cut , binwidth = 1500)) +
stat_bin(binwidth= 1500, geom="text", aes(label=..count..) ,
vjust = -1) +
scale_x_continuous(breaks = seq(0 , max(stores_1_5$Weekly_Sales) , 1500 )
, labels = comma)
here is my current plot
but as you see the number shows the count for all cuts at each bin , I want to display the count for each cut on each bin .
also a bonus point if if I would be able to configure Y axis instead of displaying numbers at step of 5000 to something else I can configure manually
Update for ggplot2 2.x
You can now center labels within stacked bars without pre-summarizing the data using position=position_stack(vjust=0.5). For example:
ggplot(aes(x = price ) , data = diamonds) +
geom_histogram(aes(fill=cut), binwidth=1500, colour="grey20", lwd=0.2) +
stat_bin(binwidth=1500, geom="text", colour="white", size=3.5,
aes(label=..count.., group=cut), position=position_stack(vjust=0.5)) +
scale_x_continuous(breaks=seq(0,max(diamonds$price), 1500))
Original Answer
You can get the counts for each value of cut by adding cut as a group aesthetic to stat_bin. I also moved binwidth outside of aes, which was causing binwidth to be ignored in your original code:
ggplot(aes(x = price ), data = diamonds) +
geom_histogram(aes(fill = cut ), binwidth=1500, colour="grey20", lwd=0.2) +
stat_bin(binwidth=1500, geom="text", colour="white", size=3.5,
aes(label=..count.., group=cut, y=0.8*(..count..))) +
scale_x_continuous(breaks=seq(0,max(diamonds$price), 1500))
One issue with the code above is that I'd like the labels to be vertically centered within each bar section, but I'm not sure how to do that within stat_bin, or if it's even possible. Multiplying by 0.8 (or whatever) moves each label by a different relative amount. So, to get the labels centered, I created a separate data frame for the labels in the code below:
# Create text labels
dat = diamonds %>%
group_by(cut,
price=cut(price, seq(0,max(diamonds$price)+1500,1500),
labels=seq(0,max(diamonds$price),1500), right=FALSE)) %>%
summarise(count=n()) %>%
group_by(price) %>%
mutate(ypos = cumsum(count) - 0.5*count) %>%
ungroup() %>%
mutate(price = as.numeric(as.character(price)) + 750)
ggplot(aes(x = price ) , data = diamonds) +
geom_histogram(aes(fill = cut ), binwidth=1500, colour="grey20", lwd=0.2) +
geom_text(data=dat, aes(label=count, y=ypos), colour="white", size=3.5)
To configure the breaks on the y axis, just add scale_y_continuous(breaks=seq(0,20000,2000)) or whatever breaks you'd like.
Now with GGPLOT 2.2.0 position_stack options makes it easier
library(ggplot2)
s <- ggplot(mpg, aes(manufacturer, fill = class))
s + geom_bar(position = "stack") +
theme(axis.text.x = element_text(angle=90, vjust=1)) +
geom_text(stat='count', aes(label=..count..), position = position_stack(vjust = 0.5),size=4)