How to highlight a column in ggplot2 - r

I have the following graph and I want to highlight the columns (both) for watermelons as it has the highest juice_content and weight. I know how to change the color of the columns but I would like to WHOLE columns to be highlighted. Any idea on how to achieve this? There doesn't seems to be any similar online.
fruits <- c("apple","orange","watermelons")
juice_content <- c(10,1,1000)
weight <- c(5,2,2000)
df <- data.frame(fruits,juice_content,weight)
df <- gather(df,compare,measure,juice_content:weight, factor_key=TRUE)
plot <- ggplot(df, aes(fruits,measure, fill=compare)) + geom_bar(stat="identity", position=position_dodge()) + scale_y_log10()

An option is to use gghighlight
library(gghighlight)
ggplot(df, aes(fruits,measure, fill = compare)) +
geom_col(position = position_dodge()) +
scale_y_log10() +
gghighlight(fruits == "watermelons")
In response to your comment, how about working with different alpha values
ggplot(df, aes(fruits,measure)) +
geom_col(data = . %>% filter(fruits == "watermelons"),
mapping = aes(fill = compare),
position = position_dodge()) +
geom_col(data = . %>% filter(fruits != "watermelons"),
mapping = aes(fill = compare),
alpha = 0.2,
position = position_dodge()) +
scale_y_log10()
Or you can achieve the same with one geom_col and a conditional alpha (thanks #Tjebo)
ggplot(df, aes(fruits, measure)) +
geom_col(
mapping = aes(fill = compare, alpha = fruits == 'watermelons'),
position = position_dodge()) +
scale_alpha_manual(values = c(0.2, 1)) +
scale_y_log10()

You could use geom_area to highlight behind the bars. You have to force the x scale to discrete first which is why I've used geom_blank (see this answer geom_ribbon overlay when x-axis is discrete) noting that geom_ribbon and geom_area are effectively the same except geom_area always has 0 as ymin
#minor edit so that the level isn't hard coded
watermelon_level <- which(levels(df$fruits) == "watermelons")
AreaDF <- data.frame(fruits = c(watermelon_level-0.5,watermelon_level+0.5))
plot <- ggplot(df, aes(fruits)) +
geom_blank(aes(y=measure, fill=compare))+
geom_area(data = AreaDF, aes( y = max(df$measure)), fill= "yellow")+
geom_bar(aes(y=measure, fill=compare),stat="identity", position=position_dodge()) + scale_y_log10()
Edit to address comment
If you want to highlight multiple fruits then you could do something like this. You need a data.frame with where you want the geom_area x and y, including dropping it to 0 between. I'm sure there's slightly tidier methods of getting the data.frame but this one works
highlight_level <- which(levels(df$fruits) %in% c("apple", "watermelons"))
AreaDF <- data.frame(fruits = unlist(lapply(highlight_level, function(x) c(x -0.51,x -0.5,x+0.5,x+0.51))),
yval = rep(c(1,max(df$measure),max(df$measure),1), length(highlight_level)))
AreaDF <- AreaDF %>% mutate(
yval = ifelse(floor(fruits) %in% highlight_level & ceiling(fruits) %in% highlight_level, max(df$measure), yval)) %>%
arrange(fruits) %>% distinct()
plot <- ggplot(df, aes(fruits)) +
geom_blank(aes(y=measure, fill=compare))+
geom_area(data = AreaDF, aes(y = yval ), fill= "yellow")+
geom_bar(aes(y=measure, fill=compare),stat="identity", position=position_dodge()) + scale_y_log10()
plot

Related

How to add percentages on top of an histogram when data is grouped

This is not my data (for confidentiality reasons), but I have tried to create a reproducible example using a dataset included in the ggplot2 library. I have an histogram summarizing the value of some variable by group (factor of 2 levels). First, I did not want the counts but proportions of the total, so I used that code:
library(ggplot2)
library(dplyr)
df_example <- diamonds %>% as.data.frame() %>% filter(cut=="Premium" | cut=="Ideal")
ggplot(df_example,aes(x=z,fill=cut)) +
geom_histogram(aes(y=after_stat(width*density)),binwidth=1,center=0.5,col="black") +
facet_wrap(~cut) +
scale_x_continuous(breaks=seq(0,9,by=1)) +
scale_y_continuous(labels=scales::percent_format(accuracy=2,suffix="")) +
scale_fill_manual(values=c("#CC79A7","#009E73")) +
labs(x="Depth (mm)",y="Count") +
theme_bw() + theme(legend.position="none")
It gave me this as a result.
enter image description here
The issue is that I would like to print the numeric percentages on top of the bins and haven't find a way to do so.
As I saw it done for printing counts elsewhere, I attempted to print them using stat_bin(), including the same y and label values as the y in geom_histogram, thinking it would print the right numbers:
ggplot(df_example,aes(x=z,fill=cut)) +
geom_histogram(aes(y=after_stat(width*density)),binwidth=1,center=0.5,col="black") +
stat_bin(aes(y=after_stat(width*density),label=after_stat(width*density*100)),geom="text",vjust=-.5) +
facet_wrap(~cut) +
scale_x_continuous(breaks=seq(0,9,by=1)) +
scale_y_continuous(labels=scales::percent_format(accuracy=2,suffix="")) +
scale_fill_manual(values=c("#CC79A7","#009E73")) +
labs(x="Depth (mm)",y="%") +
theme_bw() + theme(legend.position="none")
However, it does print way more values than there are bins, these values do not appear consistent with what is portrayed by the bar heights and they do not print in respect to vjust=-.5 which would make them appear slightly above the bars.
enter image description here
What am I missing here? I know that if there was no grouping variable/facet_wrap, I could use after_stat(count/sum(count)) instead of after_stat(width*density) and it seems that it would have fixed my issue. But I need the histograms for both groups to appear next to each other. Thanks in advance!
You have to use the same arguments in stat_bin as for the histogram when adding your labels to get same binning for both layers and to align the labels with the bars:
library(ggplot2)
library(dplyr)
df_example <- diamonds %>%
as.data.frame() %>%
filter(cut == "Premium" | cut == "Ideal")
ggplot(df_example, aes(x = z, fill = cut)) +
geom_histogram(aes(y = after_stat(width * density)),
binwidth = 1, center = 0.5, col = "black"
) +
stat_bin(
aes(
y = after_stat(width * density),
label = scales::number(after_stat(width * density), scale = 100, accuracy = 1)
),
geom = "text", binwidth = 1, center = 0.5, vjust = -.25
) +
facet_wrap(~cut) +
scale_x_continuous(breaks = seq(0, 9, by = 1)) +
scale_y_continuous(labels = scales::number_format(scale = 100)) +
scale_fill_manual(values = c("#CC79A7", "#009E73")) +
labs(x = "Depth (mm)", y = "%") +
theme_bw() +
theme(legend.position = "none")

How can I change the size of a bar in a grouped bar chart when one group has no data? [duplicate]

Is there a way to set a constant width for geom_bar() in the event of missing data in the time series example below? I've tried setting width in aes() with no luck. Compare May '11 to June '11 width of bars in the plot below the code example.
colours <- c("#FF0000", "#33CC33", "#CCCCCC", "#FFA500", "#000000" )
iris$Month <- rep(seq(from=as.Date("2011-01-01"), to=as.Date("2011-10-01"), by="month"), 15)
colours <- c("#FF0000", "#33CC33", "#CCCCCC", "#FFA500", "#000000" )
iris$Month <- rep(seq(from=as.Date("2011-01-01"), to=as.Date("2011-10-01"), by="month"), 15)
d<-aggregate(iris$Sepal.Length, by=list(iris$Month, iris$Species), sum)
d$quota<-seq(from=2000, to=60000, by=2000)
colnames(d) <- c("Month", "Species", "Sepal.Width", "Quota")
d$Sepal.Width<-d$Sepal.Width * 1000
g1 <- ggplot(data=d, aes(x=Month, y=Quota, color="Quota")) + geom_line(size=1)
g1 + geom_bar(data=d[c(-1:-5),], aes(x=Month, y=Sepal.Width, width=10, group=Species, fill=Species), stat="identity", position="dodge") + scale_fill_manual(values=colours)
Some new options for position_dodge() and the new position_dodge2(), introduced in ggplot2 3.0.0 can help.
You can use preserve = "single" in position_dodge() to base the widths off a single element, so the widths of all bars will be the same.
ggplot(data = d, aes(x = Month, y = Quota, color = "Quota")) +
geom_line(size = 1) +
geom_col(data = d[c(-1:-5),], aes(y = Sepal.Width, fill = Species),
position = position_dodge(preserve = "single") ) +
scale_fill_manual(values = colours)
Using position_dodge2() changes the way things are centered, centering each set of bars at each x axis location. It has some padding built in, so use padding = 0 to remove.
ggplot(data = d, aes(x = Month, y = Quota, color = "Quota")) +
geom_line(size = 1) +
geom_col(data = d[c(-1:-5),], aes(y = Sepal.Width, fill = Species),
position = position_dodge2(preserve = "single", padding = 0) ) +
scale_fill_manual(values = colours)
The easiest way is to supplement your data set so that every combination is present, even if it has NA as its value. Taking a simpler example (as yours has a lot of unneeded features):
dat <- data.frame(a=rep(LETTERS[1:3],3),
b=rep(letters[1:3],each=3),
v=1:9)[-2,]
ggplot(dat, aes(x=a, y=v, colour=b)) +
geom_bar(aes(fill=b), stat="identity", position="dodge")
This shows the behavior you are trying to avoid: in group "B", there is no group "a", so the bars are wider. Supplement dat with a dataframe with all the combinations of a and b:
dat.all <- rbind(dat, cbind(expand.grid(a=levels(dat$a), b=levels(dat$b)), v=NA))
ggplot(dat.all, aes(x=a, y=v, colour=b)) +
geom_bar(aes(fill=b), stat="identity", position="dodge")
I had the same problem but was looking for a solution that works with the pipe (%>%). Using tidyr::spread and tidyr::gather from the tidyverse does the trick. I use the same data as #Brian Diggs, but with uppercase variable names to not end up with double variable names when transforming to wide:
library(tidyverse)
dat <- data.frame(A = rep(LETTERS[1:3], 3),
B = rep(letters[1:3], each = 3),
V = 1:9)[-2, ]
dat %>%
spread(key = B, value = V, fill = NA) %>% # turn data to wide, using fill = NA to generate missing values
gather(key = B, value = V, -A) %>% # go back to long, with the missings
ggplot(aes(x = A, y = V, fill = B)) +
geom_col(position = position_dodge())
Edit:
There actually is a even simpler solution to that problem in combination with the pipe. Use tidyr::complete gives the same result in one line:
dat %>%
complete(A, B) %>%
ggplot(aes(x = A, y = V, fill = B)) +
geom_col(position = position_dodge())

How do I add data labels to a ggplot histogram with a log(x) axis?

I am wondering how to add data labels to a ggplot showing the true value of the data points when the x-axis is in log scale.
I have this data:
date <- c("4/3/2021", "4/7/2021","4/10/2021","4/12/2021","4/13/2021","4/13/2021")
amount <- c(105.00, 96.32, 89.00, 80.84, 121.82, 159.38)
address <- c("A","B","C","D","E","F")
df <- data.frame(date, amount, address)
And I plot it in ggplot2:
plot <- ggplot(df, aes(x = log(amount))) +
geom_histogram(binwidth = 1)
plot + theme_minimal() + geom_text(label = amount)
... but I get the error
"Error: geom_text requires the following missing aesthetics: y"
I have 2 questions as a result:
Why am I getting this error with geom_histogram? Shouldn't it assume to use count as the y value?
Will this successfully show the true values of the data points from the 'amount' column despite the plot's log scale x-axis?
Perhaps like this?
ggplot(df, aes(x = log(amount), y = ..count.., label = ..count..)) +
geom_histogram(binwidth = 1) +
stat_bin(geom = "text", binwidth = 1, vjust = -0.5) +
theme_minimal()
ggplot2 layers do not (at least in any situations I can think of) take the summary calculations of other layers, so I think the simplest thing would be to replicate the calculation using stat_bin(geom = "text"...
Or perhaps simpler, you could pre-calculate the numbers:
library(dplyr)
df %>%
count(log_amt = round(log(amount))) %>%
ggplot(aes(log_amt, n, label = n)) +
geom_col(width = 1) +
geom_text(vjust = -0.5)
EDIT -- to show buckets without the log transform we could use:
df %>%
count(log_amt = round(log(amount))) %>%
ggplot(aes(log_amt, n, label = n)) +
geom_col(width = 0.5) +
geom_text(vjust = -0.5) +
scale_x_continuous(labels = ~scales::comma(10^.),
minor_breaks = NULL)

Prevent reordering within facet_wrap()

I have an issue with my ggplot() reordering the data. I have an example code below. I have data, and reordered the factors in feed to my content, but after the str_extract() in facet_wrap(), the data gets reordered back before I reordered it. Is there a way to prevent that from occurring? For my actual code, it is important for me to use regex within the facet_wrap() in ggplot,
data <- chickwts
data <- mutate(data, time = 1:nrow(data))
lvl <- c("linseed", "meatmeal", "sunflower", "soybean",
"casein", "horsebean")
data$feed <- factor(data$feed, levels = lvl)
ggplot(data, aes(x = time, y = weight, color = feed)) +
geom_line(size = 1) + geom_point(size = 1.75) +
facet_wrap(~str_extract(feed,"[a-z]+"))
You could put the factor inside the facet_wrap:
ggplot(data, aes(x = time, y = weight, color = feed)) +
geom_line(size = 1) + geom_point(size = 1.75) +
facet_wrap(~ factor(str_extract(feed,"[a-z]+"), levels = lvl))

R ggplot . HOW TO plot only the variables >0?

I am plotting the number of covid19 PCR in the towns of my province. The problem its that many town havenĀ“t any PCR positive. I need a way to plot only the towns with at least 1+ PCR.
This is my code:
library(tidyverse)
library('data.table')
dfcsv1 <- read.csv("https://dadesobertes.gva.es/datastore/dump/ee17a346-a596-4866-a2ac-a530eb811737?bom=True",
encoding = "UTF-8", header = TRUE, sep = ",")
colnames(dfcsv1) <- c("code","code2","Municipio", "PCR", "TasaPCR", "PCR14",
"TasaPCR14", "Muertos", "TasaMuertos")
dfcsv1$TasaMuertos = as.numeric(gsub(",","\\.",dfcsv1$TasaMuertos))
dfcsv1$TasaPCR = as.numeric(gsub(",","\\.",dfcsv1$TasaPCR))
dfcsv1$TasaPCR14 = as.numeric(gsub(",","\\.",dfcsv1$TasaPCR14))
dfcsv1 %>%
mutate(Municipio = fct_reorder(Municipio, PCR14)) %>%
ggplot(aes(x=Municipio, y=PCR14, fill =TasaPCR14)) +
geom_bar(stat="identity", width=0.6) +
coord_flip() +
geom_text(data=dfcsv1, aes(y=PCR14,label=PCR14),vjust=1)+
scale_fill_gradient(low="steelblue", high="red")
As others have said in the comments, you need to filter out the PCR14 that is greater than 0 before reordering the factor levels. However, you will also need to remove the data parameter from geom_text, otherwise all those factor levels come back and you will have a big mess. It's already a bit crowded with the zero levels removed.
I think you should also change the vjust to an hjust to put the text in a nicer position since you have flipped the coordinates, with a compensating increase in the (flipped) y axis range to accommodate it:
dfcsv1 %>%
filter(PCR14 > 0) %>%
mutate(Municipio = fct_reorder(Municipio, PCR14)) %>%
ggplot(aes(x = Municipio, y = PCR14, fill = TasaPCR14)) +
geom_bar(stat = "identity", width = 0.6) +
coord_flip() +
geom_text(aes(y = PCR14,label = PCR14), hjust= -0.5) +
scale_fill_gradient(low = "steelblue", high = "red") +
ylim(c(0, 45))
Incidentally, it looks a lot better with the ones removed too:
dfcsv1 %>%
filter(PCR14 > 1) %>%
mutate(Municipio = fct_reorder(Municipio, PCR14)) %>%
ggplot(aes(x=Municipio, y=PCR14, fill =TasaPCR14)) +
geom_bar(stat="identity", width=0.6) + coord_flip() +
geom_text(aes(y=PCR14,label=PCR14),hjust=-0.5)+
scale_fill_gradient(low="steelblue", high="red") +
ylim(c(0, 45))
As a general rule, regardless of the type of plot or whether you are using ggplot , lattice or the base plot function, subsetting should happen first.
plot(x[y>0] , y[y>0])
The rest is aesthetics.

Resources