Geom_mosaic - How to add the variences labels to the axis? - r

I have this database:
I've tried to make a geom_mosaic graph.
This is my code:
ggplot(data) +
geom_mosaic(aes(x = product(substanceabuse,probation),fill=substanceabuse))
and this is the result:
How do I add the 'Yes' and 'No' labels as I got from the mosaicplot function:
mosaicplot graph
Thanks ahead!

It looks like geom_mosaic() has some bugs. I would suggest an approach of this style. Maybe can be useful for you:
library(ggplot2)
library(dplyr)
#Data
data <- data.frame(ClientId=1:6,
substanceabuse=rep(c('Yes','No'),each=3),
probation=c('No',rep('Yes',3),rep('No',2)),stringsAsFactors = F)
#Plot
data %>% group_by(substanceabuse,probation) %>%
summarise(count = n()) %>%
mutate(cut.count = sum(count),
prop = count/sum(count)) %>%
ungroup() %>%
ggplot(aes(x = substanceabuse, y = prop, width = cut.count, fill = probation)) +
geom_bar(stat = "identity", position = "fill", colour = "black") +
geom_text(aes(label = scales::percent(prop)), position = position_stack(vjust = 0.5)) + # if labels are desired
facet_grid(~substanceabuse, scales = "free_x", space = "free_x")+
theme(strip.background = element_blank(),
strip.text = element_blank())
Output:
Which in some sense is close to what you want.

Related

Placing data labels for stacked bar chart at top of bar

I have been attempting to add a label on top of each bar to represent the proportion that each ethnic group makes up in referrals.
For some reason I cannot get the labels to be placed at the top of each bar. How do I fix this?
My code below
freq <- df %>%
group_by(ethnicity) %>%
summarise(n = n()) %>%
mutate(f = round((n/sum(n)*100, 1))
df %>%
group_by(pathway) %>%
count(ethnicity) %>%
ggplot(aes(x = ethnicity, y = n , fill = pathway)) +
geom_bar(stat = "identity", position = "stack") +
geom_text(data = freq,
aes(x= ethnicity, y = f, label = f),
inherit.aes = FALSE) +
theme(legend.position = "bottom") +
scale_fill_manual(name = "",
values = c("light blue", "deepskyblue4"),
labels = "a", "b") +
xlab("") +
ylab("Number of Referrals") +
scale_y_continuous(breaks = seq(0, 2250, 250), expand = c(0,0)
Here is what it currently looks like
Since you are using the count as your y-axis position in geom_bar, you need to use the same thing in your geom_text to get the labels in the right place. Below is an example using mtcars dataset. Using vjust = -1 I put a little bit of space between the label and the bars to make it more legible and aesthetically pleasing.
library(tidyverse)
mtcars %>%
group_by(carb) %>%
summarise(n = n()) %>%
mutate(f = round(proportions(n) * 100, 1)) -> frq
mtcars %>%
group_by(gear) %>%
count(carb) -> df
df %>%
ggplot(aes(x = carb, y = n, fill = gear)) +
geom_bar(stat = "identity", position = "stack") +
geom_text(data = frq,
vjust = -1,
aes(x= carb, y = n, label = f),
inherit.aes = FALSE)
Created on 2022-10-31 by the reprex package (v2.0.1)

Horizontal percent total stacked bar chart with labels on each end

I have a simple data frame which has the probabilities that an id is real and fake, respectively:
library(tidyverse)
dat <- data.frame(id = "999", real = 0.7, fake = 0.3)
I know that I can show this as a horizontal bar chart using the code below:
dat %>%
gather(key = grp, value = prob, -id) %>%
ggplot(aes(x = id, y = prob, fill = grp)) +
geom_bar(stat = "identity") +
coord_flip()
But I was wondering if there was a way to show this in the same way as shown below, with the class labels and probabilities on either end of the bar chart?
Many thanks
A straight forward, maybe somewhat cheeky workaround is to re-define your 0.
I added a few calls that are not strictly necessary, but make it look closer to your example plot.
library(tidyverse)
dat <- data.frame(id = "999", real = -0.7, fake = 0.3) # note the minus sign!
dat %>%
gather(key = grp, value = prob, -id) %>%
ggplot(aes(x = id, y = prob, fill = grp)) +
geom_col(show.legend = FALSE) +
geom_text(aes(label = stringr::str_to_title(paste0(grp, " (", as.character(100*abs(prob)), "%)"))),
hjust = c(1,0))+
coord_flip(clip = "off") +
scale_fill_brewer(palette = "Greys") +
theme_void() +
theme(aspect.ratio = .1,
plot.margin = margin(r = 3, l = 3, unit = "lines"))
Created on 2021-02-06 by the reprex package (v0.3.0)
I'm not sure this fully answers the question but I think it will improve the plot, can you try it out?
dat %>%
gather(key = grp, value = prob, -id) %>%
ggplot(aes(x = id, y = prob, fill = grp)) +
geom_bar(stat = "identity", position = "fill") +
scale_y_continuous("Proportion") +
scale_x_discrete("", expand = c(0,0)) +
scale_fill_identity() +
coord_flip()

How to automatically choose a good ylim to read geom_labels in ggplot2 in R

Suppose I write the following code with the diamonds dataset:
library(tidyverse)
diamonds %>%
group_by(cut) %>%
summarize(total_value = sum(price, na.rm = TRUE)) %>%
arrange(total_value) %>%
mutate(cut = as_factor(cut)) %>%
mutate(across(where(is.numeric), ~round(., 1))) %>%
ggplot(aes(x = cut, y = total_value)) +
geom_col(aes(fill = cut)) +
theme(legend.position = "note") +
coord_flip() +
geom_label(aes(label = paste0("$", total_value)), size = 6) +
labs(title = "Total Value of Diamonds by Cut", y = "USD", x = "") +
theme(axis.text = element_text(size = rel(1)))
which outputs the following plot:
As you can see, it is impossible to read the last digit(s) of the first category ("Ideal").
So, my question is, I know I can simply write something like coord_flip(ylim = c(0,80000000) and this would solve the problem; however, what could I write instead for ggplot2 to automatically know by itself how much space it should provide in ylim for people to clearly read the geom_label()s without me having to do this manually?
I'm trying to create an automatic Dashboard with multiple plots such as this, but I cannot manually tune every one of those, I need an automatic mechanism and I haven't found anything regarding this on StackOverflow for geom_label() specifically.
Thanks.
Instead of positioning your label at the the bar, you could move it closer to the middle and adjust position with vjust so it won't spill out of the plot set to include the bars.
library(tidyverse)
diamonds %>%
group_by(cut) %>%
summarize(total_value = sum(price, na.rm = TRUE)) %>%
arrange(total_value) %>%
mutate(cut = as_factor(cut)) %>%
mutate(across(where(is.numeric), ~round(., 1))) %>%
ggplot(aes(x = cut, y = total_value)) +
geom_col(aes(fill = cut)) +
theme(legend.position = "note") +
coord_flip() +
geom_label(aes(label = paste0("$", total_value), y = total_value/2), size = 6, hjust = 0.2) +
labs(title = "Total Value of Diamonds by Cut", y = "USD", x = "") +
theme(axis.text = element_text(size = rel(1)))
That gives:

How to do a bar graphic with multiple columns out of an excel archive?

How can I make a graphic bar using barplot() or ggplopt() of an excel archive that has 83 columns?
I need to plot every column that has a >0 value on ich raw. (ich column represents a gene function and I need to know how many functions there is on ich cluster).
Iwas trying this,but it didn't work:
ggplot(x, aes(x=Cluster, y=value, fill=variable)) +
geom_bar(stat="bin", position="dodge") +
theme_bw() +
ylab("Funções no cluster") +
xlab("Cluster") +
scale_fill_brewer(palette="Blues")
Link to the excel:
https://github.com/annabmarques/GenesCorazon/blob/master/AllclusPathwayEDIT.xlsx
What about a heatmap? A rough example:
library(dplyr)
library(tidyr)
library(ggplot2)
library(openxlsx)
data <- read.xlsx("AllclusPathwayEDIT.xlsx")
data <- data %>%
mutate(cluster_nr = row_number()) %>%
pivot_longer(cols = -c(Cluster, cluster_nr),
names_to = "observations",
values_to = "value") %>%
mutate(value = as.factor(value))
ggplot(data, aes(x = cluster_nr, y = observations, fill = value)) +
geom_tile() +
scale_fill_brewer(palette = "Blues")
Given the large number of observations consider breaking this up into multiple charts.
It's difficult to understand exactly what you're trying to do. Is this what you're trying to achieve?
#install.packages("readxl")
library(tidyverse)
library(readxl)
read_excel("AllclusPathwayEDIT.xlsx") %>%
pivot_longer(!Cluster, names_to = "gene_counts", values_to = "count") %>%
mutate(Cluster = as.factor(Cluster)) %>%
ggplot(aes(x = Cluster, y = count, fill = gene_counts)) +
geom_bar(position="stack", stat = "identity") +
theme(legend.position = "right",
legend.key.size = unit(0.4,"line"),
legend.text = element_text(size = 7),
legend.title = element_blank()) +
guides(fill = guide_legend(ncol = 1))
ggsave(filename = "example.pdf", height = 20, width = 35, units = "cm")

How to sort the double bar using ggplot in r?

I am learning r and I have problem with sorting the double bar in ascending or descending order and I want to set the legend just on the top of the plot with two color represent respectively with one row and two columns like for example:
The title Time
box color Breakfast box color Dinner
And the plot here
Here is my dataframe:
dat <- data.frame(
time = factor(c("Breakfast","Breakfast","Breakfast","Breakfast","Breakfast","Lunch","Lunch","Lunch","Lunch","Lunch","Lunch","Dinner","Dinner","Dinner","Dinner","Dinner","Dinner","Dinner"), levels=c("Breakfast","Lunch","Dinner")),
class = c("a","a","b","b","c","a","b","b","c","c","c","a","a","b","b","b","c","c"))
And here is my code to make change:
dat %>%
filter(time %in% c("Breakfast", "Dinner")) %>%
droplevels %>%
count(time, class) %>%
group_by(time) %>%
mutate(prop = n/sum(n)) %>%
ggplot(aes(x = class, y = prop, fill = time, label = scales::percent(prop))) +
geom_col(position = 'dodge') +
geom_text(position = position_dodge(width = 0.9), vjust = 0.5, size = 3) +
scale_y_continuous(labels = scales::percent)+
coord_flip()
Any help would be appreciated.
Something like this should be close to what you are asking, feel free to ask more
Resources consulted during the answer: http://www.sthda.com/english/wiki/ggplot2-legend-easy-steps-to-change-the-position-and-the-appearance-of-a-graph-legend-in-r-software
Using part of the answer you can look further into https://ggplot2.tidyverse.org/reference/theme.html
library(tidyverse)
dat <- data.frame(
time = factor(c("Breakfast","Breakfast","Breakfast","Breakfast","Breakfast","Lunch","Lunch","Lunch","Lunch","Lunch","Lunch","Dinner","Dinner","Dinner","Dinner","Dinner","Dinner","Dinner"), levels=c("Breakfast","Lunch","Dinner")),
class = c("a","a","b","b","c","a","b","b","c","c","c","a","a","b","b","b","c","c"))
dat %>%
filter(time %in% c("Breakfast", "Dinner")) %>%
droplevels %>%
count(time, class) %>%
group_by(time) %>%
mutate(prop = n/sum(n)) %>%
ggplot(aes(x = fct_reorder(class,prop), y = prop, fill = time, label = scales::percent(prop))) +
geom_col(position = 'dodge') +
geom_text(position = position_dodge(width = 0.9), vjust = 0.5, size = 3) +
scale_y_continuous(labels = scales::percent)+
coord_flip() +
labs(x = "class",fill = "Time") +
theme(legend.position = "top", legend.direction="vertical", legend.title=element_text(hjust = 0.5,face = "bold",size = 12))
Created on 2020-05-08 by the reprex package (v0.3.0)
To get the legend title above the legend key, requires a little additional adjustments to the theme and guides.
dat %>%
filter(time %in% c("Breakfast", "Dinner")) %>%
droplevels %>%
count(time, class) %>%
group_by(time) %>%
mutate(prop = n/sum(n)) %>%
ggplot(aes(x = class, y = prop, fill = time, label = scales::percent(prop))) +
geom_col(position = 'dodge') +
geom_text(position = position_dodge(width = 0.9), vjust = 0.5, size = 3) +
scale_y_continuous(labels = scales::percent)+
coord_flip() +
theme(legend.position="top", legend.direction="vertical", legend.title=element_text(hjust = 0.5))+
guides(fill = guide_legend(title = "Time", nrow = 1))

Resources