Customize ggplot2 legend with different variables - r

I have the following data about American and German teenagers' coding skills. I can easily display their bar plots, but I need to present the total number of teenagers from each country as well.
DF <- data.frame(code = rep(c("A","B","C"), each = 2),
Freq = c(441,121,700,866,45,95),
Country = rep(c("USA","Germany"),3),
Total = rep(c(1186,1082),3))
ggplot(DF, aes(code, Freq, fill = code)) + geom_bar(stat = "identity", alpha = 0.7) +
facet_wrap(~Country, scales = "free") +
theme_bw() +
theme(legend.position="none")
For example, instead of presenting the default legend for the code, I could replace it with the Country and the Total. Your help is appreciated

Here's what I would suggest:
library(dplyr); library(ggplot2)
DF %>%
add_count(Country, wt = Total) %>%
mutate(Country_total = paste0(Country, ": Total=", n)) %>%
ggplot(aes(code, Freq, fill = code)) + geom_bar(stat = "identity", alpha = 0.7) +
facet_wrap(~Country_total, scales = "free") +
theme_bw() +
theme(legend.position="none")
To do what you're requesting would take a different approach, since the data you're describing would not strictly be a ggplot2 legend (which explains how one of the variables is mapped to one of the graph aesthetics), rather it would be a table or annotation that is displayed alongside the plot. This could be generated separately and added to the figure using patchwork or grid packages.
For instance:
library(patchwork); library(gridExtra)
ggplot(DF, aes(code, Freq, fill = code)) + geom_bar(stat = "identity", alpha = 0.7) +
facet_wrap(~Country, scales = "free") +
theme_bw() +
theme(legend.position="none") +
tableGrob(count(DF, Country, wt = Total)) +
plot_layout(widths = c(2,1))

Related

How can I add a a nested y-axis title in my graph?

I created a ggplot graph using ggsegment for certain subcategories and their cost.
df <- data.frame(category = c("A","A","A","A","A","A","B","B","B","B","B","B","B"),
subcat = c("S1","S2","S3","S4","S5","S6","S7","S8","S9","S10","S11","S12","S13"),
value = c(100,200,300,400,500,600,700,800,900,1000,1100,1200,1300))
df2 <- df %>%
arrange(desc(value)) %>%
mutate(subcat=factor(subcat, levels = subcat)) %>%
ggplot(aes(x=subcat, y=value)) +
geom_segment(aes(xend=subcat, yend=0)) +
geom_point(size=4, color="steelblue") +
geom_text(data=df, aes(x=subcat, y=value, label = dollar(value, accuracy = 1)), position = position_nudge(x = -0.3), hjust = "inward") +
theme_classic() +
coord_flip() +
scale_y_continuous(labels = scales::dollar_format()) +
ylab("Cost Value") +
xlab("subcategory")
df2
This code results in a graph that is shown below:
My main issue is I want the category variable on the left of the subcategory variables. It should look like this:
How do I add the category variables in the y-axis, such that it looks nested?
As mentioned in my comment and adapting this post by #AllanCameron to your case one option to achieve your desired result would be the "facet trick", which uses faceting to get the nesting and some styling to remove the facet look:
Facet by category and free the scales and the space so that the distance between categories is the same.
Remove the spacing between panels and place the strip text outside of the axis text.
Additionally, set the expansion of the discrete x scale to .5 to ensure that the distance between categories is the same at the facet boundaries as inside the facets.
library(dplyr)
library(ggplot2)
library(scales)
df1 <- df %>%
arrange(desc(value)) %>%
mutate(subcat=factor(subcat, levels = subcat))
ggplot(df1, aes(x=subcat, y=value)) +
geom_segment(aes(xend=subcat, yend=0)) +
geom_point(size=4, color="steelblue") +
geom_text(data=df, aes(x=subcat, y=value, label = dollar(value, accuracy = 1)), position = position_nudge(x = -0.3), hjust = "inward") +
theme_classic() +
coord_flip() +
scale_y_continuous(labels = scales::dollar_format()) +
scale_x_discrete(expand = c(0, .5)) +
facet_grid(category~., scales = "free_y", switch = "y", space = "free_y") +
ylab("Cost Value") +
xlab("subcategory") +
theme(panel.spacing.y = unit(0, "pt"), strip.placement = "outside")

Faceted bar plot with observation name adjacent to bar for each group and space=free

Using ggplot I'm trying to make something like a faceted barplot where
bars representing the same value are the same size (sort of like space = "free")
names are adjacent to bars (sort of like scales = "free_y")
graphs are generated with code - no trial and error adjustment of size or scale or stuff
I'm open to a multi-plot solution with something like cowplot::plot_grid
Here's a sample dataset.
df <- data.frame(name = c('A very long name','A short name','A really truly long big name that is very long','One shorter name'),
value =c(100,50,10,10),
group = c(2022,2022,2022,2021))
What I'm aiming for would look something like this:
Two things I've tried and rejected:
ggplot(df,
aes(x = name, y = value)) +
geom_col(aes(fill = -value)) +
coord_flip() +
facet_grid(~group, space = "free", scales = "free_x") +
theme(legend.position = "none")
ggplot(df,
aes(x = name, y = value)) +
geom_col(aes(fill = -value)) +
coord_flip() +
facet_wrap(~group, scales = "free_y") +
theme(legend.position = "none")
Here is a solution using vanilla ggplot2, taken from a related ggplot2 issue. You can use the fact that breaks and limits arguments accept functions. Below, we use that to pad limits with dummy names, and then use the breaks function to censor the dummy names. It requires you to know the maximum number of categories on a facet beforehand though.
library(ggplot2)
df <- data.frame(name = c('A very long name','A short name','A really truly long big name that is very long','One shorter name'),
value =c(100,50,10,10),
group = c(2022,2022,2022,2021))
max_categories <- 3
ggplot(df,
aes(y = name, x = value)) +
geom_col(aes(fill = -value)) +
scale_y_discrete(
limits = function(x) {
y <- paste0("dummy", seq_len(max_categories))
c(y[seq_len(max_categories - length(x))], x)
},
breaks = function(x) {
x[!startsWith(x, "dummy")]
}
) +
facet_wrap(~group, scales = "free_y") +
theme(legend.position = "none")
Created on 2021-05-09 by the reprex package (v0.3.0)
A few sidenotes; I switched the x and y aes to make the coord_flip() unnecessary. Also, you can set scales = "free"+ space = "free_x", if you want the panels to adjust their width in response to the data.
With patchwork you could try:
library(ggplot2)
library(dplyr)
library(patchwork)
df <- data.frame(name = c('A very long name','A short name','A really truly long big name that is very long','One shorter name'),
value =c(100,50,10,10),
group = c(2022,2022,2022,2021))
# plots could be simplified with a function and appearance edited to suit your needs
p2022 <-
ggplot(data = filter(df, group == 2022), aes(x = name, y = value)) +
geom_col(aes(fill = -value)) +
coord_flip() +
labs(x = NULL) +
facet_grid(~group) +
theme(legend.position = "none")
p2021 <-
ggplot(data = filter(df, group == 2021), aes(x = name, y = value)) +
geom_col(aes(fill = -value)) +
coord_flip() +
scale_y_continuous(limits = c(0, max(df$value)))+
labs(x = NULL) +
facet_grid(~group) +
theme(legend.position = "none")
# define the plotting layout
design <- "
12
#2
#2"
# plot
p2021 + p2022 + plot_layout(design = design)
Created on 2021-05-09 by the reprex package (v2.0.0)
Another approach could be:
ggplot(df,
aes(x = name, y = value)) +
geom_col(aes(fill = -value)) +
coord_flip() +
facet_wrap(~group)+
theme(legend.position = "none")
This could be an alternative approach:
p <- ggplot(df,
aes(x = name, y = value)) +
geom_col(aes(fill = -value)) +
coord_flip() +
facet_grid(group~., space = "free", scales = "free") +
theme(legend.position = "none")

Plotting multiple Pie Charts with label in one plot

I came across this question the other day and tried to re-create it for myself. ggplot, facet, piechart: placing text in the middle of pie chart slices
. My data is in a very similar format, but sadly the accepted answer did not help, hence why I am re posting.
I essentially want to create the accepted answer but with my own data, yet the issue I run into is that coord_polar does not support free scale. Using the first answer:
I tried it using the second version of the answer, with the ddplyr version, but I also do not get my desired output. Using the second answer:
Clearly none of these has the desired effect. I would prefer to create one as with size pie charts, but only showed four as an example, follows: .
This I did in excel, but with one legend, and no background grid.
Code
title<-c(1,1,2,2,3,3,4,4,5,5,6,6)
type<-c('A','B','A','B','A','B','A','B','A','B','A','B')
value<-c(0.25,0.75,0.3,0.7,0.4,0.6,0.5,0.5,0.1,0.9,0.15,0.85)
piec<-data.frame(title,type,value)
library(tidyverse)
p1<-ggplot(data = piec, aes(x = "", y = value, fill = type)) +
geom_bar(stat = "identity") +
geom_text(aes(label = value), position = position_stack(vjust = 0.5)) +
coord_polar(theta = "y")
#facet_grid(title ~ ., scales = "free")
p1
piec <- piec %>% group_by(title) %>% mutate(pos=cumsum(value)-0.5*value)
p2<-ggplot(data = piec) +
geom_bar(aes(x = "", y = value, fill = type), stat = "identity") +
geom_text(aes(x = "", y = pos, label = value)) +
coord_polar(theta = "y")
#facet_grid(Channel ~ ., scales = "free")
p2
You don't have to supply different y values for geom_text and geom_bar (use y = value for both of them). Next you have to specify position in geom_text. Finally, remove scales from facets.
library(ggplot2)
title<-c(1,1,2,2,3,3,4,4,5,5,6,6)
type<-c('A','B','A','B','A','B','A','B','A','B','A','B')
value<-c(0.25,0.75,0.3,0.7,0.4,0.6,0.5,0.5,0.1,0.9,0.15,0.85)
piec<-data.frame(title,type,value)
ggplot(piec, aes("", value, fill = type)) +
geom_bar(stat = "identity", color = "white", size = 1) +
geom_text(aes(label = paste0(value * 100, "%")),
position = position_stack(vjust = 0.5),
color = "white", size = 3) +
coord_polar(theta = "y") +
facet_wrap(~ title, ncol = 3) +
scale_fill_manual(values = c("#0048cc", "#cc8400")) +
theme_void()

Make overlapping histogram in with geom_histogram

I am trying to make an overlapping histogram like this:
ggplot(histogram, aes = (x), mapping = aes(x = value)) +
geom_histogram(data = melt(tpm_18_L_SD), breaks = seq(1,10,by = 1),
aes(y = 100*(..count../sum(..count..))), alpha=0.2) +
geom_histogram(data = melt(tpm_18_S_SD), breaks = seq(1,10,by = 1),
aes(y = 100*(..count../sum(..count..))), alpha=0.2) +
geom_histogram(data = melt(tpm_18_N_SD), breaks = seq(1,10,by = 1),
aes(y = 100*(..count../sum(..count..))), alpha=0.2) +
facet_wrap(~variable, scales = 'free_x') +
ylim(0, 20) +
ylab("Percentage of Genes") +
xlab("Standard Deviation")
My code can only make them plot side by side and I would like to also make them overlap. Thank you! I based mine off of the original post where this came from but it did not work for me. It was originally 3 separate graphs which I combined with grid and ggarrange. It looks like this right now.
Here is the code of the three separate graphs.
SD_18_L <- ggplot(data = melt(tpm_18_L_SD), mapping = aes(x = value)) +
geom_histogram(aes(y = 100*(..count../sum(..count..))), breaks = seq(1, 10, by = 1)) +
facet_wrap(~variable, scales = 'free_x') +
ylim(0, 20) +
ylab("Percentage of Genes") +
xlab("Standard Deviation")
SD_18_S <- ggplot(data = melt(tpm_18_S_SD), mapping = aes(x = value)) +
geom_histogram(aes(y = 100*(..count../sum(..count..))), breaks = seq(1, 10, by = 1)) +
facet_wrap(~variable, scales = 'free_x') +
ylim(0, 20) +
ylab("Percentage of Genes") +
xlab("Standard Deviation")
SD_18_N <- ggplot(data = melt(tpm_18_N_SD), mapping = aes(x = value)) +
geom_histogram(aes(y = 100*(..count../sum(..count..))), breaks = seq(1, 10, by = 1)) +
facet_wrap(~variable, scales = 'free_x') +
ylim(0, 20) +
ylab("Percentage of Genes") +
xlab("Standard Deviation")
What my graphs look like now:
ggplot expects dataframes in a long format. I'm not sure what your data looks like, but you shouldn't have to call geom_histogram for each category. Instead, get all your data into a single dataframe (you can use rbind for this) in long format (what you're doing already with melt) first, then feed it into ggplot and map fill to whatever your categorical variable is.
Your call to facet_wrap is what puts them in 3 different plots. If you want them all on the same plot, take that line out.
An example using the iris data:
ggplot(iris, aes(x = Sepal.Length, fill = Species)) +
geom_histogram(alpha = 0.6, position = "identity")
I decreased alpha in geom_histogram so you can see where colors overlap, and added position = "identity" so observations aren't being stacked. Hope that helps!

ggplot2 pie chart bad position of labels

Sample data
data <- data.frame(Country = c("Mexico","USA","Canada","Chile"), Per = c(15.5,75.3,5.2,4.0))
I tried set position of labels.
ggplot(data =data) +
geom_bar(aes(x = "", y = Per, fill = Country), stat = "identity", width = 1) +
coord_polar("y", start = 0) +
theme_void()+
geom_text(aes(x = 1.2, y = cumsum(Per), label = Per))
But pie chart actually look like:
You have to sort the data before calculating the cumulative sum. Then, you can optimize label position, e.g. by subtracting half of Per:
library(tidyverse)
data %>%
arrange(-Per) %>%
mutate(Per_cumsum=cumsum(Per)) %>%
ggplot(aes(x=1, y=Per, fill=Country)) +
geom_col() +
geom_text(aes(x=1,y = Per_cumsum-Per/2, label=Per)) +
coord_polar("y", start=0) +
theme_void()
PS: geom_col uses stat_identity by default: it leaves the data as is.
Or simply use position_stack
data %>%
ggplot(aes(x=1, y=Per, fill=Country)) +
geom_col() +
geom_text(aes(label = Per), position = position_stack(vjust = 0.5))+
coord_polar(theta = "y") +
theme_void()
From the help:
# To place text in the middle of each bar in a stacked barplot, you
# need to set the vjust parameter of position_stack()

Resources