How to reorder facet_wrap based on two variables - r

I need to make a bar plot based on two variables (points and type) with fill.
Below is a minimal example, I would like to see the points ranking by guard points and ranking by points as guard or forward.
I tried ~reorder(names, -c(type, points)) but it doesn't work.
name <- c("James Harden","James Harden","Lebron James","Lebron James","Lebron James","Kawhi Leonerd","Kawhi Leonerd","Klay Thompson","Steph Curry","Kevin Durant","Kevin Durant","Chris Paul","Chris Paul")
team <- c("HOU","OKC","LAL","MIA","CLE","SAS","TOR","GSW","GSW","GSW","OKC","HOU","LAC")
points <- c(2000,12000,2000,10000,20000,7000,2000,14000,20000,6000,18000,4000,14000)
type <- c("G","G","F","G","F","G","G","G","G","F","F","G","G")
nba <- data.frame(name,team,points,type)
nba <- nba %>% arrange(desc(type))
ggplot(nba, aes(x = type, y = points, fill = team)) +
geom_bar(stat = 'identity', position = 'stack', color = 'black') +
facet_wrap(~reorder(name,-points), ncol = 1, strip.position = "top") +
coord_flip() + theme_minimal() +
labs(x = "players", y = "points", title = "Rank by points as Guard")
If it's ranked by points as guard, I would like to see Steph Curry ranks top, Chris Paul at second, James Harden and Klay tied at third, Lebron at fifth, Kawhi at sixth, and KD at the bottom.
If it's ranked by points as either guard or forward, I'd like to see Lebron at top, KD second, so on and so forth.

You can sort it for points as guard by adding a helper column. Look below;
library(ggplot2)
library(dplyr)
nba %>%
mutate(guardpoints = points * (type=="G")) %>%
ggplot(aes(x = type, y = points, fill = team)) +
geom_bar(stat = 'identity', position = 'stack', color = 'black') +
facet_wrap(~reorder(name, -guardpoints, sum), ncol = 1, strip.position = "top") +
coord_flip() + theme_minimal() +
labs(x = "players", y = "points", title = "Rank by points as Guard")
nba %>%
ggplot(aes(x = type, y = points, fill = team)) +
geom_bar(stat = 'identity', position = 'stack', color = 'black') +
facet_wrap(~reorder(name, -points, sum), ncol = 1, strip.position = "top") +
coord_flip() + theme_minimal() +
labs(x = "players", y = "points", title = "Rank by points")
Created on 2019-06-04 by the reprex package (v0.3.0)

Related

how to add a legend in a ggplot?

enter image description hereI am having a problem with my ggplot that i cannot insert a legend. I just want to show the total number of facilities per region (manually).
Here is my code:
Note: my csv file has 17,333 IDs, I was thinking maybe that is why but I'm not really sure so.
library(ggplot2)
library(dplyr)
library(ggthemes)
library(tidyverse)
doh = read.csv("doh.csv")
doh %>%
ggplot(aes( y = region, color = region)) +
geom_bar(position = "identity", size = 0.7, alpha = 0.8, fill = "#28d1eb", colour="black") +
labs(title = "Total Number of COVID-19 Facilities per Region",
x = "Count",
y = "Region") +
theme_minimal() +
theme(plot.title = element_text(lineheight=6, face="bold", color="black",size=15))
I have tried inserting a legend in my code but it isn't working and I'm not sure where I went wrong.
code:
library(ggplot2)
library(dplyr)
library(ggthemes)
library(tidyverse)
doh = read.csv("doh.csv")
doh %>%
ggplot(aes( y = region, fill = region, color = region)) +
geom_bar(position = "identity", size = 1.0, alpha = 0.8, fill = "#28d1eb", colour="black") +
labs(title = "Total Number of COVID-19 Facilities per Region") +
theme_minimal() +
theme(plot.title = element_text(lineheight=6, face="bold", color="black",size=15)) +
barplot(data,
col = c("#f2b50c", "#d96fe3")) +
legend("topright",
legend = c("REGION XIII (CARAGA)"))
PS: I only included the "REGION XIII (CARAGA)" because I just want to see if its working but its not.
enter image description here
Thanks in adv!
Here is a way. Count the regions before piping to ggplot.
The example below uses the data set diamonds, substitute region for clarity and the code should work.
suppressPackageStartupMessages({
library(ggplot2)
library(dplyr)
})
data(diamonds)
diamonds %>%
select(clarity) %>%
count(clarity, name = "Count") %>%
ggplot(aes(x = clarity, y = Count, fill = clarity)) +
geom_col(alpha = 0.8, fill = "#28d1eb", colour = "black") +
geom_text(aes(label = Count), hjust = -0.1) +
coord_flip() +
labs(title = "Total Number of COVID-19 Facilities per Region") +
theme_minimal(base_size = 15) +
theme(plot.title = element_text(lineheight = 6, face = "bold", color="black"))
Created on 2022-11-21 with reprex v2.0.2

How to graph two different columns on one ggplot?

I am trying to plot one column by Date (different color points for each animal category) and on the same graph, plot a second column by Date as well. The second column has entries for the days but only for certain categories, Large Dog. There is no adoption_with_discount for small or medium dogs (please see the reproducible example data set, example_data). When I plot them separately they visualize fine but not when plotted together. I thought I would just overlay a separate geom but that is not working.
I want to combine the two plots into one. My goal is for the points plot to have the line graph on top of it. I am trying to visualize the adoption as points colored by animal and put a line on the same graph of adoption_with_discount.
Thank you for your help!
# Make example -----------------------------------------------------------
# Here is an example data set
# You can see in the `adoption_with_discount` the values I want to add as a line.
library(lubridate)
library(tidyverse)
example_days <- data.frame(Date = c(seq.Date(from = as.Date('2022-03-01'), to = as.Date('2022-04-30'), by = 'days')))
example_small <-
example_days %>%
mutate(animal = "Small Dog")
a <-sample(100:150, nrow(example_small), rep = TRUE)
example_small <-
example_small %>%
mutate(adoption = a,
adoption_with_discount = NA)
example_med <-
example_days %>%
mutate(animal = "Medium Dog")
b <-sample(150:180, nrow(example_med), rep = TRUE)
example_med <-
example_med %>%
mutate(adoption = b,
adoption_with_discount = NA)
example_large <-
example_days %>%
mutate(animal = "Large Dog")
c <-sample(150:200, nrow(example_large), rep = TRUE)
example_large <-
example_large %>%
mutate(adoption = c)
example_large <-
example_large %>%
mutate(adoption_with_discount = adoption - 15)
example_data <- rbind(example_small, example_med, example_large)
# Plot --------------------------------------------------------------------
ggplot(data = example_data) +
geom_point(mapping = aes(x = Date,
y = adoption,
color = animal)) +
ggtitle("Dog Adoption by Size") +
labs(x = "Date", y = "Adoption Fee") +
scale_y_continuous(labels = scales::dollar) +
theme(axis.text.x = element_text(angle = 45))
# Plot with Fee -----------------------------------------------------------
# This is where the problem is occurring
# When I want to add a line that plots the adoption with discount by day
# on top of the points, it does not populate.
ggplot(data = example_data) +
geom_point(mapping = aes(x = Date,
y = adoption,
color = animal)) +
geom_line(mapping = aes(x = Date,
y = adoption_with_discount),
color = "black") +
ggtitle("Dog Adoption by Size with Discount Included") +
labs(x = "Date", y = "Adoption Fee") +
scale_y_continuous(labels = scales::dollar) +
theme(axis.text.x = element_text(angle = 45))
# See if just Discount will Plot -----------------------------------------
#This plots separately
ggplot(data = example_large) +
geom_line(mapping = aes(x = Date,
y = adoption_with_discount),
color = "black") +
ggtitle("Discount") +
labs(x = "Date", y = "Adoption Fee") +
scale_y_continuous(labels = scales::dollar) +
theme(axis.text.x = element_text(angle = 45))
While subsetting is an option to fix the issue, the reason why no line is plotted is simply the missing grouping, i.e. in geom_line you are trying to plot observations for all three dog types as one group or line. However, because of the NAs no line will show up. An easy option to solve that would be to explicitly map animal on the group aes. Additionally I added na.rm=TRUE to silent the warning about removed NAs. Finally I right aligned your axis labels by adding hjust=1:
library(ggplot2)
ggplot(data = example_data) +
geom_point(mapping = aes(
x = Date,
y = adoption,
color = animal
)) +
geom_line(
mapping = aes(
x = Date,
y = adoption_with_discount,
group = animal
),
color = "black",
na.rm = TRUE
) +
ggtitle("Dog Adoption by Size with Discount Included") +
labs(x = "Date", y = "Adoption Fee") +
scale_y_continuous(labels = scales::dollar) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Based on discussion here I found that you can use subset argument in the aes of geom_line to select values that are not NAs in adoption_with_discount column.
ggplot(data = example_data) +
geom_point(mapping = aes(x = Date,
y = adoption,
color = animal)) +
geom_line(mapping = aes(x = Date,
y = adoption_with_discount),
color = "black") +
ggtitle("Dog Adoption by Size with Discount Included") +
labs(x = "Date", y = "Adoption Fee") +
scale_y_continuous(labels = scales::dollar) +
theme(axis.text.x = element_text(angle = 45)) +
geom_line(mapping = aes(x = Date,
y = adoption_with_discount,
subset = !is.na(adoption_with_discount)),
color = "black") +
ggtitle("Discount") +
labs(x = "Date", y = "Adoption Fee") +
scale_y_continuous(labels = scales::dollar) +
theme(axis.text.x = element_text(angle = 45))
The result:
It looks like it is the NA that are included in the geom_line portion that is creating the issue so you can filter those out before plotting the line:
geom_point(mapping = aes(x = Date,
y = adoption,
color = animal)) +
geom_line(data=example_data %>% filter(!is.na(adoption_with_discount)),
mapping = aes(x = Date,
y = adoption_with_discount),
color = "black") +
ggtitle("Dog Adoption by Size with Discount Included") +
labs(x = "Date", y = "Adoption Fee") +
scale_y_continuous(labels = scales::dollar) +
theme(axis.text.x = element_text(angle = 45))

Set size line plot with different y axis as addition to a stacked barplot

I would like to plot stacked barplot with added line plot that presents the overall set sizes. I'm plotting stacked barplot in ggplot2 without problems however additional line with different y axis is the difficulty. I'm using long-formated table as input, so there is no 'overall size' column.
Code to reproduce sample table:
df <- data.frame(Sample=c("S1","S2","S3","S4","S5","S6"), A=c(30,52,50,81,23,48), B=c(12,20,15,22,30,14), C=c(rep(15,6)))
df.melt <- melt(setDT(df), id.vars = "Sample", variable.name = "Group")
Head of the table:
Sample Group value
1: S1 A 30
2: S2 A 52
3: S3 A 50
4: S4 A 81
5: S5 A 23
6: S6 A 48
Code to draw stacked barplot:
ggplot(df.melt, aes(x = Sample, y = value, fill = Group)) +
geom_col(position = position_fill(reverse = TRUE)) +
theme(axis.text.x=element_text(angle=45, hjust=1), legend.title=element_blank()) +
scale_fill_brewer(palette="Set3") +
ylab("% of Total") +
scale_y_continuous(labels = percent) +
scale_x_discrete(limits = unique(df.melt$Sample))
Therefore the line would run through six stacked bars pointing the size of each set i.e. for sample S1 it would be 57 (A + B + C), and y axis labels to the right of the plot would show set size range.
You can put the data set directly in the geom. This allows you to use different data sets for each geom. Secondary axis are a bit tricky. They need to be a function of the primary axis and the data adjusted accordingly. I've used 120 as the adjustment factor.
percent <- c("0%", "25%", "50%", "75%", "100%")
set_sizes <- df %>%
rowwise %>%
mutate(Size = sum(A, B, C))
ggplot() +
geom_col(df.melt, mapping = aes(x = Sample, y = value, fill = Group),position = position_fill(reverse = TRUE)) +
geom_line(set_sizes, mapping = aes(x = Sample, y = Size / 120, group = 1)) +
scale_y_continuous(name = "% of Total", labels = percent, sec.axis = sec_axis(~ .*120, name = "Sample Size")) +
theme(axis.text.x=element_text(angle=45, hjust=1), legend.title=element_blank()) +
scale_fill_brewer(palette="Set3") +
scale_x_discrete(limits = unique(df.melt$Sample))
Alternatively, you can use cowplot to arrange two independent plots on top of each other, e.g.:
suppressMessages(invisible(lapply(c("data.table", "ggplot2", "cowplot"),
require, character.only=TRUE)))
df <- data.table(Sample=c("S1","S2","S3","S4","S5","S6"),
A=c(30,52,50,81,23,48), B=c(12,20,15,22,30,14), C=c(rep(15,6)))
df.melt <- melt(df, id.vars = "Sample", variable.name = "Group")
percent <- paste0(sprintf("%s", seq(0, 100, 25)), "%")
p1 <- ggplot(df.melt, aes(x = Sample, y = value, fill = Group)) +
geom_col(position = position_fill(reverse = TRUE)) +
theme(axis.text.x=element_text(angle=45, hjust=1), legend.title=element_blank()) +
scale_fill_brewer(palette="Set3") +
ylab("% of Total") +
scale_y_continuous(labels = percent) +
scale_x_discrete(limits = unique(df.melt$Sample))
p2 <- ggplot(df.melt[, .(value=sum(value)), by="Sample"],
aes(x = Sample, y = value, group=1)) +
geom_line() +
scale_x_discrete(labels = NULL, breaks = NULL) +
labs(x = NULL)
plot_grid(p2, NULL, p1, align="hv", nrow=3, axis='tlbr', rel_heights=c(1, -.28, 4), greedy=FALSE)
Created on 2022-02-20 by the reprex package (v2.0.1)

sankey/alluvial diagram with percentage and partial fill in R

I would like modify an existing sankey plot using ggplot2 and ggalluvial to make it more appealing
my example is from https://corybrunson.github.io/ggalluvial/articles/ggalluvial.html
library(ggplot2)
library(ggalluvial)
data(vaccinations)
levels(vaccinations$response) <- rev(levels(vaccinations$response))
ggplot(vaccinations,
aes(x = survey, stratum = response, alluvium = subject,
y = freq,
fill = response, label = response)) +
scale_x_discrete(expand = c(.1, .1)) +
geom_flow() +
geom_stratum(alpha = .5) +
geom_text(stat = "stratum", size = 3) +
theme(legend.position = "none") +
ggtitle("vaccination survey responses at three points in time")
Created on 2020-10-01 by the reprex package (v0.3.0)
Now, I would like to change this plot that it looks similar to a plot from https://sciolisticramblings.wordpress.com/2018/11/23/sankey-charts-the-new-pie-chart/, i.e. 1. change absolute to relative values (percentage) 2. add percentage labels and 3. apply partial fill (e.g. "missing" and "never")
My approach:
I think I could change the axis to percentage with something like: scale_y_continuous(label = scales::percent_format(scale = 100))
However, I am not sure about step 2. and 3.
This could be achieved like so:
Changing to percentages could be achieved by adding a new column to your df with the percentage shares by survey, which can then be mapped on y instead of freq.
To get nice percentage labels you can make use of scale_y_continuous(label = scales::percent_format())
For the partial filling you can map e.g. response %in% c("Missing", "Never") on fill (which gives TRUE for "Missing" and "Never") and set the fill colors via scale_fill_manual
The percentages of each stratum can be added to the label via label = paste0(..stratum.., "\n", scales::percent(..count.., accuracy = .1)) in geom_text where I make use of the variables ..stratum.. and ..count.. computed by stat_stratum.
library(ggplot2)
library(ggalluvial)
library(dplyr)
data(vaccinations)
levels(vaccinations$response) <- rev(levels(vaccinations$response))
vaccinations <- vaccinations %>%
group_by(survey) %>%
mutate(pct = freq / sum(freq))
ggplot(vaccinations,
aes(x = survey, stratum = response, alluvium = subject,
y = pct,
fill = response %in% c("Missing", "Never"),
label = response)) +
scale_x_discrete(expand = c(.1, .1)) +
scale_y_continuous(label = scales::percent_format()) +
scale_fill_manual(values = c(`TRUE` = "cadetblue1", `FALSE` = "grey50")) +
geom_flow() +
geom_stratum(alpha = .5) +
geom_text(aes(label = paste0(..stratum.., "\n", scales::percent(..count.., accuracy = .1))), stat = "stratum", size = 3) +
theme(legend.position = "none") +
ggtitle("vaccination survey responses at three points in time")

Aesthetics must be either length 1 or the same as the data (1): x, y, label

I'm working on some data on party polarization (something like this) and used geom_dumbbell from ggalt and ggplot2. I keep getting the same aes error and other solutions in the forum did not address this as effectively. This is my sample data.
df <- data_frame(policy=c("Not enough restrictions on gun ownership", "Climate change is an immediate threat", "Abortion should be illegal"),
Democrats=c(0.54, 0.82, 0.30),
Republicans=c(0.23, 0.38, 0.40),
diff=sprintf("+%d", as.integer((Democrats-Republicans)*100)))
I wanted to keep order of the plot, so converted policy to factor and wanted % to be shown only on the first line.
df <- arrange(df, desc(diff))
df$policy <- factor(df$policy, levels=rev(df$policy))
percent_first <- function(x) {
x <- sprintf("%d%%", round(x*100))
x[2:length(x)] <- sub("%$", "", x[2:length(x)])
x
}
Then I used ggplot that rendered something close to what I wanted.
gg2 <- ggplot()
gg2 <- gg + geom_segment(data = df, aes(y=country, yend=country, x=0, xend=1), color = "#b2b2b2", size = 0.15)
# making the dumbbell
gg2 <- gg + geom_dumbbell(data=df, aes(y=country, x=Democrats, xend=Republicans),
size=1.5, color = "#B2B2B2", point.size.l=3, point.size.r=3,
point.color.l = "#9FB059", point.color.r = "#EDAE52")
I then wanted the dumbbell to read Democrat and Republican on top to label the two points (like this). This is where I get the error.
gg2 <- gg + geom_text(data=filter(df, country=="Government will not control gun violence"),
aes(x=Democrats, y=country, label="Democrats"),
color="#9fb059", size=3, vjust=-2, fontface="bold", family="Calibri")
gg2 <- gg + geom_text(data=filter(df, country=="Government will not control gun violence"),
aes(x=Republicans, y=country, label="Republicans"),
color="#edae52", size=3, vjust=-2, fontface="bold", family="Calibri")
Any thoughts on what I might be doing wrong?
I think it would be easier to build your own "dumbbells" with geom_segment() and geom_point(). Working with your df and changing the variable refences "country" to "policy":
library(tidyverse)
# gather data into long form to make ggplot happy
df2 <- gather(df,"party", "value", Democrats:Republicans)
ggplot(data = df2, aes(y = policy, x = value, color = party)) +
# our dumbell
geom_path(aes(group = policy), color = "#b2b2b2", size = 2) +
geom_point(size = 7, show.legend = FALSE) +
# the text labels
geom_text(aes(label = party), vjust = -1.5) + # use vjust to shift text up to no overlap
scale_color_manual(values = c("Democrats" = "blue", "Republicans" = "red")) + # named vector to map colors to values in df2
scale_x_continuous(limits = c(0,1), labels = scales::percent) # use library(scales) nice math instead of pasting
Produces this plot:
Which has some overlapping labels. I think you could avoid that if you use just the first letter of party like this:
ggplot(data = df2, aes(y = policy, x = value, color = party)) +
geom_path(aes(group = policy), color = "#b2b2b2", size = 2) +
geom_point(size = 7, show.legend = FALSE) +
geom_text(aes(label = gsub("^(\\D).*", "\\1", party)), vjust = -1.5) + # just the first letter instead
scale_color_manual(values = c("Democrats" = "blue", "Republicans" = "red"),
guide = "none") +
scale_x_continuous(limits = c(0,1), labels = scales::percent)
Only label the top issue with names:
ggplot(data = df2, aes(y = policy, x = value, color = party)) +
geom_path(aes(group = policy), color = "#b2b2b2", size = 2) +
geom_point(size = 7, show.legend = FALSE) +
geom_text(data = filter(df2, policy == "Not enough restrictions on gun ownership"),
aes(label = party), vjust = -1.5) +
scale_color_manual(values = c("Democrats" = "blue", "Republicans" = "red")) +
scale_x_continuous(limits = c(0,1), labels = scales::percent)

Resources