Succinctly Reproducing the following graph with R and ggplot2

Succinctly Reproducing the following graph with R and ggplot2 - r

I borrowed the R code from the link and produced the following graph:
Using the same idea, I tried with my data as follows:
library(tidyverse)
library(tidytable)
library(ggforce)
library(ggtext)
library(camcorder)
library(bibliometrix)
library(bibliometrixData)
data(management)
M <- metaTagExtraction(management, "AU_CO")
CO <-
tidytable(
Country = unlist(strsplit(M$AU_CO,";"))
, year = rep(M$PY, lengths(strsplit(M$AU_CO,";")))
, nAuPerArt = rep(lengths(strsplit(M$AU_CO,";")),lengths(strsplit(M$AU_CO,";")))
)
df0 <-
CO %>%
summarise.(
frequency = length(Country)
, frequencyFractionalized = sum(1/nAuPerArt)
, .by = c(Country, year)
) %>%
arrange.(Country, year)
df1 <-
df0 %>%
mutate.(
min_year = min(year)
, n_total = sum(frequency)
, .by = Country
) %>%
mutate.(Country = fct_reorder(Country, min_year)) %>%
count(Country, n_total, min_year, year) %>%
mutate.(
a_deg = as.numeric(Country) * 2.7 + 8.5
, a = a_deg * pi/180
, x = -(year - min(year) + 10) * cos(a + pi/2.07)
, y = (year - min(year) + 10) * sin(a + pi/2.07)
, label_a = ifelse(a_deg > 180, 270 - a_deg, 90 - a_deg)
, h = ifelse(a_deg > 180, 1, 0)
, label = ifelse(h == 0,
paste0(Country, " <span style = 'color:darkorange;'>(", n_total, ")</span>"),
paste0(" <span style = 'color:darkorange;'>(", n_total, ")</span>", Country))
) %>%
arrange.(as.character(Country), year)
df1
# df1 %>% view()
Years <-
tidytable(
r = seq(
from = 10
, to = 280
, length.out = 12
)
, l = seq(from = min(df0$year), to = max(df0$year), by = 3)
) %>%
mutate.(
lt = ifelse(row_number.() %% 2 == 0, "dotted", "solid")
)
Years
f1 = "Porpora"
gg_record(dir = "temp", device = "png", width = 10, height = 11, units = "in", dpi = 320)
ggplot(data = df1) +
# Purple points
geom_point(data = df1, aes(x = x, y = y, size = n * 10), shape = 21, stroke = 0.15, fill = "purple") +
# Year circles
geom_circle(
data = Years
, aes(x0 = 0, y0 = 0, r = r, linetype = lt), size = 0.08, color = "grey50"
) +
# Year labels
geom_label(
data = Years
, aes(x = 0, y = r, label = l), size = 3, family = f1, label.padding = unit(0.25, "lines"), label.size = NA, fill = "grey95", color = "grey70") +
# Orange points (totals)
geom_point(aes(x = -290 * cos(a + pi/2.07), y = 290 * sin(a + pi/2.07), size = n_total), stat = "unique", shape = 21, stroke = 0.5, fill = "orange") +
# Family names and totals
geom_richtext(aes(x = -305 * cos(a + pi/2.07),
y = 305 * sin(a + pi/2.07),
label = label,
angle = label_a,
hjust = h), stat = "unique", Country = f1, size = 3.5,
fill = NA, label.color = NA, color = "#0b5029") +
# Annotations
annotate("text", 0, 293, label = "Total", Country = f1, color = "orange") +
scale_size_continuous(range = c(0, 8)) +
scale_color_viridis_c(option = "turbo") +
coord_fixed(clip = "off", xlim = c(-400, 400)) +
# labs(
# caption = "<span style = 'font-size:30px;'>Taxonomic publications, 1757-2021</span><br>
# Publications by family and year <span style = 'color:purple;'>(purple circles)</span>and total publications by family <span style = 'color:darkorange;'>(orange circles and text)</span><br>
# <span style = 'color:black;'>Source: World Spider Catalog - Graphic: Georgios Karamanis</span>"
# ) +
theme_void() +
theme(
legend.position = "none",
plot.background = element_rect(fill = "grey95", color = NA),
plot.margin = margin(0, 20, 20, 20),
plot.caption = element_markdown(family = f1, hjust = 0.5, margin = margin(100, 0, -100, 0), size = 14, lineheight = 1.4, color = "#0b5029")
)
which produces the following graph
However, I'm facing two issues with the produced graph.
How to get pink points aligned correctly?
How to use whole circle for country names?
Edited
Looking for more succinct and reproducible code which can be used for different data sets. Preferably looking for a generic function which can handle different data sets.

You can do calculations within a function for the x and y values to construct the ggplot which extends the circle all the way round and gives labels correct heights.
I've adapted a function to work with other datasets. This takes a dataset in a tidy format, with:
a 'year' column
one row per 'event'
a grouping variable (such as country)
I've used Nobel laurate data from here as an example dataset to show the function in practice. Data setup:
library(tidyverse)
library(ggforce)
library(ggtext)
nobel <- read_csv("archive.csv")
# Filtering in this example to create a plottable dataset
nobel_filt <- nobel %>%
mutate(country = fct_lump_n(factor(`Birth Country`), n = 50)) %>%
filter(country != "Other")
nobel_filt
#> # A tibble: 883 x 19
#> Year Category Prize Motivation `Prize Share` `Laureate ID` `Laureate Type`
#> <dbl> <chr> <chr> <chr> <chr> <dbl> <chr>
#> 1 1901 Chemistry The ~ "\"in rec~ 1/1 160 Individual
#> 2 1901 Literature The ~ "\"in spe~ 1/1 569 Individual
#> 3 1901 Medicine The ~ "\"for hi~ 1/1 293 Individual
#> 4 1901 Peace The ~ <NA> 1/2 462 Individual
#> 5 1901 Peace The ~ <NA> 1/2 463 Individual
#> 6 1901 Physics The ~ "\"in rec~ 1/1 1 Individual
#> 7 1902 Chemistry The ~ "\"in rec~ 1/1 161 Individual
#> 8 1902 Literature The ~ "\"the gr~ 1/1 571 Individual
#> 9 1902 Medicine The ~ "\"for hi~ 1/1 294 Individual
#> 10 1902 Peace The ~ <NA> 1/2 464 Individual
#> # ... with 873 more rows, and 12 more variables: Full Name <chr>,
#> # Birth Date <date>, Birth City <chr>, Birth Country <chr>, Sex <chr>,
#> # Organization Name <chr>, Organization City <chr>,
#> # Organization Country <chr>, Death Date <date>, Death City <chr>,
#> # Death Country <chr>, country <fct>
This function will then take the dataframe as an argument, along with the names of the column to group by and the column to mark time periods by. It's not super-succinct, as there is a lot of data processing going on. But hopefully within a function it's tidier.
circle_plot <- function(data, group_var, time_var) {
df_full <-
data %>%
select(group = {{group_var}}, year = {{time_var}}) %>%
mutate(group = factor(group),
group = fct_reorder(group, year, .fun = min),
order = as.numeric(group))
year_vals <-
tibble(year = as.character(seq(min(df_full$year), max(df_full$year), 1)),
level = 1 + 1:length(year))
y_vals <- year_vals %>%
bind_rows(tribble(~ year, ~ level,
"total", max(year_vals$level) + 5,
"title", max(year_vals$level) + 10
))
year_labs <-
tibble(year = as.character(floor(seq(
min(df_full$year), max(df_full$year), length.out = 10
)))) %>%
left_join(y_vals, by = "year")
x_len <- max(df_full$order)
df_ang <- df_full %>%
mutate(year = as.character(year)) %>%
count(group, order, year) %>%
left_join(y_vals, by = "year") %>%
mutate(a_deg = order * 350/x_len + 5,
x = - level * cos(a_deg * pi/180 + pi/2.07),
y = level * sin(a_deg * pi/180 + pi/2.07))
df_lab <- df_ang %>%
group_by(group, a_deg) %>%
summarise(n_total = n()) %>%
mutate(
group_name = str_trunc(as.character(group), 30),
label_a = ifelse(a_deg > 180, 270 - a_deg, 90 - a_deg),
h = ifelse(a_deg > 180, 1, 0),
label = ifelse(
h == 0,
paste0(
group_name,
" <span style = 'color:darkorange;'>(",
n_total,
")</span>"
),
paste0(
"<span style = 'color:darkorange;'>(",
n_total,
")</span> ",
group_name
)
),
year = "title"
) %>%
left_join(y_vals, by = "year") %>%
mutate(
x = -level * cos(a_deg * pi / 180 + pi / 2.07),
y = level * sin(a_deg * pi / 180 + pi / 2.07),
total_x = -(level - 5) * cos(a_deg * pi / 180 + pi / 2.07),
total_y = (level - 5) * sin(a_deg * pi / 180 + pi / 2.07)
)
ggplot() +
geom_circle(
data = year_labs,
aes(
x0 = 0,
y0 = 0,
r = level
),
size = 0.08,
color = "grey50"
) +
geom_label(
data = year_labs,
aes(x = 0, y = level, label = year),
size = 3,
label.padding = unit(0.25, "lines"),
label.size = NA,
fill = "grey95",
color = "grey70"
) +
geom_point(
data = df_ang,
aes(x = x, y = y, size = n),
shape = 21,
stroke = 0.15,
fill = "purple"
) +
geom_point(
data = df_lab,
aes(total_x, total_y,
size = n_total
),
stat = "unique",
shape = 21,
stroke = 0.5,
fill = "orange"
) +
geom_richtext(
data = df_lab,
aes(x, y,
label = label,
angle = label_a,
hjust = h
),
stat = "unique",
size = 4,
fill = NA,
label.color = NA,
color = "#0b5029"
) +
annotate(
"text",
0,
y = y_vals[y_vals$year=="total",]$level,
label = "Total",
color = "orange",
size = 4,
vjust = 0
) +
scale_size_continuous(range = c(1, 9)) +
scale_color_viridis_c(option = "turbo") +
coord_fixed(clip = "off", xlim = c(-120, 120)) +
theme_void() +
theme(
legend.position = "none",
plot.background = element_rect(fill = "grey95", color = NA),
plot.margin = margin(100, 180, 150, 180),
)
}
circle_plot(nobel_filt, `Birth Country`, Year)
# ggsave("test.png", height = 10, width = 10)
This creates the following graph:
The biggest headache (as you can see here) will be changing margins to accommodate long labels and exporting plot sizes which fit the sizes of text/numbers of year circles neatly. This might have to be experimented with across each plot. You can adapt the margin call within the function to a sensible default, or add a further theme element to the function call like so:
circle_plot(nobel_filt, `Birth Country`, Year) +
theme(plot.margin = margin(80, 150, 120, 150))
Hope that helps!
Created on 2021-12-27 by the reprex package (v2.0.1)

Related

How do you get and use data (logos) from R library (nflplotR) properly?

I've been trying to figure out how to pull logos from nflplotR and display them in the visual but can't seem to figure out how to do it properly. I'm fairly new to R, so I think there's some nuances I am not good with.
I tried to remove the "logos" variable and tried a different approach as well but had no luck. It might be the way I'm pulling the data from the library but not really sure. Any help would be appreciated.
library(nflfastR)
library(nflplotR)
library(tidyverse)
library(gghighcontrast)
library(scales)
library(ggimage)
library(ggthemes)
plot_for_data <-
function(data,
logos,
foreground_color,
background_color) {
# NOTE: Doesn't work well with facet_wrap()...need to specify
# team logos more dynamically based on the data being charted.
single_game_id <- data[1, ]$game_id
game_title_pieces <- strsplit(single_game_id, "_")[[1]]
game_year <- game_title_pieces[1]
game_week <- game_title_pieces[2]
# Get home_team and away_team and annotate on chart
home_team_abbr <- data[1, ]$home_team
away_team_abbr <- data[1, ]$away_team
# Build a data frame with coordinates of team logo to place on chart
logo_placement_data <- data.frame(
a = c(3600, 3600),
b = c(0.875, 0.125),
team_abbr = c(home_team_abbr, away_team_abbr),
stringsAsFactors = FALSE
) %>% inner_join(logos, by = "team_abbr")
#ggplot(logo_placement_data, aes(x = a, y = b)) +
# geom_nfl_logos(aes(team_abbr = c(home_team_abbr, away_team_abbr), alpha = alpha, colour = colour), width = 0.075) +
# geom_label(aes(label = teams), nudge_y = -0.35, alpha = 0.5) +
# scale_alpha_identity() +
# scale_color_identity() +
# theme_void()
plot <- ggplot(data,
aes(x = game_seconds_remaining, y = home_wp_custom)) +
# 50% reference line
geom_hline(yintercept = 0.5,
color = grey,
size = 1) +
# Reference line for each quarter (and halftime)
geom_vline(xintercept = 15 * 60, color = grey) +
geom_vline(xintercept = 30 * 60, color = grey) +
geom_vline(xintercept = 45 * 60, color = grey) +
annotate(
"text",
x = 58 * 60,
y = 0.95,
label = "Q1",
family = "InputMono",
color = grey,
size = 2
) +
annotate(
"text",
x = 43 * 60,
y = 0.95,
label = "Q2",
family = "InputMono",
color = grey,
size = 2
) +
annotate(
"text",
x = 28 * 60,
y = 0.95,
label = "Q3",
family = "InputMono",
color = grey,
size = 2
) +
annotate(
"text",
x = 13 * 60,
y = 0.95,
label = "Q4",
family = "InputMono",
color = grey,
size = 2
) +
# Win Probability
geom_line(aes(y = home_wp_post), color = grey) +
geom_line(size = 0.8) +
# Scoring events
geom_rug(
data = filter(data, away_scoring_play == 1),
color = foreground_color,
sides = "b",
size = 1.5
) +
geom_rug(
data = filter(data, home_scoring_play == 1),
color = foreground_color,
sides = "t",
size = 1.5
) +
# Draw home and away team logo
geom_image(
data = logo_placement_data,
aes(x = x, y = y, image = logos),
size = 0.08,
asp = 16 / 9
) +
# Formatting
scale_x_reverse() +
scale_y_continuous(labels = percent, limits = c(0, 1)) +
theme_high_contrast(
base_family = "InputMono",
background_color = background_color,
foreground_color = foreground_color
) +
theme(axis.text.x = element_blank(),
axis.ticks.x = element_blank()) +
labs(
title = str_interp(
"${game_year} Week ${game_week}: ${away_team_abbr} at ${home_team_abbr}"
),
subtitle = "Custom win probability model compared to nflfastR WP (grey)",
caption = "Data from nflfastR",
x = "Quarters",
y = "Home Win Probability"
)
}
for (single_game_id in game_ids) {
plot <-
plot_for_data(
filter(pbp_data, game_id == single_game_id),
logos,
foreground_color,
background_color
)
ggsave(
str_interp("wp-${single_game_id}.png"),
plot = plot,
width = 6,
height = 4
)
}

Use the nflplotR::geom_nfl_logos function instead of geom_image. Check out this vignette for different examples of how to add logos to your ggplot.
Based on your code, this should work:
ggplot(logo_placement_data, aes(x = a, y = b)) +
geom_nfl_logos(aes(team_abbr = team_abbr), width = 0.075) +
geom_label(aes(label = teams), nudge_y = -0.35, alpha = 0.5) +
scale_alpha_identity() +
scale_color_identity() +
theme_void()

How to get this plot in R

I have similar data like this :
Dat<-read.table (text=" Class Value
0 8
1 10
0 9
2 12
2 13
1 10
3 15
2 12
3 12
4 17
4 19
1 13
2 12
4 10
", header=TRUE)
A= 0, B=1, C=2, D=3, E=4
I have tried to sketch my plot as follows, but I failed to plot it using geom_line.
The sketch is hypothetical and may not reflect the hypothetical data.
The X-axis shows a range of the values according to the groups, i.e., A, B, C, D, E. For example, in the data, E has a range from 10 and 19.

Another option using geom_linerange like this:
library(dplyr)
library(ggplot2)
Dat %>%
mutate(Class = as.factor(Class)) %>%
group_by(Class) %>%
summarise(x_min = min(Value),
x_max = max(Value)) %>%
ggplot(aes(y = Class, colour = Class, group = Class)) +
geom_linerange(aes(xmin = x_min, xmax = x_max)) +
scale_y_discrete(labels=c("A","B","C","D", "E")) +
theme_classic() +
labs(x = "Value", y = "Class", color = "") +
theme(legend.position = "none")
Created on 2023-01-01 with reprex v2.0.2

Just wanted to throw a base R plot() approach in there for posterity though I see this is tagged with ggplot:
plot(Dat[,2:1], type = "n", xlim = c(5, 20), axes = FALSE, ylab = "")
sapply(unique(Dat$Class), function(x)
segments(x0 = min(Dat$Value[Dat$Class %in% x]),
x1 = max(Dat$Value[Dat$Class == x]),
y0 = Dat$Class[Dat$Class == x][1], col = x+1))
axis(1)
axis(2, at = unique(Dat$Class), labels = LETTERS[unique(Dat$Class)+1], las = 1)
mtext("Class", side = 2, las = 1, adj = 2)

A bit tongue-in-cheek, but this replicates your plot style with minimal code, and uses geom_line, as requested.
library(ggplot2)
ggplot(within(Dat, Class <- LETTERS[Class + 1]),
aes(Value, Class, color = Class)) +
geom_line(linewidth = 4, lineend = "round") +
scale_color_manual(values = c("#ed2028", "#22b14c", "#ffaec9", "#a349a4",
"#c4c4c4"), guide = "none") +
scale_x_continuous(breaks = c(5, 10, 15, 20), limits = c(5, 20)) +
theme_classic(base_size = 30)

library(tidyverse)
library(ggalt)
Dat %>%
group_by(Class) %>%
summarise(min = min(Value),
max = max(Value)) %>%
ggplot() +
geom_dumbbell(aes(
y = Class,
x = min,
xend = max,
color = factor(Class)
), size = 1) +
theme_light() +
xlab("Value")

I am working with NFL positional data provided for the 2022 NFL Big Data bowl and I can't get the package GGanimate to animate the play in R

I was working on data provided by the NFL for the "Big Data Bowl" and I could not get the points to move as the code intended. Instead, I am getting screenshots of each frame of the data, rather than it flowing together in the animation. Can anyone help identify problems in my code?
The data can be accessed here on the NFL Big Data Bowl 2022 page on Kaggle.
# Big Data Bowl
##### Understanding How to Plot Players Moving
library(tidyverse)
library(gganimate)
library(cowplot)
#Load a week to view a play
week1<-read.csv("week1.csv")
head(week1)
#Load the Plays Data
plays<-read.csv("plays.csv")
head(plays)
#Load the Player Data
player<-read.csv("players.csv")
head(player)
#----------------------- Make an example Play ---------------------------------------
# Join the three data sets together
tracking.example.merged <- week1 %>% inner_join(plays) %>% inner_join(player)
tracking.example.merged[tracking.example.merged$playResult==-8,] # Play 4298 Results in a -8 Yard Loss (Sack on Matt Ryan), we will use this as an example
#Filter to Specific Play
example.play <- tracking.example.merged %>% filter(playId == 4298)
example.play %>% select(playDescription) %>% slice(1)
#> # A tibble: 1 x 1
#> playDescription
#1 (3:50) (No Huddle, Shotgun) M.Ryan sacked at ATL 41 for -8 yards (H.Ridgeway).
#-----------------------Set the Field-----------------------------------------
## General field boundaries
xmin <- 0
xmax <- 160/3
hash.right <- 38.35
hash.left <- 12
hash.width <- 3.3
## Specific boundaries for a given play
ymin <- max(round(min(example.play$x, na.rm = TRUE) - 10, -1), 0)
ymax <- min(round(max(example.play$x, na.rm = TRUE) + 10, -1), 120)
df.hash <- expand.grid(x = c(0, 23.36667, 29.96667, xmax), y = (10:110))
df.hash <- df.hash %>% filter(!(floor(y %% 5) == 0))
df.hash <- df.hash %>% filter(y < ymax, y > ymin)
animate.play <- ggplot() +
scale_size_manual(values = c(6, 4, 6), guide = FALSE) +
scale_shape_manual(values = c(21, 16, 21), guide = FALSE) +
scale_fill_manual(values = c("#e31837", "#654321", "#002244"), guide = FALSE) +
scale_colour_manual(values = c("black", "#654321", "#c60c30"), guide = FALSE) +
annotate("text", x = df.hash$x[df.hash$x < 55/2],
y = df.hash$y[df.hash$x < 55/2], label = "_", hjust = 0, vjust = -0.2) +
annotate("text", x = df.hash$x[df.hash$x > 55/2],
y = df.hash$y[df.hash$x > 55/2], label = "_", hjust = 1, vjust = -0.2) +
annotate("segment", x = xmin,
y = seq(max(10, ymin), min(ymax, 110), by = 5),
xend = xmax,
yend = seq(max(10, ymin), min(ymax, 110), by = 5)) +
annotate("text", x = rep(hash.left, 11), y = seq(10, 110, by = 10),
label = c("G ", seq(10, 50, by = 10), rev(seq(10, 40, by = 10)), " G"),
angle = 270, size = 4) +
annotate("text", x = rep((xmax - hash.left), 11), y = seq(10, 110, by = 10),
label = c(" G", seq(10, 50, by = 10), rev(seq(10, 40, by = 10)), "G "),
angle = 90, size = 4) +
annotate("segment", x = c(xmin, xmin, xmax, xmax),
y = c(ymin, ymax, ymax, ymin),
xend = c(xmin, xmax, xmax, xmin),
yend = c(ymax, ymax, ymin, ymin), colour = "black") +
geom_point(data = example.play, aes(x = (xmax-y), y = x, shape = team,
fill = team, group = nflId, size = team, colour = team), alpha = 0.7) +
geom_text(data = example.play, aes(x = (xmax-y), y = x, label = jerseyNumber), colour = "white",
vjust = 0.36, size = 3.5) +
ylim(ymin, ymax) +
coord_fixed() +
theme_nothing() +
transition_time(frame.id) +
ease_aes('linear') +
NULL
## Ensure timing of play matches 10 frames-per-second
play.length.ex <- length(unique(example.play$frame.id))
animate(animate.play, fps = 10, nframe = play.length.ex)

try installing gifski and png packages.
after installing restart RStudio and it should work.
Please refer to these extremally useful links:
GitHub issues:
https://github.com/thomasp85/gganimate/issues/93
More...
https://gganimate.com/index.html#old-api
Vingete:
https://cran.r-project.org/web/packages/gganimate/vignettes/gganimate.html
Author:
https://community.rstudio.com/u/mattwarkentin

The total value alongside the percentage displayed on the top of Bar charts

my dataframe is here
data.frame(
stringsAsFactors = FALSE,
Subtype = c("OPC", "Hypopharynx", "Larynx"),
alive = c(88, 22, 100),
dead = c(12, 55, 17),
uncertain = c(10, 2, 2)
)
Title of the dataframe is tata4.
When I enter this code, I get a good graph with percentages. can't embed the image for some reason!
tata4 %>%
gather(key = "status", value = value, -Subtype) %>%
group_by(Subtype) %>%
mutate(value = value / sum(value) * 100, status = factor(status, c("alive", "dead", "uncertain"))) %>%
ggplot(aes(y = value, x = Subtype, fill = status)) +
geom_col(width = .6, position = position_dodge(width = .6, preserve = "total")) +
geom_text(aes(label = paste(round(value, 1), "%"), y = value + 7,group = status),
position = position_dodge(width = .6, preserve = "total"),size=3,angle=90) +
labs(y = "percentage", x = "Status") +
scale_fill_manual(values = c("alive" = "tan2", "dead" = "red", "uncertain" = "green"),
aesthetics = c("color", "fill")) +
theme(text = element_text(size = 15)) +
scale_y_continuous(labels = scales::label_percent(scale = 1, accuracy = 1),limits = c(0, 100))
What I'm trying to do is to get the absolute value as well the percentage displayed on the top of the graph. I'm trying to use the scales function as below, but the code below does not seem to work (Gives me strange values. What am I doing wrong? thanks
tata4 %>%
gather(key = "status", value = value, -Subtype) %>%
group_by(Subtype) %>%
mutate(perc = value / sum(value) * 100, status = factor(status, c("alive", "dead", "uncertain"))) %>%
ggplot(aes(y = value, x = Subtype, fill = status)) +
geom_col(width = .6, position = position_dodge(width = .6, preserve = "total")) +
geom_text(aes(Subtype,value + 5,label = glue::glue("{value}({percent(perc)})"))) +
labs(y = "percentage", x = "Status") +
scale_fill_manual(values = c("alive" = "tan2", "dead" = "red", "uncertain" = "green"),
aesthetics = c("color", "fill")) +
theme(text = element_text(size = 15))

The "strange" labels you get come from your scales::percent(perc) use inside the glue function. This transforms a value that was already manually scaled to be % again by 100 (80 to 8000%). You can either manually define the labels, or pass a scale = 1 argument. Preparing the labels already in the dataframe:
df0 <- tata4 %>%
gather(key = "status", value = value, -Subtype) %>%
group_by(Subtype) %>%
mutate(perc = value / sum(value) * 100, status = factor(status, c("alive", "dead", "uncertain")))
# this scales the percentage label twice:
df1 <- df0 %>% mutate(perc_label = glue::glue("{value}({percent(perc)})"))
# Subtype status value perc perc_label
#<chr> <fct> <dbl> <dbl> <glue>
#1 OPC alive 88 80 88(8 000%)
# option 1: manually define % labels
df2 <- df0 %>% mutate(perc_label = glue::glue("{value} ({round(perc, 1)}%)"))
#Subtype status value perc perc_label
#<chr> <fct> <dbl> <dbl> <glue>
#1 OPC alive 88 80 88 (80%)
# option 2: use scale parameter
df3 <- df0 %>% mutate(perc_label = glue::glue("{value}({scales::percent(perc, scale=1)})"))
# Subtype status value perc perc_label
# <chr> <fct> <dbl> <dbl> <glue>
#1 OPC alive 88 80 88(80.0%)
For the plot, we in addition need to map perc to the y-axis instead of value (the calculated percentage is defined in different columns in both examples).
df2 %>%
ggplot(aes(y = perc, x = Subtype, fill = status)) +
geom_col(width = .6, position = position_dodge(width = .6, preserve = "total")) +
geom_text(aes(Subtype, perc + 8, label = perc_label),
position = position_dodge(width = .6, preserve = "total"),size=3,angle=90) +
labs(y = "percentage", x = "Status") +
scale_fill_manual(values = c("alive" = "tan2", "dead" = "red", "uncertain" = "green"),
aesthetics = c("color", "fill")) +
theme(text = element_text(size = 15)) +
coord_cartesian(ylim = c(0, 100)) +
scale_y_continuous(labels = label_percent(scale = 1))

How to find if a point is inside a specific zone in a plot?

I have created the below example plot using ggplot2.
Example plot
The plot area is divided into various zones (SD, TC, CC, CCS) between the gray lines. I know the equations of each of these gray lines. I have also plotted my data What I would like to do is to find the zone where each data point is plotted, so I can divide them into categories and e.g. give them different colors for each zone. Does anyone have any idea? Thanks a lot in advance.
EDIT
I added code to reproduce a simple example.
# load libraries
library(tidyverse)
# sample data
sample_data <-
tibble(
x = c(8, 8, 8, 8, 8, 0),
y = c(5, 10, 20, 50, 100, 100)
)
# names of zones
zoning_data <-
tibble(
x = c(
18,
18,
0,
18
),
y = c(
70,
25,
600,
600
),
label = c(
"CC",
"CCS",
"SD",
"TC"
)
)
# plot
ggplot(
data = sample_data,
aes(
x,
y
)
) +
# points
geom_point() +
scale_x_continuous(
expand = c(0, 0),
minor_breaks = NULL,
breaks = c(
seq(-2, 20, by = 2)
)
) +
scale_y_continuous(
trans = "log10",
expand = c(0, 0),
minor = c(
seq(1, 10, by = 1),
seq(10, 100, by = 10),
seq(100, 1000, by = 100)
)
) +
coord_cartesian(
xlim = c(-2, 20),
ylim = c(1, 1000),
expand = c(0, 0)
) +
# equations for zoning lines
geom_function(
fun = function(x) ((x - 1.05) * 5)^1.0526,
xlim = c(1.31, 20)
) +
geom_function(
fun = function(x) ((x - 1.1) * 1.5)^1.0989,
xlim = c(2.82, 20)
) +
geom_function(
fun = function(x) x,
xlim = c(-2, 20)
) +
# zoning names
geom_text(
data = zoning_data,
aes(
x = x,
y = y,
label = label
)
) +
# theme
theme(aspect.ratio = 1)

It's difficult to answer your question without a minimal reproducible example, but perhaps you can adapt this approach to your dataset:
library(tidyverse)
# Using the 'diamonds' example dataset
data(diamonds)
diamonds %>%
ggplot(aes(x = carat, y = price)) +
geom_point() +
coord_cartesian(ylim = c(0, 20000))
# example equations:
# price = 7000 * carat - 3000
# price = 7050 * carat - 1750
# Create a new variable ("zone")
# and categorise each data point
# using mutate and case_when
diamonds %>%
mutate(zone = case_when(price <= 7000 * carat - 3000 ~ "zone 1",
price > 7000 * carat - 3000 & price < 7050 * carat - 1750 ~ "zone 2",
price >= 7050 * carat - 1750 ~ "zone 3")) %>%
ggplot(aes(x = carat, y = price)) +
geom_point(aes(colour = zone)) +
coord_cartesian(ylim = c(0, 20000))
Created on 2021-09-15 by the reprex package (v2.0.1)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Succinctly Reproducing the following graph with R and ggplot2 - r

Related

How do you get and use data (logos) from R library (nflplotR) properly?

How to get this plot in R

I am working with NFL positional data provided for the 2022 NFL Big Data bowl and I can't get the package GGanimate to animate the play in R

The total value alongside the percentage displayed on the top of Bar charts

How to find if a point is inside a specific zone in a plot?

Categories

Resources