Using segment labels in ggplot with ggrepel with smooth segments - r

This is my dataframe:
df<-structure(list(year = c(1984, 1984), team = c("Australia", "Brazil"
), continent = c("Oceania", "Americas"), medal = structure(c(3L,
3L), .Label = c("Bronze", "Silver", "Gold"), class = "factor"),
n = c(84L, 12L)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame"))
And this is my ggplot (my question is related to the annotations regard Brazil label):
ggplot(data = df)+
geom_point(aes(x = year, y = n)) +
geom_text_repel(aes(x = year, y = n, label = team),
size = 3, color = 'black',
seed = 10,
nudge_x = -.029,
nudge_y = 35,
segment.size = .65,
segment.curvature = -1,
segment.angle = 178.975,
segment.ncp = 1)+
coord_flip()
So, I have a segment divided by two parts. On both parts I have 'small braks'. How can I avoid them?
I already tried to use segment.ncp, change nudge_xor nudge_ynut its not working.
Any help?

Not really sure what is going on here. This is the best I could generate by experimenting with variations to the input values for segment... arguments.
There is some guidance at: https://ggrepel.slowkow.com/articles/examples.html which has an example with shorter leader lines, maybe that's an approach you could use.
df<-structure(list(year = c(1984, 1984), team = c("Australia", "Brazil"
), continent = c("Oceania", "Americas"), medal = structure(c(3L,
3L), .Label = c("Bronze", "Silver", "Gold"), class = "factor"),
n = c(84L, 12L)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame"))
library(ggplot2)
library(ggrepel)
ggplot(data = df)+
geom_point(aes(x = year, y = n)) +
geom_text_repel(aes(x = year, y = n, label = team),
size = 3, color = 'black',
seed = 1,
nudge_x = -0.029,
nudge_y = 35,
segment.size = 0.5,
segment.curvature = -0.0000002,
segment.angle = 1,
segment.ncp = 1000)+
coord_flip()
Created on 2021-08-26 by the reprex package (v2.0.0)

Related

Adding line with specific df value to each bar in ggplot

My df is organized this way for example:
OCCURED_COUNTRY_DESC | a | b | c | d | flagged | type | MedDRA_PT| **E** |
__________________________________________________________________________
UNITED STATES |403|1243|473|4077| yes | disp | Seizure |144.208|
__________________________________________________________________________
My data:
structure(list(OCCURED_COUNTRY_DESC = c("AUSTRALIA", "AUSTRIA",
"BELGIUM", "BRAZIL", "CANADA"), a = c(4L, 7L, 20L, 5L, 11L),
b = c(31, 27, 100, 51, 125), c = c(872, 869, 856, 871, 865
), d = c(5289, 5293, 5220, 5269, 5195), w = c(876, 876, 876,
876, 876), x = c(5320, 5320, 5320, 5320, 5320), y = c(35L,
34L, 120L, 56L, 136L), z = c(6161, 6162, 6076, 6140, 6060
), N = c(6196, 6196, 6196, 6196, 6196), k = c("0.5", "0.5",
"0.5", "0.5", "0.5"), SOR = c(0.80199821407511, 1.52042360060514,
1.21312776329214, 0.615857869962066, 0.539569644832803),
log = c(-0.318329070860348, 0.604473324558599, 0.278731499148795,
-0.699330656240263, -0.890118907611227), LL99 = c(-0.695969674426877,
0.382102954188229, 0.198127619344382, -1.00534117464748,
-1.03425468471322), UL99 = c(-0.0544058884186467, 0.763880731966007,
0.337239065783058, -0.482651467660248, -0.785935460582379
), flagged = c("no", "no", "no", "no", "yes"), type = c(NA,
NA, NA, NA, "under"), MedDRA_PT = c("Seizure", "Seizure",
"Seizure", "Seizure", "Seizure"), E = c(5.11098506333901,
4.43283582089552, 16.3984674329502, 8.43063199848168, 20.8132820019249
)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"
))
I am using ggplot2 to create a bar chart using the following piece of code:
test2 %>% #using test2 as the df
ggplot(aes(a, OCCURED_COUNTRY_DESC, fill=type)) +
geom_bar(stat="identity")+
scale_fill_manual(values = c("disp" = "#FF8C00",
"under" = "#7EC0EE",
"NA"="#EEE9E9"))+
theme_classic()+
labs(title = "Seizure",
x = "Count",
y = "")
What I would like to do is to add a black line in each bar correspondent to the E value, from the dataframe, for that country. However I haven't been successful. Can someone kindly guide me on how to achieve this?
Thanks!
One option to achieve your desired result would be via a geom_segment, where you map your E column on both the x and the xend position. "Tricky" part are the y positions. However, as a categorical axis is still a numeric axis we could add a helper column to your data which contains the numeric positions of your categorical OCCURED_COUNTRY_DESC column. This helper column could then be mapped on the y and the yend aes needed by geom_segment where we also take into account the width of the bars:
library(ggplot2)
test2$OCCURED_COUNTRY_DESC_num <- as.numeric(factor(test2$OCCURED_COUNTRY_DESC))
width <- .9 # defautlt width of bars
ggplot(test2, aes(a, OCCURED_COUNTRY_DESC, fill = type)) +
geom_bar(stat = "identity") +
geom_segment(aes(x = E, xend = E,
y = OCCURED_COUNTRY_DESC_num - width / 2,
yend = OCCURED_COUNTRY_DESC_num + width / 2),
color = "black", size = 1) +
scale_fill_manual(values = c(
"disp" = "#FF8C00",
"under" = "#7EC0EE",
"NA" = "#EEE9E9"
)) +
theme_classic() +
labs(
title = "Seizure",
x = "Count",
y = ""
)

Using ggalluvial with nodes holding different values

My data is a set of activities completed by persons. The sequence of activities a person takes varies. The data below show the activities for each step (Step1, Step2, etc). I'd like an alluvial plot that labels the activities at each step (each a different node 1, 2, 3...) What is the best approach? Here's what I have so far:
df<-structure(list(acts_activity_id = c("9928131", "445661", "686203", "687868", "688564"), Step1 = c("Unable to Reach", "Unable to Reach",
"Search Correspondence", "Unable to Reach", "Unable to Reach"), Step2 = c("Match Request", NA, "Connection Made", NA, "Match Request"
), Step3 = c("Support Group Request", NA, "Connection Contact Attempt", NA, "Support Group Request"),Step4 = c("Information Provided",
NA, "Not Available to Support", NA, "Information Provided"),
Step5 = c(NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_)), class = c("grouped_df", "tbl_df", "tbl",
"data.frame"),
row.names = c(NA, -5L),
groups = structure(list(acts_activity_id = c("9928131", "445661", "686203", "687868", "688564"), .rows = structure(list(1L, 2L, 3L, 4L, 5L), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -5L), .drop = TRUE))
df %>%
ggplot(
aes(
axis1=Step1, #each step has different values; individuals go thru different sequence of steps
axis2=Step2, axis3=Step3, axis4=Step4, axis5=Step5 ))+
geom_flow()+
geom_stratum()+
labs(title="Activity Sequence")
The first
If you have your data in this order (each column is a set of different activities), then use ggsankey:
df$acts_activity_id<-NULL
x<-df %>% ggsankey::make_long(Step1,Step2,Step3,Step4,Step5)
ggplot(x, aes(x = x, next_x = next_x,
node = node, next_node = next_node,
fill = factor(node), label = node)) +
geom_sankey(flow.alpha = 0.6, node.color = "gray30") +
geom_sankey_label(size = 3, color = "white", fill = "gray40") +
scale_fill_viridis_d() +
theme_sankey(base_size = 18) +
labs(x = NULL) +
theme(legend.position = "none",
plot.title = element_text(hjust = .5))

How to specify a certain csv in the errorbar line

I am trying to make a plot with three different csvs. In 2 of them, the columns are the same i.e. Year, GMSL and GMSLerror.
In the Frederikse file the columns are Year, GMSL, GMSLerrorlow and GMSLerrorup. How can I tell R to plot the Frederikse error using the columns GMSLerrorlow and GMSLerrorup? I tried the following but it did not work. Thanks.
p1<-files <- c("Frederikse.csv", "ChurchandWhite.csv","Hay.csv")
map_dfr(files, ~ read_csv(.x) %>%
mutate(Author = .x)) %>%
ggplot(aes(x = Time, y = GMSL, color = Author,fill=Author)) +
geom_line(size=0.6)+
theme_bw(12)+
theme(panel.grid.major = element_blank())+
theme(panel.grid.minor = element_blank())+
labs(x = "Year", y = "GMSL (mm)",color="Author")+
geom_errorbar(aes(ymin=GMSL-GMSLerror, ymax =GMSL+GMSLerror,alpha=Author))+
geom_errorbar("Frederikse.csv",(aes(ymin=GMSL-GMSLerrorlow, ymax =GMSL+GMSLerrorup,alpha=Author)))
scale_alpha_manual(values = c(0.3, 0.3, 0.8))+
scale_colour_manual(values=c("#BAB3F0","#1D3E72","#201641"))
p1
structure(list(Year = 1900:1905, GMSLerrorlow = c(-203.5572666,
-201.0185091, -212.0740442, -202.6975639, -200.1670151, -192.1312551
), GMSL = c(-173.2614421, -168.8016753, -180.389967, -170.2678322,
-168.7200709, -160.9814287), GMSLerrorup = c(-141.002807, -135.8976091,
-148.213824, -138.9305182, -137.4501224, -130.3514508)), row.names = c(NA,
6L), class = "data.frame")
structure(list(Time = 1900:1905, GMSL = c(-131.15, -130.5, -129.77,
-128.85, -128.1, -127.56), GMSLerror = c(25.32, 25.17, 25.01,
24.86, 24.7, 24.55)), row.names = c(NA, 6L), class = "data.frame")
structure(list(Time = c(1880.0417, 1880.125, 1880.2083, 1880.2917,
1880.375, 1880.4583), GMSL = c(-183, -171.1, -164.3, -158.2,
-158.7, -159.6), GMSLerror = c(24.2, 24.2, 24.2, 24.2, 24.2,
24.2)), row.names = c(NA, 6L), class = "data.frame")````
You can do this with mutate to make GMSLerrorlow column for all datasets
p1<-files <- c("Frederikse.csv", "ChurchandWhite.csv","Hay.csv")
set_names(files) %>% # give names - can use str_remove to drop `.csv` from names
map_dfr( ~ read_csv(.x), .id = "Author") %>% #use .id argument
mutate(
GMSLerrorlow = if_else(Author != "Frederikse.csv", GMSLerror, GMSLerrorlow),
GMSLerrorup = if_else(Author != "Frederikse.csv", GMSLerror, GMSLerrorup)
) %>%
ggplot(aes(x = Time, y = GMSL, color = Author,fill=Author)) +
geom_line(size=0.6)+
theme_bw(12)+
theme(panel.grid.major = element_blank())+
theme(panel.grid.minor = element_blank())+
labs(x = "Year", y = "GMSL (mm)",color="Author")+
geom_errorbar(aes(ymin=GMSL-GMSLerrorlow, ymax =GMSL+GMSLerrorup,alpha=Author))+
scale_alpha_manual(values = c(0.3, 0.3, 0.8))+
scale_colour_manual(values=c("#BAB3F0","#1D3E72","#201641"))

make subway graph include 102 topics in ggplot2 r

This is a followup from subway-style graph for word frequency across three datasets in ggplot2
I used the code in the answer from this question, but am struggling with how best to manipulate the graph to make it fits 100 unique dict entries within the subway graph without completely messing up the dict word entries on the margins.
I have tested out different amounts of words to feed into the subway graph, and found that it cannot contain more than 25 words.
I have data:
structure(list(dict = c("apple", "apple", "apple",
"mandarin", "mandarin", "mandarin", "orange", "orange", "orange", "pear"),
name = c("freq_ongov", "freq_onindiv", "freq_onmedia", "freq_ongov",
"freq_onindiv", "freq_onmedia", "freq_ongov", "freq_onindiv",
"freq_onmedia", "freq_ongov"), value = c(0, 87, 63, 0, 44,
20, 3, 27, 25, 0), rank = c(26, 85, 70, 26, 61, 42.5, 86,
47, 48, 26)), row.names = c(NA, -10L), groups = structure(list(
name = c("freq_ongov", "freq_onindiv", "freq_onmedia"), .rows = structure(list(
c(1L, 4L, 7L, 10L), c(2L, 5L, 8L), c(3L, 6L, 9L)), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, 3L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
But there are 100 rows within this data that I want to include in the following code:
leftlabels <- df$dict[df$name == "freq_ongov"]
leftlabels <- leftlabels[order(df$rank[df$name == "freq_ongov"])]
rightlabels <- df$dict[df$name == "freq_onmedia"]
rightlabels <- rightlabels[order(df$rank[df$name == "freq_onmedia"])]
ggplot(df, aes(name, rank, color = dict, group = dict)) +
geom_line(size = 4) +
geom_point(shape = 21, fill = "white", size = 4) +
scale_y_continuous(breaks = seq(max(df$rank)), labels = leftlabels,
sec.axis = sec_axis(~., breaks = seq(max(df$rank)),
labels = rightlabels)) +
scale_x_discrete(expand = c(0.01, 0)) +
guides(color = guide_none()) +
coord_cartesian(clip = "off") +
theme(axis.ticks.length.y = unit(0, "points"))
I tried changing the y.int and width of the y axis to fit in 100 words, but that only makes the y-axis longer, without changing the spacing between each word label on the y-axis, so all the words get squeezed together. Any suggestions?

Create new column with percentages in data frame

I have the following dataframe:
dput(df1)
structure(list(month = c(1, 1, 2, 2, 3, 4), transaction_type = c("AAA",
"BBB", "BBB", "CCC",
"DDD", "AAA"), max_wt_per_month = c(54.9,
51.6833333333333, 52.3333333333333, 49.4666666666667, 49.85,
48.5833333333333), min_wt_per_month = c(0, 0, 0, 0, 0, 0), avg_wt_per_month = c(8.41701333107861,
7.65211141060198, 6.44184012508551, 7.74798927613941, 7.4360566888844,
7.50611319574734), prop = c(Inf, Inf, Inf, Inf, Inf, Inf)), .Names = c("month",
"transaction_type", "max_wt_per_month", "min_wt_per_month", "avg_wt_per_month",
"prop"), row.names = c(NA, -6L), class = c("grouped_df", "tbl_df",
"tbl", "data.frame"), vars = list(month), drop = TRUE, indices = list(
0:5), group_sizes = 6L, biggest_group_size = 6L, labels = structure(list(
month = 1), row.names = c(NA, -1L), class = "data.frame", vars = list(
month), drop = TRUE, .Names = "month"))
I want to create column prop that would contain the percentage of maximum waiting time with respect to each month. If I run this code, then I get Inf values in most of the rows... (especially it is evident in the real dataset):
my_fun=function(vec){
100*as.numeric(vec[3]) /
sum(with(data_merged_transactions, ifelse(month == vec[1], max_wt_per_month, 0))) }
data_merged_transactions$prop=apply(data_merged_transactions , 1 , my_fun)
I then finally need to create the filled area chart so that each area would be a percentage out of 100%:
ggplot(data_merged_transactions, aes(x=month, y=prop, fill=transaction_type)) +
geom_area(alpha=0.6 , size=1, colour="black")
Why do I get Inf if the sum is not equal to 0?
Moreover, is it possible to create filled area chart with months being factors (Jan, Feb,etc.), not numbers? I tried to substitute month id's by month names, but then I got very thin bars instead of a filled area.
Is this what you were looking for?
library(tidyverse)
df1_tidy <- df1 %>%
group_by(month) %>%
summarise(SUM = sum(max_wt_per_month)) %>%
full_join(df1) %>%
mutate(prop = max_wt_per_month / SUM)
ggplot(data = df1_tidy,
aes(x = month,
y = prop,
fill = transaction_type)) +
geom_area(alpha = 0.6,
size = 1,
colour = "black") +
scale_x_continuous(labels = c("Jan", "Feb", "Mar", "Apr"))

Resources