Using ggalluvial with nodes holding different values - r

My data is a set of activities completed by persons. The sequence of activities a person takes varies. The data below show the activities for each step (Step1, Step2, etc). I'd like an alluvial plot that labels the activities at each step (each a different node 1, 2, 3...) What is the best approach? Here's what I have so far:
df<-structure(list(acts_activity_id = c("9928131", "445661", "686203", "687868", "688564"), Step1 = c("Unable to Reach", "Unable to Reach",
"Search Correspondence", "Unable to Reach", "Unable to Reach"), Step2 = c("Match Request", NA, "Connection Made", NA, "Match Request"
), Step3 = c("Support Group Request", NA, "Connection Contact Attempt", NA, "Support Group Request"),Step4 = c("Information Provided",
NA, "Not Available to Support", NA, "Information Provided"),
Step5 = c(NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_)), class = c("grouped_df", "tbl_df", "tbl",
"data.frame"),
row.names = c(NA, -5L),
groups = structure(list(acts_activity_id = c("9928131", "445661", "686203", "687868", "688564"), .rows = structure(list(1L, 2L, 3L, 4L, 5L), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -5L), .drop = TRUE))
df %>%
ggplot(
aes(
axis1=Step1, #each step has different values; individuals go thru different sequence of steps
axis2=Step2, axis3=Step3, axis4=Step4, axis5=Step5 ))+
geom_flow()+
geom_stratum()+
labs(title="Activity Sequence")
The first

If you have your data in this order (each column is a set of different activities), then use ggsankey:
df$acts_activity_id<-NULL
x<-df %>% ggsankey::make_long(Step1,Step2,Step3,Step4,Step5)
ggplot(x, aes(x = x, next_x = next_x,
node = node, next_node = next_node,
fill = factor(node), label = node)) +
geom_sankey(flow.alpha = 0.6, node.color = "gray30") +
geom_sankey_label(size = 3, color = "white", fill = "gray40") +
scale_fill_viridis_d() +
theme_sankey(base_size = 18) +
labs(x = NULL) +
theme(legend.position = "none",
plot.title = element_text(hjust = .5))

Related

geom_bar(position="dodge") and errorbar not dodging properly

So I got a little issue.... Here's my code :
ggplot(GFAPdata_numb, aes(x=Level, y=Pos.Area, fill=Statut))+
geom_bar(stat="identity", color="black", position = "dodge")+
geom_errorbar(aes(ymin=lower, ymax=higher), width=.2, position=position_dodge(.9))
And for some weird and unknown reason, my plot look like this : weird dodge
And I don't know why ! The dodge seems to have work somehow but it like looks like the "ghost" of the data are still stacked and screwing up with my errorbars...
Do you guys have any ideas what's causing that ?
Edit : I was asked to put some data with dput so here it is (first time using this function so I'm not sure I did it right)
> dput(head(GFAPdata_numb))
structure(list(Agneau = c(1L, 1L, 1L, 1L, 2L, 2L), Statut = c("terme",
"terme", "terme", "terme", "terme", "terme"), Area = c(6.53,
6.53, 6.53, 6.53, 4.93, 4.93), Level = c("Weak", "Pos", "Strong",
"Neg", "Weak", "Pos"), Values = c(6744015L, 5076648L, 787615L,
13099676L, 5356151L, 3978924L), Positivity = c(0.262331844844596,
0.197473824638087, 0.0306370160768142, 0.509557314440504, 0.275961978086681,
0.205003880155091), Pos.Area = c(0.0401733299915154, 0.0302410144928157,
0.00469173293672499, 0.0780332793936453, 0.0559760604638299,
0.041582937151134), moyenne = c(0.0382848392036753, 0.0382848392036753,
0.0382848392036753, 0.0382848392036753, 0.050709939148073, 0.050709939148073
), ecart.type = c(0.0304231534615388, 0.0304231534615388, 0.0304231534615388,
0.0304231534615388, 0.0391149666608345, 0.0391149666608345),
SEM = c(0.0152115767307694, 0.0152115767307694, 0.0152115767307694,
0.0152115767307694, 0.0195574833304173, 0.0195574833304173
), lower = c(0.00847014881136729, 0.00847014881136729, 0.00847014881136729,
0.00847014881136729, 0.0123772718204552, 0.0123772718204552
), higher = c(0.0680995295959834, 0.0680995295959834, 0.0680995295959834,
0.0680995295959834, 0.0890426064756909, 0.0890426064756909
)), row.names = c(NA, -6L), groups = structure(list(Agneau = 1:2,
.rows = structure(list(1:4, 5:6), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
Ok I think I managed to correct the issue by myself after a couple of days not working on it to refresh my brain.
I was using "Pos.Area" as my y-value (which mean the value for each of my smaples) instead of the mean of the Pos.Area to create my plot. And I guess that's why my errorbars were so wild : I had the errorbars for each values of Pos.Area
Once I changed that, the plot was way better.

Using segment labels in ggplot with ggrepel with smooth segments

This is my dataframe:
df<-structure(list(year = c(1984, 1984), team = c("Australia", "Brazil"
), continent = c("Oceania", "Americas"), medal = structure(c(3L,
3L), .Label = c("Bronze", "Silver", "Gold"), class = "factor"),
n = c(84L, 12L)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame"))
And this is my ggplot (my question is related to the annotations regard Brazil label):
ggplot(data = df)+
geom_point(aes(x = year, y = n)) +
geom_text_repel(aes(x = year, y = n, label = team),
size = 3, color = 'black',
seed = 10,
nudge_x = -.029,
nudge_y = 35,
segment.size = .65,
segment.curvature = -1,
segment.angle = 178.975,
segment.ncp = 1)+
coord_flip()
So, I have a segment divided by two parts. On both parts I have 'small braks'. How can I avoid them?
I already tried to use segment.ncp, change nudge_xor nudge_ynut its not working.
Any help?
Not really sure what is going on here. This is the best I could generate by experimenting with variations to the input values for segment... arguments.
There is some guidance at: https://ggrepel.slowkow.com/articles/examples.html which has an example with shorter leader lines, maybe that's an approach you could use.
df<-structure(list(year = c(1984, 1984), team = c("Australia", "Brazil"
), continent = c("Oceania", "Americas"), medal = structure(c(3L,
3L), .Label = c("Bronze", "Silver", "Gold"), class = "factor"),
n = c(84L, 12L)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame"))
library(ggplot2)
library(ggrepel)
ggplot(data = df)+
geom_point(aes(x = year, y = n)) +
geom_text_repel(aes(x = year, y = n, label = team),
size = 3, color = 'black',
seed = 1,
nudge_x = -0.029,
nudge_y = 35,
segment.size = 0.5,
segment.curvature = -0.0000002,
segment.angle = 1,
segment.ncp = 1000)+
coord_flip()
Created on 2021-08-26 by the reprex package (v2.0.0)

make subway graph include 102 topics in ggplot2 r

This is a followup from subway-style graph for word frequency across three datasets in ggplot2
I used the code in the answer from this question, but am struggling with how best to manipulate the graph to make it fits 100 unique dict entries within the subway graph without completely messing up the dict word entries on the margins.
I have tested out different amounts of words to feed into the subway graph, and found that it cannot contain more than 25 words.
I have data:
structure(list(dict = c("apple", "apple", "apple",
"mandarin", "mandarin", "mandarin", "orange", "orange", "orange", "pear"),
name = c("freq_ongov", "freq_onindiv", "freq_onmedia", "freq_ongov",
"freq_onindiv", "freq_onmedia", "freq_ongov", "freq_onindiv",
"freq_onmedia", "freq_ongov"), value = c(0, 87, 63, 0, 44,
20, 3, 27, 25, 0), rank = c(26, 85, 70, 26, 61, 42.5, 86,
47, 48, 26)), row.names = c(NA, -10L), groups = structure(list(
name = c("freq_ongov", "freq_onindiv", "freq_onmedia"), .rows = structure(list(
c(1L, 4L, 7L, 10L), c(2L, 5L, 8L), c(3L, 6L, 9L)), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, 3L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
But there are 100 rows within this data that I want to include in the following code:
leftlabels <- df$dict[df$name == "freq_ongov"]
leftlabels <- leftlabels[order(df$rank[df$name == "freq_ongov"])]
rightlabels <- df$dict[df$name == "freq_onmedia"]
rightlabels <- rightlabels[order(df$rank[df$name == "freq_onmedia"])]
ggplot(df, aes(name, rank, color = dict, group = dict)) +
geom_line(size = 4) +
geom_point(shape = 21, fill = "white", size = 4) +
scale_y_continuous(breaks = seq(max(df$rank)), labels = leftlabels,
sec.axis = sec_axis(~., breaks = seq(max(df$rank)),
labels = rightlabels)) +
scale_x_discrete(expand = c(0.01, 0)) +
guides(color = guide_none()) +
coord_cartesian(clip = "off") +
theme(axis.ticks.length.y = unit(0, "points"))
I tried changing the y.int and width of the y axis to fit in 100 words, but that only makes the y-axis longer, without changing the spacing between each word label on the y-axis, so all the words get squeezed together. Any suggestions?

igraph arrow.mode seems to have no effect

I have a network with some directed and some undirected edges. I'm trying to use igraph to plot it using the arrow.mode parameter, but the graph is always showing arrows with default parameters. Here's an example
Here are some data:
spearRhoP_lagged4 <- structure(list(Var1 = c("ARISA_538.9", "ARISA_538.9", "ARISA_666.4",
"ARISA_686.9", "ARISA_538.9", "ARISA_594.1"), Var2 = c("ARISA_666.4",
"ARISA_686.9", "ARISA_686.9", "ARISA_666.4", "ARISA_561.8", "ARISA_561.8"
), rho = c(0.280885191364122, 0.415365287156247, 0.614493076574831,
0.312630564055403, 0.295296877306726, 0.381890811408216), p = c(0.00206314544835896,
2.9098006351119e-06, 1.35005674822095e-13, 0.000567475872663549,
0.00116911931220592, 1.98010880043619e-05), delay = c(0, 0, 0,
1, 0, 0), fdr = c(0.0135393920048557, 7.97032347878478e-05, 2.83511917126399e-11,
0.00503534929264839, 0.00898225813036257, 0.000366902513022),
arrow = c("-", "-", "-", ">", "-", "-")), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -6L), groups = structure(list(
Var1 = c("ARISA_538.9", "ARISA_538.9", "ARISA_538.9", "ARISA_594.1",
"ARISA_666.4", "ARISA_686.9"), Var2 = c("ARISA_561.8", "ARISA_666.4",
"ARISA_686.9", "ARISA_561.8", "ARISA_686.9", "ARISA_666.4"
), .rows = list(5L, 1L, 2L, 6L, 3L, 4L)), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"), .drop = TRUE))
Then I build the graph
LaggedSpearGraph <- graph_from_data_frame(spearRhoP_lagged4)
Lastly I plot the graph, telling it that I want the arrow direction to be specified by the parameter arrow
plot(LaggedSpearGraph,
vertex.size=2,
arrow.mode = E(LaggedSpearGraph)$arrow)
I get an output that looks like this.
But what I want is a network where there is only one edge with an arrow on it.
Any suggestions?
You need to add edge as a prefix:
LaggedSpearGraph <- graph_from_data_frame(spearRhoP_lagged4, directed=T)
plot(LaggedSpearGraph,
vertex.size=10,
edge.arrow.mode = E(LaggedSpearGraph)$arrow)
See here:
https://github.com/igraph/igraph/issues/954

Create new column with percentages in data frame

I have the following dataframe:
dput(df1)
structure(list(month = c(1, 1, 2, 2, 3, 4), transaction_type = c("AAA",
"BBB", "BBB", "CCC",
"DDD", "AAA"), max_wt_per_month = c(54.9,
51.6833333333333, 52.3333333333333, 49.4666666666667, 49.85,
48.5833333333333), min_wt_per_month = c(0, 0, 0, 0, 0, 0), avg_wt_per_month = c(8.41701333107861,
7.65211141060198, 6.44184012508551, 7.74798927613941, 7.4360566888844,
7.50611319574734), prop = c(Inf, Inf, Inf, Inf, Inf, Inf)), .Names = c("month",
"transaction_type", "max_wt_per_month", "min_wt_per_month", "avg_wt_per_month",
"prop"), row.names = c(NA, -6L), class = c("grouped_df", "tbl_df",
"tbl", "data.frame"), vars = list(month), drop = TRUE, indices = list(
0:5), group_sizes = 6L, biggest_group_size = 6L, labels = structure(list(
month = 1), row.names = c(NA, -1L), class = "data.frame", vars = list(
month), drop = TRUE, .Names = "month"))
I want to create column prop that would contain the percentage of maximum waiting time with respect to each month. If I run this code, then I get Inf values in most of the rows... (especially it is evident in the real dataset):
my_fun=function(vec){
100*as.numeric(vec[3]) /
sum(with(data_merged_transactions, ifelse(month == vec[1], max_wt_per_month, 0))) }
data_merged_transactions$prop=apply(data_merged_transactions , 1 , my_fun)
I then finally need to create the filled area chart so that each area would be a percentage out of 100%:
ggplot(data_merged_transactions, aes(x=month, y=prop, fill=transaction_type)) +
geom_area(alpha=0.6 , size=1, colour="black")
Why do I get Inf if the sum is not equal to 0?
Moreover, is it possible to create filled area chart with months being factors (Jan, Feb,etc.), not numbers? I tried to substitute month id's by month names, but then I got very thin bars instead of a filled area.
Is this what you were looking for?
library(tidyverse)
df1_tidy <- df1 %>%
group_by(month) %>%
summarise(SUM = sum(max_wt_per_month)) %>%
full_join(df1) %>%
mutate(prop = max_wt_per_month / SUM)
ggplot(data = df1_tidy,
aes(x = month,
y = prop,
fill = transaction_type)) +
geom_area(alpha = 0.6,
size = 1,
colour = "black") +
scale_x_continuous(labels = c("Jan", "Feb", "Mar", "Apr"))

Resources