How to assign levels to nodes in multi-level sankey diagram? - r

I am trying to build interactive multi-level sankey diagram using R. I can't find the solution how to assign the levels to nodes. For example, a1 node should be on the second level in chart but not in the fifth. It seems that the package assigns to the last node in the chain the rightmost position, which is not preferable in my case.
I tried different packages like echarts4r, networkD3, ggvis but it seems that these packages doesn't provide the functionality to manage levels in graph.
If you know how to solve this issue, please, share.
library('networkD3')
library('echarts4r')
library('googleVis')
sankey <- data.frame(
source = c("a", "a", "b", "c", "d", "c"),
target = c("b", "a1", "c", "d", "e", "e"),
value = ceiling(rnorm(6, 10, 1)),
stringsAsFactors = FALSE
)
# googleVis solution
plot(gvisSankey(sankey, from = 'source', to = 'target', weight = 'value'))
# echarts4r solution
sankey %>%
e_charts() %>%
e_sankey(source, target, value, focusNodeAdjacency = 'allEdges')
# networkD3 solution
nodes <- data.frame(name = c("a", "a1", "b", "c", "d", "e"))
links <- data.frame(
source = c(0, 0, 2, 3, 4, 3),
target = c(2, 1, 3, 4, 5, 5),
value = ceiling(rnorm(6, 10, 1))
)
sankeyNetwork(Links = links,
Nodes = nodes,
Source = "source",
Target = "target", Value = "value", NodeID = "name",
fontSize = 12, nodeWidth = 30, sinksRight = TRUE)

Using networkd3, change sinksRight = TRUE to sinksRight = FALSE
library('networkD3')
sankey <- data.frame(
source = c("a", "a", "b", "c", "d", "c"),
target = c("b", "a1", "c", "d", "e", "e"),
value = ceiling(rnorm(6, 10, 1)),
stringsAsFactors = FALSE
)
sankeyNetwork(Links = links,
Nodes = nodes,
Source = "source",
Target = "target", Value = "value", NodeID = "name",
fontSize = 12, nodeWidth = 30, sinksRight = FALSE)

Related

R ggplot legend with Waffle chart

library(tidyverse)
library(waffle)
df_2 <- structure(list(group = c(2, 2, 2, 1, 1, 1),
parts = c("A", "B", "C", "A", "B", "C"),
values = c(1, 39, 60, 14, 15, 71)), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
df_2 %>% ggplot(aes(label = parts)) +
geom_pictogram(
n_rows = 10, aes(color = parts, values = values),
family = "fontawesome-webfont",
flip = TRUE
) +
scale_label_pictogram(
name = "Case",
values = c("male"),
breaks = c("A", "B", "C"),
labels = c("A", "B", "C")
) +
scale_color_manual(
name = "Case",
values = c("A" = "red", "B" = "green", "C" = "grey85"),
breaks = c("A", "B", "C"),
labels = c("A", "B", "C")
) +
facet_grid(~group)
With the above code, I got the legend what I expected:
However, when I replaced df_2 with the following df_1 dataframe, I was unable to combine two legends.
df_1 <- structure(list(group = c(2, 2, 2, 1, 1, 1),
parts = c("A", "B", "C", "A", "B", "C"),
values = c(0, 0, 100, 0, 0, 100)),
row.names = c(NA,-6L), class = c("tbl_df", "tbl", "data.frame"))
I kind of know the cause of the problem (0 values) but I would like to keep the legend the same as the graph above. Any suggestions would be appreciated.
To make it clear, the package "waffle" referred to here is not the CRAN package "waffle", but the GitHub-only package:
remotes::install_github("hrbrmstr/waffle")
library(waffle)
You will also need a way of displaying the pictograms, such as:
library(emojifont)
load.fontawesome()
Now, as with any other discrete scale, if you want to add values that are not present in the (post-stat) data, you need to use the limits argument:
df_1 %>% ggplot(aes(label = parts)) +
geom_pictogram(
n_rows = 10, aes(color = parts, values = values),
family = "fontawesome-webfont",
flip = TRUE
) +
scale_label_pictogram(
name = "Case",
values = c("male"),
breaks = c("A", "B", "C"),
labels = c("A", "B", "C"),
limits = c("A", "B", "C")
) +
scale_color_manual(
name = "Case",
values = c("A" = "red", "B" = "green", "C" = "grey85"),
breaks = c("A", "B", "C"),
labels = c("A", "B", "C")
) +
facet_grid(~group)
It is a bit tricky, but what you could do is say let's add 1 to all values so it will plot it like before. But using ggplot_build to remove from each case one row to get it in the right amount like this:
library(tidyverse)
library(waffle)
library(ggplot2)
library(dplyr)
library(emojifont)
library(waffle)
library(extrafont)
p <- df_1 %>% ggplot(aes(label = parts)) +
geom_pictogram(
n_rows = 10, aes(color = parts, values = values+1),
family = "fontawesome-webfont",
flip = TRUE
) +
scale_label_pictogram(
name = "Case",
values = c("male"),
breaks = c("A", "B", "C"),
labels = c("A", "B", "C")
) +
scale_color_manual(
name = "Case",
values = c("A" = "red", "B" = "green", "C" = "grey85"),
breaks = c("A", "B", "C"),
labels = c("A", "B", "C")
) +
facet_grid(~group)
q <- ggplot_build(p)
q$data[[1]] <- q$data[[1]] %>%
group_by(PANEL) %>%
slice(4:n())
q <- ggplot_gtable(q)
plot(q)
Created on 2022-10-20 with reprex v2.0.2

Sankey Diagram with networkD3 package miximises CPU usage and does not show the diagram

Most times I try plot a Sankey diagram in R my CPU is maximised and the diagram does not show. Does anyone know what could be causing this? My computer is decent with i7-7600U CPU # 2.80GHz 2.90GHZ. To get the CPU back down again I have been having to restart R.
For example I was using this example code
library("networkD3")
myDf <- list(
nodes=data.frame(name=c( "A", "B", "C", "D", "E",
"V", "W", "X", "Y", "Z")),
links=data.frame(source=as.integer(c(0, 1, 2, 3, 3, 4, 4)),
target=as.integer(c(7, 6, 7, 8, 7, 5, 9)),
value = c(1, 4, 1, 5, 1, 5, 3)
)
)
sankeyNetwork(Links = myDf$links, Nodes = myDf$nodes, Source = "source",
Target = "target", Value = "value", NodeID = "name",
units = "TWh", fontSize = 25, nodeWidth = 30,
fontFamily = "sans-serif", iterations = 0)
This could be an issue with your package or R Studio configuration.
Try saving the sankeyNetwork as an HTML file using the saveNetwork() command from the same package (networkD3) and opening that.
e.g.
library("networkD3")
myDf <- list(
nodes=data.frame(name=c( "A", "B", "C", "D", "E",
"V", "W", "X", "Y", "Z")),
links=data.frame(source=as.integer(c(0, 1, 2, 3, 3, 4, 4)),
target=as.integer(c(7, 6, 7, 8, 7, 5, 9)),
value = c(1, 4, 1, 5, 1, 5, 3)
)
)
sk <- sankeyNetwork(Links = myDf$links, Nodes = myDf$nodes, Source = "source",
Target = "target", Value = "value", NodeID = "name",
units = "TWh", fontSize = 25, nodeWidth = 30,
fontFamily = "sans-serif", iterations = 0)
saveNetwork(sk,file = "sk.html")

Most elegant way to convert lists into igraph object for plotting

I am new to igraph and it seems to be a very powerful (and therefore also complex) package.
I tried to convert the following lists into an igraph object.
graph <- list(s = c("a", "b"),
a = c("s", "b", "c", "d"),
b = c("s", "a", "c", "d"),
c = c("a", "b", "d", "e", "f"),
d = c("a", "b", "c", "e", "f"),
e = c("c", "d", "f", "z"),
f = c("c", "d", "e", "z"),
z = c("e", "f"))
weights <- list(s = c(3, 5),
a = c(3, 1, 10, 11),
b = c(5, 3, 2, 3),
c = c(10, 2, 3, 7, 12),
d = c(15, 7, 2, 11, 2),
e = c(7, 11, 3, 2),
f = c(12, 2, 3, 2),
z = c(2, 2))
Interpretation is as follows: s is the starting node, it links to nodes a and b. The edges are weighted 3 for s to a and 5 for s to b and so on.
I tried all kinds of functions from igraph but only got all kinds of errors. What is the most elegant and easy way to convert the above into an igraph object for plotting the graph?
Create an edgelist and then a graph from that. Assign the weights and plot it.
set.seed(123)
e <- as.matrix(stack(graph))
g <- graph_from_edgelist(e)
E(g)$weight <- stack(weights)[[1]]
plot(g, edge.label = E(g)$weight)

Removing "unused" nodes in sankey network

I am trying to build a sankey network.
This is my data and code:
library(networkD3)
nodes <- data.frame(c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "D", "E", "N", "O", "P", "Q", "R"))
names(nodes) <- "name"
nodes$name = as.character(nodes$name)
links <- data.frame(matrix(
c(0, 2, 318.167,
0, 3, 73.85,
0, 4, 51.1262,
0, 5, 6.83333,
0, 6, 5.68571,
0, 7, 27.4167,
0, 8, 4.16667,
0, 9, 27.7381,
1, 10, 627.015,
1, 3, 884.428,
1, 4, 364.211,
1, 13, 12.33333,
1, 14, 9,
1, 15, 37.2833,
1, 16, 9.6,
1, 17, 30.5485), nrow=16, ncol=3, byrow = TRUE))
colnames(links) <- c("source", "target", "value")
links$source = as.integer(links$source)
links$target = as.integer(links$target)
links$value = as.numeric(links$value)
sankeyNetwork(Links = links, Nodes = nodes, Source = "source",
Target = "target", Value = "value", NodeID = "name",
fontSize = 12, fontFamily = 'Arial', nodeWidth = 20)
The problem is that A and B only have common links to D and E.
Although the links are correctly displayed, D and E are also shown at the right-bottom.
How can I avoid this ?
Note: If I specify
nodes <- data.frame(c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "N", "O", "P", "Q", "R"))
no network at all is created.
Nodes must be unique, see below example. I removed repeated nodes: "D" and "E", then in links, I removed links that reference to nodes that do not exist. We have only 16 nodes, zero based 0:15. And in your links dataframe, you have last 2 rows referencing to 16 and 17.
Or as #CJYetman (networkD3 author) comments:
Another way to say it... every node that is in the nodes data frame will be plotted, even if it has the same name as another node, because the index is technically the unique id.
library(networkD3)
nodes <- data.frame(name = c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "N", "O", "P", "Q", "R"),
ix = 0:15)
links <- data.frame(matrix(
c(0, 2, 318.167,
0, 3, 73.85,
0, 4, 51.1262,
0, 5, 6.83333,
0, 6, 5.68571,
0, 7, 27.4167,
0, 8, 4.16667,
0, 9, 27.7381,
1, 10, 627.015,
1, 3, 884.428,
1, 4, 364.211,
1, 13, 12.33333,
1, 14, 9,
1, 15, 37.2833), nrow=14, ncol=3, byrow = TRUE))
colnames(links) <- c("source", "target", "value")
sankeyNetwork(Links = links, Nodes = nodes, Source = "source",
Target = "target", Value = "value", NodeID = "name",
fontSize = 12, fontFamily = 'Arial', nodeWidth = 20)

Render multiple transition plots on one page (Gmisc)

I wonder if there is a way to arrange multiple of the nice transition plots of the Gmisc package on one page (e.g. two next to each other or two-by-two)? I tried various common approaches (e.g. par(mfrow = c(2,2)) and grid.arrange()) but was not successful thus far. I would appreciate any help. Thanks!
library(Gmisc)
data.1 <- data.frame(source = c("A", "A", "A", "B", "B", "C", "C"),
target = c("A", "B", "C", "B", "C", "C", "C"))
data.2 <- data.frame(source = c("D", "D", "E", "E", "E", "E", "F"),
target = c("D", "E", "D", "E", "F", "F", "F"))
transitions.1 <- getRefClass("Transition")$new(table(data.1$source, data.1$target), label = c("Before", "After"))
transitions.2 <- getRefClass("Transition")$new(table(data.2$source, data.2$target), label = c("Before", "After"))
# wish to render transition 1 and transition 2 next to each other
transitions.1$render()
transitions.2$render()
This was actually a bug prior to the 1.9 version (uploading to CRAN when writing this, available now from GitHub). What you need to do is use the grid::viewport system:
library(grid)
grid.newpage()
pushViewport(viewport(name = "basevp", layout = grid.layout(nrow=1, ncol=2)))
pushViewport(viewport(layout.pos.row = 1, layout.pos.col = 1))
transitions.1$render(new_page = FALSE)
popViewport()
pushViewport(viewport(layout.pos.row = 1, layout.pos.col = 2))
transitions.2$render(new_page = FALSE)

Resources