Isolating a "branch" in a sankey diagram using networkd3 - r

I am using sankeyNetwork() from the networkD3 package for visualizing some data. I was wondering if theres a way to "isolate" a branch from start to finish, ignoring the irrelevant links.
Example: I've got this: SankeyGot
And I want to extract this: SankeyWant
reproducible example:
set.seed(9)
df <- tibble(
source = sample(stringr::words, 5) %>% rep(2),
target = c(sample(words, 7), source[1:3]),
values = rnorm(10, 10, 7) %>% round(0) %>% abs)
nodes <- data.frame(names = unique(c(df$source, df$target)))
links <- tibble(
source = match(
df$source, nodes$names) -1,
target = match(
df$target, nodes$names) -1,
value = df$values
)
sankeyNetwork(Links = links, Nodes = nodes, Source = "source",
Target = "target", Value = "value", NodeID = "names",
iterations = 64, sinksRight = F, fontSize = 14)
I'd like to be able to filter out "name" for example and get everything that connects to that on all levels upstream and downstream - how would i go about doing this?

Calculating the paths from a node in a graph is non-trivial, but the igraph package can help with the all_simple_paths(). However, heed that warning in the help file...
Note that potentially there are exponentially many paths between two
vertices of a graph, and you may run out of memory when using this
function, if your graph is lattice-like.
(I don't know what your words vector is, so I recreated the links data.frame manually)
library(dplyr)
library(networkD3)
set.seed(9)
df <- read.csv(header = TRUE, text = "
source,target
summer,obvious
summer,structure
however,either
however,match
obvious,about
obvious,non
either,contract
either,produce
contract,paint
contract,name
")
df$values <- rnorm(10, 10, 7) %>% round(0) %>% abs()
# use graph to calculate the paths from a node
library(igraph)
graph <- graph_from_data_frame(df)
start_node <- "name"
# get nodes along a uni-directional path going IN to the start_node
connected_nodes_in <-
all_simple_paths(graph, from = start_node, mode = "in") %>%
unlist() %>%
names() %>%
unique()
# get nodes along a uni-directional path going OUT of the start_node
connected_nodes_out <-
all_simple_paths(graph, from = start_node, mode = "out") %>%
unlist() %>%
names() %>%
unique()
# combine them
connected_nodes <- unique(c(connected_nodes_in, connected_nodes_out))
# filter your data frame so it only includes links/edges that start and
# end at connected nodes
df <- df %>% filter(source %in% connected_nodes & target %in% connected_nodes)
nodes <- data.frame(names = unique(c(df$source, df$target)))
links <- tibble(
source = match(
df$source, nodes$names) -1,
target = match(
df$target, nodes$names) -1,
value = df$values
)
sankeyNetwork(Links = links, Nodes = nodes, Source = "source",
Target = "target", Value = "value", NodeID = "names",
iterations = 64, sinksRight = F, fontSize = 14)

If you code sankeyNetwork as an object you can use str(object) to identify it as a list, with a matrix called x that holds your input df
list_sankey <- sankeyNetwork(Links = links, Nodes = nodes, Source = "source", Target = "target", Value = "value", NodeID = "names", iterations = 64, sinksRight = F, fontSize = 14)
str(list_sankey)
You can then filter the x matrix to only contrain your desired input source and output target nodes
list_sankey_filter <- list_sankey
list_sankey_filter$x$links <- list_sankey_filter$x$links %>% filter(source %in% c(4, 2, 0), target %in% c(4, 2, 0, 10))
This then gives you the object below.

Related

Dataframe issue when double clicking on VisNetwork node to run a function

I have a network diagram with a fairly large amount of nodes (~600), each node having some data, including an ID and its name.
I want to be able to run a very simple function when double-clicking on a specific node.
For that purpose, I have followed the instructions from this thread.
Using the code provided:
library(shiny)
library(visNetwork)
ui <- fluidPage(
visNetworkOutput('network')
)
server <- function(input, output, session) {
getDiagramPlot <- function(nodes, edges){
v <- visNetwork(
nodes,
edges
) %>%
visPhysics(stabilization = TRUE, enabled = TRUE) %>%
visOptions(highlightNearest = list(enabled = T, degree = 1, hover = F), autoResize = TRUE, collapse = FALSE) %>%
visEdges(color = list(highlight = "red")) %>% # The colour of the edge linking nodes
visLayout(improvedLayout = TRUE) %>%
visEdges(arrows = edges$arrows) %>%
visInteraction(multiselect = F) %>%
visEvents(doubleClick = "function(nodes) {
Shiny.onInputChange('current_node_id', nodes.nodes);
;}")
return(v)
}
testFunction <- function(node_id){
print(paste("The selected node ID is:", node_id))
}
nodes <- data.frame(id = 1:3, label = 1:3)
edges <- data.frame(from = c(1,2), to = c(1,3))
output$network <- renderVisNetwork(
getDiagramPlot(nodes, edges)
)
observeEvent(input$current_node_id,{
testFunction(input$current_node_id)
})
}
shinyApp(ui, server)
The codes works well but when I replace the simple nodes and edges dataframe provided as example by my data (much larger network) then the code doesn't work anymore (nothing gets printed in the console when I double-click on any nodes).
Would anyone know why the code is not running with my data ?
Here is the adjustments that should be done to the code below:
load("NodesEdges.RData")
# nodes <- data.frame(id = 1:3, label = 1:3)
# edges <- data.frame(from = c(1,2), to = c(1,3))
Best wishes,
C.
I have tried:
adding more columns to the example nodes/edges (group, value, color, etc.) and the codes still runs well.
restricting my larger nodes/edges dataframes respectively to the "id", "label" and "from", "to" columns (same as example data) but the codes still fails.
I wonder whether the problem comes from the size of the dataframe.

how to give different color for the lines in the Sankey plot to show different groups?

I have a question on this Sankey plot in R. So basically I want to give different color for the line that connect the source and target nodes based on the variable group. Below are the codes I found from one of the R platform. Essentially the code give you the plot but the connecting line are similar in color. My question is how to give different color for the lines to know that specific group is represented in specific color.
Thank you!
Best
AD
# Libraries
library(tidyverse)
library(viridis)
library(patchwork)
library(hrbrthemes)
library(circlize)
# Load dataset from github
data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/13_AdjacencyDirectedWeighted.csv", header=TRUE)
# Package
library(networkD3)
# I need a long format
data_long <- data %>%
rownames_to_column %>%
gather(key = 'key', value = 'value', -rowname) %>%
filter(value > 0)
colnames(data_long) <- c("source", "target", "value")
data_long$target <- paste(data_long$target, " ", sep="")
data_long$group <- c(rep("A", 10), rep("B",7), rep("C", 8), rep("D", 10))
# From these flows we need to create a node data frame: it lists every entities involved in the flow
nodes <- data.frame(name=c(as.character(data_long$source), as.character(data_long$target)) %>% unique())
# With networkD3, connection must be provided using id, not using real name like in the links dataframe.. So we need to reformat it.
data_long$IDsource=match(data_long$source, nodes$name)-1
data_long$IDtarget=match(data_long$target, nodes$name)-1
# Make the Network
sankeyNetwork(Links = data_long, Nodes = nodes,
Source = "IDsource", Target = "IDtarget",
Value = "value", NodeID = "name",
sinksRight=FALSE, nodeWidth=40, fontSize=13, nodePadding=20)
Following the example from the networkD3::sankeyNetwork documentation you could add a links variable to the data and set the LinkGroup argument...
# Libraries
library(dplyr)
library(tidyr)
library(tibble)
library(networkD3)
# Load dataset from github
data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/13_AdjacencyDirectedWeighted.csv", header=TRUE)
data_long <-
data %>%
rownames_to_column() %>%
gather(key = 'key', value = 'value', -rowname) %>%
filter(value > 0)
colnames(data_long) <- c("source", "target", "value")
data_long$target <- paste(data_long$target, " ", sep="")
data_long$group <- c(rep("A", 10), rep("B",7), rep("C", 8), rep("D", 10))
# From these flows we need to create a node data frame: it lists every entities involved in the flow
nodes <- data.frame(name=c(as.character(data_long$source), as.character(data_long$target)) %>% unique())
# With networkD3, connection must be provided using id, not using real name like in the links dataframe.. So we need to reformat it.
data_long$IDsource=match(data_long$source, nodes$name)-1
data_long$IDtarget=match(data_long$target, nodes$name)-1
# Colour links
data_long$links$source <- sub(' .*', '',
data_long$nodes[data_long$links$source + 1, 'name'])
# Make the Network
sankeyNetwork(Links = data_long,
Nodes = nodes,
Source = "IDsource",
Target = "IDtarget",
Value = "value",
NodeID = "name",
sinksRight=FALSE,
nodeWidth=40,
fontSize=13,
nodePadding=20,
LinkGroup = 'source')
Created on 2021-12-02 by the reprex package (v2.0.1)

R: Display "popup" information when mouse hovers over (graph) visnetwork

I simulated some data and created a graph network in R using visnetwork:
library(igraph)
library(dplyr)
library(visNetwork)
#create file from which to sample from
x5 <- sample(1:100, 1100, replace=T)
#convert to data frame
x5 = as.data.frame(x5)
#create first file (take a random sample from the created file)
a = sample_n(x5, 1000)
#create second file (take a random sample from the created file)
b = sample_n(x5, 1000)
#combine
c = cbind(a,b)
#create dataframe
c = data.frame(c)
#rename column names
colnames(c) <- c("a","b")
#create graph
graph <- graph.data.frame(c, directed=F)
graph <- simplify(graph)
plot(graph)
fc <- fastgreedy.community(graph)
V(graph)$community <- fc$membership
library(visNetwork)
nodes <- data.frame(id = V(graph)$name, title = V(graph)$name, group = V(graph)$community)
nodes <- nodes[order(nodes$id, decreasing = F),]
edges <- get.data.frame(graph, what="edges")[1:2]
#visnet graph
visNetwork(nodes, edges) %>% visIgraphLayout(layout = "layout_with_fr") %>%
visOptions(highlightNearest = TRUE, nodesIdSelection = TRUE)
Right now, the graph only displays node information when you click on it. Suppose if each node had observed properties in the "original file". E.g.
#add some information corresponding to the original data
other_damages_in_dollars <- rnorm(1000,104,9)
location <- c("canada","usa")
location <- sample(location, 1000, replace=TRUE, prob=c(0.3, 0.7))
type_of_house <- c("single","townhome", "rental" )
type_of_house<- sample(type_of_house , 1000, replace=TRUE, prob=c(0.5, 0.3, 0.2))
#heres how the original data would have looked like
original_data = data.frame(a,b, other_damages_in_dollars, location, type_of_house)
Is there a way to add this information when you click on each node?
#visnet graph - is it possible to use the '$' operator to add these properties?
visNetwork(nodes, edges) %>% visIgraphLayout(layout = "layout_with_fr") %>%
%>% visOptions(highlightNearest = TRUE, nodesIdSelection = TRUE)visEvents(selectEdge = "function(properties) { alert(this.body.data.edges._data[properties.edges[0]].original_data$location); }") %>% visOptions(highlightNearest = TRUE, nodesIdSelection = TRUE)visEvents(selectEdge = "function(properties) { alert(this.body.data.edges._data[properties.edges[0]].original_data$type_of_house); }") %>% visOptions(highlightNearest = TRUE, nodesIdSelection = TRUE)visEvents(selectEdge = "function(properties) { alert(this.body.data.edges._data[properties.edges[0]].original_data$other_damage_in_dollars); }")
You don't need an event. This is built into many of the vis.js elements.
So, I'll start with designing the content of my tooltip. Of the 1000 rows you made of location, home types, and costs, I created a subset with the same number of rows as there are nodes. This is what will be shown in my tooltip.
newTitle = paste0("Location: ", toupper(location[1:nrow(nodes)]),
"<br>Home Type: ", type_of_house[1:nrow(nodes)],
"<br>Damage Related Costs: ",
sprintf("$%.2f", other_damages_in_dollars[1:nrow(nodes)]))
#check it; looks okay
Now I'm going to make my tooltips the titles of my nodes.
# replace the node titles:
nodes$title = newTitle
Call the network and click anywhere on the graph once to activate it. Now you just have to hover....(note the blue box, that means it's listening). There are a lot of nodes really close together, so there will be a bit of delayed response when you move from node to node.
You can get rid of the need to click to activate with visOptions(clickToUse = F).
visNetwork(nodes, edges) %>% visIgraphLayout(layout = "layout_with_fr")
FYI
I didn't go through all of the code in the original question; there's a lot! I'm going to include what I ran before creating my graph, so you know what was in and what was not. This code is not changed from your question.
library(igraph)
library(dplyr)
library(visNetwork)
#create file from which to sample from
x5 <- sample(1:100, 1100, replace=T)
#convert to data frame
x5 = as.data.frame(x5)
#create first file (take a random sample from the created file)
a = sample_n(x5, 1000)
#create second file (take a random sample from the created file)
b = sample_n(x5, 1000)
#combine
c = cbind(a,b)
#create dataframe
c = data.frame(c)
#rename column names
colnames(c) <- c("a","b")
#create graph
graph <- graph.data.frame(c, directed=F)
graph <- simplify(graph)
fc <- fastgreedy.community(graph)
V(graph)$community <- fc$membership
nodes <- data.frame(id = V(graph)$name, title = V(graph)$name, group = V(graph)$community)
nodes <- nodes[order(nodes$id, decreasing = F),]
edges <- get.data.frame(graph, what="edges")[1:2]
#add some information corresponding to the original data
other_damages_in_dollars <- rnorm(1000,104,9)
location <- c("canada","usa")
location <- sample(location, 1000, replace=TRUE, prob=c(0.3, 0.7))
type_of_house <- c("single","townhome", "rental" )
type_of_house<- sample(type_of_house , 1000, replace=TRUE, prob=c(0.5, 0.3, 0.2))
#heres how the original data would have looked like
original_data = data.frame(a,b, other_damages_in_dollars, location, type_of_house)

networkD3 and Shiny - filter by number of nodes

I have this shiny app that generates a network graph from a df.
library(shiny)
library(dplyr)
library(tibble)
library(networkD3)
ui <- fluidPage(
sidebarPanel(
fluidRow(selectInput("nos","Mínimo de orientações",c(1:10),selected=c(1)))
),
fluidRow(simpleNetworkOutput(
"redes", width = "100%", height = "800px"
))
)
server <- function(input, output, session) {
df_orientadores <- data.frame(orientador=c("Chet Baker","Bill Evans","Miles Davis","Miles Davis","Dizzy Gillespie","Miles Davis"),
autor=c("Clifford Brown","Freddie Hubbard","Kenny Dorham","Kenny Burrell","Arturo Sandoval","Goku"))
output$redes <- renderSimpleNetwork({
sources <- df_orientadores %>%
select(orientador) %>%
dplyr::rename(label = orientador)
destination <- df_orientadores %>%
select(autor) %>%
dplyr::rename(label = autor)
nodes <- full_join(sources, destination, by = "label")
nodes <- nodes %>% group_by(label) %>% count(label) %>% rename(freq=n)
nodes <- nodes %>% rowid_to_column("id")
nodes$peso <- ((nodes$freq)^3)
orientacoes_network <- df_orientadores %>%
group_by(orientador, autor) %>%
dplyr::summarise(weight = n()) %>%
ungroup()
edges <- orientacoes_network %>%
left_join(nodes, by = c("orientador" = "label")) %>%
dplyr::rename(from = id)
edges <- edges %>%
left_join(nodes, by = c("autor" = "label")) %>%
dplyr::rename(to = id)
edges <- select(edges, from, to, weight)
nodes_d3 <- mutate(nodes, id = id - 1)
edges_d3 <- mutate(edges, from = from - 1, to = to - 1)
filtro_nos <- nodes_d3
edges_d3$value <- 1
forceNetwork(Links = edges_d3, Nodes = nodes_d3, Source = "from", Target = "to",
NodeID = "label", Group = "id", Value = "value",
opacity = 1, fontSize = 20, zoom = TRUE, Nodesize = "peso",
arrows = TRUE)
})
}
shinyApp(ui, server)
I want to update the graph by the minimum number of nodes (described as freq in the nodes_d3 dataframe) that the user chooses (on the input$nos)
I've tried filtering the nodes_d3 and the edges_d3 by the number of frequencies but it return the error Warning: Error in $<-.data.frame: replacement has 1 row, data has 0 [No stack trace available]
any ideas how to do it?
I've tried using reactiveValues as well, but it wouldn't do. I don't know if in this kind of situation I have to subset the original dataframe and generate the network, or simply subsetting the dfs used in the forcenetwork (which I think I did but still didn't work.)
Once you've created your data, you need to filter both the edges_d3 and the nodes_d3 data frames, and then you need to re-adjust the from and to values in the filtered edges_d3 data frame to reflect the new positions of the nodes they refer to in the nodes_d3 data frame.
# determine the nodes that have at least the minimum freq
nodes_d3_min_freq <-
nodes_d3 %>%
filter(freq >= input$nos)
# filter the edge list to contain only links to or from the nodes that have
# the minimum or more freq
edges_d3_filtered <-
edges_d3 %>%
filter(from %in% nodes_d3_min_freq$id | to %in% nodes_d3_filtered$id)
# filter the nodes list to contain only nodes that are in or are linked to
# nodes in the filtered edge list
nodes_d3_filtered <-
nodes_d3 %>%
filter(id %in% unlist(select(edges_d3_filtered, from, to)))
# re-adjust the from and to values to reflect the new positions of nodes in
# the filtered nodes list
edges_d3_filtered$from <- match(edges_d3_filtered$from, nodes_d3_filtered$id) - 1
edges_d3_filtered$to <- match(edges_d3_filtered$to, nodes_d3_filtered$id) - 1
forceNetwork(Links = edges_d3_filtered, Nodes = nodes_d3_filtered,
Source = "from", Target = "to", NodeID = "label",
Group = "id", Value = "value", opacity = 1, fontSize = 20,
zoom = TRUE, Nodesize = "peso", arrows = TRUE)

Sankey Diagram with networkD3 package will not plot

I am using the sankeyNetwork function in the networkD3 package in R using as an example the code found here. However, all I get is a blank screen. The diagram is supposed to show the flow of infections between age groups (by gender). My code is as below:
library(RCurl)
library(networkD3)
edges <- read.csv(curl("https://raw.githubusercontent.com/kilimba/data/master/infection_flows.csv"),stringsAsFactors = FALSE )
nodes = data.frame(ID = unique(c(edges$Source, edges$Target)))
nodes$indx =0
for (i in 1:nrow(nodes)){
nodes[i,]["indx"] = i - 1
}
edges2 <- merge(edges,nodes,by.x = "Source",by.y = "ID")
edges2$Source <-NULL
names(edges2) <- c("target","value","source")
edges2 <- merge(edges2,nodes,by.x = "target",by.y = "ID")
edges2$target <- NULL
names(edges2) <- c("value","source","target")
nodes$indx <- NULL
# Plot
sankeyNetwork(Links = edges2, Nodes = nodes,
Source = "source", Target = "target",
Value = "value", NodeID = "ID",
width = 700, fontsize = 12, nodeWidth = 30)
Are you sure there are no errors printed in your R console?
This works for me with two small modifications:
Load the curl package as well at the beginning
library("curl")
The fontsize parameter apparently does not work and should be removed.
# Plot
sankeyNetwork(Links = edges2, Nodes = nodes,
Source = "source", Target = "target",
Value = "value", NodeID = "ID",
width = 700, #fontsize = 12,
nodeWidth = 30)
Adjusting fontsize does work, but your argument is missing a capitalization: fontSize
sankeyNetwork(Links = edges2, Nodes = nodes,
Source = "source", Target = "target",
Value = "value", NodeID = "ID",
width = 700, fontSize = 12,
nodeWidth = 30)
you do not need RCurl, read.csv is able to read directly from a URL
it's probably safer to use the stringsAsFactors = FALSE option when creating the nodes data.frame
as others have pointed out, you must make sure that the source and target variables in the links data are numeric, and that they are zero-indexed
as others have pointed out, the font size parameter is properly named fontSize
I have provided a more direct way of creating the links data with numeric indexes of the nodes in the nodes data.frame
library(networkD3)
edges <- read.csv("https://raw.githubusercontent.com/kilimba/data/master/infection_flows.csv",stringsAsFactors = FALSE)
nodes = data.frame(ID = unique(c(edges$Source, edges$Target)), stringsAsFactors = FALSE)
edges$Source <- match(edges$Source, nodes$ID) - 1
edges$Target <- match(edges$Target, nodes$ID) - 1
sankeyNetwork(Links = edges, Nodes = nodes,
Source = "Source", Target = "Target",
Value = "Value", NodeID = "ID",
width = 700, fontSize = 12, nodeWidth = 30)
I solved it for me by making sure that source, target and values were all numeric.
For example:
Energy$links$value <- as.numeric(Energy$links$value)

Resources