Related
Disclaimer: I'm an R newbie, so I may be overlooking something really obvious here...
I am currently working on a sankeyNetwork diagram using R, and I am facing a problem that almost seems to be a bug, but I'm completely clueless...
I've googled extensively, and haven't been able to find anybody else reporting the same...
The problem is that in my code I currently have 7 nodes, and 5 links. When I plot the diagram, everything works fine:
Plot 1, everything working fine
This is the code for Plot 1:
library(networkD3)
# List of nodes (portfolios & targets)
nodes = data.frame("trialnodes" =
c("portfolio1", # 0
"portfolio2", # 1
"portfolio3", # 2
"portfolio4", # 3
"target1", # 4
"target2", # 5
"target3" # 6
))
# List of links
links = as.data.frame(matrix(c(
0,4,2,
1,6,1,
2,3,1,
2,6,1,
3,5,1),
byrow = TRUE, ncol = 3))
# Column names of data frame
names(links) = c("source", "target", "value")
# check
links
# Sankey Diagram
# Colour scale
colourScale = JS("d3.scaleOrdinal(d3.schemeCategory20);")
# Diagram
sankeyNetwork(Links = links, Nodes = nodes,
Source = "source", Target = "target", Value = "value", NodeID = "trialnodes",
fontSize = 14, nodeWidth = 10, nodePadding = 140, iterations = 0,
colourScale = colourScale)
however, as soon as I add one more node, and 1 more link, the plot's format becomes completely broken, showing the links between nodes as simple gray thin lines (not representing the Value). The nodeWidth and nodePadding specifications also get ignored :(
Plot 2, links as thin gray lines
This is the code for Plot 2:
library(networkD3)
# List of nodes (portfolios & targets)
nodes = data.frame("trialnodes" =
c("portfolio1", # 0
"portfolio2", # 1
"portfolio3", # 2
"portfolio4", # 3
"target1", # 4
"target2", # 5
"target3", # 6
"target4" # 7
))
# List of links
links = as.data.frame(matrix(c(
0,4,2,
0,7,1,
1,6,1,
2,3,1,
2,6,1,
3,5,1),
byrow = TRUE, ncol = 3))
# Column names of data frame
names(links) = c("source", "target", "value")
# check
links
# Sankey Diagram
# Colour scale
colourScale = JS("d3.scaleOrdinal(d3.schemeCategory20);")
# Diagram
sankeyNetwork(Links = links, Nodes = nodes,
Source = "source", Target = "target", Value = "value", NodeID = "trialnodes",
fontSize = 14, nodeWidth = 10, nodePadding = 140, iterations = 0,
colourScale = colourScale)
Can anybody spot what's going on? I hope someone can help... I'm desperate D: Thank you very much in advance! :)
Either reduce your nodePadding value to something reasonable, or make the viewer/browser-window size large enough to show the max number of nodes you have in a column * 140 pixels (plus some for the actual node) and then refresh (in your second example that comes out to ~600px).
library(networkD3)
# List of nodes (portfolios & targets)
nodes = data.frame("trialnodes" =
c("portfolio1", # 0
"portfolio2", # 1
"portfolio3", # 2
"portfolio4", # 3
"target1", # 4
"target2", # 5
"target3", # 6
"target4" # 7
))
# List of links
links = as.data.frame(matrix(c(
0,4,2,
0,7,1,
1,6,1,
2,3,1,
2,6,1,
3,5,1),
byrow = TRUE, ncol = 3))
# Column names of data frame
names(links) = c("source", "target", "value")
# check
links
# Sankey Diagram
# Colour scale
colourScale = JS("d3.scaleOrdinal(d3.schemeCategory20);")
# Diagram
sankeyNetwork(Links = links, Nodes = nodes,
Source = "source", Target = "target", Value = "value", NodeID = "trialnodes",
fontSize = 14, nodeWidth = 10, nodePadding = 14, iterations = 0,
colourScale = colourScale)
I wish to implement onClick on this sankey diagram such that by clicking on a link, I should see the details of the link between the two nodes. It's like the plotly_click function
library(networkD3)
nodes = data.frame("name" =
c("r1", # Node 0
"r2", # Node 1
"r3", # Node 2
"r4", # Node 3
"r5", # Node 4
"r6", # Node 5
"r7", # Node 6
"Blood Test", # Node 7
"Check Out", # Node 8
"Discuss Results", # Node 9
"MRI Scan", # Node 10
"Registration", # Node 11
"Triage and Assessment", # Node 12
"X-ray"))# Node 13
links = as.data.frame(matrix(c(
0, 11, 500, # Each row represents a link. The first number
1, 12, 500, # represents the node being conntected from.
2, 7, 237, # the second number represents the node connected to.
3, 10, 236,
4, 13, 261,
5, 9, 495,
6, 8, 492),# The third number is the value of the node
byrow = TRUE, ncol = 3))
names(links) = c("source", "target", "value")
sankeyNetwork(Links = links, Nodes = nodes,
Source = "source", Target = "target",
Value = "value", NodeID = "name",
fontSize= 12, nodeWidth = 30)
You can add click events using htmlwidgets::onRender function. It's not clear what details you want to see, but this, for example, will show a link's value in an alert box when you click it...
library(htmlwidgets)
sn <- sankeyNetwork(Links = links, Nodes = nodes,
Source = "source", Target = "target",
Value = "value", NodeID = "name",
fontSize = 12, nodeWidth = 30)
clickJS <- 'd3.selectAll(".link").on("click", function(d){ alert(d.value); })'
htmlwidgets::onRender(sn, clickJS)
Here is an interesting solution based on the parset package:
devtools::install_github("timelyportfolio/parsetR")
library(parsetR)
links$source <- as.character(factor(links$source, labels=nodes[1:7,1]))
links$target <- as.character(factor(links$target, labels=nodes[8:14,1]))
parset(links, dimensions = c('source', 'target'),
value = htmlwidgets::JS("function(d) {return d.value}"),
tension = 0.5)
I have this dataframe that I'm trying to make a vertical line on an x-axis that is categorical.
data <- data.frame(
condition = c('1', '1', '1', '1', '1', '2', '2', '2', '2', '2', '3', '3', '3', '3', '3'),
AssessmentGrade = c('400', '410', '420', '430', '440', '500', '510', '520', '530', '540',
'300', '310', '320', '330', '340'),
Freq = c('1', '2', '1', '5', '7', '9', '1', '5', '3', '4', '5', '8', '1', '3', '5'),
MathGrade = c('A+', 'B-', 'C-', 'D', 'F', 'A-', 'B', 'C+', 'D-', 'F', 'A+', 'D', 'D', 'F', 'C'),
Condition = c('Condition 1', 'Condition 1', 'Condition 1', 'Condition 1', 'Condition 1',
'Condition 2', 'Condition 2', 'Condition 2', 'Condition 2', 'Condition 2',
'Condition 3', 'Condition 3', 'Condition 3', 'Condition 3', 'Condition 3'))
I tried adding a field to make grade numeric and that helped
data$Gradenum <- as.numeric(data$MathGrade)
I used ggplot to get abubble graph but I was wondering how I would edit it to use my company's standard colors
p <- ggplot(data, aes(x = MathGrade, y = AssessmentGrade, size = Freq, fill = Condition)) +
geom_point(aes(colour = Condition)) +
ggtitle("Main Title") +
labs(x = "First Math Grade", y = "Math Assessment Score")
How can I get a vertical line between C+ and D? I see a lot of information out there if your x axis is a date but not for other categorical values
Hardcoded solutions are error-prone
MrSnake's solution works - but only for the given data set because the value of 7.5 is hardcoded.
It will fail with just a minor change to the data, e.g., by replacing grade "A+" in row 1 of data by an "A".
Using the hardcoded xintercept of 7.5
p + geom_vline(xintercept = 7.5)
draws the line between grades C- and C+ instead of C+ and D:
This can be solved using ordered factors. But first note that the chart contains another flaw: The grades on the x-axis are ordered alphabetically
A, A-, A+, B, B-, C, C-, C+, D, D-, F
where I would have expected
A+, A, A-, B, B-, C+, C, C-, D, D-, F
Fixing the x-axis
This can be fixed by turning MathGrade into an ordered factor with levels in a given order:
grades <- c(as.vector(t(outer(LETTERS[1:4], c("+", "", "-"), paste0))), "F")
grades
[1] "A+" "A" "A-" "B+" "B" "B-" "C+" "C" "C-" "D+" "D" "D-" "F"
data$MathGrade <- ordered(data$MathGrade, levels = grades)
factor()would be sufficient to plot a properly ordered x-axis but we need an ordered factor for the next step, the correct placement of the vertical line.
Programmatically placing the vertical line
Let's suppose that the vertical line should be drawn between grades C- and D+. However, it may happen that either or both grades are missing from the data. Missing factors won't be plotted. In the sample data set, there are no data with grade D+, so the vertical line should be plotted between grades C- and D.
So, we need to look for the lowest grade equal or greater D+ and the highest grade equal or less than C- in the data set:
upper <- as.character(min(data$MathGrade[data$MathGrade >= "D+"]))
lower <- as.character(max(data$MathGrade[data$MathGrade <= "C-"]))
These are the grades in the actual data set where the vertical line is to be plotted between:
xintercpt <- mean(which(levels(droplevels(data$MathGrade)) %in% c(lower, upper)))
p + geom_vline(xintercept = xintercpt)
Just add geom_vline ;)
p + geom_vline(xintercept = 7.5)
For changing the colors as to fit your company scheme, you can add something like:
+ scale_color_manual(values = c('Condition 1' = 'grey20',
'Condition 2' = 'darkred',
'Condition 3' = 'blue'))
I try to plot a bipartite network with the edges of the same color than one of their nodes.
For example, let's take an movie/actor bipartite graph as an exemple, with 7 movies and 15 actors, and each actor has a nationality.
I want to have the edge between an actor and a movie of the same color than the nationality of the actor.
NG1 <- 7
NG2 <- 15
Nat <- sample(x = c("French", "English", "Italian", "American", "Chinese"), size = NG2, replace = T)
G <- graph.empty(NG1+NG2)
[Here, head(Nat) returns "Italian" "English" "American" "French" "French" "French"]
The code to create the edgelist:
E1 <- sample(x=1:NG1, size = 30, replace = T)
E2 <- sample(x=(NG1+1):(NG1+NG2), size = 30, replace = T)
EL <- c(rbind(E1, E2))
G <- add_edges(G, EL, nat = Nat[E2-NG1])
[Here, head(EL) returns 1 14 3 13 2 15]
The different aes arguments:
GROUP <- c(rep("Movie", NG1), rep("Act", NG2))
COL <- c(rep("Movie", NG1), Nat)
TXT <- c(as.character(1:NG1), letters[1:NG2])
And now the ggraph instructions:
ggraph(G, layout = 'kk') +
geom_node_point(aes(col = COL, shape = GROUP, size = 7)) +
geom_edge_link(aes(col = nat)) +
geom_node_text(aes(label = TXT), size = 4)
As you can see in the bottom, actor a, which is Italien has a blue node, but is connected with a pink edge with movie 7... How can I specify the color palette for nodes (here 6 colors) and edges (the 5 first colors of the nodes)?
I hope that I have made clear.
After two hours, I finally found a solution !
I used the function gg_color_hue defined here to emulate the 6 colors used for the nodes and then:
+ scale_edge_colour_manual(values = cols[1:5])
I have generated a graph:
library(DiagrammeR)
grViz("
digraph boxes_and_circles {
# a 'graph' statement
graph [layout = neato, overlap = true, fontsize = 10, outputorder = edgesfirst]
# several 'node' statements
node [shape = circle,
fontname = Helvetica]
A [pos = '1,1!'];
B [pos = '0,2!'];
C [pos = '1.5,3!'];
D [pos = '2.5,1!'];
E [pos = '4,1!'];
F [pos = '4,2!'];
G [pos = '5,1!'];
H [pos = '6,2!'];
I [pos = '1.5,-0.1!'];
# several 'edge' statements
A->B B->C
D->E D->F E->F E->G F->G G->H F->H
}
")
Which produces:
Now I would like to draw a box with dotted lines around the nodes A, B, and C.
How can I accomplish this in R? A key requirement of the solution is that it is reproducible, i.e. that I can run the script multiple times and get the same result.
Here's another approach based on igraph. It is inspired by this igraph code sample.
I'm assuming that using igraph instead of DiagrammeR is an option - maybe that is not the case...
We leave positioning of the vertices to a standard layout algorithm and query it for the resulting vertex positions. These positions are then used to draw a dotted rectangle around an arbitrary set of "selected" vertices. No user interaction is needed.
We start with the graph topology.
library(igraph)
set.seed(42)
df <- data.frame(from = c('A', 'B', 'I', 'D', 'D', 'E', 'E', 'F', 'F', 'G'),
to = c('B', 'C', 'I', 'E', 'F', 'G', 'F', 'H', 'G', 'H'))
g <- graph.data.frame(df, directed = TRUE)
The size of the vertices and arrows in the graph can be set freely, according to taste.
vertexsize <- 50
arrowsize <- 0.2
We ask the Fruchterman-Reingold layout engine to calculate the coordinates of the vertices.
coords <- layout_with_fr(g)
Then plot the graph.
plot(g,
layout = coords,
vertex.size = vertexsize,
edge.arrow.size = arrowsize,
rescale = FALSE,
xlim = range(coords[,1]),
ylim = range(coords[,2]))
If we like to see what's going on, we can add coordinate axes and print the vertex coordinates:
axis(1)
axis(2)
V(g) # ordered vertex list
coords # coordinates of the vertices (in the same coordinate system as our dotted rectangle)
We now figure out the bounding box of the vertices that we want a rectangle around.
selectedVertices = c("A", "B", "C")
vertexIndices <- sapply(selectedVertices, FUN = function(x) { return(as.numeric(V(g)[x])) } )
llx <- min(coords[vertexIndices, 1])
lly <- min(coords[vertexIndices, 2])
urx <- max(coords[vertexIndices, 1])
ury <- max(coords[vertexIndices, 2])
Almost there. We already have the coordinates of the vertex centers in coords[], but we also need the size of the vertices in the coordinate system of plot(). From the plot.igraph source code we can see that the vertex.size option for plot() gets divided by 200 and then used as radius for drawing the vertex. We use a 50% bigger value as the margin around the bounding box of the vertex coordinates when drawing the dotted rectangle.
margin <- (vertexsize / 200) * 1.5
rect(llx - margin, lly - margin, urx + margin, ury + margin, lty = 'dotted')
This is the result we get:
You could use #StevenBeaupre's solution for the widget, but there are a few packages for graphing networks using R's graphics. One is igraph if you are open to using other solutions.
This will make the graph
library('igraph')
set.seed(11)
g <- data.frame(from = c('A', 'B', 'I', 'D', 'D', 'E', 'E', 'F', 'F', 'G'),
to = c('B', 'C', 'I', 'E', 'F', 'G', 'F', 'H', 'G', 'H'))
(gg <- graph.data.frame(g, directed = TRUE))
plot(gg, vertex.color = 'white')
And there are many ways to add a box to r graphics; here is one where you can click the plot to add the box without having to calculate anything
rekt <- function(...) {
coords <- c(unlist(locator(1)), unlist(locator(1)))
rect(coords[1], coords[2], coords[3], coords[4], ..., xpd = NA)
}
rekt(border = 'red', lty = 'dotted', lwd = 2)
I get this
An easy solution with DiagrammR would be to use dot rather than neato. You mostly lose the ability to manually position the nodes (attribute pos doesn't work anymore), but you gain the ability to use cluster and subgraph to draw lines around sets of nodes.
library(DiagrammeR)
grViz("
digraph boxes_and_circles {
# a 'graph' statement
graph [ fontsize = 10,rankdir=LR]
# several 'node' statements
node [shape = circle,
fontname = Helvetica]
# several 'edge' statements
subgraph cluster_1 {
style=dotted
A->B->C
}
D->E D->F E->F E->G F->G G->H F->H
I
}
")