Which R packages are suitable for plotting disease transmission over time? - r

I am trying to find a way to plot a disease transmission tree that allows me to:
plot the tree over a timeline (a timeline spanning 2 months)
specify the shape and colour of the nodes in the tree (so that you can easily identify which nodes belong to the same household for example)
format the link between the nodes (dashed lines, two way arrows, solid lines...etc.)
plot "stray branches" that aren't linked to the root/parent node.
The dataset I am working with is relatively small (22 nodes) so I don't mind working with a package that is a bit fiddly!
I have thought about using phylogeny trees, but I'm uncertain whether they will allow me to plot stray nodes. Which package would be most suitable for this task?
Thanks!

Try DiagrammeR. I don't have much experience, but it did what I needed to do and I know I barely scratched the surface of what it can do.

Related

Plot each individual community with iGraph

I have generated a community network using iGraph and qGraph to determine the relatedness between individual's genetic codes above a threshold edge value. To further analyze this data, I used cluster_walktrap and then graphed this along with the network. I am wondering if there is a way label the individual communities, as well as plot a single community from the batch. I can provide extra details if need be. Thanks
Do you mean something using components, e.g.,
sapply(components(g), plot)

R: Untangle graph plot

I have the following graph plot with 131 vertices made by using plot with an object of the igraph class. My question is whether there is any way to present this in a cleaner way so that nodes at least don't overlap each other and edges are more visible.
I'm not too familiar with these types of graphs, so I can't provide a specific answer. But this sounds like a good time to use the jitter() function which adds random noise between the data points, thus separating them out.

Visualisation for hierarchically clustered graph on map in python

I am starting a new project in python (to be used through jupyter-notebooks), where I'll need to visualise some hierarchically clustered graphs.
I have looked for existing packages, but so far I am not convinced by what I have seen.
I am not interested in the clustering process in itself, because this will be another part of the project and I know (roughly) how the graphs will be built up progressivelly.
What I am looking for are:
an appropriate data structure for storing hierarchically clustered graphs,
visualisation tools that would allow to represent the graph on a map (based on X and Y coordinates of the nodes) and either represent the subparts of the clusters, or simplify the clusters depending on their type or depth in the graph structure,
ideally, bring some interactivity, for example the ability to zoom-in or-out, or click on some clustered nodes to expand the nodes that were hidden in the cluster.
It looks pretty specific and despite some cool packages I have seen I am not sure which one would help without having too much to reimplement. So far, NetworkX looks like a cool starting point, especially with some D3.js (as shown here), but it is still far from what I have in mind.
Any advice about where to start digging?
Thanks a lot.
Gautier
For Python, Seaborn's clustermaps are nice. Seaborn is mainly meant to be used with Pandas dataframes; however, the documentation for clustermap says it can be rectangular data, and so I think it means other arrays will wor.
See also:
Dendrogram with heat map
SciPy Hierarchical Clustering and Dendrogram Tutorial
Hierarchical Clustering in Python

How to visualize a large network in R?

Network visualizations become common in science in practice. But as networks are increasing in size, common visualizations become less useful. There are simply too many nodes/vertices and links/edges. Often visualization efforts end up in producing "hairballs".
Some new approaches have been proposed to overcome this issue, e.g.:
Edge bundling:
http://vis.stanford.edu/papers/divided-edge-bundling or
https://gephi.org/tag/edge-bundling/
Hierarchial edge bundling:
http://graphics.cs.illinois.edu/sites/graphics.dev.engr.illinois.edu/files/edgebundles.pdf
Group Attributes Layout:
http://wiki.cytoscape.org/Cytoscape_3/UserManual
How to make grouped layout in igraph?
I am sure that there are many more approaches. Thus, my question is:
How to overcome the hairball issue, i.e. how to visualize large networks by using R?
Here is some code that simulates an exemplary network:
# Load packages
lapply(c("devtools", "sna", "intergraph", "igraph", "network"), install.packages)
library(devtools)
devtools::install_github(repo="ggally", username="ggobi")
lapply(c("sna", "intergraph", "GGally", "igraph", "network"),
require, character.only=T)
# Set up data
set.seed(123)
g <- barabasi.game(1000)
# Plot data
g.plot <- ggnet(g, mode = "fruchtermanreingold")
g.plot
This questions is related to
Visualizing Undirected Graph That's Too Large for GraphViz?. However, here I am searching not for general software recommendations but for concrete examples (using the data provided above) which techniques help to make a good visualization of a large network by using R (comparable to the examples in this thread: R: Scatterplot with too many points).
Another way to visualize very large networks is with BioFabric (www.BioFabric.org), which uses horizontal lines instead of points to represent the nodes. Edges are then shown using vertical line segments. A quick D3 demo of this technique is shown at: http://www.biofabric.org/gallery/pages/SuperQuickBioFabric.html.
BioFabric is a Java application, but a simple R version is available at: https://github.com/wjrl/RBioFabric.
Here is a snippet of R code:
# You need 'devtools':
install.packages("devtools")
library(devtools)
# you need igraph:
install.packages("igraph")
library(igraph)
# install and load 'RBioFabric' from GitHub
install_github('RBioFabric', username='wjrl')
library(RBioFabric)
#
# This is the example provided in the question:
#
set.seed(123)
bfGraph = barabasi.game(1000)
# This example has 1000 nodes, just like the provided example, but it
# adds 6 edges in each step, making for an interesting shape; play
# around with different values.
# bfGraph = barabasi.game(1000, m=6, directed=FALSE)
# Plot it up! For best results, make the PDF in the same
# aspect ratio as the network, though a little extra height
# covers the top labels. Given the size of the network,
# a PDF width of 100 gives us good resolution.
height <- vcount(bfGraph)
width <- ecount(bfGraph)
aspect <- height / width;
plotWidth <- 100.0
plotHeight <- plotWidth * (aspect * 1.2)
pdf("myBioFabricOutput.pdf", width=plotWidth, height=plotHeight)
bioFabric(bfGraph)
dev.off()
Here is a shot of the BioFabric version of the data provided by the questioner, though networks created with values of m > 1 are more interesting. The inset detail shows a close-up of the upper left corner of the network; node BF4 is the highest-degree node in the network, and the default layout is a breadth-first search of the network (ignoring edge directions) starting from that node, with neighboring nodes traversed in order of decreasing node degree. Note that we can immediately see that, for example, about 60% of node BF4's neighbors are degree 1. We can also see from the strict 45-degree lower edge that this 1000-node network has 999 edges, and is therefore a tree.
Full disclosure: BioFabric is a tool that I wrote.
That's an interesting question, I didn't know most of the tools you listed, so thanks. You can add HivePlot to the list. It's a deterministic method consisting in projecting nodes on a fixed number of axes (usually 2 or 3). Look a the linked page, there're many visual examples.
It works better if you have a categorical nodal attribute in your dataset, so that you can use it to select which axis a node goes to. For instance, when studying the social network of a university: students on one axis, teachers on another and administrative staff on the third. But of course, it can also work with a discretized numerical attribute (eg. young, middle-aged and older people on their respective axes).
Then you need another attribute, and it has to be numerical (or at least ordinal) this time. It is used to determine the position of a node on its axis. You can also use some topological measure, such as degree or transitivity (clustering coefficient).
(source: hiveplot.net)
The fact the method is deterministic is interesting, because it allows comparing different networks representing distinct (but comparable) systems. For example, you can compare two universities (provided you use the same attributes/measures to determine axes and position). It also allows describing the same network in various ways, by choosing different combinations of attributes/measures to generate the visualization. This is the recommanded way of visualizing a network, actually, thanks to a so-called hive panel.
Several softwares able of generating those hive plots are listed in the page I mentioned at the beginning of this post, including implementations in Java and R.
I've been dealing with this problem recently. As a result, I've come up with another solution. Collapse the graph by communities/clusters. This approach is similar to the third option outlined by the OP above. As a word of warning, this approach will work best with undirected graphs. For example:
library(igraph)
set.seed(123)
g <- barabasi.game(1000) %>%
as.undirected()
#Choose your favorite algorithm to find communities. The algorithm below is great for large networks but only works with undirected graphs
c_g <- fastgreedy.community(g)
#Collapse the graph by communities. This insight is due to this post http://stackoverflow.com/questions/35000554/collapsing-graph-by-clusters-in-igraph/35000823#35000823
res_g <- simplify(contract(g, membership(c_g)))
The result of this process is the below figure, where the vertices' names represent community membership.
plot(g, margin = -.5)
The above is clearly nicer than this hideous mess
plot(r_g, margin = -.5)
To link communities to original vertices you will need something akin to the following
mem <- data.frame(vertices = 1:vcount(g), memeber = as.numeric(membership(c_g)))
IMO this is a nice approach for two reasons. First, it can in theory deal with any size graph. The process of finding communities can be continuously repeated on collapsed graphs. Second, adopting a interactive approach would yield very readable results. For example, one can imagine the user being able to click on a vertex in the collapsed graph to expand that community revealing all of its original vertices.
I have looked around and found no good solution. My approach has been to remove nodes and play with edge transparency. It is more of a design solution rather than a technical one, but I've been able to plot gephi-like networks of up to 50,000 edges without much complications on my laptop.
with your example:
plot(simplify(g), vertex.size= 0.01,edge.arrow.size=0.001,vertex.label.cex = 0.75,vertex.label.color = "black" ,vertex.frame.color = adjustcolor("white", alpha.f = 0),vertex.color = adjustcolor("white", alpha.f = 0),edge.color=adjustcolor(1, alpha.f = 0.15),display.isolates=FALSE,vertex.label=ifelse(page_rank(g)$vector > 0.1 , "important nodes", NA))
Example of twitter mentions network with 30,000 edges:
Yet another interesting package is networkD3. There are a myriad of means of representing graphs within this library. In particular, I find the forceNetwork an interesting option. It is interactive and therefore allows you to really explore your network. It is great for EDA, but it maybe too "wiggly" for final work.
I tired this pacakge. It's very fast.
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("netbiov")
https://www.bioconductor.org/packages/release/bioc/html/netbiov.html
Examples:
https://www.bioconductor.org/packages/release/bioc/vignettes/netbiov/inst/doc/netbiov-intro.pdf

visualization - size of circle proportionate to the value of the item

I'm getting familiar with Graphviz and wonder if it's doable to generate a diagram/graph like the one below (not sure what you call it). If not, does anyone know what's a good open source framework that does it? (pref, C++, Java or Python).
According to Many Eyes‌​, this is a bubble chart. They say:
It is especially useful for data sets with dozens to hundreds of values, or with values that differ by several orders of magnitude.
...
To see the exact value of a circle on the chart, move your mouse over it. If you are charting more than one dimension, use the menu to choose which dimension to show. If your data set has multiple numeric columns, you can choose which column to base the circle sizes on by using the menu at the bottom of the chart.
Thus, any presentation with a lot of bubbles in it (especially with many small bubbles) would have to be dynamic to respond to the mouse.
My usual practice with bubble charts is to show three or four variables (x, y and another variable through the size of the bubble, and perhaps another variable with the color or shading of the bubble). With animation, you can show development over time too - see GapMinder. FlowingData provides a good example with a tutorial on how to make static bubble charts in R.
In the example shown in the question, though, the bubbles appear to be located somewhat to have similar companies close together. Even then, the exact design criteria are unclear to me. For example, I'd have expected Volkswagen to be closer to General Motors than Pfizer is (if some measure of company similarity is used to place the bubbles), but that isn't so in this diagram.
You could use Graphviz to produce a static version of a bubble chart, but there would be quite a lot of work involved to do so. You would have to preprocess the data to calculate a similarity matrix, obtain edge weights from that matrix, assign colours and sizes to each bubble and then have the preprocessing script write the Graphviz file with all edges hidden and run the Graphviz file through neato to draw it.

Resources