I'm trying to use data.tree and NetworkD3 in R to create a tree representation of a file system where the nodes of the graph are weighted by file size.
library(data.tree)
library(networkD3)
repo <- Node$new("Repository")
git <- repo$AddChild(".git")
prod <- repo$AddChild("Production")
exp <- repo$AddChild("Experimental")
repo$size <- 866000
git$size <- 661000
prod$size <- 153000
exp$size <- 48000
I can get a vector of these sizes using Get, so that
sizes <- repo$Get("size")
But when I try to put it all together, I'm not sure how to include this weight information in the network visualization step. Trying to do something like this...
reponet <- ToDataFrameNetwork(repo,"repo")
net <- forceNetwork(reponet, Nodesize = repo$Get("size"))
to no avail. Basically I'm trying to do what Julia Silge did in this great SO blog post. Does anyone know how to set this?
Check the help file for forceNetwork... there are numerous, mandatory parameters that you have not set.
You can use simpleNetwork to plot a network with just a links data frame like you have, but it doesn't allow you to control the node size... for example...
simpleNetwork(reponet)
To control the node size, you need to use forceNetwork, but it requires a links data frame and a nodes data frame. You could build the nodes data frame from the sizes object you created, and then adjust the source and target IDs in your links data frame to match the indexes of the appropriate node in your nodes data frame (0 indexed because it's sent to JavaScript)... for example...
nodesdf <- data.frame(name = names(sizes), nodesize = sizes / 10000, group = 1)
reponet$from <- match(reponet$from, nodesdf$name) - 1
reponet$to <- match(reponet$to, nodesdf$name) - 1
forceNetwork(reponet, Nodes = nodesdf, Source = "from", Target = "to",
NodeID = "name", Group = "group", Nodesize = "nodesize")
Related
I'm trying to create a graph with relationships of people involved in the 9/11 attacks, but I don't understand the input very much. I use loops to group the hijackers (hijacker1 knows hijacker2; hijacker5 knows hijacker3 etc.) but it doesn't work for me.
The result of my work should be a relationship graph as on this page: LINK
I use data in csv format: Data to download
The data schema looks like the screenshots below. There are three files available, but if I understand correctly to get what I want, enough data from the first file (?)
Hijackers ASSOCIATES
Hijackers ATTR
Hijackers PRIORITY_CONTACT
Hname1 HName2 HName3
HName1 0 1 0
HName2 1 0 1
HName3 0 1 0
...
I would like to draw a relationship diagram and extract information about which of the hijackers had the most relationships (Should I use betweenness() from igraph library?).
Here's an approach with igraph:
First, let's grab the data and make it into an adjacency matrix:
temp <- tempfile(fileext = ".zip")
download.file("https://sites.google.com/site/ucinetsoftware/datasets/covert-networks/911hijackers/9%2011%20Hijackers%20CSV.zip?attredirects=0&d=1",
temp,
mode = "wb")
data <- read.csv(unz(temp,"CSV/9_11_HIJACKERS_ASSOCIATES.csv"))
my.rownames <- data$X
data2 <- sapply(data[,-1], as.numeric)
rownames(data2) <- my.rownames
Adj <- as.matrix(data2)
Now the easy parts. We can convert the adjacency matrix into an igraph graph, compute vertex degree and add that data to to the graph.
library(igraph)
Graph <- graph_from_adjacency_matrix(Adj)
V(Graph)$vertex_degree <- degree(Graph)
Finally we can plot the graph with the vertex size being proportional to the degree:
plot.igraph(Graph,
vertex.size = V(Graph)$vertex_degree,
layout=layout.fruchterman.reingold, main="Hijacker Relationships")
I have plotted a network graph using d3 library but as the number of nodes grow, animation takes a huge performance toll. For analytics, I do not need good UI but something quick
Question: How do I stop the real-time simulation and yet still enable manipulation of the graph?
Working Code Example
require(igraph)
require(networkD3)
# Use igraph to make the graph and find membership
karate <- make_graph("Zachary")
wc <- cluster_walktrap(karate)
members <- membership(wc)
# Convert to object suitable for networkD3
karate_d3 <- igraph_to_networkD3(karate, group = members)
# Create force directed network plot
forceNetwork(Links = karate_d3$links, Nodes = karate_d3$nodes,
Source = 'source', Target = 'target', NodeID = 'name',
Group = 'group')
Visual Output:
I think this might be an easy question, but I could not solve it after reading the pegas documentation. I want to plot an haplotype network using a FASTA file and identify which mutations are separatting the distinct haplotypes.
Example:
fa <- read.FASTA("example.fa")
haps <- haplotype(fa)
haps50 <- subset(haps, minfreq = 50)
(network <- haploNet(haps50))
plot(network, size = attr(network, "freq"),show.mutation=1,labels=T)
How can I identify the position of the mutation in my FASTA file that is separating for example haplotype XX from V?
Extra question:
Would it be also possible to know for example, what is the haplotype sequence of one of the haplotypes? For example haplotype V, which is quite frequent?
The pegas package contains a function diffHaplo, which specifies differences between haplotypes.
diffHaplo(haps50, a = "XX", b = "V")
To extract the DNA sequence of the frequent haplotype V, the indexes in the object of class haplotype will identify, which DNA sequence contains the haplotype.
# haplotype index from its name
i <- which(attr(haps50, "dimnames")[[1]] == "V")
# index of the first sequence with the haplotype
s <- attr(haps50, "index")[[i]][1]
The respective sequence can then be identified in the alignment fa to save or print on screen.
write.dna(fa[s], file = "hapV.fas", format = "fasta", nbcol = -1, colsep = "")
paste(unlist(as.character(fa[s])), collapse = "")
I am using R to visualize relation between, say, 5-6 different nodes. Now, a graph is probably the best way to do it. The issue is, the edges are way too many. There can be a hundred edges between two vertexes. That makes the graph look very clumsy. I want the edge name to be displayed. With a hundred edge name being displayed, they overlap over each other and hence not comprehensible.
So, I have two questions-
Is there any other way in which I can represent the data? Some kind of chart probably?
I want to export the final output to HTML, which uses d3.js or any other similar library, keeping the edge name and a few other similar information intact. What will be the best plugin to use in that case?
I am using the igraph library to create the graph in R.
I also tried using the networkD3 library to export it to an HTML and make it interactive.
graph <- graph.data.frame(edges, directed = TRUE, vertices = vertex)
plot(graph, edge.label = E(graph)$name)
wc <- cluster_walktrap(graph)
members <- membership(wc)
graph_d3 <- igraph_to_networkD3(graph, group = members)
graph_forceNetwork <- forceNetwork(Links = graph_d3$links, Nodes = graph_d3$nodes,
Source = 'source',
Target = 'target',
NodeID = 'name',
Group = 'group',
zoom = TRUE,
fontSize = 20)
Right now, it is a graph with only two vertex and about 60-70 edges between them. So, I did not use any particular layout.
Following the example of plotting a term-document matrix below,
library("tm")
data("crude")
tdm <- TermDocumentMatrix(crude, control = list(removePunctuation = TRUE,
removeNumbers = TRUE,
stopwords = TRUE))
plot(tdm, terms = findFreqTerms(tdm, lowfreq = 6)[1:25], corThreshold = 0.5)
Is there a way to colorize the nodes based on how many vertices they have? Is there an example of making the nodes with more vertices larger or something to that effect as well?
I appears that the nodes that end up being plotted are of the class AgNode. The properties that you can set of the AgNode are listed on the ?AgNode help page. Once you know what properties you would like to set, you can pass a list to a nodeAttrs parameter to your plotting command. (EDIT: actually a better list is probably the node attributes description in the Rgraphviz documentation)
The nodeAttrs parameter take a list where each named element of that list corresponds to one of the properties of AgNode. At each position, you store a named vector where the name of the vector corresponds to the node name (ie the word in your term matrix) and the value represents the value for that attribute. For example
list(
color=c(futures="blue", demand="red"),
shape=c(crude="ellipse", budget="circle"),
)
So when you wanted to color the terms by the number of vertexes they have, i'm going to assume you mean edges as each word is a single vertex in the graph. So, using your tdm object
freqterms <- findFreqTerms(tdm, lowfreq = 6)[1:25]
vtxcnt <- rowSums(cor(as.matrix(t(tdm[freqterms,])))>.5)-1
I save the terms you wanted, and then I basically copied the code inside the plot command to calculate the correlations with your cutoff of 0.5 to see how many other words each word in this subset is connected to. That's the vtxcnt variable. (There may be a more efficient way to extract this but I could not find it). Now I'm ready to assign colors
mycols<-c("#f7fbff","#deebf7","#c6dbef",
"#9ecae1","#6baed6","#4292c6",
"#2171b5", "#084594")
vc <- mycols[vtxcnt+1]
names(vc) <- names(vtxcnt)
Here I grabbed some colors from ColorBrewer. I have 8 values because the values of vtxcnt range from 0-8. If you had a wider range or wanted to collapsed categories, you could use the cut() command to categorize them. Then I created a named vector vc that matches up each word to the appropriate color. vc should look like this
head(vc)
# ability accord agreement ali also analysts
# "#084594" "#c6dbef" "#2171b5" "#9ecae1" "#f7fbff" "#4292c6"
And now we are ready to make the plot
pp <- plot(tdm,
terms = freqterms,
corThreshold = 0.5,
nodeAttrs=list(fillcolor=vc))
So as you can see the customizing of nodes is pretty flexible. You can color them how every you like if you pass the correct values to nodeAttrs.