Word graph in R using Igraph - r

I have a simple problem. I have 2 text documents and I want to make a graph of each document through Igraph or other similar library. I actually want to make a large graph combine both subgraphs of two documents. I tried the following code. But,
> Topic1 = c("I love Pakistan")
> Topic2 = c("Pakistan played well")
> src = data.frame(Topic1,Topic2)
> mycorpus = Corpus(VectorSource(src))
> tdm = as.matrix(TermDocumentMatrix(mycorpus))
Now, don't know what should do next.
First graph of Topic1 will have 3 nodes and 3 edges, similarly, Second graph Topic2 will have 3 nodes and 3 edges. Now, I want o merge these two graph into one graph. The large graph now will have 5 nodes and 6 edges, where, node Pakistan will have 4 edges.
Anybody can help me?

Finally, I got the solution myself. First, we should make a graph of terms from Topic1. We will use every term that have frequency greater than 0.
tdm = as.matrix(TermDocumentMatrix(my))
x = names(tdm[,1][tdm[,1]>0])
k = t(combn(x,2))
g = graph_from_edgelist(k,directed = FALSE)
plot(g)
x2 = names(tdm[,2][tdm[,2]>0])
k2 = t(combn(x2,2))
g2 = graph_from_edgelist(k2,directed = FALSE)
plot(g2)
E1 = get.edgelist(g)
E2 = get.edgelist(g2)
E3 = rbind(E1,E2)
g3 = graph_from_edgelist(E3,directed = FALSE)
plot(g3)
g3 = simplify(g3,remove.multiple = TRUE, remove.loops = TRUE)

Related

fixing nodes when plotting networks

mynet is a network object with the 93 vertices and three vertex attributes: sex, indegree, and outdegree. Another network object, simnet, is simulated version of the network. The nodes and degree distributions are the same, but some edges have been rewired.
I plot them side by side...
par(mfrow=c(1,2))
plot(mynet, vertex.col="sex", main="mynet")
plot(simnet, vertex.col="sex", main="simnet")
...and get the following result:
This would be much more useful if I could fix the node location in both plots, as it would make the differences in edges very clear. Is there a way to do this with the base plot() function? If not, what is the simplest way to do this without manually entering coordinates for each node?
There is a way to do this by setting the layout in advance of plotting and using the same layout for both plots. We can do this using the names of the nodes since these are the same nodes between each graph. The approach is a little hacky but seems to work. Example code below:
library(igraph)
# Make some fake networks
set.seed(42)
df1 <- data.frame(e1 = sample(1:5, 10, replace = T),
e1 = sample(1:5, 10, replace = T))
df2 <- data.frame(e1 = sample(1:5, 10, replace = T),
e1 = sample(1:5, 10, replace = T))
# the original
g1 <- graph_from_data_frame(df1, directed = F)
# the 'simulations'
g2 <- graph_from_data_frame(df2, directed = F)
# set up the plot
par(mfrow=c(1,2))
# we set the layout
lo <- layout_with_kk(g1)
# this is a matrix of positions. Positions
# refer to the order of the nodes
head(lo)
#> [,1] [,2]
#> [1,] -0.03760207 0.08115827
#> [2,] 1.06606602 0.35564140
#> [3,] -1.09026110 0.28291157
#> [4,] -0.90060771 -0.72591181
#> [5,] 0.67151585 -1.82471026
V(g1)
#> + 5/5 vertices, named, from 418e4e6:
#> [1] 5 2 4 3 1
# If the layout has names for the rows then we can
# use those names to fiddle with the order
row.names(lo) <- names(V(g1))
# plot with layout
plot(g1, layout = lo)
# plot with layout but reorder the layout to match the order
# in which nodes appear in g2
plot(g2, layout = lo[names(V(g2)), ])
Created on 2018-11-15 by the reprex package (v0.2.1)

Depict supply network structures

I would like to plot a supply network structure. I have tried to use igraph, but until now did not come up with a reasonable result. An example would look like this:
library(igraph)
d <- read.table(text = "V1 V2 weight
s1 p1 88
s3 p1 100
s2 p2 100
s3 p2 43
p1 c1 21
p1 c2 79
p1 c3 88
p2 c1 22
p2 c2 121
", stringsAsFactors = F, header = T)
g <- graph_from_data_frame(d, directed = T)
plot(g, layout=layout.fruchterman.reingold,
edge.width=E(g)$weight/20,
vertex.shape = "none", vertex.label.font = 2,
vertex.label.cex=1.1, edge.color="gray70")
Which gives:
The problem is that the network has an additional structure. A resonable - among others - result would show the "s"-nodes (for suppliers) should be in the left third, the "p"-nodes (plants) should be in the middle and the c-nodes (customers) on the right hand side. Is this even doable with igraph (and how)? Is there another package that could do this?
Yes, this is doable with igraph. One way to make your own layout. A simple way to do this is to place all "s" nodes at x=1, "p" nodes at x=2 and "c" nodes at x=3. Each distinct node of each type (s,p,c) should get a unique y value so that they do not overlap. Using your example graph:
LO = matrix(0, nrow=vcount(g), ncol=2)
LO[grep("s", V(g)$name), 1] = 1
LO[grep("p", V(g)$name), 1] = 2
LO[grep("c", V(g)$name), 1] = 3
LO[,2] = ave(rep(1, vcount(g)), LO[,1], FUN = seq_along)
plot(g, layout=LO, edge.width=E(g)$weight/20,
vertex.shape = "none", vertex.label.font = 2,
vertex.label.cex=1.1, edge.color="gray70")
Also, following up on the comment of #Henrik, you can use layout_with_sugiyama. You still need to define the (s,p,c)-layers. Also, sugiyama arranges the layers vertically. You need to swap the x and y coordinates to get a horizontal layout.
Layers = rep(0,vcount(g))
Layers[grep("s", V(g)$name)] = 3
Layers[grep("p", V(g)$name)] = 2
Layers[grep("c", V(g)$name)] = 1
LO2 = layout_with_sugiyama(g, layers=Layers)$layout
LO2 = LO2[,2:1]
plot(g, layout=LO2, edge.width=E(g)$weight/20,
vertex.shape = "none", vertex.label.font = 2,
vertex.label.cex=1.1, edge.color="gray70")

turning a weighted edgelist into an unweighted in r

I have data on every interaction that could and did happen at a university club weekly social hour
id1 id2 timestalked date
1 2 1 1/1/2010
1 3 0 1/1/2010
...
100 2 4 1/8/2010
...
I want to first load this in as a directed graph for the entire time period for visualization. For the weighted matrix I did.
library(igraph);
el <- read.csv("el.csv", header = TRUE);
G <- graph.data.frame(el,directed=TRUE);
A <- as_adjacency_matrix(G,type="both",names=TRUE,sparse=FALSE,attr="timestalked");
I thought removing attr="timestalked" would turn the weights > 0 into 1 but that does not seem to work
library(igraph);
el <- read.csv("el.csv", header = TRUE);
G_unweight <- graph.data.frame(el,directed=TRUE);
A_unweight <- as_adjacency_matrix(G_unweight,type="both",names=TRUE,sparse=FALSE)
as_adjacency_matrix() doesn't provide any argument to control weights. Note that it just provides the number of edges between nodes from the graph.
To turn the weighted edgelist into an unweighted one, try this
A <- as_adjacency_matrix(G, type = "both", names = TRUE, sparse = FALSE)
A[A > 1] <- 1
Note that you can also use the graph_from_adjacency_matrix() function to create an unweighted igraph graph from the adjacency matrix by specifying weighted = NULL.

Cleaning a graph with R package "igraph"

I need to "clean" a graph in R. By cleaning, I mean that i need to delete all nodes which are not linked with a specific one. For instance, if in my graph there are 4 nodes, with these edges :
1 to 3
1 to 2
4 to 2
I want to keep only the nodes linked with the edges 1 plus the edges 1 itself, so to say I need to delete the edges 4.
Is there any way with igraph to build an algorithm which can do that for very very very big graph (like more than 1000 nodes and 1 000 000 edges) ?
Usesubcomponent and induced.subgraph:
edges_df <- data.frame(from = c(1, 1, 4), to = c(3, 2, 2))
g1 <- graph.data.frame(edges_df, directed = TRUE)
g2 <- induced.subgraph(g1, subcomponent(g1, "1", mode = "out"))
As for the "big" graphs: 1000 is not so big. On my laptop:
system.time({
g3 <- graph.full(n = 1000, directed = TRUE)
g4 <- induced.subgraph(g3, subcomponent(g3, "1", mode = "out"))
})
# user system elapsed
# 0.47 0.10 0.57

How to mine for motifs in R with iGraph

I'm trying to mine for 3-node motifs in R using the package igraph. I would like to retrieve the number of motifs for each individual vertex in the graph, which does not appear possible from the graph.motifs() function.
So, for the example graph:
testGraph = barabasi.game(10,
m = 5,
power = 2,
out.pref = TRUE,
zero.appeal = 0.5,
directed = TRUE)
I can use graph.motifs() to count the total number of each 3-node motif in the entire graph:
graph.motifs(testGraph,
size = 3)
[1] 0 0 26 0 16 0 2 58 0 0 0 0 0 0 0 0
But I would like to know the individual vertex participation. So, how many motifs (and what type) does vertex 1 participate in? Does anybody know a simple way to do that?
Here is a quick how-to.
It you are interested in the triads of vertex A, then first create the induced subgraph that contains A and its immediate neighbors. You can do this via neighborhood() and induced.subgraph() or simply with graph.neighborhood().
Then find the motifs in this subgraph, but not with graph.motifs(), but rather with triad.census(), because that counts all possible triples, even a non-connected ones.
Then remove A from this subgraph, and call triad.census() again. The difference of the two count vector will be exactly the motifs that include A.
Here's a self-contained example of Gabor's solution:
testGraph = barabasi.game(10,
m = 5,
power = 0.6,
out.pref = TRUE,
zero.appeal = 0.5,
directed = TRUE)
# Label nodes to more easily keep track during subsets/deletions
V(testGraph)$name = c('one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine', 'ten')
subGraph = graph.neighborhood(testGraph, order = 1, V(testGraph)[1], mode = 'all')[[1]]
allMotifs = triad.census(subGraph)
removeNode = delete.vertices(subGraph, 'one')
node1Motifs = allMotifs - triad.census(removeNode)

Resources