Scale-Free graph, access the vertices of the graph - r

I'm trying to generate a Scale-Free graph, according to the Barabasi-Albert Model. I used the barabasi.game function, gerating 1000 vertices of the graph:
library('igraph')
g <- barabasi.game(1000)
I would like, now, to pick at random a vertex of g and, among its neighbors, to pick at random another vertex. How can I access the vertices of the graph?
Edit. I had problems with the solution kindly suggested by G5W. For this graph:
I obtained, from the first instruction
RV<-sample(V(g), 1)
the result RV=4, but from the second
RVn<-sample(neighbors(g, RV, mode="all"), 1)
I obtained RVn=1. As we can see from the pic this is a mistake; moreover, instruction
neighbors(g, i)
returns
+ 1/10 vertex, named, from 57207c1:
[1] 2
Why?
Thank you.

Modified
You can pick a random vertex and a random neighbor like this:
RV = sample(V(g), 1)
NRV = neighbors(g, RV, mode="all")
RVn = ifelse(length(NRV) == 1, NRV, sample(NRV, 1))
This should work when RV has one neighbor or more.
I would like to mention that most vertices have only one neighbor, so the random selection of a neighbor doesn't do much.
table(sapply(V(g), function(v) length(neighbors(g, v, mode="all"))))
1 2 3 4 5 6 7 8 9 10 11 12 13 14 18 21 37 38 92
657 183 67 35 12 11 3 7 4 6 2 4 1 2 2 1 1 1 1

Related

Build new adjacency matrix after graph partitioning

I have an adjancecy matrix stored in CSR format. Eg
xadj = 0 2 5 8 11 13 16 20 24 28 31 33 36 39 42 44
adjncy = 1 5 0 2 6 1 3 7 2 4 8 3 9 0 6 10 1 5 7 11 2 6 8 12 3 7 9 13 4 8 14 5 11 6 10 12 7 11 13 8 12 14 9 13
I am now paritioning said graph using METIS. This gives me the partition vector part of the graph. Basically a list that tells me in which partition each vertex is. Is there an efficient way to build the new adjacency matrix for this partitioning such that I can partition the new graph again? Eg a function rebuildAdjacency(xadj, adjncy, part). If possible reusing xadj and adjncy.
I'm assuming that what you mean by "rebuild" is removing the edges between vertices that have been assigned different partitions? If so, the (probably) best you can do is iterate your CSR list, generate a new CSR list, and skip all edges that are between partitions.
In pseudocode (actually, more or less Python):
new_xadj = []
new_adjcy = []
for row in range(0, n):
row_index = xadj[row]
next_row_index = xadj[row+1]
# New row index for the row we are currently building
new_xadj.append(len(new_adjcy))
for col in adjncy[row_index:next_row_index]:
if partition[row] != partition[col]:
pass # Not in the same partition
else:
# Put the row->col edge into the new CSR list
new_adjcy.append(col)
# Last entry in the row index field is the number of entries
new_xadj.append(len(new_adjcy))
I don't think that you can do this very efficiently re-using the old xadj and adjcy fields. However, if you are doing this recursively, you can save memory allocation / deallocation by having exacyly two copies of xadj and adjc, and alternating between them.

How to get the ID of each node from topological sort?

I have a network (a directed acyclic graph):
dag_1 <- barabasi.game(20)
I applied a topological sort:
top1 <- topo_sort(dag_1)
top1
+ 20/20 vertices, from 0ee5d26:
[1] 5 8 11 13 14 15 16 17 18 20 4 7 12 19 2 10 9 6 3 1
If I type top1 and hit enter, the results are above. I need to access the vector
5 8 11 13, ..., 1
I tried top1[1] and top1[[1]]. Neither of them gave me the vector.
How can I get it?
top1 is an igraph.vs class object, and indexing e.g. top1[1:10] returns the vertices of the graph. To return a vector of the vertices use:
as.vector(top1)

Vertex Labels in igraph R

I am using igraph to plot a non directed force network.
I have a dataframe of nodes and links as follows:
> links
source target value sourceID targetID
1 3 4 0.6245 1450552 1519842
2 6 8 0.5723 2607133 3051992
3 9 7 0.7150 3101536 3025831
4 0 1 0.7695 401517 425784
5 2 5 0.5535 1045501 2258363
> nodes
name group size
1 401517 1 8
2 425784 1 8
3 1045501 1 8
4 1450552 1 8
5 1519842 1 8
6 2258363 1 8
7 2607133 1 8
8 3025831 1 8
9 3051992 1 8
10 3101536 1 8
I plot these using igraph as follows:
gg <- graph.data.frame(links,directed=FALSE)
plot(gg, vertex.color = 'lightblue', edge.label=links$value, vertex.size=1, edge.color="darkgreen",
vertex.label.font=1, edge.label.font =1, edge.label.cex = 1,
vertex.label.cex = 2 )
On this plot, igraph has used the proxy indexes for source and target as vertex labels.
I want to use the real ID's, in my links table expressed as sourceID and targetID.
So, for:
source target value sourceID targetID
1 3 4 0.6245 1450552 1519842
This would show as:
(1450552) ----- 0.6245 ----- (1519842)
Instead of:
(3) ----- 0.6245 ----- (4)
(Note that the proxy indexes are zero indexed in the links dataframe, and one indexed in the nodes dataframe. This offset by 1 is necessary for igraph plotting).
I know I need to somehow match or map the proxy indexes to their corresponding name within the nodes dataframe. However, I am at a loss as I do no not know the order in which igraph plots labels.
How can I achieve this?
I have consulted the following questions to no avail:
Vertex Labels in igraph with R
how to specify the labels of vertices in R
R igraph rename vertices
You can specify the labels like this:
library(igraph)
gg <- graph.data.frame(
links,directed=FALSE,
vertices = rbind(
setNames(links[,c(1,4)],c("id","label")),
setNames(links[,c(2,5)], c("id","label"))))
plot(gg, vertex.color = 'lightblue', edge.label=links$value,
vertex.size=1, edge.color="darkgreen",
vertex.label.font=1, edge.label.font =1, edge.label.cex = 1,
vertex.label.cex = 2 )
You could also pass
merge(rbind(
setNames(links[,c(1,4)],c("id","label")),
setNames(links[,c(2,5)], c("id","label"))),
nodes,
by.x="label", by.y="name")
to the vertices argument if you needed the other node attributes.
Data:
links <- read.table(header=T, text="
source target value sourceID targetID
1 3 4 0.6245 1450552 1519842
2 6 8 0.5723 2607133 3051992
3 9 7 0.7150 3101536 3025831
4 0 1 0.7695 401517 425784
5 2 5 0.5535 1045501 2258363")
nodes <- read.table(header=T, text="
name group size
1 401517 1 8
2 425784 1 8
3 1045501 1 8
4 1450552 1 8
5 1519842 1 8
6 2258363 1 8
7 2607133 1 8
8 3025831 1 8
9 3051992 1 8
10 3101536 1 8")
It appears I was able to repurpose the answer to this question to achieve this.
r igraph - how to add labels to vertices based on vertex id
The key was to use the vertex.label attribute within plot() and a select a sliced subset of nodes$names.
For our index we can use the ordered default labels returned in igraph automatically. To extract these, you can type V(gg)$names.
Within plot(gg) we can then write:
vertex.label = nodes[c(as.numeric(V(gg)$name)+1),]$name
# 1 Convert to numeric
# 2 Add 1 for offset between proxy links index and nodes index
# 3 Select subset of nodes with above as row index. Return name column
As full code:
gg <- graph.data.frame(links,directed=FALSE)
plot(gg, vertex.color = 'lightblue', edge.label=links$value, vertex.size=1, edge.color="darkgreen",
vertex.label.font=1, edge.label.font =1, edge.label.cex = 1,
vertex.label.cex = 2, vertex.label = nodes[c(as.numeric(V(gg)$name)+1),]$name)
With the data above, this gave:
The easiest solution would be to reorder the columns of links, because according to the documentation:
"If vertices is NULL, then the first two columns of d are used as a symbolic edge list and additional columns as edge attributes."
Hence, your code will give the correct output after running:
links <- links[,c(4,5,3)]

How to get the edge list of a strongly connected components in a graph?

I have a weighted directed multigraph with a few cycles. With clusters function in igraph package, I can get the nodes belongs to a strongly connected components. But I need the path/order of the nodes that form a cycle.
EDIT after #josilber's response
I have a very dense graph, with 30 nodes and around 2000 edges. So graph.get.subisomorphisms.vf2 takes too long to run in my case.
I'm not familiar with graph algorithm, but I'm thinking maybe do a DFS to the original or reverse graph and use the order or order.out might work, but not sure.
Or any other ideas to make this run faster are welcomed!
Example
library(igraph)
set.seed(123)
graph <-graph(c(1,2,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,8,10,9,10,9,10,10,11,10,5,11,12,12,13,13,14,14,15,14,20,15,16, 16,17,17,18,18,19,19,20,20,21,21,1,22,23,22,23,23,22),directed=T)
E(graph)$weight= runif(ecount(graph),0,10)
> clusters(graph, "strong")
$membership
[1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1
$csize
[1] 2 21
$no
[1] 2
How do I get the edge list of a cycle with the highest weight here? Thanks!
Assuming that all nodes in each strongly connected component form a cycle and that you're only interested in this large cycle (e.g. in your example you're just interested in the cycle with nodes 1:21 and the cycle with nodes 22:23), then you can extract the node order that forms the cycle, grab the weights on the edges, and compute the total weight of the cycle.
# Compute the node order of the cycle for each component by performing an
# isomorphism with a same-sized directed ring graph
clusts <- clusters(graph, "strong")
(cycles <- lapply(1:clusts$no, function(x) {
sg <- induced.subgraph(graph, clusts$membership == x)
n <- sum(clusts$membership == x)
col <- rep(c(1, 0), c(1, n-1)) # Used to grab isomorphism starting at 1
sg.idx <- graph.get.subisomorphisms.vf2(sg, graph.ring(n, directed=TRUE), col, col)[[1]]
which(clusts$membership == x)[sg.idx]
}))
# [[1]]
# [1] 22 23
#
# [[2]]
# [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Now you can grab the sum of the edge weights for each cycle:
sapply(cycles, function(x) sum(graph[from=x, to=c(tail(x, -1), x[1])]))
# [1] 8.833018 129.959437
Note that this is in general NP-hard, because finding a Hamiltonian cycle in a general graph is NP-hard. Therefore the graph.get.subisomorphisms.vf2 call could be quite slow for large graphs.

Filter between threshold

I am working with a large dataset and I am trying to first identify clusters of values that meet specific threshold values. My aim then is to only keep clusters of a minimum length. Below is some example data and my progress thus far:
Test = c("A","A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B")
Sequence = c(1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10)
Value = c(3,2,3,4,3,4,4,5,5,2,2,4,5,6,4,4,6,2,3,2)
Data <- data.frame(Test, Sequence, Value)
Using package evd, I have identified clusters of values >3
C1 <- clusters(Data$Value, u = 3, r = 1, cmax = F, plot = T)
Which produces
C1
$cluster1
4
4
$cluster2
6 7 8 9
4 4 5 5
$cluster3
12 13 14 15 16 17
4 5 6 4 4 6
My problem is twofold:
1) I don't know how to relate this back to the original dataframe (for example to Test A & B)
2) How can I only keep clusters with a minimum size of 3 (thus excluding Cluster 1)
I have looked into various filtering options etc. however they do not cluster data according to a desired threshold, with no options for the minimum size of the cluster either.
Any help is much appreciated.
Q1: relate back to original dataframe: Have a look at Carl Witthoft's answer. He wrote a variant of rle() (seqle() because it allows one to look for integer sequences rather than repetitions): detect intervals of the consequent integer sequences
Q2: only keep clusters of certain length:
C1[sapply(C1, length) > 3]
yields the 2 clusters that are long enough:
$cluster2
6 7 8 9
4 4 5 5
$cluster3
12 13 14 15 16 17
4 5 6 4 4 6

Resources