How to create graph with large number of points in R? - r

I have a large dataset contains a large number of nodes; more than 25000 nodes, organized in a .csv file. The structure is similar to the following :
node freq
a 3
b 2
c 5
I want to create a graph from these node in which edges between nodes are constructed by a function of the freq column. I have used the rgraph function from sna package, such as:
num_nodes <- length(data$node)
pLink = data$freq/10
# create 1 graph with nodes and link proability, graph loops = FALSE
graph_adj= rgraph(num_nodes,1,pLink,"graph",FALSE)
graph <- graph.adjacency(graph_adj, mode="undirected")
The above code is running in case of small number of nodes, but with large number of nodes, The R session aborted with the following Error:
Error: C stack usage 19924416 is too close to the limit
Is there another way to create a graph with the mentioned properties: a large number of nodes and edges are created with probability?

Related

Batching batches of graphs in PyTorch Geometric

I'm working with some data that have a precise structure and I'm struggling to keep the same structure when using graphs. The data has a nested structure:
one sample has L sub-samples
one sub-sample has S HeteroData graphs
Probably you already see the problem. In the standard case one has data samples and creates batches from them (e.g. if batch size is B then one batch contains B graphs). But in my case I first need to group my S graphs into a sub-sample, then group the L sub-samples and finally create batches from them. In other words, if batch size is B, one batch would have B samples, each of them having L sub-samples, each of them having S graphs.
Is there a way to do that in PyTorch Geometric?

Alternative for shortest_path algorithm

I have a network consisting of 335 nodes. I computed the weighted shortest.paths between all of the nodes.
Now I would like to see which path sequences where used to travel between the nodes.
I use the the shortest_path command in igraph and iterate through all combinations of nodes in my network (335² combinations - 335(path from/to same node is 0)/2 (graph is undirected). So all in all I have to iterate over 55.945 combinations.
My approach looks like this:
net is my network
sp_data is a df with all combinations of links in the network
results1 <- sapply(sp_data[,1], function(x){shortest_paths(net, from = x, to = V(net), output="epath"})
Unfortunately this needs ages to compute and at the end I don't have enough memory to store the information. (Error: cannot allocate vector of size 72 Kb).
Basically I have two questions:
How can it be that the shortest.paths command needs seconds to compute the distance between all nodes of my network whereas extracting the path sequences (not just it length) needs days and exceeds the memory capacity?
Is there an alternative to get the desired output (path sequences of shortest path)? I guess that the sapply Syntax should already be faster than a for::loop?
you could try cppRouting package.
It provides get_multi_paths function which return a list containing the node sequence for each shortest path.
library(igraph)
library(cppRouting)
#random graph
g <- make_full_graph(335)
#convert to three columns data.frame
df<-data.frame(igraph::as_data_frame(g),dist=1)
#instantiate cppRouting graph
gr<-cppRouting::makegraph(df)
#extract all nodes
all_nodes<-unique(c(df$from,df$to))
#Get all paths sequence
all_paths<-get_multi_paths(Graph=gr,from=all_nodes,to=all_nodes)
#Get path from node 1 to 3
all_paths[["1"]][["3"]]

How to create network with both edges and isolates using statnet/igraph

My question is similar to the one posted here: Network adding edges error
I am creating a network from scratches: I have data about 228 vertices, over a period of 13 years. In the first year, I have just 1781 edges: they do not involve all of my vertices (barely 164), therefore the remaining nodes should result as isolated.
I created the network starting from my edgelist, using the code
fdi.graph.2003 <- graph_from_data_frame(fdi.edge.2003, directed = T, vertices = fdi.attr.2003)
where fdi.edge.2003 is a data.frame containing edge attributes (including some potential weight columns): it only involves 164 vertices out of the total vertices defined by fdi.attr.2003 is a data.frame containing a row for each vertex that is involved in the edgelist (164 in total).
all i get is a network with 164 vertices and no isolates. However, I know they do exist in my data! Any suggestion on how to do it? I think that I shoul initialize a network with all 228 vertices, adding their attributes and then adding the edges. However, nothing I am trying is working: rather, I am receiving the most disparate errors related to "Illegal vertex reference in addEdges_R".
Any suggestion is more than welcome, also in the case it would involve the alternative package igraph, for which I am finding the same problem
Filippo
Use add.isolates from the sna package
net1 = as.network(cbind(1:3, 3:5)) #5 vertices, 3 edges
net2 = as.network(add.isolates(net1, 10), matrix.type = "edgelist") #15 v, 3 e
And then you'll probably want to create new vertex names, e.g.
net2%v%"vertex.names" = 1:15

Get node descendants in a tree graph

I have a directed graph (grafopri1fase1) the graph has no loops and it has a tree structure (not binary tree).
I have an array of nodes (meterdiretti) that i have extracted from the graph (grafopri1fase1) matching a condition.
I would like to know starting from each node of Meterdiretti how many nodes are under each node of Meterdiretti.
The result I would like to have is a Matrix with the following format
first column------------ second column
meterdiretti[1] -------- total amount of nodes reachable starting from meterdiretti[1]
meterdiretti[2] -------- total amount of nodes reachable starting from meterdiretti[2]
....
meterdiretti[n] ----------total amount of nodes reachable starting from meterdiretti[n]
Take a punt at what you want - it would be good if you could add a reproducible example to your question.
I think what you want is to count the descendents of a node. You can do this with neighborhood.size and mode="out" argument.
library(igraph)
# create a random graph
g <- graph.tree(17, children = 2)
plot(g, layout=layout.reingold.tilford)
# test on a single node
neighborhood.size( g, vcount(g), "1", "out") - 1
# [1] 16
# apply over a few nodes
neighborhood.size( g, vcount(g), c(1,4,7), "out") - 1
[1] 16 4 2

Extract just the number of edges in a class graphNEL

Using R I have created an graphNEL class (undirected) with nodes and edges. I want to save the numbers that gets printed out when I print the variable which the undirected graph is saved under:
graphNEL graph with undirected edges
Number of Nodes = 671
Number of Edges = 4267
I tried using the function edgeL(), but the number I get printed out is the number of nodes. I was thinking that I get this number because each gene has its own amount of edges, which is why the output equals the number of nodes. All I want is to save the number of edges in this graph. How can I do this?
Thanks
I am not familiar with the package, however it looks like nodes and edgeL are returned as slots. This means that you should be able to see the data using $ or # if you save your graphNEL as an object.
yourgraphObject#edgeL

Resources