Find previous node in tree - r

I've got the following code:
df <- read.table(text='verkoop V621
verkoopcode V62123
verkoopcodenaam V6212355
verkoopdatum V621335
verkoopdatumchar V62133526
verkooppr V6216
verkoopprijs V62162
verkoopsafdeling V621213452
verkoopsartikel V62126324')
# use igraph package
require(igraph)
# create adjacency matrix
adj <- nchar(sapply(df$V1, gsub, x=df$V1, replacement=''))
adj[!sapply(df$V1, grepl, x=df$V1)] <- 0
# name adjecency matrix
colnames(adj) <- df$V2
# original graph
gr <- graph.adjacency(adj, mode='directed', weighted=TRUE)
layout(matrix(1:2, ncol=2))
plot(gr)
# minimum spanning tree
mst <- minimum.spanning.tree(gr)
shortest.paths(mst, to="V621", weights=rep(1, ecount(mst)))
Now I get for every node the depth in the tree. I want to determine now which node comes before a specific node. For example, for 'verkoopdatumchar' I want to find 'verkoopdatum'.

Related

R igraph make “subgraph” from igraph object from list of vertices, infer edges between selected vertices if there are connected nodes in original

I am using igraph from R. I know we can make a subgraph with selected vertices but if those nodes aren’t directly connected, there won’t be an edge in the new subgraph. Is there a way to make a subgraph which creates an edge between two nodes if there are other nodes (that are not a part of the vertex list) indirectly connecting those two nodes?
For example, if I have a graph which has the following edges:
E-F
F-G
And my vertex list contains E and G, how can I create a new subgraph that creates that edge E-G?
Thank you!!!
One way to find neighbors that are two steps away is to multiply the adjacency matrix with itself (see comments here for example).
First create the graph described in the question:
library(igraph)
g <- graph_from_literal(E--F, F--G)
Then take the adjacency matrix (m) and multiply it with itself.
m <- get.adjacency(g, sparse = F)
m2 <- m %*% m
Built new graph from resulting adjacency matrix and remove all vertices that have a degree of 0 (no second-degree neighbor):
g2 <- graph_from_adjacency_matrix(m2, diag = F, mode = "undirected")
induced_subgraph(g2, degree(g2) > 0)
#> IGRAPH 089bf67 UN-- 2 1 --
#> + attr: name (v/c)
#> + edge from 089bf67 (vertex names):
#> [1] E--G
Created on 2022-08-26 with reprex v2.0.2
Building upon the suggestions in the comments, I arrive at:
require(igraph)
set.seed(1)
g <- erdos.renyi.game(2^6, 1/32)
V(g)$name <- seq(vcount(g))
filter <- c(7,22, 1, 4, 6)
amg <- g[] # adjacency matrix g
clg <- clusters(g)$membership # strongly connected components
amtc <- clg[row(amg)] == clg[col(amg)] # adjacency matrix of transitive closure
dim(amtc) <- dim(amg)
gtc <- simplify(graph.adjacency(amtc, mode="undirected")) # transitive closure of g
V(gtc)$name <- V(g)$name
isg <- induced_subgraph(gtc, filter)
plot(isg)
However this solution is not feasible if g is large and the subgraph significantly smaller.
If subgraph << original graph then:
require(igraph)
set.seed(1)
g <- erdos.renyi.game(2^6, 1/32)
V(g)$name <- seq(vcount(g))
filter <- c(1, 4, 6, 7, 22, 25)
stopifnot(!is.directed(g)) # assume undirected graph
mscc <- components(g)$membership[filter] # membership strongly connected components
amfi <- outer(X=mscc, mscc, FUN = "==")*1 # cross product = 1, when equal
fitc <- simplify(graph.adjacency(amfi, mode="undirected")) # transitive closure of filter in g
plot(fitc)
Building on Szabolcs, note that connect(g, vcount(g)) computes the transitive closure of g. However not suitable for larger graphs (vcount > 8192).
require(igraph)
g <- make_graph(~ E-G, G-F)
fi <- c("E", "F")
system.time(tcg <- connect(g, vcount(g)) )
sg <- subgraph(tcg, V(tcg)[fi])
sg

Does igraph has a function that generates sub-graphs limited by weights? dfs, random_walk

I have a weighted graph in igraph R environment.
And need to obtain sub-graphs recursively, starting from any random node. The sum of weights in each sub-graph has to be less them a number.
The Deep First Search algorithm seems to deal with this problem. Also the random walk function.
Does anybody know which igraph function could tackle this?
This iterative function finds the sub-graph grown from vertex vertex of any undirected graph which contains the biggest possible weight-sum below a value spevified in limit.
A challange in finding such a graph is the computational load of evaluating the weight sum of any possible sub-graphs. Consider this example, where one iteration has found a sub-graph A-B with a weight sum of 1.
The shortest path to any new vertex is A-C (with a weight of 3), a sub-graph of A-B-D has a weight-sum of 6, while A-B-C would have a weight-sum of 12 because of the inclusion of the edge B-C in the sub-graph.
The function below looks ahead and evaluates iterative steps by choosing to gradually enlarge the sub-graph by including the next vertex that would result in the lowest sub-graph weight-sum rather than that vertex which has the shortest direct paths.
In terms of optimisation, this leaves something to be desired, but I think id does what you requested in your first question.
find_maxweight_subgraph_from <- function(graph, vertex, limit=0, sub_graph=c(vertex), current_ws=0){
# Keep a shortlist of possible edges to go next
shortlist = data.frame(k=integer(0),ws=numeric(0))
limit <- min(limit, sum(E(graph)$weight))
while(current_ws < limit){
# To find the next possible vertexes to include, a listing of
# potential candidates is computed to be able to choose the most
# efficient one.
# Each iteration chooses amongst vertecies that are connected to the sub-graph:
adjacents <- as.vector(adjacent_vertices(graph, vertex, mode="all")[[1]])
# A shortlist of possible enlargements of the sub-graph is kept to be able
# to compare each potential enlargement of the sub-graph and always choose
# the one which results in the smallest increase of sub-graph weight-sum.
#
# The shortlist is enlarged by vertecies that are:
# 1) adjacent to the latest added vertex
# 2) not alread IN the sub-graph
new_k <- adjacents[!adjacents %in% sub_graph]
shortlist <- rbind(shortlist[!is.na(shortlist$k),],
data.frame(k = new_k,
ws = rep(Inf, length(new_k)) )
)
# The addition to the weight-sum is NOT calculated by the weight on individual
# edges leading to vertecies on the shortlist BUT on the ACTUAL weight-sum of
# a sub-graph that would be the result of adding a vertex `k` to the sub-graph.
shortlist$ws <- sapply(shortlist$k, function(x) sum( E(induced_subgraph(graph, c(sub_graph,x)))$weight ) )
# We choose the vertex with the lowest impact on weight-sum:
shortlist <- shortlist[order(shortlist$ws),]
vertex <- shortlist$k[1]
current_ws <- shortlist$ws[1]
shortlist <- shortlist[2:nrow(shortlist),]
# Each iteration adds a new vertex to the sub-graph
if(current_ws <= limit){
sub_graph <- c(sub_graph, vertex)
}
}
(induced_subgraph(graph, sub_graph))
}
# Test function using a random graph
g <- erdos.renyi.game(16, 30, type="gnm", directed=F)
E(g)$weight <- sample(1:1000/100, length(E(g)))
sum(E(g)$weight)
plot(g, edge.width = E(g)$weight, vertex.size=2)
sg <- find_maxweight_subgraph_from(g, vertex=12, limit=60)
sum(E(sg)$weight)
plot(sg, edge.width = E(sg)$weight, vertex.size=2)
# Test function using your example code:
g <- make_tree(10, children = 2, mode = c("undirected"))
s <- seq(1:10)
g <- set_edge_attr(g, "weight", value= s)
plot(g, edge.width = E(g)$weight)
sg <- find_maxweight_subgraph_from(g, 2, 47)
sum(E(sg)$weight)
plot(sg, edge.width = E(g)$weight)
It is done here below, however, it does not seem to be effective.
#######Example code
g <- make_tree(10, children = 2, mode = c("undirected"))
s <- seq(1:19)
g <- set_edge_attr(g, "weight", value= s)
plot(g)
is_weighted(g)
E(g)$weight
threshold <- 5
eval <- function(r){
#r <- 10
Vertice_dfs <- dfs(g, root = r)
Sequencia <- as.numeric(Vertice_dfs$order)
for (i in 1:length(Sequencia)) {
#i <- 2
# function callback by vertice to dfs
f.in <- function(graph, data, extra) {
data[1] == Sequencia[i]-1
}
# DFS algorithm to the function
dfs <- dfs(g, root = r,in.callback=f.in)
# Vertices resulted from DFS
dfs_eges <- na.omit(as.numeric(dfs$order))
# Rsulted subgraph
g2 <- induced_subgraph(g, dfs_eges)
# Total weight subgraph g2
T_W <- sum(E(g2)$weight)
if (T_W > threshold) {
print(T_W)
return(T_W)
break
}
}
}
#search by vertice
result <- lapply(1:length(V(g)),eval)

Sort list into hash table according to specific comparisson criteria in R

I am looking for a way, in R, to convert a list into a hash table, grouping elements that are similar according to a specific criteria.
The details are specific to "graph theory", as explained bellow, but I suppose the answer is a general procedure to hash based on some specific criteria.
The list is comprised of "graph" objects (from igraph package).
library(igraph)
#Creating the list of graphs
edgeList <- data.frame(
idA=c(008, 001, 001, 010, 047, 002, 005, 005),
idB=c(100, 010, 020, 030, 030, 001, 011, 111)
)
edgeList$idB= edgeList$idB+0.1
g <- graph_from_data_frame(edgeList, directed = TRUE)
g_list <- decompose(g, mode = "weak")
#from the 8 edges we obtain 5 graphs (connected components of the original graph)
The similarity criteria is that graphs must be isomorphic:
isomorphic(g_list[[1]],g_list[[4]])
How can I hash the indexes for the elements in g_list into a hash table?
For this toy example the expected result should be:
g_inded_hash
[[1]]
[1] 1 4
[[2]]
[1] 2 5
[[3]]
[1] 3
(not necessarily a list, but some data structure that groups graphs (1 and 4) and (2 and 5) which are similar)
In reality, I have 40 millions of (small) graphs that I need to group according to the isomorphisms.
From searching I found the answer must be related to the hash package or environment, but could not adapt that into a solution.
EDIT: changed directed = TRUE in graph_from_data_frame(), above.
Since isomorphism is transitive, we can look at all the pairs of components (i,j), such that i < j, then build a graph where the nodes are the components and the edges are defined by the isomorphic property. The hash table can be extracted from the connected components of this new graph.
# all pairs (i,j) such that i < j
combinations <- unlist(sapply(seq_along(g_list),
function(j) lapply(seq_len(j-1),
function(i) c(i,j))),
recursive = FALSE)
# filter the isomorphic pairs
iso <- Filter(function(pair) isomorphic(g_list[[pair[1]]],g_list[[pair[2]]]),
combinations)
# convert to data frame
df <- data.frame(matrix(unlist(iso), ncol = 2, byrow = TRUE))
# build graph where the vertices are the components
# and the edges indicate the isomorphic property
g_iso <- graph_from_data_frame(df, directed = FALSE)
# identify groups that share the same property
groups <- clusters(g_iso)$membership
# the names are the indices of g_list
g_hash <- lapply(unique(groups),
function(i) as.integer(names(which(groups == i))))
Result:
> g_hash
[[1]]
[1] 2 3 5
[[2]]
[1] 1 4
This does not match the expected result in the question but isomorphic(g_list[[2]],g_list[[3]]) and isomorphic(g_list[[3]],g_list[[5]]) are true.
It's probably not the most straightforward way to do this but that's what came to mind.
I managed to write a solution for my problem. It is probably not very "Rish", not very efficient, with all the loops, but I think it works. Please let me know of a better way to do this.
gl_hash <- list()
gl_hash[1] <- 1
j <- 1
for(i in 2:length(gl)) {
m <- 0
for(k in 1:j){
if(isomorphic( gl[[ gl_hash[[k]][1] ]], gl[[i]])) {
gl_hash[[k]] <- c(gl_hash[[1]],i)
m <- 1
break
}
}
if(m==0) {
j <- j+ 1
gl_hash[j] <- i
}
}

Efficient way to analysis neighbours of subsets of nodes in large graph

I have a graph of 6 million of nodes such as
require(igraph)
# Graph of 1000 nodes
g <- ba.game(1000)
with the following four attributes defined for each node
# Attributes
V(g)$attribute1 <- V(g) %in% sample(V(g), 20)
V(g)$attribute2 <- V(g) %in% sample(V(g), 20)
V(g)$attribute3 <- V(g) %in% sample(V(g), 20)
V(g)$attribute4 <- V(g) %in% sample(V(g), 20)
Among the nodes I have a subset of 12,000 that are of particular interest:
# Subset of 100 nodes
V(g)$subset <- V(g) %in% sample(V(g), 100)
What I want to obtain is an analysis (count) of the neighbourhood of my subset. That is, I want to define
V(g)$neigh.attr1 <- rep(NA, vcount(g))
V(g)$neigh.attr2 <- rep(NA, vcount(g))
V(g)$neigh.attr3 <- rep(NA, vcount(g))
V(g)$neigh.attr4 <- rep(NA, vcount(g))
such that NA is replaced for every node in the subset with the corresponding count of neighbouring nodes with V(g)$attribute{1..4}==TRUE.
I can easily create a list of the neighbourhood of interest with
neighbours <- neighborhood(g, order = 1, V(g)[V(g)$subset==TRUE], mode = "out")
but I can't think of an efficient way to iterate over every neighbours and compute the statistics for each of the four attributes. Indeed the only way I've came up with is a loop which given the size of my original graph takes just too long:
subset_indices <- as.numeric(V(g)[V(g)$subset==TRUE])
for (i in 1:length(neighbours)) {
V(g)$neigh.attr1[subset_indices[i]] <- sum(V(g)$attribute1[neighbours[[i]]])
V(g)$neigh.attr2[subset_indices[i]] <- sum(V(g)$attribute2[neighbours[[i]]])
V(g)$neigh.attr3[subset_indices[i]] <- sum(V(g)$attribute3[neighbours[[i]]])
V(g)$neigh.attr4[subset_indices[i]] <- sum(V(g)$attribute4[neighbours[[i]]])
}

Deleting a single node in R

I'm trying to visualize a preferential network of products using R. I already have a graph of the product network using igraph, but I want to see what would happen if I were to remove one product. I found that I can delete a node using
g2 <- g - V(g)[15]
but it would also delete all the edges connected to that specific node.
Is there any way to delete just the node and to see how the other nodes reconnect to each other after the deletion of that one node? Any help in this matter is appreciated.
P.S.
Hopefully this will make it clearer:
For example, if we generate the random graph:
set.seed(10)
Data <- data.frame(
X = sample(1:10),
Y = sample(3, 10, replace=T)
)
d <- graph.data.frame(Data)
plot(d)
d2 <- d-V(d)[2] #deleting '3' from the network
plot(d2)
If you notice, when you delete the node '3' from the network, node '9' remains unconnected. Is there a way to see the new edge of node '9' after node '3' is connected? Still following the same plot, we would expect that it would connect to node '2'. Is there a function that does this in igraph? Or should i make a code for it?
Maybe not the most efficient way, but it should work :
library(igraph)
set.seed(10) # for plot images reproducibility
# create a graph
df <- data.frame(
X = c('A','A','B','B','D','E'),
Y = c('B','C','C','F','B','B')
)
d <- graph.data.frame(df)
# plot the original graph
plot(d)
# function to remove the vertex
removeVertexAndKeepConnections <- function(g,v){
# we does not support multiple vertices
stopifnot(length(v) == 1)
vert2rem <- V(g)[v]
if(is.directed(g) == FALSE){
# get the neigbors of the vertex to remove
vx <- as.integer(V(g)[nei(vert2rem)])
# create edges to add before vertex removal
newEdges <- as.matrix(unique(expand.grid(vx,vx)))
# remove the cycles
newEdges <- newEdges[newEdges[,1] != newEdges[,2],]
# sort each row index to remove all the duplicates
newEdges <- t(apply(newEdges,1,sort))
newEdges <- unique(newEdges)
}else{
# get the ingoing/outgoing neigbors of the vertex to remove
vx <- as.integer(V(g)[nei(vert2rem,mode='in')])
vy <- as.integer(V(g)[nei(vert2rem,mode='out')])
# create edges to add before vertex removal
newEdges <- as.matrix(unique(expand.grid(vx,vy)))
}
# remove already existing edges
newEdges <- newEdges[!apply(newEdges,MARGIN=1,FUN=function(x)are.connected(g,x[1],x[2])),]
# add the edges
g <- g + edges(as.integer(t(newEdges)))
# remove the vertex
g <- g - vertex(vert2rem)
return(g)
}
# let's remove B (you can also use the index
v <- 'B'
plot(removeVertexAndKeepConnections(d,v))
Original :
Modified :

Resources