I'm trying to visualize a preferential network of products using R. I already have a graph of the product network using igraph, but I want to see what would happen if I were to remove one product. I found that I can delete a node using
g2 <- g - V(g)[15]
but it would also delete all the edges connected to that specific node.
Is there any way to delete just the node and to see how the other nodes reconnect to each other after the deletion of that one node? Any help in this matter is appreciated.
P.S.
Hopefully this will make it clearer:
For example, if we generate the random graph:
set.seed(10)
Data <- data.frame(
X = sample(1:10),
Y = sample(3, 10, replace=T)
)
d <- graph.data.frame(Data)
plot(d)
d2 <- d-V(d)[2] #deleting '3' from the network
plot(d2)
If you notice, when you delete the node '3' from the network, node '9' remains unconnected. Is there a way to see the new edge of node '9' after node '3' is connected? Still following the same plot, we would expect that it would connect to node '2'. Is there a function that does this in igraph? Or should i make a code for it?
Maybe not the most efficient way, but it should work :
library(igraph)
set.seed(10) # for plot images reproducibility
# create a graph
df <- data.frame(
X = c('A','A','B','B','D','E'),
Y = c('B','C','C','F','B','B')
)
d <- graph.data.frame(df)
# plot the original graph
plot(d)
# function to remove the vertex
removeVertexAndKeepConnections <- function(g,v){
# we does not support multiple vertices
stopifnot(length(v) == 1)
vert2rem <- V(g)[v]
if(is.directed(g) == FALSE){
# get the neigbors of the vertex to remove
vx <- as.integer(V(g)[nei(vert2rem)])
# create edges to add before vertex removal
newEdges <- as.matrix(unique(expand.grid(vx,vx)))
# remove the cycles
newEdges <- newEdges[newEdges[,1] != newEdges[,2],]
# sort each row index to remove all the duplicates
newEdges <- t(apply(newEdges,1,sort))
newEdges <- unique(newEdges)
}else{
# get the ingoing/outgoing neigbors of the vertex to remove
vx <- as.integer(V(g)[nei(vert2rem,mode='in')])
vy <- as.integer(V(g)[nei(vert2rem,mode='out')])
# create edges to add before vertex removal
newEdges <- as.matrix(unique(expand.grid(vx,vy)))
}
# remove already existing edges
newEdges <- newEdges[!apply(newEdges,MARGIN=1,FUN=function(x)are.connected(g,x[1],x[2])),]
# add the edges
g <- g + edges(as.integer(t(newEdges)))
# remove the vertex
g <- g - vertex(vert2rem)
return(g)
}
# let's remove B (you can also use the index
v <- 'B'
plot(removeVertexAndKeepConnections(d,v))
Original :
Modified :
Related
I have a weighted graph in igraph R environment.
And need to obtain sub-graphs recursively, starting from any random node. The sum of weights in each sub-graph has to be less them a number.
The Deep First Search algorithm seems to deal with this problem. Also the random walk function.
Does anybody know which igraph function could tackle this?
This iterative function finds the sub-graph grown from vertex vertex of any undirected graph which contains the biggest possible weight-sum below a value spevified in limit.
A challange in finding such a graph is the computational load of evaluating the weight sum of any possible sub-graphs. Consider this example, where one iteration has found a sub-graph A-B with a weight sum of 1.
The shortest path to any new vertex is A-C (with a weight of 3), a sub-graph of A-B-D has a weight-sum of 6, while A-B-C would have a weight-sum of 12 because of the inclusion of the edge B-C in the sub-graph.
The function below looks ahead and evaluates iterative steps by choosing to gradually enlarge the sub-graph by including the next vertex that would result in the lowest sub-graph weight-sum rather than that vertex which has the shortest direct paths.
In terms of optimisation, this leaves something to be desired, but I think id does what you requested in your first question.
find_maxweight_subgraph_from <- function(graph, vertex, limit=0, sub_graph=c(vertex), current_ws=0){
# Keep a shortlist of possible edges to go next
shortlist = data.frame(k=integer(0),ws=numeric(0))
limit <- min(limit, sum(E(graph)$weight))
while(current_ws < limit){
# To find the next possible vertexes to include, a listing of
# potential candidates is computed to be able to choose the most
# efficient one.
# Each iteration chooses amongst vertecies that are connected to the sub-graph:
adjacents <- as.vector(adjacent_vertices(graph, vertex, mode="all")[[1]])
# A shortlist of possible enlargements of the sub-graph is kept to be able
# to compare each potential enlargement of the sub-graph and always choose
# the one which results in the smallest increase of sub-graph weight-sum.
#
# The shortlist is enlarged by vertecies that are:
# 1) adjacent to the latest added vertex
# 2) not alread IN the sub-graph
new_k <- adjacents[!adjacents %in% sub_graph]
shortlist <- rbind(shortlist[!is.na(shortlist$k),],
data.frame(k = new_k,
ws = rep(Inf, length(new_k)) )
)
# The addition to the weight-sum is NOT calculated by the weight on individual
# edges leading to vertecies on the shortlist BUT on the ACTUAL weight-sum of
# a sub-graph that would be the result of adding a vertex `k` to the sub-graph.
shortlist$ws <- sapply(shortlist$k, function(x) sum( E(induced_subgraph(graph, c(sub_graph,x)))$weight ) )
# We choose the vertex with the lowest impact on weight-sum:
shortlist <- shortlist[order(shortlist$ws),]
vertex <- shortlist$k[1]
current_ws <- shortlist$ws[1]
shortlist <- shortlist[2:nrow(shortlist),]
# Each iteration adds a new vertex to the sub-graph
if(current_ws <= limit){
sub_graph <- c(sub_graph, vertex)
}
}
(induced_subgraph(graph, sub_graph))
}
# Test function using a random graph
g <- erdos.renyi.game(16, 30, type="gnm", directed=F)
E(g)$weight <- sample(1:1000/100, length(E(g)))
sum(E(g)$weight)
plot(g, edge.width = E(g)$weight, vertex.size=2)
sg <- find_maxweight_subgraph_from(g, vertex=12, limit=60)
sum(E(sg)$weight)
plot(sg, edge.width = E(sg)$weight, vertex.size=2)
# Test function using your example code:
g <- make_tree(10, children = 2, mode = c("undirected"))
s <- seq(1:10)
g <- set_edge_attr(g, "weight", value= s)
plot(g, edge.width = E(g)$weight)
sg <- find_maxweight_subgraph_from(g, 2, 47)
sum(E(sg)$weight)
plot(sg, edge.width = E(g)$weight)
It is done here below, however, it does not seem to be effective.
#######Example code
g <- make_tree(10, children = 2, mode = c("undirected"))
s <- seq(1:19)
g <- set_edge_attr(g, "weight", value= s)
plot(g)
is_weighted(g)
E(g)$weight
threshold <- 5
eval <- function(r){
#r <- 10
Vertice_dfs <- dfs(g, root = r)
Sequencia <- as.numeric(Vertice_dfs$order)
for (i in 1:length(Sequencia)) {
#i <- 2
# function callback by vertice to dfs
f.in <- function(graph, data, extra) {
data[1] == Sequencia[i]-1
}
# DFS algorithm to the function
dfs <- dfs(g, root = r,in.callback=f.in)
# Vertices resulted from DFS
dfs_eges <- na.omit(as.numeric(dfs$order))
# Rsulted subgraph
g2 <- induced_subgraph(g, dfs_eges)
# Total weight subgraph g2
T_W <- sum(E(g2)$weight)
if (T_W > threshold) {
print(T_W)
return(T_W)
break
}
}
}
#search by vertice
result <- lapply(1:length(V(g)),eval)
For example if we have a graph 1-2-3 and delete the vertex 2, then the graph will be 1-3. I have a huge graph with 10000000+ vertices, so I can't delete and create all of them by hand. When I use delete.vertices(g, verticesToDelete) it automatically deletes the edges that they had with their neighbors.
Let's say we have a graph of the stackoverflow users and badges, where an edge means that a user has that badge. I want to have edges between all the users that have that badge. Below is a code sample :
users <- c(1,2,3,4,5,6,7,8)
badges <- c('Teacher','Teacher','Teacher','Student','Student','Student','Popular Question','Popular Question')
edgeList <- data.frame(users,badges)
library(igraph)
g <- graph_from_data_frame(edgeList,directed = FALSE)
plot(g)
verticesToDelete <- c('Teacher','Student','Popular Question')
g2 <- delete.vertices(g, verticesToDelete)
plot(g2)
# I want the graph to be like the one below after the deletions
users1 <- c(1,1,2,4,4,5,7)
users2 <- c(2,3,3,5,6,6,8)
edgeList2 <- data.frame(users1,users2)
g3 <- graph_from_data_frame(edgeList2,directed = FALSE)
plot(g3)
How about this?
edgeList <- data.frame(users,badges)
edgeList_badges <- merge(edgeList,edgeList,by="badges",
all=T)
edgeList_badges$badges <- NULL
edgeList_badges <-edgeList_badges %>% filter(users.x!=users.y)
edgeList_badges<-edgeList_badges[!duplicated(t(apply(edgeList_badges[1:2], 1, sort))), ]
g4 <- graph_from_data_frame(edgeList_badges,directed = FALSE)
plot(g4)
You merge table edgeList with itself by badge to get all combinations of users with the same badge
Delete column badge: we do not need it
Delete relation of users with themselves
Delete permutation of users: if there is a link between 1 and 2, I do not need a link between 2 and 1 (this will solve point 3, also)
Enjoy your graph (if this was the graph you asked for...)
Here is another option
library(DescTools)
edgeList <- data.frame(users,badges)
combSetTmp <- list()
for(badge in 1:length(verticesToDelete)){
tmp <- edgeList %>% filter(badges==verticesToDelete[badge]) %>% select(users)
combSetTmp[[badge]] <- CombSet(tmp$users,2)
}
combSet <- do.call(rbind, combSetTmp)
g4 <- graph_from_edgelist(combSet,directed = FALSE)
plot(g4)
We filter users having the same badge
Create all sets of those users
Join all sets
Draw the graph
It should be more "memory-friendly"
I would like to know if I can find so-called n-cliques in an igraph object. Those are defined as "a maximal subgraph in which the largest geodesic distance between any two nodes is no greater than n" according to Wasserman & Faust. I'm aware that cliques of n=1 can be found via cliques() and that the sizes of cliques can be defined beforehand, but is there any way to find cliques of n larger than 1?
In theory, you could try RBGL::kCliques:
library(igraph)
library(RBGL)
set.seed(1)
g <- random.graph.game(100, p.or.m = 300, type = "gnm")
coords <- layout.auto(g)
cl <- kCliques(igraph.to.graphNEL(g))
k <- 2
clSel <- cl[[paste0(k, '-cliques')]][[1]] # select first of all k-cliques (e.g.)
plot(
g,
layout = coords,
vertex.shape = "none",
vertex.label.color = ifelse(V(g) %in% clSel, "red", "darkgrey"),
edge.color = ifelse(tail_of(g, E(g)) %in% clSel & head_of(g, E(g)) %in% clSel, "orange", "#F0F0F099"),
vertex.size = .5,
edge.curved = 1
)
However, in practice...
all(print(distances(induced_subgraph(g, clSel))) <=k ) # should be TRUE
# [1] FALSE
there seems to be something wrong if we use the definition:
In Social Network Analysis, a k-clique in a graph is a subgraph where
the distance between any two nodes is no greater than k.
Or maybe I misunderstood something...
Thanks to lukeA for pointing out RBGL::kCliques as a solution within R for this problem.
n-cliques are allowed to have links through other nodes that aren't cliques. So
A -- B -- C -- D, with B -- E and C -- E as well can be a 2-clique if A and D are linked through another node, F, even though F is not in the 2-clique (since it is 3 away from E). See http://faculty.ucr.edu/~hanneman/nettext/C11_Cliques.html#nclique
n-clans are not allowed to have this behavior, however; all paths must pass through members of the subgraph to count. lukeA's test therefore demonstrates that the n-cliques are not all n-clans.
You could construct a function that outputs n-clans by throwing out all subgraphs in which the paths aren't fully within the subgraph, e.g.,
nclan <- function(g,n){
g <- as.undirected(g)
E(g)$weight <- 1 #just in case g has weights - does not modify original graph
ncliques <- kCliques(ugraph(igraph.to.graphNEL(g))) #get cliques
n.cand <- ncliques[[n]] #n-clique candidates to be an n-clan
n.clan <- list() #initializes a list to store the n-clans
n.clan.i <- 1 #initializes a list pointer
for (n.cand.i in 1:length(n.cand)){ #loop over all of the candidates
g.n.cand <- induced_subgraph(g,n.cand[[n.cand.i]]) #get the subgraph
if (diameter(g.n.cand)<=n){ #check diameter of the subgraph
n.clan[[n.clan.i]] <- n.cand[[n.cand.i]] #add n-clan to the list
n.clan.i <- n.clan.i+1 #increment list pointer
}
}
return(n.clan) #return the entire list
}
The removal of edge weights is due to an odd bug in RBGL's kCliques implementation. Similarly, you can write a k-plex function:
kplex <- function(g,k,m){
g.sym <- as.undirected(g) #to make sure that degree functions properly
g.sym.degmk <- induced_subgraph(g.sym,igraph::degree(g.sym)>=(m-k)) #makes algorithm faster
k.cand <- combn(V(g.sym.degmk)$name,m) #all candidate combinations with m members
k.plex <- list() #initializes a list to store the k-plexes
k.plex.i <- 1 #initializes a list pointer
for (k.cand.i in 1:dim(k.cand)[2]){ #loop over all of the columns
g.k.cand <- induced_subgraph(g.sym.degmk,k.cand[,k.cand.i]) #get the subgraph
if (min(igraph::degree(g.k.cand))>=(m-k)){ #if minimum degree of sugraph is > m=k, k-plex!
k.plex[[k.plex.i]] <- k.cand[,k.cand.i] #add k-plex to list
k.plex.i <- k.plex.i+1 #increment list pointer
}
}
return(k.plex) #return the entire list
}
You can use the connect.neighborhood() igraph function to connect each vertex all all others no more than distance k away. Then you can find cliques in the resulting graph. This will give you the "k-cliques", as you defined them.
Those are defined as "a maximal subgraph ...
I'm aware that cliques of n=1 can be found via cliques()
Careful here. cliques() finds all cliques, both maximal and non-maximal. max_cliques() finds only maximal cliques. Choose the one that is appropriate for your application.
I've got the following code:
df <- read.table(text='verkoop V621
verkoopcode V62123
verkoopcodenaam V6212355
verkoopdatum V621335
verkoopdatumchar V62133526
verkooppr V6216
verkoopprijs V62162
verkoopsafdeling V621213452
verkoopsartikel V62126324')
# use igraph package
require(igraph)
# create adjacency matrix
adj <- nchar(sapply(df$V1, gsub, x=df$V1, replacement=''))
adj[!sapply(df$V1, grepl, x=df$V1)] <- 0
# name adjecency matrix
colnames(adj) <- df$V2
# original graph
gr <- graph.adjacency(adj, mode='directed', weighted=TRUE)
layout(matrix(1:2, ncol=2))
plot(gr)
# minimum spanning tree
mst <- minimum.spanning.tree(gr)
shortest.paths(mst, to="V621", weights=rep(1, ecount(mst)))
Now I get for every node the depth in the tree. I want to determine now which node comes before a specific node. For example, for 'verkoopdatumchar' I want to find 'verkoopdatum'.
I am trying to generate a graph using iGraph library in R
Script
library("igraph")
#set the size of the graph
graphSize = 20
g <- graph.empty(graphSize, dir =FALSE)
V(g)$color <- c("grey")
#Number of steps to create the graph
steps <- 10
#create a vector where
x <- sample(1:graphSize, steps, replace=T)
print("Node To Connect")
print (x)
#coin tossing with Bias
toss <- function(){
y <- sample(0:1, size = 1, replace = TRUE, prob = c(0.1,0.9))
#print (y)
}
#Connect a random vertice with preselected random vertice if toss result is 1
connect <- function( graph , vertice, graphSize ) {
tossResult <- toss()
if ( tossResult == 1 ){
nodeToConnect <- sample(1:graphSize, 1 , replace=T)
print(nodeToConnect)
graph <- graph + edge(vertice, nodeToConnect)
}
}
#for each random nodes chosen previously in x vector an edge with another random verice is created
for (i in seq_along(x) ){
g <- connect(g, x[i], graphSize)
}
After sourcing the codes into R, sometimes it is giving me the desired graph and sometimes showing the following error:
Error
Error in graph + edge(vertice, nodeToConnect) : non-numeric argument to binary operator
Interestingly after showing the error the graph object "g" becomes null. I am newbie to R. Any suggestion for improving the function is welcome too.
Your connect function sometimes returns the graph, sometimes not. Note that in R a function returns the value of the last expression of the function. So sometimes you return tossresult, sometimes graph.
The fix is:
connect <- function( graph , vertice, graphSize ) {
tossResult <- toss()
if ( tossResult == 1 ){
nodeToConnect <- sample(1:graphSize, 1 , replace=T)
print(nodeToConnect)
graph <- graph + edge(vertice, nodeToConnect)
}
graph
}
Your approach creates an empty graph with 20 vertices, then adds edges at random. Here's a different way to generate this type of graph which essentially creates the edges first (as an adjacency matrix), and then creates the graph at the end.
library(igraph)
graphSize <- 20
steps <- 10
# set.seed(1) # for reproducible example
x <- sample(1:graphSize, steps, replace=T)
adj <- matrix(0,nc=graphSize,nr=graphSize)
for (i in x) adj[i,sample(1:graphSize,1,replace=T)]<-sample(0:1,1,p=c(0.1,0.9))
g <- graph.adjacency(adj) # create the graph
g <- as.undirected(g) # declare it undirected
plot(g)