R/Network Analysis - How to create edges by node's attributes - r

Dear Stackoverflow community,
I am currently using R to compile an affiliation network where nodes are companies/umbrella organisations and ties are defined as "member of". At the moment, my list is still small and I can create edges as follow, based on the position of the nodes (I use igraph):
g <- igraph::add_edges(g, c(1,51,
1,52,
1,53,
1,54))
However, I am adding new nodes and the final network will include at least 500 organisations. This means that the position of a node can change everytime I add a new one. Since I cannot redo the edges everytime I add a new node, is there a way I can add edges knowing the names of the nodes?
The names of the nodes are treated as an attribute, I tried to use the same command as above including names - as opposed to positions - but it did not work:
g <- igraph::add_edges(g, c(V(g)$name=="Company1", V(g)$name == "Umbrella2"))
Any suggestion on how I could create edges by specifying the names and not the position?

I believe you're looking for as.numeric(V(g)["Company1"]).
I would strongly advice against building your network structure in an R-script, though. Even for a small network, I would have inputed my data in an excel-file, create an R-script that reads the data as an edge-list and creates an igraph from it. That way, you can add your companies and organisations as you go with greater oversight of what data has actually gone in to your network, which I guess is what you're looking for in the first place. Doing that here would be out of bounds for the question though.
As for the adding-nodes-by-name, I wrote this example for you which I hope is pedagogical.
library(igraph)
# Make an empty Bipartite graph
g <- make_bipartite_graph(0, NULL, directed=TRUE)
g <- delete_vertices(g, 1)
# Create vertices of two different types: companies and umbrellas
g <- add_vertices(g, 5, color = "red", type=TRUE, name=paste("Company", 1:5, sep="_"))
g <- add_vertices(g, 2, color = "blue", type=FALSE, name=paste("Umbrella", 1:2, sep="_"))
# In a bipartate graph edges may only appear BETWEEN verticies of different types. Companies
# can belong to umbrellas, but not to each other.
# Look at the types:
ifelse(V(g)$type, 'Company', 'Umbrella') # true for companies, false for umbrellas
# Lets add some edges one by one. This is what I believe you're asking for in the question:
g <- add_edges(g, c(as.numeric(V(g)["Company_1"]), as.numeric(V(g)["Umbrella_1"])))
g <- add_edges(g, c(as.numeric(V(g)["Company_1"]), as.numeric(V(g)["Umbrella_2"])))
g <- add_edges(g, c(as.numeric(V(g)["Company_2"]), as.numeric(V(g)["Umbrella_1"])))
g <- add_edges(g, c(as.numeric(V(g)["Company_3"]), as.numeric(V(g)["Umbrella_1"])))
g <- add_edges(g, c(as.numeric(V(g)["Company_4"]), as.numeric(V(g)["Umbrella_2"])))
g <- add_edges(g, c(as.numeric(V(g)["Company_5"]), as.numeric(V(g)["Umbrella_2"])))
# Note that "Company_1" belongs to two umbrella organisations, as I assume your companies can:
plot(g)

Related

Creating weighted igraph network using two-column edge list

I'm in the process of creating a weighted igraph network object from a edge list containing two columns from and to. It has proven to be somewhat challenging for me, because when doing a workaround, I notice changes in the network metrics and I believe I'm doing something wrong.
library(igraph)
links <- read.csv2("edgelist.csv")
vertices <- read.csv2("vertices.csv")
network <- graph_from_data_frame(d=links,vertices = vertices,directed = TRUE)
##the following step is included to remove self-loops that I have used to include all isolate nodes to the network##
network <- simplify(network,remove.multiple = FALSE, remove.loops = TRUE)
In this situation I have successfully created a network object. However, it is not weighted. Therefore I create a second network object by taking the adjacency matrix from the objected created earlier and creating the new igraph object from it like this:
gettheweights <- get.adjacency(network)
network2 <- graph_from_adjacency_matrix(gettheweights,mode = "directed",weighted = TRUE)
However, after this when I call both of the objects, I notice a difference in the number of edges, why is this?
network2
IGRAPH ef31b3a DNW- 200 1092 --
network
IGRAPH 934d444 DN-- 200 3626 --
Additionally, I believe I've done something wrong because if they indeed would be the same network, shouldn't their densities be the same? Now it is not the case:
graph.density(network2)
[1] 0.02743719
graph.density(network)
[1] 0.09110553
I browsed and tried several different answers found from here but many were not 1:1 identical and I failed to find a solution.
All seems to be in order. When you re-project a network with edge-duplicates to be represented as a weight by the number of edges between given vertices, the density of your graph should change.
When you you test graph.density(network2) and graph.density(network), they should be different if indeed edge-duplicates were reduced to single-edges with weight as an edge attribute, as your output from network2 and network suggest.
This (over-) commented code goes through the process.
library(igraph)
# Data that should resemble yours
edges <- data.frame(from=c("A","B","C","D","E","A","A","A","B","C"),
to =c("A","C","D","A","B","B","B","C","B","D"))
vertices <- unique(unlist(edges))
# Building graphh in the same way as you do
g0 <- graph_from_data_frame(d=edges, vertices=vertices, directed = TRUE)
# Note that the graph is "DN--": directed, named, but NOT Weighted, since
# Instead of weighted edges, we have a whole lot of dubble edges
(g0)
plot(g0)
# We can se the dubble edges in the adjacency matrix as >1
get.adjacency(g0)
# Use simplify to remove LOOPS ONLY as we can see in the adjacency metrix test
g1 <- simplify(g0, remove.multiple = FALSE, remove.loops = TRUE)
get.adjacency(g1) == get.adjacency(g0)
# Turn the multiple edges into edge-weights by jumping through an adjacency matrix
g2 <- graph_from_adjacency_matrix(get.adjacency(g1), mode = "directed", weighted = TRUE)
# Instead of multiple edges (like many links between "A" and "B"), there are now
# just single edges with weights (hence the density of the network's changed).
graph.density(g1) == graph.density(g2)
# The former doubble edges are now here:
E(g2)$weight
# And we can see that the g2 is now "Named-Directed-Weighted" where g1 was only
# "Named-Directed" and no weights.
(g1);(g2)
# Let's plot the weights
E(g2)$width = E(g2)$weight*5
plot(g2)
A shortcoming of this/your method, however, is that the adjacency matrix is able to carry only the edge-count between any given vertices. If your edge-list contains more variables than i and j, the use of graph_from_data_frame() would normally embed edge-attributes of those variables for you straight from your csv-import (which is nice).
When you convert the edges into weights, however, you would loose that information. And, come to think of it, that information would have to be "converted" too. What would we do with two edges between the same vertices that have different edge-attributes?
At this point, the answer goes slightly beyond your question, but still stays in the realm of explaining the relation between graphs of multiple edges between the same vertices and their representation as weighted graphs with only one structural edge per verticy.
To convert edge-attributes along this transformation into a weighted graph, I suggest you'd use dplyr to "rebuild" any edge-attributes manually in order to keep control of how they are supposed to be merged down when recasting into a weighted one.
This picks up where the code above left off:
# Let's imagine that our original network had these two edge-attributes
E(g0)$coolness <- c(1,2,1,2,3,2,3,3,2,2)
E(g0)$hotness <- c(9,8,2,3,4,5,6,7,8,9)
# Plot the hotness
E(g0)$color <- colorRampPalette(c("green", "red"))(10)[E(g0)$hotness]
plot(g0)
# Note that the hotness between C and D are very different
# When we make your transformations for a weighted netowk, we loose the coolness
# and hotness information
g2 <- g0 %>% simplify(remove.multiple = FALSE, remove.loops = TRUE) %>%
get.adjacency() %>%
graph_from_adjacency_matrix(mode = "directed", weighted = TRUE)
g2$hotness # Naturally, the edge-attributes were lost!
# We can use dplyr to take controll over how we'd like the edge-attributes transfered
# when multiple edges in g0 with different edge attributes are supposed to merge into
# one single edge
library(dplyr)
recalculated_edge_attributes <-
data.frame(name = ends(g0, E(g0)) %>% as.data.frame() %>% unite("name", V1:V2, sep="->"),
hotness = E(g0)$hotness) %>%
group_by(name) %>%
summarise(mean_hotness = mean(hotness))
# We used a string-version of the names of connected verticies (like "A->B") to refere
# to the attributes of each edge. This can now be used to merge back the re-calculated
# edge-attributes onto the weighted graph in g2
g2_attributes <- data.frame(name = ends(g2, E(g2)) %>% as.data.frame() %>% unite("name", V1:V2, sep="->")) %>%
left_join(recalculated_edge_attributes, by="name")
# And manually re-attatch our mean-attributes onto the g2 network
E(g2)$mean_hotness <- g2_attributes$mean_hotness
E(g2)$color <- colorRampPalette(c("green", "red"))(max(E(g2)$mean_hotness))[E(g2)$mean_hotness]
# Note how the link between A and B has turned into the brown mean of the two previous
# green and red hotness-edges
plot(g2)
Sometimes, your analyses may benefit from either structure (weighted no duplicates or unweighted with duplicates). Algorithms for, for example, shortest paths are able to incorporate edge-weight as described in this answer, but other analyses might not allow for or be intuitive when using the weighted version of your network data.
Let purpose guide your structure.

cluster walktrap returns three communities, but when plotting they are all on top of each other, with no visible clustering

I've been following documentation tutorials and even lecture tutorials step by step. But for some reason the output of my plot is like this:
The output doesn't make any sense to me. There clearly is no structure, or communities in this current plot, as you can see that the bigger circles are all overlapping. Shouldn't this, in this case, return only a single community? Additionally the modularity of my network is ~0.02 which would again, suggest there is no community structure. But why does it return 3 communities?
this is my code: (exactly same as in documentation, with different dataset)
m <- data.matrix(df)
g <- graph_from_adjacency_matrix(m, mode = "undirected")
#el <- get.edgelist(g)
wc <- cluster_walktrap(g)
modularity(wc)
membership(wc)
plot(wc,g)
my data set looks is a 500x500 adjacency matrix in the form of a csv, with a 1-500 column and index names corresponding to a person.
I tried understanding the community class and using different types of variables for the plot, e.g. membership(wc)[2] etc. My thought is that the coloring is simply wrong, but nothing Ive tried so far seems to fix the issue.
You can have inter-community connections. You're working with a graph of 500 nodes and they can have multiple connections. There will be a large number of connections between nodes of different communities, but if you conduct a random walk you're most likely to traverse connections between nodes of the same community.
If you separate the communities in the plot (using #G5W's code (igraph) Grouped layout based on attribute) you can see the different groups.
set.seed(4321)
g <- sample_gnp(500, .25)
plot(g, vertex.label = '', vertex.size = 5)
wc <- cluster_walktrap(g)
V(g)$community <- membership(wc)
E(g)$weight = 1
g_grouped = g
for(i in unique(V(g)$community)){
groupV = which(V(g)$community == i)
g_grouped = add_edges(g_grouped, combn(groupV, 2), attr=list(weight = 2))
}
l <- layout_nicely(g_grouped)
plot( wc,g, layout = l, vertex.label = '', vertex.size = 5, edge.width = .1)
Red edges are intercommunity connections and black edges are intracommunity edges

Why igraph::cluster_walktrap gives a different result for non directed isomorphic graphs?

I'm trying to use igraph::cluster_walktrap in R to look for communities inside of a graph, however I noticed a weird behaviour (or at least, a behaviour I am not able to explain).
Suppose you are given an undirected graph by defining a list of its edges. Say
a,b
c,d
e,f
...
Then, if I define another graph by swapping randomly selected vertices in the edge list definition:
a,b
d,c
e,f
...
I expect the two graphs to be isomorphic and the difference between the two graph to be empty. This is exactly what happens in R in my toy example. Following this line of reasoning, calling cluster_walktrap on the two graphs (using set.seed appropriately) should yield the same result since the two graphs are the same. This is not happening and the only explanation I can give is that the starting point of each random walk is not the same for the two graphs. Why is this?
You can follow my reasoning in the toy example below. I don't understand why the last two objects are not identical.
require(igraph)
# Number of vertices
verteces <- 50
# Swap randomly some elements in the edges definition
set.seed(20)
row_swapped <- sample(1:verteces,25,replace=F)
m_values <- sample(letters, verteces*2, replace=T) #1:100
# Build edge lists
m1 <- matrix(m_values, verteces, 2)
m1
a <- m1
colS <- seq(round(ncol(m1)*0.3))
m1[row_swapped, 2:1] <- m1[row_swapped, 1:2]
m1
b <- m1
# Define the two graphs
ag <- igraph::graph_from_edgelist(a, directed = F)
bg <- igraph::graph_from_edgelist(b, directed = F)
# Another way of building an isomorphic graph for testing
#bg <- permute(ag, sample(vcount(ag)))
# Should be empty: ok
difference(ag, bg)
# Should be TRUE: ok
isomorphic(ag,bg)
# I expect it to be TRUE but it isn't...
identical(ag, bg)
# Vertices
V(ag)
ag
V(bg)
bg
# Calculate community
set.seed(100)
ac1 <- cluster_walktrap(ag)
set.seed(100)
bc1 <- cluster_walktrap(bg)
# I expect all to be TRUE, however
# merges is different
# membership is different
# names are different
identical(ac1$merges, bc1$merges)
identical(ac1$modularity, bc1$modularity)
identical(ac1$membership, bc1$membership)
identical(ac1$names, bc1$names)
identical(ac1$vcount, bc1$vcount)
identical(ac1$algorithm, bc1$algorithm)
The results are not different. You have two things going on which is making your graphs not identical but isoporphic. I emphasize identical because it has a very strict definition.
1) identical(ag, bg) is not identical because the vertices and edges are not in the same order between the two graphs. Exactly, the same nodes and edges exist but they are not in the exact same place or orientation. For, example if I shuffle the rows of a and make a new graph...
a1 <- a[sample(1:nrow(a)), ]
a1g <- igraph::graph_from_edgelist(a1, directed = F)
identical(ag, a1g)
#[1] FALSE
2) This goes for edges as well. An edge is stored as node1, node2 and a flag if the edge is directed or not. so when you swap rows the representation at the "byte level" (I use this term loosely) is different even though the relationship is the same. Edge 44 represents the same relationship but is stored based on how it was constructed.
E(ag)[44]
# + 1/50 edge from 6318240 (vertex names):
# [1] q--d
E(bg)[44]
# + 1/50 edge from 38042e0 (vertex names):
# [1] d--q
So onto your cluster_walktrap, first, the function returns the index of the vertices, not the name which can be misleading. Which means the reason the objects aren't identical is because ag and bg have different ordering of nodes in the object.
If I reorder the membership by node name the two become identical.
identical(membership(bc1)[order(names(membership(bc1)))], membership(ac1)[order(names(membership(ac1)))])
#[1] TRUE

Looking to save coordinates/layout to make temporal networks in Igraph with DRL

I would like to create temporal networks in R but the only resources I've found works with FR or KK graphs. However, my primary graph that I would like to base the layout from is a DRL layout. How could I code this in R to keep the layouts?
Thank you
Added:
Code:
drl <- layout.drl(netfull, options=list(simmer.attraction=0))
plot(netfull, edge.arrow.size=2, vertex.size=.5, vertex.label.cex=.3, vertex.label.dist=.1, vertex.lable.degree=pi, layout=drl)
plot(net7, edge.arrow.size=2, vertex.size=.5,vertex.label.cex=.3, vertex.label.dist=.1, vertex.lable.degree=pi, layout=drl)
You can just explicitly compute your layout before plotting and then use the layout argument when you want to plot. DRL is one of the standard options provided by igraph.
library(igraph)
## create test graph
set.seed(1234)
g = erdos.renyi.game(15, 0.2, type = "gnp")
## Create a reusable layout for the graph
LO = layout_with_drl(g)
## plot using the layout
plot(g, layout=LO)
Edit
Based on the discussion in the comments, I have a different understanding of the question. I think that the question is this: Given a graph g and a subgraph g2 print both g and g2 with the corresponding nodes in the same place. This extra response addresses that.
Start with the example above to create the graph g and the layout LO.
Now we want to take a subgraph and print it with the corresponding nodes in the same place. I will use as an example the graph that we get by removing nodes 2, 9, and 15.
If we simply remove those nodes, the new graph will have 12 nodes and they will have node IDs 1-12. In order to preserve the original numbering, we need to save the node IDs as labels.
V(g)$label = 1:15
Now let's create the subgraph by removing nodes 2,9 and 15.
g2 = induced_subgraph(g, V(g)[-c(2,9,15)])
We want to reuse the layout LO, but LO has the positions for all 15 original nodes. We want to select only the part for the remaining nodes in g2.
LO2 = LO[-c(2,9,15),]
Now we are ready to plot the original graph and the reduced graph so that the nodes line up.
par(mfrow=c(1,2), mar=c(2,1,2,1))
plot(g, layout=LO, frame=TRUE)
plot(g2, layout=LO2, frame=TRUE)

Using layers in Rgraphviz

In Rgraphviz, how can I assign nodes and edges to particular layers, and plot only a selected layer or layers?
After perusing and searching the Rgraphviz documentation, I think I've figured out how to assign nodes and edges to particular layers -- but I still can't figure out how to plot only a selected layer.
This "How to use layers" item on the graphviz wiki implies that there should be a "layerselect" graph attribute that can be used to plot only a specified layer. However, the list of allowed graph attributes in Rgraphviz does not include "layerselect." The "layers" graph attribute says "Only those components belonging to the current layer appear," but I can find no information on how to set (or even query) the "current layer."
I tried to do it anyway, but wasn't successful. Here's a reproducible example of my attempt:
require('Rgraphviz')
params <- LETTERS[1:5] #nodes A, B, C, D, E
edgelist <- vector('list', #set up edges
length=length(params))
names(edgelist) <- params
edgelist[['B']] <- c('E','A') #add edges from B to E and from B to A
edgelist[['A']] <- 'C' #add edge from A to C
edgelist[['D']] <- 'E' #add edge from D to E
graph.nel <- new('graphNEL', #construct graphNEL object
nodes=params,
edgeL=edgelist,
edgemode='directed')
#I want there to be two layers:
#"redlayer", containing nodes A, B, C, E and the edges between them;
#"blacklayer", containing node D and the edge from D to E.
#Assign the colors and layers of the edges
eAttr <- list(color=c('B~A'='red',
'A~C'='red',
'B~E'='red',
'D~E'='black'),
layer=c('B~A'='redlayer',
'A~C'='redlayer',
'B~E'='redlayer',
'D~E'='blacklayer')
)
#Assign the colors and layers of the nodes
nAttr <- list(color=c(B='red',
A='red',
C='red',
E='red',
D='black'),
layer=c(B='redlayer',
A='redlayer',
C='redlayer',
E='redlayer',
D='blacklayer'))
#Now plot
plot(graph.nel,
attrs=list(graph=list(layers='redlayer:blacklayer', #Define the two layers
layersep=':', #Explicitly define the layer separation character
layerselect='redlayer')), #Attempt to select only the 'redlayer' for plotting
edgeAttrs=eAttr, #specify edge attributes
nodeAttrs=nAttr #specify node attributes
)
The result of the above code is the following plot:
However, I expected that only the nodes and edges colored red (i.e. those assigned to the layer named 'redlayer') would appear!
I've also tried
plot(graph.nel,
attrs=list(graph=list(layers='redlayer'), #Attempt to select only the 'redlayer' for plotting
edgeAttrs=eAttr, #specify edge attributes
nodeAttrs=nAttr #specify node attributes
)
but it results in exactly the same plot -- that is, both layers are still being plotted.
Is there any way to plot only the layer named 'redlayer' in this example?

Resources