Transforming directed network into undirected while preserving degree distribution - r

I have a directed network where 50 nodes have a degree of 3 and another 50 have a degree of 10.
source("http://bioconductor.org/biocLite.R")
biocLite("graph")
#load graph and make the specified graph
library(graph)
degrees=c(rep(3,50),rep(10,50))
names(degrees)=paste("node",seq_along(degrees)) #nodes must be names
x=randomNodeGraph(degrees)
#verify graph
edges=edgeMatrix(x)
edgecount=table(as.vector(edges))
table(edgecount)
This is a directed network where the total degree is made up from both indegree and outdegree.
I would like to have a network where every indegree is also an outdegree and vice versa
so for example if node 1 has an edge to node 5 then node 5 also needs to have an edge to node 1. My main goal is to preserve the degree distribution, i.e. 50 with degree of 3 and 50 with degree of 10.

Simply setting the graph to be undirected seems to do it:
x2 <- x
edgemode(x2) <- "undirected"
edges<-edgeMatrix(x)
edgecount <- table(as.vector(edges))
table(edgecount)
Gives the same results as your code.
Also, an undirected graph will always have an edge from 5 to 1 if there is an edge from 1 to 5. A single edge satisfies this property.

Paul Shannon suggests the following:
library(graph)
library(igraph)
degrees=c(rep(3,50),rep(10,50))
g <- igraph.to.graphNEL(degree.sequence.game(degrees))
table(graph::degree(g))
This gives the same results as your code.

Related

"sna" or "igraph" : Why do I get different degree values for undirected graph?

I am doing some basic network analysis using networks from the R package "networkdata". To this end, I use the package "igraph" as well as "sna". However, I realised that the results of descriptive network statistics vary depending on the package I use. Most variation is not too grave but the average degree of my undirected graph halved as soon as I switched from "sna" to "igraph".
library(networkdata)
n_1 <- covert_28
library(igraph)
library(sna)
n_1_adjmat <- as_adjacency_matrix(n_1)
n_1_adjmat2 <- as.matrix(n_1_adjmat)
mean(sna::degree(n_1_adjmat2, cmode = "freeman")) # [1] 23.33333
mean(igraph::degree(n_1, mode = "all")) # [1] 11.66667
This doesn't happen in case of my directed graph. Here, I get the same results regardless of using "sna" or "igraph".
Is there any explanation for this phenomenon? And if so, is there anything I can do in order to prevent this from happening?
Thank you in advance!
This is explained in the documentation for sna::degree.
indegree of a vertex, v, corresponds to the cardinality
of the vertex set N^+(v) = {i in V(G) : (i,v) in E(G)};
outdegree corresponds to the cardinality of the vertex
set N^-(v) = {i in V(G) : (v,i) in E(G)}; and total
(or “Freeman”) degree corresponds to |N^+(v)| + |N^-(v)|.
(Note that, for simple graphs,
indegree=outdegree=total degree/2.)
A simpler example than yours makes it clear.
library(igraph)
library(sna)
g = make_ring(3)
plot(g)
AM = as.matrix(as_adjacency_matrix(g))
sna::degree(AM)
[1] 4 4 4
igraph::degree(g)
[1] 2 2 2
Vertex 1 has links to both vertices 2 and 3. These count in the
in-degree and also count in the out-degree, so
Freeman = in + out = 2 + 2 = 4
The "Note" in the documentation states this.

R iGraph: degree in the case of bidirectional edges

I have noticed that the function degree in iGraph doesn't straighforwardly allow to calculate the degree of the undirected skeleton graph of a directed graph, whenever bidirectional edges are involved.
For example,
g <-graph_from_literal( a-+b,a++c,d-+a,a-+e,a-+f )
d1 <- degree(g,v='a',mode="all")
# 6
nn <- unique(neighbors(g,'a',mode='all'))
d2 <- length(nn)
# 5
As I wanted d2, instead of d1, I have used a different route based on finding the neighbors of the considered vertex.
My question is: is there a better/faster way to do this, maybe using some other iGraph function that I'm not aware of?
Create an undirected copy of the graph, collapse the multiple edges in the undirected graph into a single edge, and then calculate the degree on that:
> g2 <- as.undirected(g, mode="collapse")
> degree(g2)

igraph --- find shortest path including weight at turns

The following example gives shortest path 1-2-6-7-3-4, where only the weight of edges is considered; and the weight of turn at vertices is not counted for. Can someone suggest a procedure to include the weight at each vertex that is no-turn, right-turn, or left-turn? We can assume the weight for (NT, RT, LT)=(0,0.5,1). When edge weight is combined with turn effect, the shortest path would become 1-2-3-4. Below is the example in question. Thank you.
#
library(igraph)
n <- c(1,2,3,4,5,6,7,8)
x <- c(1,4,7,10,1,4,7,10)
y <- c(1,1,1,1,4,4,4,4)
node <- data.frame(n,x,y)
fm <- c(1,2,3,5,6,7,1,2,3,4)
to<-c(2,3,4,6,7,8,5,6,7,8)
weight<- c(1,4,1,1,1,2,5,1,1,1)
link <- data.frame(fm,to,weight)
g <- graph.data.frame(link,directed=FALSE,vertices=node)
sv <- get.shortest.paths(g,1,4,weights=NULL,output="vpath")
sv
E(g)$color <- "pink"
E(g, path=sv[[1]])$width <- 8
plot(g,edge.color="red")
plot(g,edge.label=weight,edge.label.color="blue",edge.label.cex=2)
As a preprocessing step: for each vertex v with a incoming edges and b outgoing edges, split it into a vertexes connected to those incoming edges and b vertexes connected to those outgoing edges. Then create edges representing turning costs in between.
In principal, Jeffery is describing what we want, but the problem size is such that we need a programmatic solution. Maybe 200,000 vertices with 3 to 6 edges. If there is a way to explode, for instance, the standard intersection of 4 edges in and 4 edges out to the 16 right through left movements and automatically assigning left through and right penalties.
most important is the ability to have lesser penalties for turning at T intersections (ease of wayfinding) than turning at traditional intersections/vertices
Is this tractable for a huge network?

How to turn data from R data frames into a network

Suppose I have the following data frames
df <- data.frame(dev = c("A","A","B","B","C","C","C"),
proj = c("W","X","Y","X","W","X","Z"))
types <- data.frame(proj = c("W","X","Y","Z"),
type = c("blue","orange","orange","blue"))
> df
dev proj
1 A W
2 A X
3 B Y
4 B X
5 C W
6 C X
7 C Z
> types
proj type
1 W blue
2 X orange
3 Y orange
4 Z blue
I would like to turn these into the following network
The nodes are the unique entries in proj. For nodes u,v, there is an arc from u to v if u and v share an element from dev. The data is a list of developers and projects that each developer has worked on, and I would like to form a network which connects projects that have a developer in common. Each project is of a particular type, and that information would need to be encoded in the graph (I did this in this toy example via colour).
From this graph what I need is the degree of each node, as well as one or more measures of centrality. In particular I need the closeness centrality of each node, as well as a modified version of closeness centrality which measures the centrality within each type. So my end goal is to obtain a table like this:
proj degree closeness_centrality type_centrality
W 2 0.75 1
X 3 1 1
Y 2 0.75 1
Z 1 0.60 1
For reference, the closeness centrality of a node u is defined as C(u)=(N-1)/(sum over all nodes v of the distance from u to v), where N is the number of nodes in the graph and the distance from u to v is the length of the shortest u-v-path. The type centrality is C(T,u)=|T-u|/(sum over all nodes v in T of the distance from u to v) where T is the set of all nodes of a given type, and |T-u| is the size of T with u excluded (so either |T| or |T|-1 depending on the type of u).
One of the big challenges is that my actual df has almost 300,000 rows and this graph will have around 155,000 vertices. The average degree will be very low though so I think that it is doable.
My questions are:
Is R the best tool to be using for this? Are there good packages for performing these types of calculations on graphs?
What is the best way to store this kind of data? Should I form an adjacency matrix, or something else?
Any insight or tips at all would be well appreciated; as an economics major I'm kind of in over my head comp-sci-wise here.
Thanks!

DFS on undirected graph complexity

Let's say I have an undirected graph with V nodes and E edges.If I represent the graph with adjacency lists,if I have a representation of an edge between x and y,I must also have a representation of the edge between y and x in the adjacency list.
I know that DFS for directed graphs has V+E complexity.For undirected graphs doesn't it have v+2*e complexity ,because you visit each edge 2 times ?Sorry if it's a noobish question..I really want to understand this think.Thank you,
The complexity is normally expressed O(|V| + |E|) and this isn't affected by the factor of 2.
But the factor of 2 is actually there. One undirected edge behaves just line 2 directed edges. E.g. the algorithm (for a connected undirected graph) is
visit(v) {
mark(v)
for each unmarked w adjacent to v, visit(w)
}
The for loop will consider each edge incident to each vertex once. Since each undirected edge is incident to 2 vertices, it will clearly be considered twice!
Note the undirected DFS doesn't have to worry about restarting from all the sources. You can pick any vertex.

Resources