Trying to find networking metrics with R - r

I have created a directed network in R. I have to find the average degree, which I think I have, the diameter and the maximum/minimum clustering. The diameter is the longest of the shortest distances between two nodes. If this makes sense to anyone, please point me in the right direction. I have what I have coded below so far.
library(igraph)
ghw <- graph.formula(1-+4:5:9:12:14, 2-+11:16:17, 3-+4:5:7,
4-+1:3:6:7:8, 5-+1:3:6:7, 6-+4:5:8,
7-+3:4:5:8:13, 8-+4:6:7, 9-+10:12:14:15,
10-+9:12:14, 11-+2:16:17, 12-+1:9:10:14,
13-+7:15:18, 14-+1:9:10:12, 15-+13:16:18,
16-+2:11:15:17:18, 17-+2:11:16:18, 18-+13:15:16:17)
plot(ghw)
get.adjacency(ghw)
Total number of directed edges
numdeg <- ecount(ghw)
Average number of edges per node
avgdeg <- numdeg / 18

How about looking at the documentation?
diameter(ghw)
I am not sure what you mean by maximum/minimum clustering, but maybe this:
range(transitivity(ghw, type="local"))
Btw. your average number of edges per node is wrong, because every edge belongs to two nodes.

Related

Maximum number of directed graph

Given set of N points, what is the maximum number of directed graphs can be created ? I'm having trouble with isomorphic problem.
Edit (1): Only directed simple, non-loop vertex graph, doesn't required to be connected
Edit (2): Any point in this set is treated equally to each other, so the main problem here is to calculate and subtract the number of isomorphic graphs created from different sets of edges.
Number of unlabeled directed graphs with n vertices is here (OEIS A000273)
1, 1, 3, 16, 218, 9608, 1540944, 882033440, 1793359192848
There is no closed formula, approximated value is number of labeled graphs divided by number of vertex permutations:
2^(n*(n-1)) / n!
There are n-1 possible edges for each node, so a total of n(n-1) edges.
Each possible graph will either contain a particular edge, or it won't.
So the number of possible graphs is 2^(n(n-1)).
EDIT: This only applies under the assumption there are no loops and each edge is unique.
Looping is basically coming back to the same node again so I'm considering double-headed arrows are not allowed. Now, if there are n nodes available so graphs you make without loops can have n-1 edges. Now, let m be the number of homeomorphic graphs you can make out of n nodes. Let si is number of symmetries present in ith graph of those m homeomorphic graphs. These symmetries I'm talking about are the likes of we study in group theory for geometric figures. Now, we know all edge can have 2 states i.e. left head and right head.
So the total number of distinct directed graphs can be given as:
Note: If these symmetries were not present then it would have been simply m*2(n-1)
(Edit 1) Also, this valid for connected graph with n nodes. If you want to include graphs that don't need to be connected then you'll have to modify a few things in this equation or add few things like the number of smaller partitions of this n noded graph you can form and apply this formula in each of those combinations.
Permutation&Combination, Group Theory, Symmetries, Partitions, Overall it's messy so this was the only simple way I could put it.

k means clustering limit?

I'm doing a kmeans clustering to analyze my data. So far Its working perfectly.
This is my code so far:
library(Ckmeans.1d.dp)
file=read.csv(file.choose(),header=T)
attach(file)
sortfile=file[order(normalized),]
results=Ckmeans.1d.dp(normalized,3)
plot(results)
Now, I'm able to get the clusters,and the centers, but I'm more interested in getting the "limits" of the cluster. Not the maximum value in one cluster among the data I used, but the limits of the cluster I have now. Is that possible? how can I do it?
K-Means labels points based on their closest centroids (cluster centers). So the "limits" between clusters (called the decision boundary) are the points which have at least two different centroids as their closest centroids (e.g. have the exact same distance from them).
For example in 2D, for each point in the plane calculate it's closest centroids. If it has more than one (e.g. at least two centroids are at minimal distance from it), then it is part of the decision boundary.

Number of vertices in Igraph in R

I'm fairly new to IGraph in R.
I'm doing community detection using IGraph and have already built my communities /clusters using the walktrap technique.
Next, within each cluster, I want to count the number of vertices between each two certain vertices. The reason I want to do this is, for each vertex XX, I want to list vertices that are connected to XX via say max 3 vertices, meaning no further than 3 vertices away from XX.
Can anyone help how this can be done in R please?
making a random graph (for demonstration):
g <- erdos.renyi.game(100, 1/25)
plot(g,vertex.size=3)
get walktrap communities and save as vertex attribute:
V(g)$community<-walktrap.community(g, modularity = TRUE, membership = TRUE)$membership
V(g)$community
now make a subgraph containing only edges and vertices of one community, e.g. community 2:
sub<-induced.subgraph(g,v=V(g)$community==2)
plot(sub)
make a matrix containing all shortest paths:
shortestPs<-shortest.paths(sub)
now count the number of shortest paths smaller or equal to 3.
I also exclude shortest paths from each node to itself (shortestPaths!=0).
also divide by two because every node pair appears twice in the matrix for undirected graphs.
Number_of_shortest_paths_smaller_3 <- length(which(shortestPs<=3 & shortestPs!=0))/2
Number_of_shortest_paths_smaller_3
Hope that's close to what you need, good luck!

Is it possible to use a metric (distance) unit in igraph cutoff argument?

Newbie igraph user here. I'm trying to calculate betweenness values for every segment (edge) in a street network. I would ideally like to restrict the calculations so that only paths of less than X metres be considered. The igraph::edge.betweenness.estimate function has a cutoff argument which restricts this for steps (turns) but I would like to know if it is possible to use a metric distance instead.
So far the closest question I have been able to find is http://lists.nongnu.org/archive/html/igraph-help/2012-11/msg00083.html on the igraph help, and this suggests that it might not be possible.
I have been using my network aspatially, as a simple graph, but have an attribute of street segment length - LnkLength. From reading other StackOverflow posts it is possible to use spatial networks with igraph (with the help of spatial packages). If LnkLength could be used as a weight for the network would this solve my problem?
If anyone has any ideas I'd be very grateful to hear them.
data <- data.frame(
Node1 = as.factor(c(AA, AB, AC, AD, AE, AF, AG, AH, AI, AJ)),
Node2 = as.factor(c(BA, BB, BC, AA, AB, AC, BA, BB, BC, AA)),
LnkLength =as.numeric(c(23.05, 42.81, 77.08, 39.63, 147.87, 56.46, 13.43, 25.53, 197.19, 34.9)))
data.graph <- graph.data.frame(data, directed=FALSE, vertices=NULL)
# attempt to limit the betweeness estimates on 800m
btw.trunc <- edge.betweenness.estimate(d.graph, e=E(d.graph), directed = FALSE, cutoff=20, weights = NULL)
I'm assuming you are computing geodesic betweenness scores (e.g. counts of the number of times a shortest path uses a given edge) as opposed to, say current-flow betweenness (which would treat the graph like a network of electrical reisistors).
I think one solution would be to first use the 'distances' function to obtain the matrix of all pairwise distances
e.g.
distMat=distances(g,weights=g$LnkLength)
Now assuming you have a cutoff dCut, you can then sparsify the matrix
distMat[which(distMat>dCut)]=0
distMat=as(distMat,'sparseMatrix')
This would allow you to select which pairs of points to consider...
fromInds=row(distMat)[which(!distMat==0)]
toInds=col(distMat)[which(!distMat==0)]
Assuming there are no one way streets and the LnkLength is not a directed property, then you would only have to consider either the upper or lower triangular portion of the matrix. Since the 'toInds' array will automatically be listed in ascending order, you can accomplish this by taking only the first half of the fromInds and toInds respectively
fromInds=fromInds[c(1:(length(fromInds)/2)]
toInds=toInds[c(1:length(toInds)/2)]
If your graph is directed (e.g. LnkLength is not the same in both directions or there are one way streets) then you would just leave fromInds and toInds as is.
You would now be ready to start assigning betweenness scores iteratively by looping over the index list and computing shortest_path and using the returned path to increment the betweenness scores of the appropriate edges
E(g)$betweenness=0
for(iPair in c(1:length(fromInds)){
vpath=as.numeric(
shortest_paths(g,weights=g$LnkLength,output="vpath")$vpath[[1]])
#need to convert the list of path vertices into a list of edge indices
pathlist=c(1:(2*length(vpath)-1))
pathlist[c(1:length(vpath)-1))*2]=vpath[c(2:length(vpath)))]
pathlist[c(1:length(vpath)-1))*2-1]=vpath[c(1:(length(vpath)-1))]
eList=get.edge.ids(g,pathlist)
#increment betweenness scores of edges in eList
E(g)$betweenness[eList]=E(g)$betweenness+1
}
your graph should now have the property 'betweenness' assigned to its edges using a geodesic betweenness metric computed over the subset of paths with a distance of less than 'dCut'
If your graph is quite large, it may be worth while to port the above for loop over to Rcpp, since it could take quite a long time otherwise. But that is a whole new can of worms...

Finding probability of edges in a graph

I have a random graph G(n, p) with n = 5000 vertices and an edge probability of p = 0.004.
I wonder what would be the expected number of edges in the graph but I have not much knowledge in probability-theory.
Can anyone help me?
Thank you so much!
EDIT:
If pE is the number of possible edges in the Graph, wouldn't I have to calculate 0.004 * pE to get the expected number of edges in the graph?
First, ask yourself the maximum number of possible edges in the graph. This is when every vertex is connected to every single other vertex (nC2 = n * (n-1)/2), assuming this is an undirected graph without self-loops).
If each possible edge has a likelihood of 0.004, and the # of possible edges is n(n-1)/2, then the expected number of edges will be 0.004*(n(n-1)/2).
The number of expected vertices depend on the number of nodes and the edge probability as in E = p(n(n-1)/2).
The total number of possible edges in your graph is n(n-1) if any i is allowed to be linked to any j as both i->j and j->i. I am your friend, you are mine. If the graph is undirected (and an edge only means that we are friends) the total number of edges drop by half: n(n-1)/2 since i->j and j->i are the same.
The multiplication with p gives the expected number of edges, since every possible edge has become real or not depending on the probability. p=1 gives n(n-1)/2 edges since every possible edge actually happened. For graphs with p<1, the actual edge count might (obviously) differ from time to time if you were to actually generate a random graph using the p and n of your choice. Expected edge count will however be the most common observed edge count if you were to generate an infinite number of random graphs. NetLogo is a very pedagogical tool if you want to generate random graphs and get a feel for how network measurements arise from random graphs of different structures.

Resources