I am analysing a directed, weighted network with R igraph. The network is based on a correlation matrix, i.e. weights go from -1 to +1. This network is clearly undirected, but I am also interested in more general cases.
Based on this network I would like to perform a community detection to group "similar" nodes together. I know there is a whole bunch of community detection methods in R igraph.
See for example here or here.
But none of these cases deals with negative weights.
Is there an implementation in igraph (or in some other R package) which can deal with directed networks which have negative weights? Any hints are very appreciated.
Not 100 % sure, if it violates any assumptions, but as a workaround I set all negative edge weights to zero before calculating Louvain community detection with igraph in R. At least, they are never included in community relationships.
E(g)$width <- ifelse(E(g)$width < 0, 0, E(g)$width)
g.louv <- cluster_louvain(g, weights = E(g)$width)
Note: this applies only to undirected graphs (I overlooked this detail of the question, sorry)
Related
I have a network model of trade between nodes, where the edge weights are the total trade flow between nodes and are therefore undirected.
I would like to cluster the network using modularity maximization but the needed binary (0/1) adjacency matrix does not adequately represent the different edge weights. Is there a way to deal with this complexity or is modularity maximization simply the wrong method?
Happy for any hints!
I have seen a paper that uses non-binary adjacency matrix to define the weights of node connections. The weights are ratios in the range [0,1]. Can these weights be also considered as edge features? Then what is the difference between having an adjacency matrix and an edge feature matrix?
It all depends on how you utilise this information. You can use a binary adjacency matrix to define a graph, but you can also interpret it as a fully connected graph with 0/1 features. Same with weigths in [0,1], that depending on the semantic can mean probability of observing an edge etc. (with 0 being no edge) or can be seen as a fully connected graph with float features. Depending on the choice of what you do with this interpretation you can end up with neural nets of different representational power, inductive biases etc. So unfortunately "it all depends"
I currently have an adjacency matrix I would like to perform spectral clustering on to determine the community each node belongs to. I have looked around, but there do not look to be implementations in either igraph or other packages.
Another issue is determining how many clusters you want. I was wondering if R has any packages that might help one find the optimal number of clusters to break an adjacency matrix into? Thanks.
I cannot advise for R, however, I can suggest this example implementation of Spectral Clustering using Python and Networkx (which is comparable to iGraph). It should not be hard to translate this into R.
For an introduction to Spectral Clustering see lectures 28-34 here and this paper.
I am running Community Detection in graphs and I run different community detection algorithm implemented in igraph listed here :
1. Edge-betweennes.community(w,-d)
2. walktrap.community (w,-d)
3. fastgreedy.community(w)
4. spinglass.community (w,d, not for unconnected graph)
5. infomap.community (w,d)
6. label.propagation.community(w)
7. Multivel.community(w)
8.leading.eigenvector.community (w)
as I have two types of graph one is directed an weighted and the other one is undirected and unweighted,
the one which I could use for both are four (1,2,4,5) which I get the error on the forth one as my graph is an unconnected graph, so there is three.
now I want to compare them using different evaluation metrics provided in here http://lab41.github.io/Circulo/ , as I searched there is modularity and compare.communities ( metrics listed here :http://www.inside-r.org/packages/cran/igraph/docs/compare.communities are ("vi", "nmi","split.join", "rand","adjusted.rand) in igraph).
what I am wondering about are :
is there any other algorithm which is implemented in igraph and is not in the list? and which will give me overlapping communities as well.
which of these metric could be used for weighted and directed graph and is there any implementation in igraph?
also which metric could be used for which algorithm? , as I go through one of the article "edge-betweeness"the metric used in there was the ground truth and they compare to the known community graph.
thank you in advance.
Yes, there are many algorithms which are not in iGraph package, to name one: RG+, presented in Cluster "Cores and Modularity Maximization" on 2010.
Modularity by far is the best measure to evaluate communities.
edge.betweenness simply gives you the betweenness centrality values of all the edges, it's not a measure to evaluate communities but can be used for one.
I want to cluster binary vectors (millions of them) into k clusters.I am using hamming distance for finding the nearest neighbors to initial clusters (which is very slow as well). I think K-means clustering does not really fit here. The problem is in calculating mean of the nearest neighbors (which are binary vectors) to some initial cluster center, to update the centroid.
A second option is to use K-medoids in which the new cluster center is chosen from one of the nearest neighbors ( the one which is closest to all neighbors for a particular cluster center). But finding that is another problem because numbers of nearest neighbors are also quite large.
Can someone please guide me?
It is possible to do k-means with clustering with binary feature vectors. The paper called TopSig I co-authored has the details. The centroids are calculated by taking the most frequently occurring bit in each dimension. The TopSig paper applied this to document clustering where we had binary feature vectors created by random projection of sparse high dimensional bag-of-words feature vectors. There is an implementation in java at http://ktree.sf.net. We are currently working on a C++ version but it is very early code which is still messy, and probably contains bugs, but you can find it at http://github.com/cmdevries/LMW-tree. If you have any questions, please feel free to contact me at chris#de-vries.id.au.
If you are wanting to cluster a lot of binary vectors there are also more scalable tree based clustering algorithms of K-tree, TSVQ and EM-tree. For more details related to these algorithms you can see a paper I have recently submitted for peer review that is not yet published relating to the EM-tree.
Indeed k-means is not too appropriate here, because the means won't be reasonable on binary data.
Why do you need exactly k clusters? This will likely mean that some vectors won't fit to their clusters very well.
Some stuff you could look into for clustering: minhash, locality sensitive hashing.