is there a way to set the resolution parameter when using the function cluster_louvain to detect communities in igraph for R? It makes a lot of difference for the result, as this parameter is related to the hierarchical dissimilarity between nodes. Thank you.
The easiest way to do it is through the resolution package, available in this link https://github.com/analyxcompany/resolution
It is based on this paper http://arxiv.org/pdf/0812.1770.pdf
It pretty much has 2 functions cluster_resolution() and cluster_resolution_RandomOrderFULL().
In both you can state the resolution t and how many repetitions you want rep. And, you can just use the igraph object in the function.
cluster_resolution_RandomOrderFULL(g,t=0.5)
cluster_resolution_RandomOrderFULL(g,rep=20)
NOTE/EDIT: it will not accept signed networks though! I'm trying to either contact the owner of the code or costumize it myself to make it suitable for signed networks.
EDIT2: I was able to translate the function community_louvain.m from the Brain Connectivity Toolbox for Matlab to R.
Here is the github link for the signed_louvain()
you can pretty much just put for ex. signed_louvain(g, gamma = 1, mod = 'modularity')
it works with igraph or matrix objects as input. If it has negative values, you have to choose mod = 'neg_sym' or 'neg_asym'.
Related
I am applying the Cluster Info Map algorithm for community detection for a set of large networks. I can achieve high modularity scores of around 0.6 however this comes with getting very high numbers of communities (18-24) which complicates my final analysis. Ideally, I would like to work with 10-15 communities. Is there any way to adjust how many communities are ultimately detected in my network using a resolution/modularity parameter or otherwise?
I've seen similar questions for Louvain and other algorithms, but nothing specifically on the Cluster Info Map algorithm. Nor does the official documentation for the standard function show anything about a resolution parameter. https://search.r-project.org/CRAN/refmans/igraph/html/cluster_infomap.html
I'm wondering if there is some other workaround available? Example of my code below for a weighted, directed network.
ci <-cluster_infomap(network,
e.weights = E(network)$weight,
nb.trial=50)
The Infomap implementation included in igraph does not have any parameters that allow controlling the number of communities or the resolution. igraph's cluster_infomap is based on the old Infomap package.
The new Python Infomap package supports several features which you might find helpful, including hierarchical partitioning.
I've been searching paper about method review in graph clustering but not satisfied me,
please tell me what is best method (according to you) in graph clustering, so sorry if my question very general
Thanks
With such an open question, I guess I can recommend you to try WEKA.
It has a nice set of user interfaces to let you import your dataset and then try and compare various classification and clustering algorithms on your data, without writing even one line of code.
After you identified an algorithm that works for your problem, you can then search for a nice and fast implementation in the programming language of your choice.
EDIT: since you mentioned the graph tag, maybe you should have a look at Markov Cluster Algorithm, or else, you will have a hard time trying to represent your graph data in a format suitable for the distance based clustering algorithms in WEKA.
The problem
I have a set of locations on a plane (actually they are pins in a KML file) and I want to partition this graph into subgraphs. Connectivity is pretty good - as with all real world road networks - so I assume that if two locations are close they have some kind of connection. The resulting set of subgraphs should adhere to these constraints:
Every node has to be covered by a subgraph
Every node should be in exactly 1 subgraph
Every node within a subgraph should be close to each other (L2 norm distances)
Every subgraph should contain at least 5 locations
The amount of subgraphs should be minimal
Right now the amount of locations is no more than 100 so I thought about brute forcing through every possibility but this obviously won't scale well.
I thought about using some k-Nearest-Neighbors algorithm (e.g. using QuickGraph) but I can't get my head around where to start and how to extend/shrink the subgraphs on the way. Maybe it's possible to map this problem to another problem that can easily be solved with some numerical procedure (e.g. Simplex) ...
Maybe someone has experience in this kind of optimization problems and is willing to help me find a solution? I don't have access to Mathematica/Matlab or the like ... but sufficient .NET programming skills and hmm Excel :-)
Thanks a lot!
As soon as there are multiple criteria that need to be appeased in the best possible way simultanously, it is usually starting to get difficult.
A numerical solution could work as follows: You could define yourself a utility function, that maps partitionings of your locations to positive real values, describing how "good" a partition is by assigning it a "rating" (good could be high "bad" could be near zero).
Once you have such a function assigning partitions their according "values", you simply need to optimize it and then you hopefully obtain a good solution if you defined your utility function reasonably. Evolutionary algorithms are good at that task since your utility function is probably analytically too complex to solve due to its discrete nature.
The problem is then only how you assign "values" to partitions via this utility function. This is then your task. It can be done for example by weighing each criterion with a factor and summing the results up, or even more complex functions (least squares etc.). The factors you use in the definition of the utility function are tuning parameters and can be varied until the result seems to be good.
Some CA software wold help a lot for testing if you can get your hands on one, bit I guess to obtain a black box solver for your partitioning problem, you need to implement the complete procedure yourself using a language of your choice.
I am using the InfoMap algorithm in the igraph package to perform community detection on a directed and non-weighted graph (34943 vertices, 206366 edges). In the graph, vertices represent websites and edges represent the existence of a hyperlink between websites.
A problem I have encountered after running the algorithm is that the majority of vertices have a membership in a single massive community (32920 or 94%). The rest of the vertices are dispersed into hundreds of other tiny communities.
I have tried different settings with the nb.trials parameter (i.e. 50, 100, and now running 500). However, this doesn't seem to change the result much.
I am feeling rather exasperated because the run-time on the algorithm is quite high, so I have to wait each time for the results (with no luck yet!!).
Many thanks.
Thanks for all the excellent comments. In the end, I got it working by downloading and running the source code for Infomap, which is available at: http://www.mapequation.org/code.html.
Due to licence issues, the latest code has not been integrated with igraph.
This solved the problem of too many nodes being 'lumped' into a single massive community.
Specifically, I used the following options from the command line: -N 10 --directed --two-level --map
Kudos to Martin Rosvall from the Infomap project for providing me with detailed help to resolve this problem.
For the interested reader, here is more information about this issue:
When a network collapses into one major cluster, it is most often because of a very dense and random link structure ... In the code for directed networks implemented in iGraph, teleportation is encoded. If many nodes have no outlinks, the effect of teleportation can be significant because it randomly connect nodes. We have made new code available here: http://www.mapequation.org/code.html that can cluster network without encoding the random teleportation necessary to make the dynamics ergodic. For details, see this paper: http://pre.aps.org/abstract/PRE/v85/i5/e056107
I was going to put this in a comment, but it ended up being too long and hard to read in that format, so this is a tangentially related answer.
One thing you should do is assess whether the algorithm is doing a good job at finding community structure. You can try to visualise your network to establish:
Is the algorithm returning community structures that make sense? Maybe there is one massive community?
If not does the visualisation provide insight as to why?
This will help inform your next steps. Maybe the structure of the network requires a different algorithm?
One thing I find useful for large networks is plotting your edges as a heatmap. This is simple to do if you have your edges stored in an adjacency matrix.
For this, you can use the image function, passing in your matrix of edges as the argument z. Hopefully this will allow you to see by eye the community structure.
However you also want to assess the correctness of your algorithm, so you want to sort the nodes (rows and columns of your adjacency matrix) by the community they've been assigned to.
Another thing to note is that if your edges are directed it may be more difficult to assess by eye as edges can appear on either side of the diagonal of the heatmap. One thing you can do is instead plot the underlying graph -- that is the adjacency matrix assuming your edges are undirected.
If your algorithm is doing a good job, you would expect to see square blocks along the diagonal, one for each detected community.
I want to count no of objects in an image using open cv. I have a soybean image and now I want to count the soybean numbers. If possible please help me and let me know the counting algorithms.
Thanks and I look forward to hear from you.
Regards,
Sumon
Sumon,
There is no one algorithm for counting objects. It greatly depends on the image itself. Depending on the contrast of the beans to the background it may be possible to use a simple threshold then a labeling algorithm, or even just finding contours.
The threshold function in opencv is cvThreshold. The contour finding algorithm is cvFindContours using this you could count the number of contours found.
Also the blob library has many facilities for this type of machine vision applications including connected component labeling which is basically what you need here. The library's description I believe is already included in opencv. The description of it can be found here.
I could provide some more assistance if I knew a little more about the image.