igraph Components: Which Algorithm (citation)? - r

I'm using igraph in academic research and I need to provide a proper citation for the algorithm used in the components() command. This algorithm returns the connected components of the graph. The command in question is documented here. It's part of the R/CRAN igraph library.
I think the algorithm used is the one below, which seems to be the canonical workhourse algoirthm cited on the Wikipedia page for connected components.
Hopcroft, J.; Tarjan, R. (1973), "Algorithm 447: efficient algorithms for graph manipulation", Communications of the ACM, 16 (6): 372–378, doi:10.1145/362248.362272
Does anyone know what algorithm is used?

It should be noticed that, igraph in R is actually written in c/c++. If you want to dig into the the details about how components is implemented, you should trace back to its c or c++ source code.
Here is a link to the source code for components
https://github.com/igraph/igraph/blob/f9b6ace881c3c0ba46956f6665043e43b95fa196/src/components.c
However, it seems the algorithm applied is not mentioned in the source code. I guess you can reach the author by email and ask for help.

Related

Graph Clustering

I've been searching paper about method review in graph clustering but not satisfied me,
please tell me what is best method (according to you) in graph clustering, so sorry if my question very general
Thanks
With such an open question, I guess I can recommend you to try WEKA.
It has a nice set of user interfaces to let you import your dataset and then try and compare various classification and clustering algorithms on your data, without writing even one line of code.
After you identified an algorithm that works for your problem, you can then search for a nice and fast implementation in the programming language of your choice.
EDIT: since you mentioned the graph tag, maybe you should have a look at Markov Cluster Algorithm, or else, you will have a hard time trying to represent your graph data in a format suitable for the distance based clustering algorithms in WEKA.

System Dependance Graph with frama-c

I read that with frama-c, we can generate a PDG
which free tools can I use to generate the program dependence graph for c codes
My question is: there is a way for it to generate a SDG (It is a set of PDG, it aims to modelize interprocedural dependences)?.
Anybody could help me or could give me tips about which tools could generate the SDG.
Thank you
I'm not completely sure that it answers your question, but Frama-C's PDG plugin does have inter-procedural information, in the form of nodes for parameters and implicit inputs (globals that are read by the callee), as well as for the returned value and output locations (globals that are written). It uses results of the From plug-in to compute dependencies.
If I understand correctly PDG's API in Db.Pdg, you should be able to obtain all nodes corresponding to a given call with the Db.Pdg.find_simple_stmt_nodes function.

Write igraph clustering to file

I am currently testing various community detection algorithms in the igraph package to compare against my implementation.
I am able to run the algorithms on different graphs but I was wondering if there was a way for me to write the clustering to a file, where all nodes in one community are written to one line and so on. I am able to obtain the membership of each node using membership(communities_object) and write that to a file using dput() but I don't know how to write it the way I want.
This is the first time I am working with R as well. I apologize if this has been asked before.
This does not have to do much with igraph, the clustering is given by a simple numeric vector. See ?write.
write(membership(communities_object), file="myfile", ncolumns=1)
write(communities_object$membership, file="myfile", ncolumns=1) also work

Community detection with InfoMap algorithm producing one massive module

I am using the InfoMap algorithm in the igraph package to perform community detection on a directed and non-weighted graph (34943 vertices, 206366 edges). In the graph, vertices represent websites and edges represent the existence of a hyperlink between websites.
A problem I have encountered after running the algorithm is that the majority of vertices have a membership in a single massive community (32920 or 94%). The rest of the vertices are dispersed into hundreds of other tiny communities.
I have tried different settings with the nb.trials parameter (i.e. 50, 100, and now running 500). However, this doesn't seem to change the result much.
I am feeling rather exasperated because the run-time on the algorithm is quite high, so I have to wait each time for the results (with no luck yet!!).
Many thanks.
Thanks for all the excellent comments. In the end, I got it working by downloading and running the source code for Infomap, which is available at: http://www.mapequation.org/code.html.
Due to licence issues, the latest code has not been integrated with igraph.
This solved the problem of too many nodes being 'lumped' into a single massive community.
Specifically, I used the following options from the command line: -N 10 --directed --two-level --map
Kudos to Martin Rosvall from the Infomap project for providing me with detailed help to resolve this problem.
For the interested reader, here is more information about this issue:
When a network collapses into one major cluster, it is most often because of a very dense and random link structure ... In the code for directed networks implemented in iGraph, teleportation is encoded. If many nodes have no outlinks, the effect of teleportation can be significant because it randomly connect nodes. We have made new code available here: http://www.mapequation.org/code.html that can cluster network without encoding the random teleportation necessary to make the dynamics ergodic. For details, see this paper: http://pre.aps.org/abstract/PRE/v85/i5/e056107
I was going to put this in a comment, but it ended up being too long and hard to read in that format, so this is a tangentially related answer.
One thing you should do is assess whether the algorithm is doing a good job at finding community structure. You can try to visualise your network to establish:
Is the algorithm returning community structures that make sense? Maybe there is one massive community?
If not does the visualisation provide insight as to why?
This will help inform your next steps. Maybe the structure of the network requires a different algorithm?
One thing I find useful for large networks is plotting your edges as a heatmap. This is simple to do if you have your edges stored in an adjacency matrix.
For this, you can use the image function, passing in your matrix of edges as the argument z. Hopefully this will allow you to see by eye the community structure.
However you also want to assess the correctness of your algorithm, so you want to sort the nodes (rows and columns of your adjacency matrix) by the community they've been assigned to.
Another thing to note is that if your edges are directed it may be more difficult to assess by eye as edges can appear on either side of the diagonal of the heatmap. One thing you can do is instead plot the underlying graph -- that is the adjacency matrix assuming your edges are undirected.
If your algorithm is doing a good job, you would expect to see square blocks along the diagonal, one for each detected community.

Quickly cross-check complex math results?

I am doing matrix operations on large matrices in my C++ program. I need to verify the results that I get, I used to use WolframAlpha for the task up until now. But my inputs are very large now, and the web interface does NOT accept such large values (textfield is limited).
I am looking for a better solution to quickly cross-check/do math problems.
I know there is Matlab but I have never used it and I don't know if thats what will suffice my needs and how steep the learning curve would be?
Is this the time to make the jump? or there are other solutions?
If you don't mind using python, numpy might be an option.
Apart from the license costs, MATLAB is the state of the art numerical math tool. There is octave as free open source alternative, with a similar syntax. The learning curve is for both tools absolutely smooth!
WolframAlpha is web interface to Wolfram Mathematica. The command syntax is exactly the same. If you have access to Mathematica at your university, it would be most smooth choice for you since you already have experience with WolframAlpha.
You may also try some packages to convert Mathematica commands to MATLAB:
ToMatlab
Mathematica Symbolic Toolbox for MATLAB 2.0
Let us know in more details what is your validation process. How your data look like and what commands have you used in WolframALpha? Then we can help you with MATLAB alternative.

Resources