How can Modularity help in Network Analysis [closed] - graph

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 4 years ago.
Improve this question
I have a large network of routers all interconnected in a community network. I am trying to see different ways in which i could analyse this network and gain helpful insights and ways in which it could be improved just by analyzing the graph(using gephi). So i came across this measure called "Modularity" which is defined as :
to measure the strength of division of a network into modules (also called groups, clusters or communities). Networks with high modularity have dense connections between the nodes within modules but sparse connections between nodes in different modules.
My question is, what can i learn by from the network by using the "Modularity" measure ? When i use it in gephi for example, the network is colored per segments but how could it be helpful ?

Modularity algorithm implemented in Gephi looks for the nodes that are more densely connected together than to the rest of the network (it's well explained in the paper they published on the website by the guy who created the algorithm - Google scholar it - Blondel, V. D., Guillaume, J., & Lefebvre, E. (n.d.). Fast unfolding of communities in large networks.)
So then when you implement this measure the colors indicate different communities determined by this algorithm and basically it'll show, in your case, which routers are more densely connected between each other than to the rest of the network.
To make this information really helpful, though, you have to juxtapose it at with at least one more measure. For instance, if you apply Betweenness Centrality measure (which shows which routers are connect the most different communities together or the most influential nodes in the network that serve as the junctions), you'd be able to identify the most vulnerable routers in every community, which should be monitored more closely. You could also filter out a community and identify the most connected routers within each community (highest degree measure), which would then show you which routers are important for that specific community.
All in all, modularity measure allows you to see vulnerable spots of your network and gives you a general idea about its structure.
There is also interesting research on modularity as the measure of network's robustness. For example, if your network has modularity that is too high, it's more robust against random external attacks, but it's also susceptible to targeted attacks on the most connected hubs (high betweenness centrality nodes). On the other hand, if it's too interconnected, you could put it down more easily if you wage a large scale attack on the routers (or if there's a blackout, for example). There's some good explanation of this in the paper (or video / slide show) on information epidemics here and a more general explanation of metastability vs modularity measure here.
Hope this helps, and let me know if you have more questions, I love this subject!

Related

Difference between iterative and incremental software process model? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed last month.
Improve this question
I have confusion in understanding the difference between iterative and incremental software process model?
what is the main reason of making these two software model if they both works same?
It's not that kind of difference to have an iterative process or an incremental process. Models use the terms to describe the process, but there are models that are both incremental and iterative.
See https://www.testingexcellence.com/iterative-incremental-development-agile/
There is a software design principal: Design iteratively, build incrementally.
From Wikipedia:
Iterative design is a design methodology based on a cyclic process of
prototyping, testing, analyzing, and refining a product or process.
...
The process should be repeated until user issues have been reduced to
an acceptable level.
From Righting Software by Juval Lowy:
While the car company may have had a team of designers designing a car
across multiple iterations, when it is time to build the car, the
manufacturing process does not start with a skateboard, grow that to a
scooter, then a bicycle, then a motorcycle, and finally a car.
Instead, a car is built incrementally. First, the workers weld a
chassis together, then they bolt on the engine block, and then they
add the seats, the skin, and the tires. They paint the car, add the
dashboard, and finally install the upholstery.
There are two reasons why you can build only incrementally, and not
iteratively. First, building iteratively is horrendously wasteful and
diffi cult (turning a motorcycle into a car is much more diffi cult than
just building a car). Second, and much more importantly, the
intermediate iterations do not have any business value. If the
customer wants a car to take the kids to school, what would the
customer do with a motorcycle and why should the customer pay for it?

Need advice to choose graph database [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I'm looking for options for graph database to be used in a project. I expect to have ~100000 writes (vertix + edge) per day. And much less reads (several times per hour). The most frequent query takes 2 edges depth tracing that I expect to return ~10-20 result nodes.
I don't have experience with graph databases and want to work with gremlin to be able to switch to another graph database if needed. Now I consider 2 possibilities: neo4j and Titan.
As I can see there is enough community, information and tools for Neo4j, so I'd prefer to start from it. Their capacity numbers should be enough for our needs (∼ 34 billion nodes, ∼ 34 billion edges). But I'm not sure which hardware requirements will I face in this case. Also I didn't see any parallelisation options for their queries.
On the other hand Titan is built for horizontal scalability and has integrations with intensively parallel tools like spark. So I can expect that hardware requirements can scale in a linear way. But there is much less information/community/tools for Titan.
I'll be glad to hear your suggestions
Sebastian Good made a wonderful presentation comparing several databases to each other. You might have a look at his results in here.
A quick summary of the presentation is here
For benchmarks on each graph databases with different datasets, different node sizes and caches, please have a look at this Github repository by socialsensor. Just to let you know, the results in the repo are a bit different that the ones in the presentation.
My personal recommendation is:
If you have deep pockets, go for Neo4j. With the technical support and easy CIPHER, things will go pretty quickly.
If you support Open Source (and are patient for its development cycles), go for Titan DB with Amazon Dynamo DB backend. This will give you "infinite" scalability and good performance with both EC2 machines and Dynamo tables. Check here for docs and here for their code for more information.

wireshark network topology [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
Does anyone know of a programme that can take a wireshark (pcap) trace and turn it into a visual network topology?
I have 3 pcap files with "ALOT" of data and I really want to see if I can make sense of some things.
I played with things like network miner but nothing that can give a visual cue to the data. For instance
You are in fact asking two questions:
How to discover the network topology from network traces
How to visualize the discovered topology
Topology Discovery
This is the hard part. The community has not yet have developed reliable tools, because network traffic exhibits so much hard-to-deal with crud. The most useful tool that comes to mind in this space is Bro, which creatse quality connection logs.
It is straight-forward to extract communication graphs, i.e., graphs that show who communicates with whom. By weighing the edges with some metric (number of packets/bytes/connections), you can get an idea about the relative contribution of a given node.
For more sophisticated analyses, you will have to develop some heuristics. For example, detecting routers may involve looking at packet forwarding behavior or extracting default gateways from DHCP ACK messages. Bro ("the Python for the network") allows you to codify such analysis in a very natural form.
Graph Visualization
The low-key approach involves generating GraphViz output. Afterglow offers some wrapping that makes the output more digestible. For inspiration, checkout out http://secviz.org/ where you find many examples on such graphs. Most of them have been created with afterglow.
There is also Gephi, a more fancy graph visualization engine, which supports a variety of graph input formats. The generated graphs look quite fancy and can also be explored interactively.

Do modern routers/network devices/ISPs prevent fake IP headers? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I know it's possible to modify the IP headers and change the source IP address, but it should be simple for network devices to detect and discard those messages. If they don't, why not? Does it add too much overhead?
The industry name for the feature you are asking about is called "Unicast Reverse Path Forwarding" (or as Cisco calls it, "uRPF"); it is defined in RFC 3704 and is considered a Best Current Practice (see BCP38).
Speaking at a very high level, most of the hardware used by ISPs has this feature built into an ASIC; normally there is not a huge penalty for turning it on. Sometimes there are feature conflicts, but again this is not a huge deal in most cases.
The biggest reason it isn't universal is because the internet is still somewhat like the American "wild west" in the 1800s; consider them analagous to a town's sheriff. The policies and competency of the engineering/operational personnel varies, and many ISPs are too busy with making things "work" that they don't have cycles to make things "work well".
That dynamic is particularly true in smaller countries; I worked for a large network equipment manufacturer in a previous life and occasionally traveled throughout southeast asia conducting ISP seminars. Smaller countries are often half a decade (or more) behind the practices and competency of ISPs here in the US (that's not to say that US ISPs are terribly great on the whole either, but they are generally much better off than, say, some of the ISPs operating in the smaller islands in the Pacific).
This results in the non-trivial amount of spamming / hacker traffic on the internet today... it's there because they have no lack of places to hide. Source IP address spoofing is one of their first lines of defense.

Are there implementations of algorithms for community detection in graphs? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I am looking for implementations of community detection algorithms, such as the Girvan-Newman algorithm (2002). I have visited the websites of several researchers in this field (Newman, Santo, etc.) but was unable to find any code. I imagine someone out there published implementations of these algorithms (maybe even a toolkit?), but I can't seem to find it.
Community detection algorithms are sometimes part of a library (such as JUNG for java) or a tool (see Gephi). When authors publish a new method, they do sometimes make their code available. For example, the Louvain and Infomap methods.
Side note: Girvan-Newman algorithm is sometimes still used, but it has mostly been replaced by faster and more accurate methods. For a good overview of the topic, I recommend Community detection algorithms: a comparative analysis or the longer Community detection in graphs (103 pages).
You should have a look at the igraph library:
7 community detection algorithms (including those mentionned above):
Edgebetweenness (Girvan-Newman link centrality-based approach),
Walktrap (Pons-Latapy random walk-based approach),
Leading Eigenvectors (Newman's spectral approach),
Fast Greedy (Clauset et. al modularity optimization),
Label Propagation (Raghavan et. al),
Louvain (Blondel et. al, modularity optimization),
Spinglass (Reichardt-Bornholdt, modularity optimization),
InfoMap (Rosvall-Bergstrom, compression-based approach).
Other related functions: process modularity, deal with hierarchical structures, etc.
Available in R, C and Python
Open source
To my opinion, the most complete tool for community detection.
For more details, also check: What are the differences between community detection algorithms in igraph?
You can try the SNAP library (Stanford Network Analysis Platform, http://snap.stanford.edu/), which includes Modularity, Girvan-Newman and Clauset-Newman-Moore algorithms. It's written in C++, and is under the BSD licence. As a number of papers have used it (see, http://snap.stanford.edu/papers.html), it should be good.
We have recently implemented our algorithm, which is based on Constant Potts Model, fast Louvain optimization, and reliable map equation of InfoMap for weighted and signed networks. Here is the open source java project + an executable jar.

Resources