What is the difference (if any) between a node and a vertex? I can't find the answer after looking at countless sites! Even my book doesn't specify it so I am kind of lost!
It is worth mentioning that I am looking for the difference besides the fact that it is called a 'vertex' when used in a graph and a 'node' when used in a tree.
There are no differences between the words Node and Vertex. Even in some books that explain graph theory and graph algorithms they name it as:
Vertex denoted by v, and sometimes it's called nodes also
There are no major nor minor differences between them.
This is mentioned in the book: Data structure and Algorithms with Object Oriented Design Patterns in C#, Bruno R, Preiss.
In "The Practitioner's Guide to Graph Data", the author avoid the term "node/nodes" and only use vertex/vertices and they explain it as below:
...because we are focusing on distributed graphs, and nodes has different meanings in distributed systems, graph theory and computer science.
In distributed systems, a node can be a client, server or peer, while in computer network it can be a computer or a modem. In computer science, as you already point out, it could be used either for graph theory or tree system.
So in the context of graph theory, node and vertex are used interchangeable. But if you would like to make it clear and avoid any misunderstanding, vertex/vertices is the way to go.
In think both terminologies come from the different perception of graphs and networks. Albert-László Barabási writes in his recent text book.
"In the scientific literature the terms network and graph are used interchangeably:
Network science
Graph theory
Network
Graph
Node
Vertex
Link
Edge
Yet, there is a subtle distinction between the two terminologies: the {network, node, link} combination often refers to real systems: The WWW is a network of web documents linked by URLs; society is a network of individuals linked by family, friendship or professional ties; the metabolic network is the sum of all chemical reactions that take place in a cell. In contrast, we use the terms {graph, vertex, edge} when we discuss the mathematical representation of these networks: We talk about the web graph, the social graph (a term made popular by Facebook), or the metabolic graph. Yet, this distinction is rarely made, so these two terminologies are often synonyms of each other."
<tl;dr> Same, same, but different.
There is no difference between a node and a vertex. Most books use V to represent the vertex of a graph. I've seen node mostly associated with a tree.
For instance, you may have come across O(V + E) being used to represent the time complexity for depth first search and breadth first search graph traversals.
Similarly, V is used as part of time complexity analysis for other graph algorithms like Prim's, Kruskal's, etc.
Related
As my question title suggests, I have a confusion about the fat-tree structure.
I am trying to write a program, where I get a certain number of nodes as my input and I should generate an output that builds a fat-tree topology out of them.
For example, if my input is 4, my output must represent a fat-tree topology made by 4 nodes(n1,n2,n3,n4)
As far as I could read, fat-tree topology is only dependant on the number of ports rather than the nodes. This is why I am confused about whether it is possible to create a fat-tree structure with the number of nodes as my only input at all!.
I am very new to networking concepts, I would appreciate any guidance
If I understood the question, you have a certain amount of nodes in input, and you want to build a FatTree topology with these nodes.
Unfortunately, you cannot create a complete FatTree topology with an arbitrary number of nodes.
If you are confused about the construction, I suggest to have a look at this link
For my master thesis, I explored some data center topologies and their feasability for network tomography-based monitoring applications. This resultet in a few python models—FatTree included—implemented using the networkx library, that are available on Github. The code is not the prettiest, especially the visualisation parts, and could surely be improved, but I hope it can still be useful to gain an intuition about how these topologies scale.
If you start playing around with the different scales of the FatTree you will quickly see, that Giuseppe is right. A fat-tree has a very strict structure that is only dependant on the port number parameter. It is therefore indeed not possible to construct a fat-tree with an arbitrary number of nodes.
Although I'm late in answering this, and others have already given the correct answer, I'd still like to add some value with respect to FatTree topology design.
-> For a k-port switch based Fattree topology, you can derive these values by using tree data structure properties and the topology requirements:
- number of core switches = (k/2)^2
- number of pods = k
- number of aggregation switches in each pod = k/2
- number of edge switches in each pod = k/2
- each aggregation switch connected to k/2 core switches and k/2 edge switches
- each edge switch connected to k/2 aggregation switches and k/2 nodes
- i.e., each pod consists of (k/2)^2 nodes
- number of nodes possible to be connected to the network = (K^3)/4
Since the number of servers possible to be connected to this network is expressed in terms of k, now you can clearly see that you can't create a fattree topology with any number of nodes. The count of nodes can only take the forms (k^3)/4 for even values of k (to be integer values), e.g., 16, 54, and so on. So, in other words, you can't have a proper fattree topology with random node count (different than listed above or if not expressed as above)!
I am given a list of edges (each of which have a 'From' and 'To' properties that specify which vertices they connect).
I want to either return a null if they don't form the cycle in this (undirected) graph, or return the list forming a cycle.
Does anyone know how I would go about such a problem? I am clueless.
The way I've been taught to do this involves storing a list of visited vertices.
Navigate through the graph, storing each vertex and adding it to the list. Each turn, compare the current vertex to the list - if it is present, you've visited it before, and are therefore in a cycle.
Algorithms of this type are called Graph Cycle Detection Algorithms. There are some intricacies about which algorithms to select based on the needs of the application or the context of the problem. For example, do you want to find the first cycle, the shortest cycle, longest cycle, all of the cycles, is the graph unidirectional or bidirectional, etc.?
There are numerous cycle detection algorithms to select from, depending on the need and the allowable computational complexity and the nature of the cycle (e.g. finding first, longest, etc.). Some common algorithms include the following:
Floyd's Algorithm
Brent's Algoritm (see also here)
Tarjan's Strongly Connected Components Algorithm (see also this Stack Overflow post)
The specific algorithm you select will depend on your need and how efficient the algorithm must be. If serious efficiency is needed, I would suggest looking through some scholarly articles on the topic and compare and contrast some of the trade-offs of various algorithms.
I know there has some famous graph partition algo tools like METIS which is implemented by karypis Lab (http://glaros.dtc.umn.edu/gkhome/metis/metis/overview)
but I wanna know is there any method to partition graph stored in Neo4j?
or I have to dump the Neo4j's data and transform the node and edge format manually to fit the METIS input format?
Regarding new-ish and interesting algorithms, this is by no means exhaustive or state of the art, but these are the first places I would look:
Specific Algorithm: DiDiC (Distributed Diffusive Clustering) - I used it once in my thesis (Partitioning Graph Databases)
You iterate over all nodes, then for each node retrieve all neighbors, in order to spread some of "some unit" to all your neighbors
Easy to implement.
Can be made deterministic
Iterative - as it's based on iterations (like Super Steps in Pregel) you can stop it at any time. The longer you leave it the better the result, in theory (though in some cases, on certain graph shapes it can be unstable)
When we implemented this we ran it for 100 iterations on a machine with ~30GB RAM, for up to ~4 million nodes - it took no more than two days to complete.
Specific Algorithm: EvoCut "Finding sparse cuts locally using evolving sets" - local probabilistic algorithm from Microsoft - related to these papers
Difficult to implement
Local algorithm - BFS-like access patterns (random walks)
It's been a while since i read that paper, but i remember it was built on clean abstractions:
EvoNibble (pluggable - decides how much of neighborhood to add to the current cluster
EvoCut (calls EvoNibble multiple times to find the local cluster)
EvoPartition (calls EvoCut repeatedly to partition entire graph)
Not deterministic
General Algorithm Family: Hierarchical Graph Clustering
From a high level:
Coarsen the graph by collapsing nodes into aggregate nodes
coarsening strategy is selectable
Find clusters in the coarsened/smaller graph
clustering strategy is selectable
Incrementally decoarsen the graph, refining at the clustering at each step
refining strategy is selectable
Notes:
If the graph changes slowly (or results don't need to be right up to date) it may be possible to coarsen once (or infrequently) then work with the coarsened graph - to save computation
I don't know of a specific algorithm to recommend
General limitations - the things few clustering algorithms do:
Node types not acknowledged - i.e., all nodes treated equally
Relationship types not acknowledged - i.e., all relationships treated equally
Relationship direction not acknowledged - i.e., relationships treated as undirected
Having worked independently with METIS and Neo4j in the past, I am not aware of any tool for generating a METIS file from Neo4j. That being said, writing such a tool should be an easy task and would be a great community contribution.
Another approach for integrating METIS with Neo4j might be in connecting METIS to Neo4j from C++ via JNI. However this is going to be much more involved as it would have to take care of things like transactions, concurrency etc.
On the more general question of partitioning graphs, it is quite possible to implement some of the more known and simple algorithms with reasonable effort.
So I have an un-directed un-weighted graph. It contains cycles. I would like to find the path which visits the most nodes with no repeat visits to any node. Since this is a graph traversal, you can start and end at any node you like.
Background Research:
I have looked at Travelling Salesman Problem (TSP); this problem is different and does NOT allow you to finish where you started from and there are no weights. I have looked at several other algorithms, but have found none suitable for this problem.
Graph Size: There are 100 nodes in the graph; with 10 disconnected nodes.
UPDATE: I have moved this to: https://math.stackexchange.com/questions/243375/what-is-the-maximum-number-of-nodes-i-can-traverse-in-an-undirected-graph-visiti
Look for the Hamiltonian Cycle problem
http://en.wikipedia.org/wiki/Hamiltonian_cycle
You should take a look at the wikipedia entry which has an algorithm for acyclic graphs. Your graph has cycles which makes your problem NP-hard.
I would try and create a DAG with nodes representing strongly connected components. Then you could at least find the path that visits the most strongly connected components. You could then expand that path by replacing the individual (strongly connected components) nodes with the longest paths in each of the subgraphs.
Finding the longest paths in the subgraphs is now the same as your original problem but at least you graphs are smaller. If your in luck, the subproblems are easy and your done. In the general case they might not be so small and you could use some advanced heuristics. Maybe have a look at this paper or this question (you could use the answer there to solve your problem completely but i'm not sure)
I have graphs with thousands of nodes to millions of nodes. I want to detect all possible cycles in such graphs.
I use hash table to store the edges. ( (source node,edge weight) -> (target node) ).
What can be the efficient way of implementing it in OCaml?
Its looks like Tarjan's algorithm is the best one.
What can be the most implementation for the same.
Yes, Tarjan's algorithm for strongly connected components is a good solution. You may also use so-called path-based strong component algorithms which have (when done carefully) comparable linear complexity.
If you pick reasonable data structures, they should work. It's hard to say much more before you implemented and profiled a prototype implementation.
I don't understand what your graph representation is: are you hashed keys really a (node,weight) couple? Then how do you find all neighbors of a given node? For a large graph structure you should optimize access time, of course, but also memory efficiency.
If you really want to find all possible cycles, the problem seems at least exponential in the worst case. For a complete graph, every nonempty subset of nodes gives you a different cycle (including a link from the last back to the first). Forthermore every cyclic permutation of every subset gives you a different cycle. Depending on the sparsity of your graphs, the problem could be tractable in practice.