R: merging nodes that have the same interactions from network table - r

I have a large network table that I want to simplify by merging nodes that share the same interactions, so that it will have a better network once imaged (I am using Cytoscape). The interactions do not have direction. As a mini example, if I have a table such as below.
A E
B E
C G
C H
D G
H D
E F
R S
The two columns are nodes that interact with each other. In this case since nodes A, B and F all only have connections to node E, I want to merge them so it's A,B,F as one node that interacts with E. Similarly since both C and D only interact with G and H I would want to merge them together. The resulting table should look something like below.
A,B,F E
C,D G
C,D H
R S
I have created a list with all the nodes, but I am not sure how to see if they have matching interactions since they can be in either column. Is there a good way/program to handle this?

For most networks, there would not be single solution to this problem. Once you start collapsing you begin to exclude other equally valid solutions. For example, even in your simplified example, these are other valid solutions:
A,B,F E
G,H C
G,H D
R S
or even
A,B,F E
C,D G,H
R S
And it gets more complicated once you include interactions across your relatively clean example, e.g., A G. But if your data really is partitioned like this example (which should be immediately apparent when you load it into iGraph or Cytoscape and perform any basic layout to see a bunch of separate networks), then you might be able to get a solution by querying all interaction partners for each node, then collapsing based on identical partner sets.
I don't know of a function iGraph or Cytoscape that can do this for you.
And if you network is not a partitioned set of multiple networks, then I don't think this is feasible at all.

Do I understand it right, that your cytoscape graph contains all said elements and the 'interactions' stand for edges in cytoscape? Because if your network has said elements and edges, you could, for example (this will not be a fast solution I think), make an array with all id' of your elements with
cy.nodes();
and then call
Object.keys(cy.edges("[target = '" + nodeId + "']")).length;
on every node in the array and save the number of edges going out of said node in the array. This way you can find all nodes with at least x nodes attached and then you can do whatever you want. You could e.g make the selected nodes to parents, so that the connected nodes are now inside the parent nodes?
Please tell me if this helps you or not :)

Related

Representation of graphs in a hash table

I'm currently writing my master thesis about clusterings in graphs. My prof said he wants the graph to be represented as a hash table. Because it needs less space than the adjency matrix and it is faster in checking if a edge exists between two vertices than adjency lists.
Anyway, I have a lot of problems understanding how a graph can be built with (perfect) hash functions. I know there should be two tables inside each other. The first includes every node and the second contains all the adjacent vertices. But how do I find a hash function that makes this correctly?
After I built the graph I have to assign a weight to each edge. Is it better to build a new graph or keep the old one? How can I assign the weights correctly to each edge and how do I save it?
And the last question: How fast can I do a degree query for one vertex? O(1)?
Sorry for all these questions but I read so many papers and I'm still confused.
Thank you in advance for any help!!!
Lisa
You have to ask your professor, but I would assume it is something simple.
E.g. let us say you have a triangle A,B,C then in the hash you just represent it as
A {B,C}
B {A,C}
C {A,B}
So the entry to the link A,B could be both from A and B.

Gremlin query cosmosDB to find a connection

I have a graph modeled this way. A --calls('for', 'Item1')--> B --calls('for', 'Item1')--> C --calls('for', 'Item1')--> D.
A calls B for Item1(property of the edge). B calls C and C calls D. There can be other chains as well in the graph that would have some vertex call D for Item1. How can i determine all such chains? All the ways in which D can be called for Item1.
Apologies if the question is too basic. My knowledge on graph is very minimal and I am using cosmosDB to model this.
I suppose I would start at "D" and follow "Item1" paths from there using repeat(). Assuming "D" is the actual T.id (element identifier):
g.V("D").repeat(inE('calls').has('for','Item1').outV()).emit().path()
The above is the beginnings of such a query. You might need a terminating condition to that repeat() loop and methods to avoid cycles (i.e. simplePath()) if your graph contains such things so that you avoid infinite traversals along such paths.

Enumerate all paths from a single source in a graph

I was wondering if you are aware of an algorithm to enumerate all possible simple paths in a graph from a single source, without repeating any of the vertices. keep in mind that the graph will be very small (16 nodes) and relatively sparse (2-5 edges per node).
To make my question clear:
Vertices: A,B,C
A connects to B, C
B connects to A, C
C connects to A, B
Paths (from A):
A,B
A,C
A,B,C
A,C,B
Vertices: A,B,C,D
A connects to B, C
B connects to A, C, D
C connects to A, B, D
Paths (from A):
A,B
A,C
A,B,C
A,B,D
A,C,B
A,C,D
A,B,C,D
A,C,B,D
It is surely not BFS or DFS, although one of their possible variants might work. Most of the similar problems I saw in SO, were dealing with pair of nodes graphs, so my problem is slightly different.
Also this Find all possible paths from one vertex in a directed cyclic graph in Erlang is related, but the answers are too Erlang related or it is not clear what exactly needs to be done. As I see, the algorithm could be alternatively be decribed as find all possible simple paths for a destined number of hops from a single source. Then for number of hops (1 to N) we could find all solutions.
I work with Java but even a pseudocode is more than enough help for me.
In Python style, it is a BFS with a different tracking for visited:
MultiplePath(path, from):
from.visited = True
path.append(from)
print(path)
for vertex in neighbors(from):
if (not vertex.visited):
MultiplePath(path, vertex)
from.visited = False
Return

Neo4J - Extracting graph as a list based on relationship strength

I have a typical friend of friend graph database i.e. a social network database. The requirement is to extract all the nodes as a list in such a way that the least connected nodes appear together in the list and the most connected nodes are placed further apart in the list.
Basically its asking a graph to be represented as a list and I'm not sure if we can really do that. For e.g. if A is related to B with strength 10, B is related to C with strength 80, A to C is 20
then how to place this in a list ?
A, B, C - no because then A is distant from C relatively more than B which is not the case
A, C, B - yes because A and B are less related that A,C and C,B.
With 3 nodes its very simple but with lot of nodes - is it possible to put them in a list based on relationship strength ?
Ok, I think this is maybe what you want. An inverse of the shortestPath traversal with weights. If not, tell me how the output should be.
http://console.neo4j.org/r/n8npue
MATCH p=(n)-[*]-(m) // search all paths
WHERE n <> m
AND ALL (x IN nodes(p) WHERE length([x2 IN nodes(p) WHERE x2=x])=1) // this filters simple paths
RETURN [n IN nodes(p)| n.name] AS names, // get the names out
reduce(acc=0, r IN relationships(p)| acc + r.Strength) AS totalStrength // calculate total strength produced by following this path
ORDER BY length(p) DESC , totalStrength ASC // get the max length (hopefully a full traversal), and the minimum strength
LIMIT 1
This is not going to be efficient for a large graph, but I think it's definitely doable--probably needs using the traversal/graphalgo API shortest path functionality if you need speed on a large graph.

Directed Acyclical Graph Traversal... help?

a little out of my depth here and need to phone a friend. I've got a directed acyclical graph I need to traverse and I'm stumbling into to graph theory for the first time. I've been reading a lot about it lately but unfortunately I don't have time to figure this out academically. Can someone give me a kick with some help as to how to process this tree?
Here are the rules:
there are n root nodes (I call them "sources")
there are n end nodes
source nodes carry a numeric value
downstream nodes (I call them "worker" nodes) preform various operations on the incoming values like Add, Mult, etc.
As you can see from the graph below, nodes a, b, and c need to be processed before d, e, or f.
What's the proper order to walk this tree?
I would look into linearization of DAGs which should be achievable through Topological sorts.
Linearization, from what I remember, basically sorts in an order which holds to the invariant that for all nodes (Node_X) that have an outdegree to any other given node NodeA, NodeX appears before NodeA.
This would mean that, from your example, nodes a,b, and d would be processed first. Node c second. Nodes e and f, last.
http://en.wikipedia.org/wiki/Topological_sorting
You need to process the nodes via a Topological sort. The sort is not necessarily unique so there might be more than one available order (not that this should matter anyway).
The linked wikipedia page should have concrete algorithms to help you.

Resources