Longest path in adjacency matrix in R

Longest path in adjacency matrix in R - r

Hypothetical scenario to have a descriptive example: I've a model consisting of 10 parts (vertices) to be put together. Each part can be connected to others (edges) as defined by a connection table.
There's a shortest.paths function in igraph. However here the aim is to find a way to calculate the longest path in the adjacency matrix. Resulting in a path using as many parts as possible, ideally all, so no part of the model is left alone in the end. MWE as follows:
library(igraph)
connections <- read.table(text="A B
1 2
1 7
1 9
1 10
2 7
2 9
2 10
3 1
3 7
3 9
3 10
4 1
4 6
4 7
7 5
7 9
7 10
8 9
8 10
9 10", header=TRUE)
adj <- get.adjacency(graph.edgelist(as.matrix(connections), directed=FALSE))
g1 <- graph_from_adjacency_matrix(adj, weighted=TRUE, mode="undirected")
plot(g1)
Edit:
The result should be something like: for instance if the first part of the model is 8 it could be combined with 9 or 10. Let's say 10 is selected next part can be either 1,2,7 or 9. If 9 is selected as next the follow up could be 1,2,3,7 or 8. If then 8 is selected the model would be finished as part 10 is already in use. The question then would be how to find a way/path to put together as many parts as possible, ideally all of them. The latter would be possible only by starting with 6 or 5.

There are cycles in your graphs, and I don't think you have stated that we cannot use the same vertex (part) more than once: and in this case the longest path might be infinitely long as you can traverse the cycle infinitely many times and then proceed to your destination.
As per your edit, I think this is not allowed. You can use dynamic programming for this I hope. You can start with DFS like algorithm and mark all the vertex except starting as unvisited. Then apply recursion to choose maximum between the longest paths from all the possible vertex we can reach (except which are already visited) from that given vertex.
It is an NP-hard problem, so you would have to check all the possible paths!
You can see: https://en.wikipedia.org/wiki/Longest_path_problem . You will have modify the algorithm to work in graphs with cycle by adding, as stated earlier, a flag to tell which vertices are already visited.

Tell me if i get it right, you are trying to find a path, that touch the maximum number of nodes?
If that so this is basically an instance of the Hamiltonian path problem, I would say an easier version of it if you can pass on a node more than 1 time.
You can try to watch that algorithm.
to respect you edit maybe, you can try to see the graphs search algorithms, you can find something here, however be advise that this type of algorithms are quite heavy on the memory complexity side.

Related

What is the correct name for those mathematical operations?

I am not an english speaker, however I need to write code where I need to include print messages in English, hence using english terminology from Math, statistics etc.
This is the case:
I have two lists and I compare them, let's say:
list 1 - 1 2 3 4 5
list 2 - 2 4 6
So naturally when I compare both lists you see that 2 4 are present in both lists. What is the operation itself called? Because when I try to translate it from my language to english it's "section" or "cutting". I don't believe that this is the official mathematical term for this operation.
Also I want to know what is it called when you show the things that are missing in both lists. For example 1 3 5 6 ?
Thanks and sorry for the silly question.

Intersection for {1,2,3,4,5} ; {2,4,6} = {2,4}
Symmetric difference for {1,2,3,4,5} ; {2,4,6} = {1,3,5,6}

How to check if a path exists between two nodes of length exactly x in an undirected graph?

An undirected graph is given (as an adjacency list or incidence matrix). For multiple queries, check if a path of length exactly x exists between two nodes. Same nodes can be visited more than once.
I know that for single queries it's easy to check for this, simply by raising the incidence matrix to the power x (number of steps) and checking if the value at [first node][second node] is greater that 0. This takes too long, and for bigger matrices it takes too much memory. Even more so for multiple queries.
How can I solve this problem using as little space and time as possible?
Example:
Graph
Queries:
Is it possible to reach 3 from 2 in 1 step? yes
Is it possible to reach 4 from 1 in 1 step? no
Is it possible to reach 5 from 5 in 8 steps? yes
Is it possible to reach 8 from 1 in 10 steps? no
Thank you in advance.

5 values for a face in PLY files?

I was just barely introduced to .ply files and I don't understand how they work. The vertex list has only 3 values for each vertex: x,y,z. But each face has 5 values and I don't know what those 5 values mean. I just need a little explanation. Thanks!

This simply means that the face has 5 sides or 5 vertices and each vertex's position is specified in space by the 3 values x,y,z.

The above answer is not exactly right to my knowledge.
The first value is the number of vertices which define the face, in your case it should be 4, as there are only 4 numbers left. The four other numbers are indices of the vertices. Like often in computer science, the index of the first vertex is 0, not 1.
Some programs though support only triangular meshes, and faces with more than 3 edges are not supported.
Furthermore, if you use a visualization program which adds light and other rendering stuff, the face 3 0 1 2 is not the same as 3 0 2 1 as the triangle has another orientation.

Point handling for closed loop searching

I have set of line segments. Each contains only 2 nodes. I want to find the available closed cycles which produces by joining line segments. Actually, I am looking for the smallest loop if there exist more than one occurrence. If can, please give me a good solution for this.
So, for example I have added below line list together with their point indices to get idea about m case. (Where First value = line number, second 2 values are the point indices)
0 - 9 11
1 - 9 18
2 - 9 16
3 - 11 26
4 - 11 45
5 - 16 25
6 - 16 49
7 - 18 26
8 - 18 25
9 - 18 21
10 - 25 49
11 - 26 45
So, assume I have started from the line 1. That is I have started to find connected loops from point 9, 18. Then, could you please explain (step by step) how I can get the "closed loops" from that line.

Well, I don't see any C++ code, but I'll try to suggest a C++ solution (although I'm not going to write it for you).
If your graph is undirected (if it's directed, s/adjacent/in-edges' vertices/), and you want to find all the shortest cycles passing through some vertex N, then I think you could follow this procedure:
G <= a graph
N <= some vertex in G
P <= a path (set of vertexes/edges connecting them)
P_heap <= a priority queue, ascending by distance(P) where P is a path
for each vertex in adjacent(N):
G' = G - edge(vertex, N)
P = dijkstraShortestPath(vertex, N, G')
push(P, P_heap)
You could also just throw out all but the shortest loop, but that's less succinct. As long as you don't allow negative edge weights (which, since you'll be using line segment length for weights, you don't), I think this should work. Also, fortunately Boost.Graph provides all of the necessary functionality to do this in C++ (you don't even have to implement Dijkstra's algorithm)! You can find documentation about it here:
http://www.boost.org/doc/libs/1_47_0/libs/graph/doc/table_of_contents.html
EDIT: you will have to create the graph from that data you listed first before you can do this, so you'll just define your graph's property_map accordingly and make sure the distance between a vertex you're about to insert and all vertexes currently in the graph is greater than zero, because otherwise the vertex is already in the graph and you don't want to insert it again.
Happy graphing!

Determine how different are some vectors

I want to differentiate data vectors to find those that are similar. For example:
A=[4,5,6,7,8];
B=[4,5,6,6,8];
C=[4,5,6,7,7];
D=[1,2,3,9,9];
E=[1,2,3,9,8];
In the previous example I want to distinguish that A,B,C vectors are similar (not the same) to each other and D,E are similiar to each other. The result should be something like: A,B,C are similar and D,E are similar, but the group A,B,C is not similar to the group of D,E. Matlab can do this?
I was thinking using some classification algorithm or Kmeans,ROC,etc.. but I'm not sure which one will be the best one.
Any suggestion? Thanks in advance

One of my new favourite methods for this sort of thing is agglomerate clustering.
First, concatenate all your vectors into a matrix, where each row is a separate vector. This makes such methods much easier to use:
F = [A; B; C; D; E];
Then the linkages can be found:
Z = linkage(F, 'ward', 'euclidean');
This can be plotted using:
dendrogram(Z);
This shows a tree, where each leaf at the bottom is one of the original vectors. Lengths of the branches show similarities and dissimilarities.
As you can see, 1, 2 and 3 are shown to be very close, as are 4 and 5. This even gives a measure of closeness, and shows that vectors 1 and 3 are deemed to be closer than vectors 2 and 3 (in the sense that, percentagewise, 7 is closer to 8 than 6 is to 7).

If all the vectors you are comparing are of the same length, a suitable norm on pairwise differences may well be enough. The norm to choose will depend on your particular criteria of closeness, of course, but with the examples you show, simply summing the absolute values of the components of the pairwise differences gives:
A B C D E
A 0 1 1 12 11
B 0 2 13 12
C 0 13 12
D 0 1
E 0
which doesn't need a particularly well-tuned threshold to work.

You can use pdist(), this function gives you the pairwise distances.
Various distance (opposite of similarity) metrics are already implemented, 'euclidean' seems appropriate for your situation, although you may want to try out the effect of different metrics.

Here it goes the solution I propose based on your results:
Z = [A;B;C;D;E];
Y = pdist(Z);
matrix = SQUAREFORM(Y);
matrix_round = round(matrix);
Now that we have the vector we can set the threshold based on the maximun value and decide with which theshold is the most appropriate.
It would be nice to create some cluster plot showing the differences between them.
Best regards