I have an n-partite (undirected) graph, given as an adjacency matrix, for instance this one here:
a b c d
a 0 1 1 0
b 0 0 0 1
c 0 0 0 1
d 0 0 0 0
I would like to know if there is a set of matrix operations that I can apply to this matrix, which will result in a matrix that "lists" all paths (of length n, i.e. through all the partitions) in this graph. For the above example, there are paths a->b->d and a->c->d. Hence, I would like to get the following matrix as a result:
a b c d
1 1 0 1
1 0 1 1
The first path contains nodes a,b,d and the second one nodes a,c,d. If necessary, the result matrix may have some all-0 lines, as here:
a b c d
1 1 0 1
0 0 0 0
1 0 1 1
0 0 0 0
Thanks!
P.S. I have looked at algorithms for computing the transitive closure, but these usually only tell if there is a path between two nodes, and not directly which nodes are on that path.
One thing you can do is to compute the nth power of you matrix A. The result will tell you how many paths there of length n from any one vertex to any other.
Now if you're interested in knowing all of the vertices along the path, I don't think that using purely matrix operations is the way to go. Bearing in mind that you have an n-partite graph, I would set up a data structure as follows: (Bear in mind that space costs will be expensive for all but small values.)
Each column will have one entry of each of the nodes in our graph. The n-th column will contain 1 in if this node is reachable on the n-th iteration from our designated start vertex or start set, and zero otherwise. Each column entry will also contain a list of back pointers to the vertices in the n-1 column which led to this vertex in the nth column. (This is like the viterbi algorithm, except that we have to maintain a list of backpointers for each entry rather than just one.) The complexity of doing this is (m^2)*n, where m is the number of vertices in the graph, and n is the length of the desired path.
I'm a little bit confused by your top matrix: with an undidrected graph, I would expect the adjacency matrix to be symmetric.
No, There is no pure matrix way to generate all paths. Please use pure combinatorial algorithms.
'One thing you can do is to compute the nth power of you matrix A. The result will tell you how many paths there of length n from any one vertex to any other.'
The power of matriax generates walks not paths.
Related
Disclaimer: This question is not the same question as other projection matrix questions.
So Projection Matrices are 4x4 Matrices that are multiplied with 4D vectors to flatten them onto a 2D plane. Like this one:
1 0 0 0
0 1 0 0
0 0 0 0
0 0 1 0
But in the explanation, it says that the x and y coordinates of the vector are divided by Z. But I don't understand how this works because each part of the matrix that is multiplied by Z is 0. A comment in another question on this subject said, "The hardware does this for you." And I didn't quite get what it meant by that. Thank you in advance!
I was confounded by this nomenclature issue, too. Here is a bit better explanation in regards to Vulkan: https://matthewwellings.com/blog/the-new-vulkan-coordinate-system/
After the programmable vertex stage a set of fixed function vertex operations are run. During this process your homogeneous coordinates in clip space are divided by wc
Clearly, calling those matrices projection matrices is very misleading if the actual perspective correction isn't actually done by them. :)
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have a bidirectional network, that is, a network where flow exists both from i->j and j->i. And I want to calculate the number of simple paths between each [i,j] and report in matrices according to the path length, that is, for each [i,j] pair there's a certain number of simple paths of length 2, 3, 4, etc, and I would like to calculate this and have the results being reported in: a reporting matrix of the number of simple paths of length 2 between i and j; a reporting matrix of the number of simple paths of length 3 between i and j, etc....
The solution I found was to create a code that would look to the original input matrix and search for paths of length x from i->n by looking to the connections of i with the other variables, then these with the other variables excluding i, and so on for x+1 variables until we got to x->n. E.g. for length two paths will look for i->x connection and any x->n connection. If this is true then there's a length two simple path between i and n. If the approach was left like this, when analysing bidirectional matrices or matrices with self-loops, the code would count self-loops has simple paths, and pass more than once by the same vertex. To solve this problem, in the conditions set in the code another parameter need to be verified. This parameter is a restriction on the assignment of the variables of the original matrix to our general variables, that is, when assigning a new general variable for the path search, the variable assigned cannot be one already assigned in that path search to another general variable:
* when looking for a path of length 2 between iand n, the variable to be assigned to x cannot be the one already assigned to i (this eliminates the self-loops from counting in as paths), and in the same way n cannot be assigned a variable already used either by i or x (this eliminates de reporting of cases of i->x->i has paths of length 2 and also eliminates de reporting of paths passing more than once by the same variable [i->x->x2->i for 3length paths e.g.]). So the code I use is basically this:
#the adjacency matrix
> MM<-matrix(c(1,1,0,0,0,1,1,1,1,0,0,1,1,1,0,0,1,0,1,1,0,0,0,1,1), 5, byrow=T)
> colnames(MM)<-c("A", "B", "C", "D", "E")
> row.names(MM)=colnames(MM)
> MM
A B C D E
A 1 1 0 0 0
B 1 1 1 1 0
C 0 1 1 1 0
D 0 1 0 1 1
E 0 0 0 1 1
#this is the reporting matrix where the results will be reported
> MMres2<-matrix(rep(0,length(MM)), sqrt(length(MM)))
> colnames(MMres2)=colnames(MM)
> row.names(MMres2)=row.names(MM)
#this is the code for the calculation and report of simple paths of lenght 2
> for(i in 1:dim(MM)){
for(j in 1:dim(MM)){
for(k in 1:dim(MM)){
if(MM[i,j]==1 & MM[j,k]==1 & j!=i & k!=i & k!=j){
MMres2[i,k]=MMres2[i,k]+1
}
}
}
}
#the reported results
> MMres2
A B C D E
A 0 0 1 1 0
B 0 0 0 1 2
C 1 1 0 2 1
D 1 0 1 0 0
E 0 1 0 0 0
If I want to calculate the number of simple paths of length 3 between any i->n we just need to had the condition of [x2,n]==1 and make sure we restrict the new variable to not be equal to any of the previously assigned ones.
And here, at last, lays my problem. I don't want to simply calculate the number of paths of length 2 or three or four, but all the possible (maximum possible length of a path is the total number of variables minus 1). Obviously, having a code for each path of length x for each matrix would be cumbersome, and for matrices with ever higher N number of variables, the more cumbersome would it be to create such code. To simplify this, the ideal solution would be to develop a code that would look for all pairs i and j and and calculate the number of paths between each for all the possible number of links per path up to paths of tot.var-1 links (that is, the maximum number of links on a path between each pair of i and j).
Take again the M2 matrix, the ideal code would look for the existence of a link between i and a x variable and then between x variable and j, and in the case of the condition being reported, it would report the result each time a path was found:
[i,x]==1 & [x’,j]==1 -> Res.mat[i,j] + 1
Where, x and x’ are any (and any number) of variables between i and j.
The point that differs this approach from the original above is that here x can be a multitude of variables, that is, in one iteration, when looking for a path of 2 links, x will be one variable, while one looking for a path of 3 links, x will be two variables and so forth.
E.g.:
For a path of length 2:
[i,xa]==1 & [xa,j]==1 -> Res.mat2[i,j] +1
For a path of length 3:
[i,xa]==1 & [xa,xb]==1 & [xb,j]==1 -> Res.mat3[i,j] +1
For a path of length 4:
[i,xa]==1 & [xa,xb]==1 & [xb,xc]==1 & [xc,j]==1 -> Res.mat4[i,j] +1
In this code, x would progressively assume all the other variables excluding i and j, and reporting each path for the respective reporting matrix, the ones of length two for the length2 reporting matrix, etc.
Sorry for the very, very long post, this is something I've been searching for long and talked with colleagues and no one seems to either understand or help me and that's why I made it in a long post to try and be the clearest possible.
So, anyone knows a how I can make this?
I'd like to split a sequence into k parts, and optimize the homogeneity of these sub-parts.
Example : 0 0 0 0 0 1 1 2 3 3 3 2 2 3 2 1 0 0 0
Result : 0 0 0 0 0 | 1 1 2 | 3 3 3 2 2 3 2 | 1 0 0 0 when you ask for 4 parts (k = 4)
Here, the algorithm did not try to split in fixed-length parts, but instead tried to make sure elements in the same parts are as homogeneous as possible.
What algorithm should I use ? Is there an implementation of it in R ?
Maybe you can use Expectation-maximization algorithm. Your points would be (value, position). In your example, this would be something like:
With the E-M algorithm, the result would be something like (by hand):
This is the desired output, so you can consider using this, and if it really works in all your scenarios. An annotation, you must assign previously the number of clusters you want, but I think it's not a problem for you, as you have set out your question.
Let me know if this worked ;)
Edit:
See this picture, is what you talked about. With k-means you should control the delta value, this is, how the position increment, to have its value to the same scale that value. But with E-M this doesn't matter.
Edit 2:
Ok I was not correct, you need to control the delta value. It is not the same if you increment position by 1 or by 3: (two clusters)
Thus, as you said, this algorithm could decide to cluster points that are not neighbours if their position is far but their value is close. You need to guarantee this not to happen, with a high increment of delta. I think that with a increment of 2 * (max - min) values of your sequence this wouldn't happen.
Now, your points would have the form (value, delta * position).
Per DICOM specification, a UID is defined by: 9.1 UID Encoding Rules. In other words the following are valid DICOM UIDs:
"1.2.3.4.5"
"1.3.6.1.4.35045.103501438824148998807202626810206788999"
"1.2.826.0.1.3680043.2.1143.5028470438645158236649541857909059554"
while the following are illegal DICOM UIDs:
".1.2.3.4.5"
"1..2.3.4.5"
"1.2.3.4.5."
"1.2.3.4.05"
"12345"
"1.2.826.0.1.3680043.2.1143.50284704386451582366495418579090595540"
Therefore I know that the string is at most 64 bytes, and should match the following regex [0-9\.]+. However this regex is really a superset, since there are a lot less than (10+1)^64 (=4457915684525902395869512133369841539490161434991526715513934826241L) possibilities.
How would one computes precisely the number of possibilities to respect the DICOM UID rules ?
Reading the org root / suffix rule clearly indicates that I need at least one dot ('.'). In which case the combination is at least 3 bytes (char) in the form: [0-9].[0-9]. In which case there are 10x10=100 possibilities for UID of length 3.
Looking at the first answer, there seems to be something unclear about:
The first digit of each component shall not be zero unless the
component is a single digit.
What this means is that:
"0.0" is valid
"00.0" or "1.01" are not valid
Thus I would say a proper expression would be:
(([1-9][0-9]*)|0)(\.([1-9][0-9]*|0))+
Using a simple C code, I could find:
f(0) = 0
f(1) = 0
f(2) = 0
f(3) = 100
f(4) = 1800
f(5) = 27100
f(6) = 369000
f(7) = 4753000
f(8) = 59049000
The validation of the Root UID part is outside the scope of this question. A second validation step could take care of rejecting some OID that cannot possibly be registered (some people mention restriction on first and second arc for example). For simplicity we'll accept all possible (valid) Root UID.
While my other answer takes good care of this specific application, here is a more generic approach. It takes care of situations where you have a different regular expression describing the language in question. It also allows for considerably longer string lengths, since it only requires O(log n) arithmetic operations to compute the number of combinations for strings of length up to n. In this case the number of strings grows so quickly that the cost of these arithmetic operations will grow dramatically, but that may not be the case for other, otherwise similar situations.
Build a finite state automaton
Start with a regular expression description of your language in question. Translate that regular expression into a finite state automaton. In your case the regular expression can be given as
(([1-9][0-9]*)|0)(\.([1-9][0-9]*|0))+
The automaton could look like this:
Eliminate ε-transitions
This automaton usually contains ε-transitions (i.e. state transitions which do not correspond to any input character). Remove those, so that one transition corresponds to one character of input. Then add an ε-transition to the accepting state(s). If the accepting states have other outgoing transitions, don't add ε-loops to them, but instead add an ε-transition to an accepting state with no outgoing edges and then add the loop to that. This can be seen as padding the input with ε at its end, without allowing ε in the middle. Taken together, this transformation ensures that performing exactly n state transitions corresponds to processing an input of n characters or less. The modified automaton might look like this:
Note that both the construction of the first automaton from the regular expression and the elimination of ε-transitions can be performed automatically (and perhaps even in a single step. The resulting automata might be more complicated than what I constructed here manually, but the principle is the same.
Ensuring unique paths
You don't have to make the automaton deterministic in the sense that for every combination of source state and input character there is only one target state. That's not the case in my manually constructed one either. But you have to make sure that every complete input has only one possible path to the accepting state, since you'll essentially be counting paths. Making the automaton deterministic would ensure this weaker property, too, so go for that unless you can ensure unique paths without this. In my example the length of each component clearly dictates which path to use, so I didn't make it deterministic. But I've included an example with a deterministic approach at the end of this post.
Build transition matrix
Next, write down the transition matrix. Associate the rows and columns with your states (in order a, b, c, d, e, f in my example). For each arrow in your automaton, write the number of characters included in the label of that arrow in the column associated with the source state and the row associated with the target state of that arrow.
⎛ 0 0 0 0 0 0⎞
⎜ 9 10 0 0 0 0⎟
⎜10 10 0 10 10 0⎟
⎜ 0 0 1 0 0 0⎟
⎜ 0 0 0 9 10 0⎟
⎝ 0 0 0 10 10 1⎠
Read result off that matrix
Now applying this matrix with a column vector once has the following meaning: if the number of possible ways to arrive in a given state is encoded in the input vector, the output vector gives you the number of ways one transition later. Take the 64th power of that matrix, concentrate on the first column (since ste start situation is encoded as (1,0,0,0,0,0), meaning only one way to end up in the start state) and sum up all the entries that correspond to accepting states (only the last one in this case). The bottom left element of the 64th power of this matrix is
1474472506836676237371358967075549167865631190000000000000000000000
which confirms my other answer.
Compute matrix powers efficiently
In order to actually compute the 64th power of that matrix, the easiest approach would be repeated squaring: after squaring the matrix 6 times you have an exponent of 26 = 64. If in some other scenario your exponent (i.e. maximal string length) is not a power of two, you can still perform exponentiation by squaring by multiplying the relevant squares according to the bit pattern of the exponent. This is what makes this approach take O(log n) arithmetic operations to compute the result for string length n, assuming a fixed number of states and therefore fixed cost for each matrix squaring.
Example with deterministic automaton
If you were to make my automaton deterministic using the usual powerset construction, you'd end up with
and sorting the states as a, bc, c, d, cf, cef, f one would get the transition matrix
⎛ 0 0 0 0 0 0 0⎞
⎜ 9 10 0 0 0 0 0⎟
⎜ 1 0 0 0 0 0 0⎟
⎜ 0 1 1 0 1 1 0⎟
⎜ 0 0 0 1 0 0 0⎟
⎜ 0 0 0 9 0 10 0⎟
⎝ 0 0 0 0 1 1 1⎠
and could sum the last three elements of the first column of its 64th power to obtain the same result as above.
Single component
Start by looking for ways to form a single component. The corresponding regular expression for a single component is
0|[1-9][0-9]*
so it is either zero or a non-zero digit followed by arbitrary many zero digits. (I had missed the possible sole zero case at first, but the comment by malat made me aware of this.) If the total length of such a component is to be n, and you write h(n) to denote the number of ways to form such a component of length exactly n, then you can compute that h(n) as
h(n) = if n = 1 then 10 else 9 * 10^(n - 1)
where the n = 1 case allows for all possible digits, and the other cases ensure a non-zero first digit.
One or more components
Subsection 9.1 only writes that a UID is a bunch of dot-separated number components, as outlined above. So in regular expressions that would be
(0|[1-9][0-9]*)(\.(0|[1-9][0-9]*))*
Suppose f(n) is the number of ways to write a UID of length n. Then you have
f(n) = h(n) + sum h(i) * f(n-i-1) for i from 1 to n-2
The first term describes the case of a single component, while the sum takes care of the case where it consists of more than one component. In that case you have a first component of length i, then a dot which accounts for the -1 in the formula, and then the remaining digits form one or more components which is expressed via the recursive use of f.
Two or more components
As the comment by cneller indicates, the part of section 9 before subsection 9.1 indicates that there has to be at least two components. So the proper regular expression would be more like
(0|[1-9][0-9]*)(\.(0|[1-9][0-9]*))+
with a + at the end indicating that we want at least one repetition of the parenthesized expression. Deriving an expression for this simply means leaving out the one-component-only case in the definition of f:
g(n) = sum h(i) * f(n-i-1) for i from 1 to n-2
If you sum all the g(n) for n from 3 (the minimal possible UID length) through 64 you get the number of possible UIDs as
1474472506836676237371358967075549167865631190000000000000000000000
or approximately 1.5e66. Which is considerably less than the 4.5e66 you get from your computation, in terms of absolute difference, although it's definitely on the same order of magnitude. By the way, your estimate doesn't explicitely mention UIDs shorter than 64, but you can always consider padding them with dots in your setup. I did the computation using a few lines of Python code:
f = [0]
g = [0]
h = [0, 10] + [9 * (10**(n-1)) for n in range(2, 65)]
s = 0
for n in range(1, 65):
x = 0
if n >= 3:
for i in range(1, n - 1):
x += h[i] * f[n-i-1]
g.append(x)
f.append(x + h[n])
s += x
print(h)
print(f)
print(g)
print(s)
I have a dataframe looks like
src dst sign
0 1 +1
1 2 -1
2 5 +1
1 0 -1
...
to describe a signed graph (with two types of edges: +/-)
I want to calculate edge embeddedness of this graph.
Currently, I am writing two nested loop (i.e., a brute-force attack: just count one by one).
You could imagine that this solution is very slow.
Is there a better way to perform the task?
Thank you very much,