How to approximately match bowtie graphs using igraph? - r

I'm trying to learn how match_vertices works in igraph so that I can use it to join node and attribute information from graphs generated by a few different black box computer programs. I tried looking at the single provided example, but it was too complicated for me to understand. So, I tried to throw together an even simpler toy example which I am trying to understand.
library(igraph)
bow = make_graph(~ A - B - C - A - D - E - A)
tie = make_graph(~ a - e - d - a - c - b - a)
isomorphic(bow, tie)
It looks to me like the code should be:
A = get.adjacency(bow)
B = get.adjacency(tie)
P0 = diag(nrow(A))
corr = match_vertices(A, B,
start = P0,
m = 0,
iteration = 30)
corr$P
The permutation matrix didn't seem to change. The resulting permutation matrix is the same as the
starting permutation matrix. Why is that?
I defined a random permutation matrix and repeated the exercise to
see if that was always the case. It was!
random_permutation = function(n, ...) {
P = diag(n)
i = sample(1:n, ...)
P[i,,drop=FALSE]
}
Can anyone recommend some simple toy examples of using match_vertices that
demonstrate its main features?
show how to approximately match two graphs
show how to approximately match two graphs when some vertices are known
show how to approximately match two graphs when they have different
numbers of vertices
Also, are there any ways of matching graphs if even if the seeds aren't known
a priori, the nodes have maximally consistent attributes?

Related

Generating orthonormal vectors to a given set of vectors

I'm trying to simulate a problem in physics for which I require a Unitary operator in a Hilbert space with an inner product defined as transpose(x*)x. Given two orthogonal column vectors, I want to generate more orthonormal vectors. Here is the way I tried approaching this problem. I randomly generate complex vectors and subtract their projections onto other already available orthogonal vectors Similar to this. And then I check the norm with respect to the inner product (InProd). Here is an attempt to implement this using python.
def stinemod():
comped = [[1,0,0,0,0],[0,1,0,0,0]]
d = len(comped[0])
r = len(comped)
while(r<d):
randr = np.random.rand(d)
randc = np.random.rand(d)
vr = randr + 1j*randc
vo = vr
for v in comped:
vo = vo - (InProd(vr,v)*np.array(v)/InProd(v,v))
k = 1e-10
if(InProd(vo,vo)<k*InProd(vr,vr)):
pass
else:
r = r+1
comped.append(np.array(vr)/InProd(vr,vr))
return(np.transpose(comped))
But on running this code and checking unitarity using,
A = stinemod()
print(abs(np.matmul(np.transpose(np.conj(A)),A)))
Output:
[[1. 0. 0.28003392 0.24068132 0.1977418 ]
[0. 1. 0.53992755 0.24199218 0.06786818]
[0.28003392 0.53992755 0.58108559 0.29561698 0.23971144]
[0.24068132 0.24199218 0.29561698 0.21599542 0.18374313]
[0.1977418 0.06786818 0.23971144 0.18374313 0.21586778]]
I get an output suggesting that it is not Unitary which means columns are not orthonormal. I can't seem to figure out what the mistake in here is.

Add iteratively one edge in a graph generated by a k regular game maintaining past connections

I have generated an undirected regular graph with an even number of nodes with the same degree, e.g. k, by using the function k.regular.game of the R package igraph.
Now I need to iteratively add one edge to each node, so that in each iteration the degree remains constant for every node and it is equal to k + i, where i is the number of iterations performed.
In addition, I want connections to be preserved in each iteration, that is: the set of neighbors of agent j for iteration i should be the same of the set of neighbors of agent j for iteration i + 1 except for one connection: e.g., if j is connected to w and y when k = 2, j must be connected to w, y and z when k = 3.
My final goal is to obtain (n-1) graphs, where n is equal to the number of nodes in the regular graph. As a result, I will obtain that the first generated graph has k = 1 and the last generated graph has k = (n-1).
Any suggestion on how to do this?
This is a nice network problem solved with two partial solutions below.
Let's imagine there is a function which would bring a graph g from all degrees being 1 to all degrees being 2. It would have to be a graph with an even number of nodes.
increment.k <- function(g){}
It follows that increment.k will increase the degree of each node by one by adding |V|/2 edges to it - one edge for each two nodes in the graph. From what I understand from your problem specification, any of those edges must not connect agin two nodes that are already connected. This makes increment.k() a puzzle in which a random edge between two nodes might close the possibility for all nodes to reach the new k-value of degrees. What if a graph has k=1 and we start adding edges at random only to arrive at the last edge only to find that the only two nodes still with degree 1 are already connected?!
I cannot intuitively grasp if this allows for the possibility of graphs that cannot be incremented since no combination of random edges allows for the creation of |V|/2 edges between previously unconnected nodes. But I can imagine that such graphs exist.
I've done this example on a graph with 20 nodes (which consequently can have a k between 1 and 19):
g <- k.regular.game(no.of.nodes=20, k=1, directed=F)
What if you were to generate random k.regular.games with a higher k until you found a graph where the edges of your graph is a subset of the edges of the higher-k random graph? It should be spectacularly slow.
The problem, of course, is that you don't want to allow for duplicated arches. If not, the solution would be quite simple:
increase.k.allowing.duplicates <- function(graph){
if(length(V(graph))%%2!=0){
stop("k can only be incremented for graphs with an even number of nodes.")
}
# Add random edges to the graph and allow dual edges just to increase k
graph %>% add_edges(as.numeric(sample(1:length(V(graph)), length(V(graph)))))
}
The above code would solve the problem if double arches were allowed. This would return graphs of ever higher k, and would let k go towards infinity since the number of nodes of the graph don't set any maximum average degree of the graph.
I have come up with this Montecarlo approach below. To increase k by one, a given number of edges is added one by one between nodes, but if the loop runs out of alternatives when placing arches between nodes that are 1) not connected and 2) not already incremented to the higher k/degree, the process of creating a new graph with a higher k starts over. The function has a maximum number of tries start over in maximum.tries.
increase.k <- function(graph, maximum.tries=200){
if(length(V(graph))%%2!=0){
stop("k can only be incremented for graphs with an even number of nodes.")
}
k <- mean(degree(graph))
if(k != round(k) ){
stop("Nodes in graph do not have the same degree")
}
if(k >= length(V(graph))-1 ) {
stop("This graph is complete")
}
# each node has the following available arches before starting the iteration:
#posisble.arches <- lapply(neighbors(graph,1), function(x) setdiff(V(graph), x[2:length(x)]))
# Here we must lay the puzzle. If we run into a one-way street with the edges we add, we'll have to start afresh
original.graph <- graph
for(it in 1:maximum.tries){
# We might need many tries to get the puzzle right by brute-forcing
# For each try we increment in a loop to avoid duplicate links
for(e_ij in 1:(length(V(graph))/2)){
# Note that while(mean(degree(graph)) < k + 1){} is a logical posibility, but less safe
# Add a new edge between two nodes of degree k. i is any such node and j is any such node not already connected to i
i <- sample(as.numeric(V(graph)[degree(graph)==k]), 1)
js <- as.numeric(V(graph)[degree(graph) == k * !V(graph) %in% c(as.numeric(neighbors(graph,i)), i)])
# Abandon this try if no node unconnected to i and with degree == k exists
if(length(js)==0){break}
j <- sample(c(js), 1); if(length(js)==1){j<-js}
graph <- graph %>% add_edges(c(i,j))
}
# Did we lay the puzzle to completion successfully crating a random graph with a higher k?
if(mean(degree(graph)) == k+1){
# Success
print(paste("Succeded at iteration ", it))
break
} else {
# Failure, let's try again
graph <- original.graph
print("Failed")
}
}
(graph)
}
# Compare the two approaches
g1 <- increase.k.allowing.duplicates(g)
g2 <- increase.k(g)
degree(g1) == degree(g2)
l <- layout_with_gem(g2)
par(mfrow=c(1,2))
plot(g1, layout=l, vertex.label="")
plot(g2,layout=l, vertex.label="")
dev.off()
# Note that increase.k() can be run incrementally up untill a complete graph:
is.complete <- function(graph){mean(degree(graph)) >= (length(V(graph))-1)}
while(!is.complete(g)){
print(mean(degree(g)))
g <- increase.k(g)
}
# and that increase.k() cannot increase k in already complete graphs.
g <- increase.k(g)
The above code has solved the problem for some graphs. More iterations are needed to lay the puzzle the larger the graph is. In this example with only 20 nodes, each k-level can be generated from 1-19 relatively quickly. I did manage to get 19 separate networks from k=1 to k=19. But I have managed to get stuck in the loop also, which I take as evidence for the existing network structures of which k cannot be successfully incremented. Particularly since the same starting specification can get stuck sometimes, but manage to arrive at a complete graph on other occasions.
To test the function, I set the maximum.tries to 25 and tried to go from k=1 to 19 100 times. It never worked. The higher the k, the more difficult it is to lay the puzzle and find arches that fit, even though the next-to-last iteration is faster before a collapse. The risk of hitting the cap of 25 increased between the 15th and 18th iteration, and most graphs only made it to k=17.
It is possible to imagine this method being performed backwards starting at a complete graph, removing edges within a Montecarlo process which tries to remove edges to achieve a graph with all degrees at k-1. It should run into similar problems, though.
The code above is really an attempt to brute-force this problem without going into the underlying mathematics of graphs of this type. I am not a mathematician and lack the skills, but maybe the creation of a fail-safe k.increment()-function is a real and unsolved mathematical problem. If any graph-theoreticians come by this post, please enlighten us.

Linear regression / line finding for non-function lines

I want to find a line having a number of points which are around that line. Line is in 2D space and is defined by two points, or by one point and an angle. What would be the algorithm for it?
There's a lot about this on SO and in internet and also in Numerical Receipes, but all examples seem to focus on function form of the line (y=ax+b) which is not going to work well for (almost) vertical lines.
I could possibly detect if the line is more horizontal or more vertical, and swap coordinates in the other case, but maybe there exists some more elegant solution?
I'm using C# ATM but can translate from any code probably.
I'm sorry I can't provide a reference, but here's how:
Suppose your N (2d) data points are p[] and you want to find a vector a and a scalar d to minimise
E = Sum{ i | sqr( a'*p[i] - d) }/N
(The line is { q | a'*q = d} E is the sum of the squares of the distances of the data points from the line).
Some tedious algebra shows that
E = a'*C*a + sqr(d - a'*M)
where M is the mean and C the covariance of the data, ie
M = Sum{ i | p[i] } / N
C = Sum{ i | (p[i]-M)*(p[i]-M)' } / N
E will be minimised by choosing d = a'*M, and a to be an eigenvector of C corresponding to the smaller eigenvalue.
So the algorithm is:
Compute M and C
Find the smaller eigenvalue of C and the corresponding eigenvector a
Compute d = a'*M
(Note that the same thing works in higher dimensions too. For example in 3d we would find the 'best' plane).

Dimensions of fractals: boxing count, hausdorff, packing in R^n space

I would like to calculate dimensions of fractal written as a n-dimensional array of 0s and 1s. It includes boxing count, hausdorff and packing dimension.
I have only idea how to code boxing count dimensions (just counting 1's in n-dimensional matrix and then use this formula:
boxing_count=-log(v)/log(n);
where n-number of 1's and n-space dimension (R^n)
This approach simulate counting minimal resolution boxes 1 x 1 x ... x 1 so numerical it is like limit eps->0. What do you think about this solution?
Do you have any idea (or maybe code) for calculating hausdorff or packing dimension?
The Hausdorff and packing dimension are purely mathematical tools based in measure theory. They have wonderful properties in that context but are not well suited for experimentation. In short, there is no reason to expect that you can estimate their values based on a single matrix approximation to some set.
Box counting dimension, by contrast, is well suited for numerical investigation. Specifically, let N(e) denote the number of squares of side length e required to cover your fractal set. As you seem to know, the box counting dimension of your set is the limit as e->0 of
log(N(e))/log(1/e)
However, I don't think that just choosing the smallest available value of e is generally a good idea. The standard interpretation in the physics literature, as I understand it, is to presume that the relationship between N(e) and e should be maintained over a broad range of values. A standard way to compute the box-counting dimension is compute N(e) for some choices of e chosen from a sequence that tends geometrically to zero. We then fit a line to the points in a log-log plot of N(e) versus 1/e The box-counting dimension should be approximately the slope of that line.
Example
As a concrete example, the following Python code generates a binary matrix that describes a fractal structure.
import numpy as np
size = 1024
first_row = np.zeros(size, dtype=int)
first_row[int(size/2)-1] = 1
rows = np.zeros((int(size/2),size),dtype=int)
rows[0] = first_row
for i in range(1,int(size/2)):
rows[i] = (np.roll(rows[i-1],-1) + rows[i-1] + np.roll(rows[i-1],1)) % 2
m = int(np.log(size)/np.log(2))
rows = rows[0:2**(m-1),0:2**m]
We can view the fractal structure by simply interpreting each 1 as a black pixel and each zero as white pixel.
import matplotlib.pyplot as plt
plt.matshow(rows, cmap = plt.cm.binary)
This matrix makes a nice test since it can be shown that there is an actual limiting object whose fractal dimension is log(1+sqrt(5))/log(2) or approximately 1.694, yet it's complicated enough to make the box counting estimate a little tricky.
Now, this matrix is 512 rows by 1024 columns; it decomposes naturally into 2 matrices that are 512 by 512. Each of those decomposes naturally into 4 matrices that are 256 by 256, etc. For each such decomposition, we need to count the number of sub matrices that have at least one non-zero element. We can perform this analysis as follows:
cnts = []
for lev in range(m):
block_size = 2**lev
cnt = 0
for j in range(int(size/(2*block_size))):
for i in range(int(size/block_size)):
cnt = cnt + rows[j*block_size:(j+1)*block_size, i*block_size:(i+1)*block_size].any()
cnts.append(cnt)
data = np.array([(2**(m-(k+1)),cnts[k]) for k in range(m)])
data
# Out:
# array([[ 512, 45568],
# [ 256, 22784],
# [ 128, 7040],
# [ 64, 2176],
# [ 32, 672],
# [ 16, 208],
# [ 8, 64],
# [ 4, 20],
# [ 2, 6],
# [ 1, 2]])
Now, your idea is to simply compute log(45568)/log(512) or approximately 1.7195, which is not too bad. I'm recommending that we examine a log-log plot of this data.
xs = np.log(data[:,0])
ys = np.log(data[:,1])
plt.plot(xs,ys, 'o')
This indeed looks close to linear, indicating that we might expect our box-counting technique to work reasonably well. First, though, it might be reasonable to exclude the one point that appears to be an outlier. In fact, that's one of the desirable characteristics of this approach. Here's how to do so:
plt.plot(xs,ys, 'o')
xs = xs[1:]
ys = ys[1:]
A = np.vstack([xs, np.ones(len(xs))]).T
m,b = np.linalg.lstsq(A, ys)[0]
def line(x): return m*x+b
ys = line(xs)
plt.plot(xs,ys)
m
# Out: 1.6902585379630133
Well, the result looks pretty good. In particular, this is a definitive example that this approach can work better than the simple idea of using just one data point. In fairness, though, it's not hard to find examples where the simple approach works better. Also, this set is regular enough that we get some nice results. Generally, one can't really expect box-counting computations to be too reliable.

Expand large polynomial with Sage (List colouring, combinatorial Nullstellensatz)

Disclaimer: I have no experience with sage, programming or any computer calculations.
I want to expand a polynomial in Sage. The input is a factored polynomial and I need a certain coefficient. However, since the polynomial has 30 factors, my computer won't do it.
Should I look for somebody with a better computer or are 30 factors simply too much?
Here is my sage code:
R.<x_1,x_2,x_3,x_4,x_5,x_6,x_7,x_8,x_9,x_10,x_11,x_12> = QQbar[]
f = (x_1-x_2)*(x_1-x_3)*(x_1-x_9)*(x_1-x_10)*(x_2-x_3)*(x_2-x_10)*(x_2-x_11)*(x_2-x_12)*(x_3-x_4)*(x_4-x_11)*(x_4-x_5)*(x_4-x_6)*(x_4-x_11)*(x_5-x_6)*(x_5-x_10)*(x_5-x_11)*(x_5-x_12)*(x_6-x_7)*(x_6-x_12)*(x_7-x_9)*(x_7-x_8)*(x_7-x_12)*(x_8-x_9)*(x_8-x_10)*(x_8-x_11)*(x_8-x_12)*(x_9-x_10)*(x_10-x_11)*(x_10-x_12)*(x_11-x_12);
c = f.coefficient({x_1:2,x_2:2,x_3:2,x_4:2,x_5:2,x_6:2,x_7:2,x_8:2,x_9:2,x_10:5,x_11:5,x_12:5}); c
Just some background. I'm trying to solve an instance of list edge colouring with the combinatorial Nullstellensatz.
https://en.wikipedia.org/wiki/List_edge-coloring
Given a graph G=(V,E) we associate a variable x_i with each vertex i in V. The graph monomial eps(G) is defined as the product \prod_{ij \in E} (x_i-x_j). (Note that we fixed an orientation of the edges, but that's not important here.)
Suppose that there are lists of colours assigned to the vertices, such that the vertex i has a list of size a(i). Then, by the combinatorial Nullenstellensatz there is a colouring from those lists (i.e. each vertex receives a colour from its list and two adjacent vertices do not receive the same colour), if the coefficient of \prod_{i \in V} x_i^{a(i)-1} is non-zero in eps(G).
I want to apply this to the line graph of the graph G(M) with incidence matrix:
M = Matrix([0,0,0,3,3,0,3],[0,0,0,0,3,3,3],[0,0,0,3,0,3,3],[0,0,0,3,3,0,3],[3,0,3,0,0,0,6],[3,3,0,0,0,0,6],[0,3,3,0,0,0,6],[3,3,3,6,6,6,0])
(Here the size of the lists are indicated by the integers).
I believe that it takes so long because your coefficients are in QQbar, and arithmetic in QQbar is much slower than over QQ, for example. Is there a good reason for not using QQ?
If I change the coefficient ring to QQ, Sage fairly quickly tells me that c is 0:
sage: R.<x_1,x_2,x_3,x_4,x_5,x_6,x_7,x_8,x_9,x_10,x_11,x_12> = QQ[]
sage: f = (x_1-x_2)*(x_1-x_3)*(x_1-x_9)*(x_1-x_10)*(x_2-x_3)*(x_2-x_10)*(x_2-x_11)*(x_2-x_12)*(x_3-x_4)*(x_4-x_11)*(x_4-x_5)*(x_4-x_6)*(x_4-x_11)*(x_5-x_6)*(x_5-x_10)*(x_5-x_11)*(x_5-x_12)*(x_6-x_7)*(x_6-x_12)*(x_7-x_9)*(x_7-x_8)*(x_7-x_12)*(x_8-x_9)*(x_8-x_10)*(x_8-x_11)*(x_8-x_12)*(x_9-x_10)*(x_10-x_11)*(x_10-x_12)*(x_11-x_12)
sage: c = f.coefficient({x_1:2,x_2:2,x_3:2,x_4:2,x_5:2,x_6:2,x_7:2,x_8:2,x_9:2,x_10:5,x_11:5,x_12:5})
sage: c
0

Resources