How to optimize the building of a graph? - graph

I am working on a problem where I want to travel through a graph. However I can see when I profilate my code that the building of the graph is the heavy part.
Every node should have a value with fixed length M. The graph should contain all combinations of base 2. Thus for example for M = 3, we have: "000" "001" "010" "011" "100" "101" "110" "111", i.e. 2^M = 8 combinations.
I then want to link the nodes together in a very specific way. Every node has two outoing edges, with value "0" and "1". For example "000" will be connected to "001" with edge 1 since if I delete the first number to the right and add the edges value at the end I will end up with "001". Similarily "111" is connedted to "110" by the edge "0".
Help needed. Note that the nodes does not necessary have to be represented with String but this was how I implemented but it seems to run too slow. The important things here is that the nodes are connected correctly.
I have solved this by storing the nodes in a HashTable and then looping through the whole set to connect the nodes to eachother.
Suggestions appreciated how to make this smarter.

UPDATE:
So you basically want to take a number and derive two numbers from it
shift it one bit to the left and unset the first bit, and let the last bit be zero
the same as above but set the last bit to be one
Now this number is connected to these 2 numbers described above.
That is my understanding.
Here is some code I wrote to compute such a graph:
import pygraphviz as pgv
# length of binary codes
for n in range(3,8):
def b(x):
return str(bin(x))[2:].zfill(n)
G=pgv.AGraph(directed=True)
for i in range(1,2**n):
for j in range(1,2**n):
I = b(i)
J = b(j)
# we make room for another bit (the zero bit)
i1 = i << 1
# we unset the first bit
i1 = i1 & ~(1<<(n+1))
# we copy the previous result
i2 = i1
# we set the last bit
i2 = i2 | 1
if i1 == j :
G.add_edge(I,J,label="0")
elif i2 == j:
G.add_edge(I,J,label="1")
G.layout(prog='dot')
G.draw("graph"+str(n)+".png")
n=3
n=4
n=5
n=6
P.S. Initially I tried using networkx, but soon realized pygraphviz was much easier to use for this.

Given that your vertexes are actually numbers, why don't you use adjacency matrix where column and row numbers represent the vertexes?

Related

Generate sets from given overlap matrix

Note: I edited the original question to explain more precisely.
While I was doing a simulation for my new method, I needed to generate a special type of dataset consists of multiple subset. The problem is that there is some "shared" variables across the subsets, and the number of shared variable is called "overlap" here. Since the distribution of overlap proportion is given, I need to generate an appropriate list of variables and their overlap follows the given distribution. But I have failed to implement such algorithm...
I am not sure whether there is a specific algorithm for this kind of question,
but I have failed to find such thing after a long search.
I prefer R solution, but anything others also will be very appreciated. Please help me to solve this problem! Thank you so much in advance!
The below is a standardized explanation for my problem. I tried to explain as general as possible I can, but please give me any suggestion if it is not sufficient.
Purpose: Generate n sets from given overlap matrix of elements. Each set contains k elements.
Input: There is a n*n matrix whose (i,j)th cell value represents a number of overlapped elements from (i)th set to (j)th set.
Output: A list of k element identifiers (whatever can be used such as number) for n sets.
Assumption: The number of elements for each set is k, and it is same across all n sets. Hence, the input matrix is symmetric.
Example (assumes k=3 and n=3)
Input
3 1 0
1 3 1
0 1 3
Output
Set 1: A B C
Set 2: A D E
Set 3: D F G
In the above example input, (1,2)th and (2,1)th cells are 1 because set 1 and 2 share "A" element and vice versa, and diagonal cells are 3(=k) because each set shares all elements with itself.
I would repeat the following process until I had accounted for all the matrix entries:
1) Treat the matrix as the adjacency matrix of a graph, and find the largest clique in it. That is, find the largest possible set S of indexes such that for all i, j in set S M(i,j) > 0
2) Create an item that is in all of the sets which correspond to the indexes in S - in fact, if the minimum value of M(i,j) = v, create v such items.
3) subtract v from M(i,j) for all i, j in set S, accounting for the counts generated by the items you have just created.

R code for simulation of a of setup for picking tiles from a bag

The problem is there is box with 5 tiles numbered 1,2,3,4,5. I pick 2 tiles note the numbers and drop the tiles in the bag. And then I pick 2 tiles again and note the numbers. What is probability that there is no overlap between the numbers? Say got 1,4 the first time and then the second time I get 3,5. No overlap. The theoretical result is 3/10. But this simulation is keeps giving me an answer close to 0.5. Any insights about what I am doing wrong? Could it be sample function in R ?
I make a matrix with all possible pairs you could get with 5 tiles 1,2 1,3 etc and then generate two random numbers which give the row numbers. I assume these are the two draws of numbers and see if they are equal.
set.seed(1234)
n=10000
count=0
t<-cbind(c(1,1,1,1,2,2,2,3,3,4),c(2,3,4,5,3,4,5,4,5,5))
idx<-sample(1:10,2*n,replace=T)
i<-idx[1:n]
j<-idx[(n+1):(2*n)]
for( ii in 1:n) {
if( (t[i[ii],1] != t[j[ii],1]) && (t[i[ii],2] != t[j[ii],2]))
count=count+1
}
count/n
[1] 0.5004
Any insights will be helpful. I am sure the theoretical answer is 3/10
It's been awhile since I've used R so apologies if I'm a little rusty. Seems to me you're almost there. The problem is in your if statement within the for loop. You're testing whether the first number in the first pair is different from the first number in the second pair AND the second number in the first pair is different from the second number in the second pair. But you're forgetting about whether the first number in the first pair is different from the second number in the second pair AND the second number in the first pair is different from the first number in the second pair. Here's the full line:
if(
(t[i[ii],1] != t[j[ii],1]) &&
(t[i[ii],2] != t[j[ii],2]) &&
(t[i[ii],1] != t[j[ii],2]) &&
(t[i[ii],2] != t[j[ii],1])
) count=count+1
There might be other ways to accomplish this, but this seems to do the trick. I get about 0.3 for the result. And thanks for the opportunity to think about R again.
I would not use a loop. 10000 observations is not big enough to prevent you from building a data.frame with your samples. In the following code, I take samples twice and put it in a 10000 rows by 4 column object. I then identify which rows have duplicated picks. I then divide by your total number. The 1- is there because the code counts duplicateds. My result is in line with the theoretical number.
n <-10000
res <-cbind(t(replicate(n,sample(1:5,2,replace=FALSE))),t(replicate(n,sample(1:5,2,replace=FALSE))))
1-sum(apply(apply(res, 1, duplicated),2,any))/n
#[1] 0.2979

skipping recursion

The first line is a number, int x. The following m lines contain letters. After m lines, you read in a number, int y.
The goal is to find the soluiton number, int y, from recursion of 1 letter from each line.
The problem states that there’s a much faster solution which avoids going through each possible password. That is where my question is. How can this be done? Any help would be greatly appreciated.
This is not that complicated. You can count the number of letters in the m lines. Then you calculate a value for every line of letters, that specifies how many possible solutions are skipped, if one letter in the referenced line is skipped. Visualized:
abc -> 3 letters
xy -> 2 letters
dmnr -> 4 letters
if you skip from the n-th letter to the n+1-th letter in the "abc" line, you skip as many possible solutions as the product of the length of every following line says. so you skip 2*4 solutions -> 8 solutions.
repeat this step for xy -> 4 solutions skipped.
the last line skips alwasy 1 solution, because it is the recursion path itself.
so now you know, how many solutions you skip, if you skip to some specific letter. the last thing is simple. you start with 1 and add the calculated value of each line to the number, until it reaches exactly r.
means in c++:
int v = 1, r=10;
int i1=0, i2=0, i3=0;
while (v<=r-8) {
i1++;
v+=8;
}
while (v<=r-4) {
i2++;
v+=4;
}
while (v<=r-1) {
i3++;
v++;
}
now i1 is the index of the letter you need to use from the "abc" line, i2 is the index of the letter from "xy" and i3 from "dmnr" :) thats all. the algorithm should end with i1=1, i2=0, i3=1 -> "b" + "x" + "m"
I hope this helps. It removes the recursion, but that's no problem, is it? ;)

Completing a list of possible binary sequences give a binary sequence with gaps

So, I am working on a program in Scilab which solves a binary puzzle. I have come across a problem however. Can anyone explain to me the logic behind solving a binary sequence with gaps (like [1 0 -1 0 -1 1 -1] where -1 means an empty cell. I want all possible solutions of a given sequence. So far I have:
function P = mogelijkeCombos(V)
for i=1:size(V,1)
if(V(i) == -1)
aantalleeg = aantalleeg +1
end
end
for i=1:2^aantalleeg
//creating combos here
end
endfunction
sorry that some words are in dutch
aantalleeg means amountempty by which I mean the amount of empty cells
I hope I gave you guys enough info. I don't need any code written, I'd just like ideas of how I can make every possible rendition as I am completely stuck atm.
BTW this is a school assignment, but the assignment is way bigger than this and it's just a tiny part I need some ideas on
ty in advance
Short answer
You could create the combos by extending your code and create all possible binary words of the length "amountempty" and replacing them bit-for-bit in the empty cells of V.
Step-by-step description
Find all the empty cell positions
Count the number of positions you've found (which equals the number of empty cells)
Create all possible binary numbers with the length of your count
For each binary number you generate, place the bits in the empty cells
print out / store the possible sequence with the filled in bits
Example
Find all the empty cell positions
You could for example check from left-to-right starting at 1 and if a cell is empty add the position to your position list.
V = [1 0 -1 0 -1 1 -1]
^ ^ ^
| | |
1 2 3 4 5 6 7
// result
positions = [3 5 7]
Count the number of positions you've found
//result
amountempty = 3;
Create all possible binary numbers with the length amountempty
You could create all possible numbers or words with the dec2bin function in SciLab. The number of possible words is easy to determine because you know how much separate values can be represented by a word of amountempty bits long.
// Create the binary word of amountEmpty bits long
binaryWord = dec2bin( i, amountEmpty );
The binaryWord generated will be a string, you will have to split it into separate bits and convert it to numbers.
For each binaryWord you generate
Now create a possible solution by starting with the original V and fill in every empty cell at the position from your position list with a bit from binaryWordPerBit
possibleSequence = V;
for j=1:amountEmpty
possibleSequence( positions(j) ) = binaryWordPerBit(j);
end
I wish you "veel succes met je opdracht"

How to get a value of a multi-dimensional array by an INCOMPLETE vector of indices

This question is very similar to
R - how to get a value of a multi-dimensional array by a vector of indices
I have:
dim_count <- 5
dims <- rep(3, dim_count)
pi <- array(1:3^5, dims)
I want to get an entire line, but with an automatic building of the address of this line.
For example, I would like to get:
pi[1,,2,2,3]
## [1] 199 202 205
You could insert a sequence covering the whole dimension in the appropriate slot:
do.call("[",list(pi,1,1:dim(pi)[2],2,2,3))
By the way, defining a variable called pi is a little dangerous (I know this was inherited from the previous question) -- suppose you tried a few lines later to compute the circumference of a circle as pi*diameter ...

Resources