Note: I edited the original question to explain more precisely.
While I was doing a simulation for my new method, I needed to generate a special type of dataset consists of multiple subset. The problem is that there is some "shared" variables across the subsets, and the number of shared variable is called "overlap" here. Since the distribution of overlap proportion is given, I need to generate an appropriate list of variables and their overlap follows the given distribution. But I have failed to implement such algorithm...
I am not sure whether there is a specific algorithm for this kind of question,
but I have failed to find such thing after a long search.
I prefer R solution, but anything others also will be very appreciated. Please help me to solve this problem! Thank you so much in advance!
The below is a standardized explanation for my problem. I tried to explain as general as possible I can, but please give me any suggestion if it is not sufficient.
Purpose: Generate n sets from given overlap matrix of elements. Each set contains k elements.
Input: There is a n*n matrix whose (i,j)th cell value represents a number of overlapped elements from (i)th set to (j)th set.
Output: A list of k element identifiers (whatever can be used such as number) for n sets.
Assumption: The number of elements for each set is k, and it is same across all n sets. Hence, the input matrix is symmetric.
Example (assumes k=3 and n=3)
Input
3 1 0
1 3 1
0 1 3
Output
Set 1: A B C
Set 2: A D E
Set 3: D F G
In the above example input, (1,2)th and (2,1)th cells are 1 because set 1 and 2 share "A" element and vice versa, and diagonal cells are 3(=k) because each set shares all elements with itself.
I would repeat the following process until I had accounted for all the matrix entries:
1) Treat the matrix as the adjacency matrix of a graph, and find the largest clique in it. That is, find the largest possible set S of indexes such that for all i, j in set S M(i,j) > 0
2) Create an item that is in all of the sets which correspond to the indexes in S - in fact, if the minimum value of M(i,j) = v, create v such items.
3) subtract v from M(i,j) for all i, j in set S, accounting for the counts generated by the items you have just created.
minibatch = torch.Tensor(5, 2, 3,5)
m = nn.View(-1):setNumInputDims(1)
m:forward(minibatch)
gives a tensor of size
30x5
m = nn.View(-1):setNumInputDims(3)
m:forward(minibatch)
gives a tensor of size
5 x 30
m = nn.View(-1):setNumInputDims(2)
m:forward(minibatch)
gives a tensor of size
10 x 15
What is going on? I don't understand why I'm getting the dimensions I am.
The reason I don' think I understand it is that I'm thinking that the View m is expecting n dims as the input. So if n = 1, then we take 5 as the 1st dim and 30 as the 2nd dim, which is what seems to be happening when the numInputDims is set to 2.
As its name indicates, View(-1):setNumInputDims(n) is to set the number of input dimensions of View(-1).
To understand the role of View(-1), please refer to How view() method works for tensor in torch
If there is any situation that you don't know how many rows you want but are sure of the number of columns then you can mention it as -1(You can extend this to tensors with more dimensions. Only one of the axis value can be -1). This is a way of telling the library; give me a tensor that has these many columns and you compute the appropriate number of rows that is necessary to make this happen.
So View(-1) converts the input to a two-dimensional matrix. Note View(-1) corresponds to the columns of this matrix. Hence its input dimension is the latter half of the complete input. Its number of dimensions means how many dimensions are "allocated" for the columns, and any dimensions before these dimensions are used for the rows.
Therefore in your example:
minibatch = torch.Tensor(5, 2, 3,5)
m = nn.View(-1):setNumInputDims(2)
It allocates the last two dimensions (3*5) to the columns and the first two dimensions (5*2) to the rows. The result tensor is then 10*15.
So in my text book there is this example of a recursive function using f#
let rec gcd = function
| (0,n) -> n
| (m,n) -> gcd(n % m,m);;
with this function my text book gives the example by executing:
gcd(36,116);;
and since the m = 36 and not 0 then it ofcourse goes for the second clause like this:
gcd(116 % 36,36)
gcd(8,36)
gcd(36 % 8,8)
gcd(4,8)
gcd(8 % 4,4)
gcd(0,4)
and now hits the first clause stating this entire thing is = 4.
What i don't get is this (%)percentage sign/operator or whatever it is called in this connection. for an instance i don't get how
116 % 36 = 8
I have turned this so many times in my head now and I can't figure how this can turn into 8?
I know this is probably a silly question for those of you who knows this but I would very much appreciate your help the same.
% is a questionable version of modulo, which is the remainder of an integer division.
In the positive, you can think of % as the remainder of the division. See for example Wikipedia on Euclidean Divison. Consider 9 % 4: 4 fits into 9 twice. But two times four is only eight. Thus, there is a remainder of one.
If there are negative operands, % effectively ignores the signs to calculate the remainder and then uses the sign of the dividend as the sign of the result. This corresponds to the remainder of an integer division that rounds to zero, i.e. -2 / 3 = 0.
This is a mathematically unusual definition of division and remainder that has some bad properties. Normally, when calculating modulo n, adding or subtracting n on the input has no effect. Not so for this operator: 2 % 3 is not equal to (2 - 3) % 3.
I usually have the following defined to get useful remainders when there are negative operands:
/// Euclidean remainder, the proper modulo operation
let inline (%!) a b = (a % b + b) % b
So far, this operator was valid for all cases I have encountered where a modulo was needed, while the raw % repeatedly wasn't. For example:
When filling rows and columns from a single index, you could calculate rowNumber = index / nCols and colNumber = index % nCols. But if index and colNumber can be negative, this mapping becomes invalid, while Euclidean division and remainder remain valid.
If you want to normalize an angle to (0, 2pi), angle %! (2. * System.Math.PI) does the job, while the "normal" % might give you a headache.
Because
116 / 36 = 3
116 - (3*36) = 8
Basically, the % operator, known as the modulo operator will divide a number by other and give the rest if it can't divide any longer. Usually, the first time you would use it to understand it would be if you want to see if a number is even or odd by doing something like this in f#
let firstUsageModulo = 55 %2 =0 // false because leaves 1 not 0
When it leaves 8 the first time means that it divided you 116 with 36 and the closest integer was 8 to give.
Just to help you in future with similar problems: in IDEs such as Xamarin Studio and Visual Studio, if you hover the mouse cursor over an operator such as % you should get a tooltip, thus:
Module operator tool tip
Even if you don't understand the tool tip directly, it'll give you something to google.
I need to have a vector with a fixed number of elements e.g. 20. Usually I would take:
x0 <- 1
seq(x0, 560, 1)
Right now I want to have each number double to triple:
x<-(1,1,2,2,3,3,...)
or
x<-(1,1,1,2,2,2,3,3,3,...)
Looking like this but automatically generation, because I need too many elements.
Here is the solution as per #Arun:
x<-rep(c(1,2,3),3)
Say we have a set of 3D (integer) coordinates from (0,0,0) to (100,100,100)
We want to visit each possible coordinate (100^3 possible coordinates to visit) without visiting each coordinate more than once.
The sum of the differences between each coordinate in adjacent steps cannot be more than 2 (I don't know if this is possible. If not, then minimized)
for example, the step from (0,2,1) to (2,0,0) has a total difference of 5 because |x1-x2|+|y1-y2|+|z1-z2| = 5
How do we generate such a sequence of coordinates?
for example, to start:
(0,0,0)
(0,0,1)
(0,1,0)
(1,0,0)
(1,0,1)
(0,0,2)
(0,1,1)
(0,2,0)
(1,1,0)
(2,0,0)
(3,0,0)
(2,0,1)
(1,0,2)
(0,0,3)
etc...
Anyone know an algorithm that will generate such a sequence to an arbitrary coordinate (x,y,z) where x=y=z or can prove that it is impossible for such and algorithm to exist? Thanks
Extra credit: Show how to generate such a sequence with x!=y!=z :D
One of the tricks (there are other approaches) is to do it one line [segment] at a time, one plane [square] at a time. Addressing the last part of the question, this approach works, even if the size of the volume visited is not the same in each dimension (ex: a 100 x 6 x 33 block).
In other words:
Start at (0,0,0),
move only on the Z axis till the end of the segment, i.e.
(0,0,1), (0,0,2), (0,0,3), ... (0,0,100),
Then move to the next line, i.e.
(0,1,100)
and come backward on the line, i.e.
(0,1,99), (0,1,98), (0,1,97), ... (0,1,0),
Next to the next line, going "forward"
And repeat till the whole "panel is painted", i.e ending at
... (0,100,99), (0,100,100),
Then move, finally, by 1, on the X axis, i.e.
(1,100,100)
and repeat on the other panel,but on this panel going "upward"
etc.
Essentially, each move is done on a single dimension, by exactly one. It is a bit as if you were "walking" from room to room in a 101 x 101 x 101 building where each room can lead to any room directly next to it on a given axis (i.e. not going joining diagonally).
Implementing this kind of of logic in a programming language is trivial! The only mildly challenging part is to deal with the "back-and-forth", i.e. the fact that sometimes, some of the changes in a given dimension are positive, and sometimes negative).
Edit: (Sid's question about doing the same diagonally):
Yes! that would be quite possible, since the problem states that we can have a [Manhattan] distance of two, which is what is required to go diagonally.
The path would be similar to the one listed above, i.e. doing lines, back-and-forth (only here lines of variable length), then moving to the next "panel", in the third dimension, and repeating, only going "upward" etc.
(0,0,0) (0,0,1)
(0,1,0) first diagonal, only 1 in lengh.
(0,2,0) "turn around"
(0,1,1) (0,0,2) second diagonal: 2 in length
(0,0,3) "turn around"
(0,1,2) (0,2,1) (0,3,0) third diagonal: 3 in length
(0,4,0) turn around
etc.
It is indeed possible to mix-and-match these approaches, both at the level of complete "panel", for example doing one diagonally and the next one horizontally, as well as within a given panel, for example starting diagonally, but when on the top line, proceeding with the horizontal pattern, simply stopping a bit earlier when looping on the "left" side, since part of that side has been handled with the diagonals.
Effectively this allows a rather big (but obviously finite) number of ways to "paint" the whole space. The key thing is to avoid leaving (too many) non painted adjacent area behind, for getting back to them may either bring us to a dead-end or may require a "jump" of more than 2.
Maybe you can generalize Gray Codes, which seem to solve a special case of the problem.
Seems trivial at first but once started, it is tricky! Especially the steps can be 1 or 2.
This is not an answer but more of a demostration of the first 10+ steps for a particular sequence which hopefully can help others to visualise. Sid, please let me know if the following is wrong:
s = No. of steps from the prev corrdinates
c1 = Condition 1 (x = y = z)
c2 = Condition 2 (x!= y!= z)
(x,y,z) s c1 c2
---------------
(0,0,0) * (start)
(0,0,1) 1
(0,1,0) 2
(1,0,0) 2
(1,0,1) 1
(1,1,0) 2
(1,1,1) 1 *
(2,1,1) 1
(2,0,1) 1 *
(2,0,0) 1
(2,1,0) 1 *
(2,2,0) 1
(2,2,1) 1
(2,2,2) 1 *
(2,3,2) 1
(2,3,3) 1
(3,3,3) 1 *
(3,3,1) 2
(3,2,1) 1 *
(3,2,0) 1 *
.
.
.