Repetitively taking XOR of consecutive elements - math

Given a binary array of size N
e.g. A[1:N] = 1 0 0 1 0 1 1 1
A new array of size N-1 will be created by taking XOR of 2 consecutive elements.
A'[1:N-1] = 1 0 1 1 1 0 0
Repeat this operation until one element is left.
1 0 0 1 0 1 1 1
1 0 1 1 1 0 0
1 1 0 0 1 0
0 1 0 1 1
1 1 1 0
0 0 1
0 1
1
I want to find the last element left (0 or 1)
One can find the answer by repetitively performing the operation. This approach will take O(N*N) time. Is there a way to solve the problem more efficiently?

There's a very efficient solution to this problem, which needs just a few lines of code, but it's rather complicated to explain. I'll have a go, anyway.
Suppose you need to reduce a list of, say, 6 numbers that are all zero except for one element. By symmetry, there are just three cases to consider:
1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0
1 0 0 0 0 1 1 0 0 0 0 1 1 0 0
1 0 0 0 0 1 0 0 1 0 1 0
1 0 0 1 1 0 1 1 1
1 0 0 1 0 0
1 1 0
In the first case, a single '1' at the edge doesn't really do anything much. It basically just stays put. But in the other two cases, more elements of the list get involved and the situation is more complex. A '1' in the second element of the list produces a result of '1', but a '1' in the third element produces a result of '0'. Is there a simple rule that explains this behaviour?
Yes, there is. Take a look at this:
Row 0: 1
Row 1: 1 1
Row 2: 1 2 1
Row 3: 1 3 3 1
Row 4: 1 4 6 4 1
Row 5: 1 5 10 10 5 1
I'm sure you've seen this before. It's Pascal's triangle, where each row is obtained by adding adjacent elements taken from the row above. The larger numbers in the middle of the triangle reflect the fact that these numbers are obtained by adding together values drawn from a broader subset of the preceding rows.
Notice that in Row 5, the two numbers in the middle are both even, while the other numbers are all odd. This exactly matches the behaviour of the three examples shown above; the XOR product of an even number of '1's is zero, and the XOR product of an odd number of '1's is '1'.
To make things clearer, let's just consider the parity of the numbers in this triangle (i.e., '1' for odd numbers, '0' for even numbers):
Row 0: 1
Row 1: 1 1
Row 2: 1 0 1
Row 3: 1 1 1 1
Row 4: 1 0 0 0 1
Row 5: 1 1 0 0 1 1
This is actually called a Sierpinski triangle. Where a zero appears in this triangle, it tells us that it doesn't matter if your list has a '1' or a '0' in this position; it will have no effect on the resulting value because if you wrote out the expression showing the value of the final result in terms of all the initial values in your list, this element would appear an even number of times.
Take a look at Row 4, for example. Every element is zero except at the extreme edges. That means if your list has 5 elements, the end result depends only on the first and last elements in the list. (The same applies to any list where the number of elements is one more than a power of 2.)
The rows of the Sierpinski triangle are easy to calculate. As mentioned in oeis.org/A047999:
Lucas's Theorem is that T(n,k) = 1 if and only if the 1's in the binary expansion of k are a subset of the 1's in the binary expansion of n; or equivalently, k AND NOT n is zero, where AND and NOT are bitwise operators.
So, after that long-winded explanation, here's my code:
def xor_reduction(a):
n, r = len(a), 0
for k in range(n):
b = 0 if k & -n > 0 else 1
r ^= b & a.pop()
return r
assert xor_reduction([1, 0, 0, 1, 0, 1, 1, 1]) == 1
I said it was short. In case you're wondering, the 4th line has k & -n (k AND minus n) instead of k & ~n (k AND not n) because n in this function is the number of elements in the list, which is one more than the row number, and ~(n-1) is the same thing as -n (in Python, at least).

Related

Find the minimum number of non-interescting lines from a matrix in R

I have a set of lines some of which intersect each other. I can generate an intercept matrix.
1 2 3 4 5 6
1 0 1 0 1 0 0
2 1 0 1 1 0 0
3 0 1 0 0 1 0
4 1 1 0 0 0 1
5 0 0 1 0 0 0
6 0 0 0 1 0 0
Where 1 = intersects and 0 = does not itersect
For example line 1 intersects with lines 2 and 4.
I would like to produce the minimum number of sets of lines where no lines intersect within the set.
For this example, the best I could come up with is three sets containing:
lines 2, 5, 6
lines 1,3
line 4
I'm programming this in R but I really need a mathematical/conceptional answer to the problem.
If you consider the lines as nodes in a graph and the intersection relation as edges (i.e. your matrix is the adjacency matrix), then you want to assign every vertex to a group, such that two neighboring vertices are not in the same group.
This is equivalent to the vertex coloring problem. A number of algorithms for this problem can be found on the Wikipedia page. The problem of finding the optimal coloring is NP-hard. If you are good with an approximation, you can use the greedy approach with a time complexity of O(V D), where V is the number of vertices and D is the maximum vertex degree.

What is the rule of multiple borrowing in binary subtraction?

11000
- 111
= 10001
Below is the procedure, it seems that when doing a multiple borrowing, the value of a borrowed position will never change?
So, for instance, in this example, when doing this subtraction, the last '0' will need to borrow from a 1, finally it find a '1' as the second '1', and this second '1' just like a big fan of propagation-animal and feed all the 0 behind with 10?
Is this the rule?
A '1' can fill all the following '0' with '10'?
the '1' does not fill all the following 0's with '10'
1100 becomes 1 0 '10' 0 0
that can then become 1 0 1 10 0 [as 10 - 1 = 1 in binary]
this then becomes 1 0 1 1 10
now 1 0 1 1 10
. - 0 0 1 1 1
will be 1 0 0 0 1
it acts similar to regular subtraction

Updating 0 vector values based on preceding and successive values

I have a data frame which has a cumulative count for each event (an event in this case being represented by a sequence of 1's in the bin column) with separating values given the value 0 and each event given an ID as such:
bin cumul ID
0 0 0
1 1 3
1 1 3
1 1 3
1 1 3
0 0 0
0 0 0
0 0 0
0 0 0
1 2 2
1 2 2
1 2 2
1 2 2
1 2 2
0 0 0
0 0 0
0 0 0
0 0 0
1 3 1
1 3 1
1 3 1
I want to update the ID column so each non-event (0 in the bin column) is assigned an ID value based on the previous and subsequent ID.
Therefore, if a non-event is preceded and succeeded by events of equal ID values (e.g. both 3) the non-event also carries this ID value (3). However if the non-event is preceded by an event with one value but succeeded with an event with a different value then the first half of the non-event is given an ID value equal to the preceding event and the final half of the non-event is given an ID value equal to the ID value of the succeeding event. Giving the final data frame:
bin cumul ID
0 0 3
1 1 3
1 1 3
1 1 3
1 1 3
0 0 3
0 0 3
0 0 2
0 0 2
1 2 2
1 2 2
1 2 2
1 2 2
1 2 2
0 0 2
0 0 2
0 0 1
0 0 1
1 3 1
1 3 1
1 3 1
If the question were how to fill in the zeros with ID that matched the preceding values, or matched successive values, then you could use na.locf from the zoo-package and it would be a one liner. For this task I think you might reach for the rle function:
rle(dat$ID)
#Run Length Encoding
# lengths: int [1:6] 1 4 4 5 4 3
# values : int [1:6] 0 3 0 2 0 1
Then thinking about how to use such result, my thinking was to use an algorithm like:
for each '0' in values; assign the first [`length`/2 + .9] values as $values[ idx-1 ]
assign the next ]`length`/2] values as $values[ idx+1 ]
( using `rep` will truncate/floor the fractional indices and adding a number
slightly less than 1.0 will take care of the edge cases where there are an
odd number of zeros in a row.)
( `sum` on the lengths can recover the correct positions.)
and for the beginning and ending 0-cases;
replace with successive and preceding values respectively
After considerable debugging effort (and commenting out the debugging cat-calls):
rldat <- rle(dat$ID)
for ( nth in seq_along( rldat$lengths) ){ #cat("nth=", nth, "\n")
if(rldat$values[nth] == 0){
if (nth == 1) { # cat("first value=",rldat$values[nth+1], "\n")
dat$ID[ 1:rldat$lengths[nth] ] <-rldat$values[nth+1];
} else {
if (nth== length(rldat$lengths) ){
dat$ID[ (length(dat$ID)-rldat$lengths[nth]+1):length(dat$ID) ] <-
rldat$values[nth-1]
} else {
# cat( "seq=", (sum(rldat$lengths[1:(nth- 1)])+1): sum(rldat$lengths[1:nth]) ,"\n")
dat$ID[ (sum(rldat$lengths[1:(nth-1)])+1):sum(rldat$lengths[1:nth]) ] <-
c( rep( rldat$values[nth-1],rldat$lengths[nth]/2+.9) ,
rep( rldat$values[nth+1],rldat$lengths[nth]/2) )}}
} }

Determining if each vector element exceeds all previous elements

I need to compare element i with all previous elements i-1,i-2,..., and if i > i-1, i-2, ... return 1, otherwise return 0.
data <- c(10.3,14.3,7.7,15.8,14.4,16.7,15.3,20.2,17.1,7.7,15.3,16.3,19.9,14.4,18.7,20.7)
The result of comparing should be the following.
0 1 0 1 0 1 0 1 0 0 0 0 0 0 0 1
Here's one standard way:
as.integer(cummax(data) == data)
The value of the first element is 1 here instead of the OP's preferred 0, but that is easy to tweak.

In R: Sample from a "totals" column, then subtract 1 from sampled column, store value, and resample

I am definitely not an R coder but am trying to stumble my way through this code. I have a dataframe that looks like this--with 200 rows (just 8 shown here).
Ind.ID V1 V2 V3 V4 V5 V6 V7 Captures
1 1 0 0 1 1 0 0 0 2
2 2 0 0 1 0 0 0 1 2
3 3 1 1 0 1 1 0 1 5
4 4 0 0 1 1 0 0 0 2
5 5 1 0 0 0 0 1 0 2
6 6 0 1 1 0 0 0 0 2
7 7 0 0 1 1 1 0 0 3
8 8 1 0 0 0 1 0 0 2
I am trying to sample from the Captures column (which is the sum of the row) and output the Ind.ID value. If there is a 0 in the Captures column, I want it to subtract 1 from i (i=i-1) and resample--to ensure that I get the correct number of samples. I also want to then subtract 1 from the sampled column (i.e., decrease the Captures value by 1 if it was sampled), and then resample. I am trying to get 400 samples (I think the current code will get me only 200, but I can't figure out how to get 400).
i want my output to be
23
45
197
64
.....
Here's my code:
sess1<-(numeric(200)) #create a place for output
for(i in 1:length(dep.pop$Captures)){
if(dep.pop[i,'Captures']!=0){ #if the value of Captures is not 0, sample and
sample(dep.pop$Captures, size=1, replace=TRUE) #want to resample the row if Captures >1
#code here to decrease the value of the sampled Captures column by 1. create new vector for resampling?
}
else {
if(dep.pop[i,'Captures']==0){ #if the value of Captures = 0
i<-i-1 #decrease the value of i by 1 to ensure 200 samples
sample(dep.pop$Captures, size=1, replace=TRUE) #and resample
}
#sess1<- #store the value from a different column (ID column) that represents the sampled row
}}
Thanks!
Assuming sum(dep.pop$Captures) is at least 400 then the following code may meet your needs to sample up to the number of captures for each individual id:
sample(rep(dep.pop$Ind.ID, times=dep.pop$Captures), size=400)
If you wish to sample with replacement (so you do not need to worry about the total number of captures) but still want to use the number of captures per individual id as sampling weights, then perhaps
sample(dep.pop$Ind.ID, size=400, replace=TRUE, prob=dep.pop$Captures)

Resources