Binary data fusion - math

Can you please advise a way of binary data fusion?
Here is a a task:
There are n (n is odd) sources of binary labels (0 | 1). So, every data "frame" contains n labels. The task is to produce a single label per frame based on the fusion of all labels. For example:
S1 0 0 0 1 1 1 0 0 0 1 1 0
S2 0 0 1 1 1 1 1 0 0 1 1 1
S3 0 0 0 0 1 1 1 0 0 0 1 0
--------------------------
0 0 0 1 1 1 1 0 0 1 1 0
The "major voting" was used in this case: 0 0 0 -> 0; 1 1 0 -> 1 etc.
The major voting could be extended in horizontal direction, so that it's done over k frames for every i-th frame E.g. for k=3:
F1 round( (0+0+0+0+0+0+0+1+0) / 9) = 0
F2 round( (0+0+0+0+1+0+1+1+0) / 9) = 0
F3 round( (0+1+0+1+1+0+1+1+1) / 9) = 1 # was 0
F4 round( (1+1+0+1+1+1+1+1+1) / 9) = 1
..
Are there any other fusion schemes that come to your mind?
Thank you!

It looks to me you might be interested in
The tradeoff between reliability, consistency and availability. Here you can read about it with Amazon's Dynamo as an example.
Forward Error Correction

Related

mlogit gives error: the two indexes don't define unique observations

My dataframe named longData looks like:
ID Set Choice Apple Microsoft IBM Google Intel HewlettPackard Sony Dell Yahoo Nokia
1 1 1 0 1 0 0 0 0 0 0 0 0 0
2 1 2 0 0 1 0 0 0 0 0 0 0 0
3 1 3 0 0 0 1 0 0 0 0 0 0 0
4 1 4 1 0 0 0 1 0 0 0 0 0 0
5 1 5 0 0 0 0 0 0 0 0 0 0 1
6 1 6 0 -1 0 0 0 0 0 0 0 0 0
I am trying to run mlogit on it by:
logitModel = mlogit(Choice ~ Apple+Microsoft+IBM+Google+Intel+HewlettPackard+Sony+Dell+Yahoo+Nokia | 0, data = longData, shape = "long")
it gives the following error:
Error in dfidx::dfidx(data = data, dfa$idx, drop.index = dfa$drop.index, :
the two indexes don't define unique observations
after looking for some time I found that this error was given by dfidx as seen in here as:
z <- data[, c(posid1[1], posid2[1])]
if (nrow(z) != nrow(unique(z)))
stop("the two indexes don't define unique observations")
but upon calling the following code, it runs without the error and gives the names of two idx that are uniquely able to identify a row in dataframe:
dfidx(longData)$idx
this gives expected output as:
~~~ indexes ~~~~
ID Set
1 1 1
2 1 2
3 1 3
4 1 4
5 1 5
6 1 6
7 1 7
8 1 8
9 1 9
10 1 10
indexes: 1, 2
So what am I doing wrong, I saw some related questions 1, 2 but couldn't find what I am missing.
It looks like your example comes from here: https://docs.displayr.com/wiki/MaxDiff_Analysis_Case_Study_Using_R
The code seems outdated, I remember it worked for me, but not anymore.
The error message is valid because every pair (ID, Set) appears several times, once for each alternative.
However this works:
# there will be complaint that choice can't be coerced to logical otherwise
longData$Choice <- as.logical(longData$Choice)
# create alternative number (nAltsPerSet is 5 in this example)
longData$Alternative <- 1+( 0:(nrow(longData)-1) %% nAltsPerSet)
# define dataset
mdata <- mlogit.data(data=longData,shape="long", choice="Choice",alt.var="Alternative",id.var="ID")
# model
logitModel = mlogit(Choice ~ Microsoft+IBM+Google+Intel+HewlettPackard+Sony+Dell+Yahoo+Nokia | 0,
data = mdata
)
summary(logitModel)

Find all m-tuples that sum to n

I want to find ALL the non-negative integer solutions to the equation i+j+k+l+m=n where n is a non-negative integer. That is, I want to find all possible 5-tuples (i,j,k,l,m) with respect to a certain n, in R.
I wrote a code which is not working. I am suspicious there is something wrong in the looping.
For your convenience, I have taken n=3, so I am basically trying to compute all vectors (i,j,k,l,m) which are 35 in number, and the matrix a(35 by 5) is the matrix that is supposed to display those vectors. The whole thing is in the function "sample(n)", where if I put n=3 i.e. sample(3) when called will give me the matrix a. Please note that a (35 by 5) is defined beforehand with all entries 0.
sample=function(n){
i=0
j=0
k=0
l=0
m=0
for(p in 1:35){
while(i<=3){
while(j<=3){
while(k<=3){
while(l<=3){
m=n-(i+j+k+l)
if(m>-1){
a[p,]=c(i,j,k,l,m)
}
l=l+1}
k=k+1}
j=j+1}
i=i+1}
}
return(a)
}
When I call sample(3), I get my original a i.e. the matrix with all elements 0. What is wrong with this code? Please rectify it.
I don't think a brute-force approach will bring you much joy for this task. Instead you should look for existing functions that can be used and are efficient (i.e. implemented in C/C++).
n <- 3
library(partitions)
blockparts(rep(n, 5), n)
#[1,] 3 2 1 0 2 1 0 1 0 0 2 1 0 1 0 0 1 0 0 0 2 1 0 1 0 0 1 0 0 0 1 0 0 0 0
#[2,] 0 1 2 3 0 1 2 0 1 0 0 1 2 0 1 0 0 1 0 0 0 1 2 0 1 0 0 1 0 0 0 1 0 0 0
#[3,] 0 0 0 0 1 1 1 2 2 3 0 0 0 1 1 2 0 0 1 0 0 0 0 1 1 2 0 0 1 0 0 0 1 0 0
#[4,] 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 2 2 2 3 0 0 0 0 0 0 1 1 1 2 0 0 0 1 0
#[5,] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 3
I believe that your code isn't answering your stated problem (as I understand it), on top of possible errors in your code.
One way to think of the problem is that, given the quadruple (i,j,k,l), the value of m = n - (i + j + k + l), while noting that the quadruple (i,j,k,l) is constrained so that n >= i+j+k+l AND i,j,k,l >= 0. For example, consider the following algorithm:
Let i freely take any value between 0 and n.
Given i, j can take values between 0 and n-i.
Given (i,j), k takes values between 0 and n-i-j.
Given (i,j,k), l takes values between 0 and n-i-j-k.
Given (i,j,k,l), m is defined as m = n - i - j - k -l.
The following code ought to answer your question. Please comment if this is not what you were looking for.
sample.example = function(n){
a=array(0,c(0,5))
for(i in 0:n){
for(j in seq(from=0,to=n-i,by=1)){
for(k in seq(from=0,to=n-i-j,by=1)){
for(l in seq(from=0,to=n-i-j-k,by=1)){
m = n - i -j - k - l
a = rbind(a,c(i,j,k,l,m))
}}}}
return(a)
}

Random binary distribuition

I would like to generate a random binary combination (row order) in my dataframe df:
bin
2
2
2
2
3
2
3
2
In this example I intend to generate 6 times 0 (the same number of 2) and two times 1 (the same number of 3). I expect something like that:
bin
0
0
1
0
0
1
0
0
Any ideas? Thank you
So given a vector bin
bin<-c(2,2,2,2,3,2,3,2)
You would like to create a new vector that contains the same number of 0's as the number of 2's in bin, and the same number of 1's as the number of 3's in bin. Assuming that's correct, then
sample(rep(0:1, table(bin)))
Should do the trick. Here are the results of running that command several times:
# 0 0 0 0 1 1 0 0
# 0 0 0 1 0 0 1 0
# 0 0 0 1 0 0 1 0
# 0 0 1 0 1 0 0 0

r sum several colmns by another column

I have a 39 column (with upward of 100000 rows) data frame whose last ten columns looks like that (The rest of the columns do not concern my question)
H3K27me3_gross_bin H3K4me3_gross_bin H3K4me1_gross_bin UtoP UtoM UPU UPP UPM UMU UMP UMM
cg00000029 3 3 6 1 1 0 0 0 0 0 0
cg00000321 6 1 5 1 0 0 1 0 0 0 0
cg00000363 6 1 1 1 0 1 0 0 0 0 0
cg00000622 1 2 1 0 0 0 0 0 0 0 0
cg00000714 2 5 6 1 0 0 0 0 0 0 0
cg00000734 2 6 2 0 0 0 0 0 0 0 0
I want to create a matrix that will:
a) count the number of rows in which the value columns UPU, UPP or UPM is 1 by each of the first three columns (H3K27me3_gross_bin, H3K4me3_gross_bin, H3K4me1_gross_bin)
b) sum each row of the columns UPU, UPP, UPM by the first three columns
I came up with this incredibly cumbersome way of doing this:
UtoPFrac<-seq(6)
UtoPTotEvents<-seq(6)
for (j in 1:3){
y<-df[,28+j]
for (i in 1:3){
UtoPFrac<-cbind(UtoPFrac,tapply(df[which(is.na(y)==FALSE),33+i],y[which(is.na(y)==FALSE)], function(x) length(which(x==1))))
}
}
UtoPFrac<-UtoPFrac[,2:10]
UtoPEvents<-cbind(rowSums(UtoPFrac[,1:3]),rowSums(UtoPFrac[,4:6]),rowSums(UtoPFrac[,7:9]))
I am certian there is a more elegent way of doing this, probably by using aggregate() or ddply(), but was unable to get this working.
I will apprciate any help doing this more efficenly
Thanks in advance
Not tested:
library(plyr)
dpply(df,.(H3K27me3_gross_bin, H3K4me3_gross_bin, H3K4me1_gross_bin), summarize, UPUl=length(UPU[which(UPU==1)]),UPPl=length(UPP[which(UPP==1)]),UPMl=length(UPM[which(UPM==1)]), mysum=sum( UPU + UPP + UPM))
P.S. If you dput the data and provide the expected output, I will test the above code

reverse lexicographic order after using expand.grid

I'm trying to generate the following matrix, based on a multinomial framework. For example, if I had three columns, I'd get:
0 0 0
1 0 0
0 1 0
0 0 1
1 1 0
1 0 1
0 1 1
1 1 1
But, I want many more columns. I know I can use expand.grid, like:
u <- list(0:1)
expand.grid(rep(u,3))
But, it returns what I want in the wrong order:
0 0 0
1 0 0
0 1 0
1 1 0
0 0 1
1 0 1
0 1 1
1 1 1
Any ideas? Thanks.
You can reorder your rows to match your expected output:
u <- list(0:1)
g <- expand.grid(rep(u,3))
g <- g[order(rowSums(g)), ]

Resources