I have a matrix of values with thousands of rows and a couple dozen columns. For a given row, $$R_0$$, I'd like to find all other complementary rows. A complementary row is defined as:
if given row has a non-zero value for a column, then the complement must have a zero value for that column
the sum of the elements of a given row and its complements must be less than 1.0
To illustrate, here is a toy matrix:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0 0 0 0.1816416 0 0.1796779
[2,] 0.1889351 0 0 0 0 0
[3,] 0 0 0.1539683 0 0 0.1983812
[4,] 0 0.155489 0.1869410 0 0 0
[5,] 0 0 0 0 0.1739382 0
For row 1, there are values for columns 4 and 6. A complementary row must have "0" for columns 4 and 6.
I don't know what data structure my desired output should be. But I know the output should tell me:
row 1 has the following complementary rows: 2, 3, 5
row 2 has the following complementary rows: 1, 3, 4, 5
row 3 has the following complementary rows: 2, 5
row 4 has the following complementary rows: 1, 2, 5
row 5 has the following complementary rows: 1, 2, 3, 4
Perhaps a list of lists? I.e.:
[1: 2, 3, 5;
2: 1, 3, 4, 5;
3: 2, 5;
4: 1, 2, 5;
5: 1, 2, 3, 4]
But I'm open to other data structures.
The following code generates the toy matrix above.
set.seed(1)
a = runif(n=30, min=0, max=0.2)
a[a<0.15] = 0
A = matrix(a, # the data elements
nrow=5, # number of rows
ncol=6, # number of columns
byrow = TRUE) # fill matrix by rows
Is there a package or clever way to approach this problem?
We can create a function to check if the combination of two rows is a compliment
check_compliment <- function(x, y) {
all(A[y, A[x,] != 0] == 0) & sum(c(A[x, ], A[y, ])) < 1
}
Here, we subset row y for columns where x is not 0 and check if all of them are 0. Also check if sum of x and y rows is less than 1.
Apply this function for every combination using outer
sapply(data.frame(outer(1:nrow(A), 1:nrow(A), Vectorize(check_compliment))), which)
#$X1
#[1] 2 4 5
#$X2
#[1] 1 3 4 5
#$X3
#[1] 2 5
#$X4
#[1] 1 2 5
#$X5
#[1] 1 2 3 4
outer step gives us TRUE/FALSE value for every combination of a row with every other row indicating if it is a compliment
outer(1:nrow(A), 1:nrow(A), Vectorize(check_compliment))
# [,1] [,2] [,3] [,4] [,5]
#[1,] FALSE TRUE FALSE TRUE TRUE
#[2,] TRUE FALSE TRUE TRUE TRUE
#[3,] FALSE TRUE FALSE FALSE TRUE
#[4,] TRUE TRUE FALSE FALSE TRUE
#[5,] TRUE TRUE TRUE TRUE FALSE
We convert this to data frame and use which to get indices for every column.
Related
I have two arrays.
Using numpy.append we can merge two arrays.
How can we do same thing in R?
merge can not do that.
Python Output/Example:
a=np.array([1,2,3,4,5,5])
b=np.array([0,0,0,0,0,0])
np.append(a,b)
array([1, 2, 3, 4, 5, 5, 0, 0, 0, 0, 0, 0]) # this is what I want
x<-c(mat , (0.0) * (l - length(demeaned)))
mat is matrix (size is 20)
l - length(demeaned) is 10
i want at the end 30 size
The c-function concatenates its arguments. A vector can be a concatenation of numbers or of other verctors:
a = c(1,2,3,4,5,5)
b = c(0,0,0,0,0,0)
c(a,b)
[1] 1 2 3 4 5 5 0 0 0 0 0 0
At least for one-dimensional arrays like in your python-example this is equivalent to np.append
Adding to the previous answer, you can use rbind or cbind to create two-dimensional arrays (matrices) from simple arrays (vectors):
cbind(a,b)
# output
a b
[1,] 1 0
[2,] 2 0
[3,] 3 0
[4,] 4 0
[5,] 5 0
[6,] 5 0
or
rbind(a,b)
# output
[,1] [,2] [,3] [,4] [,5] [,6]
a 1 2 3 4 5 5
b 0 0 0 0 0 0
If you want to convert it back to vector, use as.vector. This
as.vector(rbind(a,b))
will give you a joined vector with alternating elements.
Also, note that c can flatten lists if you use the recursive=TRUE argument:
a <- list(1,list(1,2,list(3,4)))
b <- 10
c(a,b, recursive = TRUE)
# output
[1] 1 1 2 3 4 10
Finally, you can use rep to generate sequences of repeating numbers:
rep(0,10)
I want to compare each value of a row of a data.frame to its corresponding value in a vector. Here is an example:
df1 <- matrix(c(2,2,4,8,6,9,9,6,4), ncol = 3)
df2 <- c(5,4,6)
> df1
[,1] [,2] [,3]
[1,] 2 8 9
[2,] 2 6 6
[3,] 4 9 4
> df2
[1] 5 4 6
The comparison would be, if a value in a row of df1 is smaller than its corresponding value in df2, so row1: 2 < 5, 8 < 5, 9 < 5; row2: 2 < 4, 6 < 4, 6 < 4; row3: 4 < 6, 9 < 6, 4 < 6
> result
[,1] [,2] [,3]
[1,] TRUE FALSE FALSE
[2,] TRUE FALSE FALSE
[3,] TRUE FALSE TRUE
Is there any way to do this without use of a loop?
Thanks lads!
We can just do a comparison to create the logical matrix
df1 < df2
# [,1] [,2] [,3]
#[1,] TRUE FALSE FALSE
#[2,] TRUE FALSE FALSE
#[3,] TRUE FALSE TRUE
The reason why it works is based on the recycling of the vector. So, each elements of the vector 'df2', compares with the first columns 'df1', then goes to the second column and so on.
If the length of the vector is not equal to the number of columns of first dataset, we can replicate the vector
df1 < df2[row(df1)]
# [,1] [,2] [,3]
#[1,] TRUE FALSE FALSE
#[2,] TRUE FALSE FALSE
#[3,] TRUE FALSE TRUE
Or another option is sweep
sweep(df1, 1, df2, "<")
I have data in the form of n*n matrix for which I want to do some computations (e.g. sum) on whose elements placed between diagonals (excluding diagonals).
For example for this matrix:
[,1] [,2] [,3] [,4] [,5]
[1,] 2 0 1 4 3
[2,] 5 3 6 0 4
[3,] 3 5 2 3 1
[4,] 2 1 5 3 2
[5,] 1 4 3 4 1
The result for sum (between diagonal elements) would be:
# left slice 5+3+2+5 = 15
# bottom slice 4+3+4+5 = 16
# right slice 4+1+2+3 = 10
# top slice 0+1+4+6 = 11
# dput(m)
m <- structure(c(2, 5, 3, 2, 1, 0, 3, 5, 1, 4, 1, 6, 2, 5, 3, 4, 0,
3, 3, 4, 3, 4, 1, 2, 1), .Dim = c(5L, 5L))
How to accomplish that efficiently?
Here's how you can get the "top slice":
sum(m[lower.tri(m)[nrow(m):1,] & upper.tri(m)])
#[1] 11
to visualize it:
lower.tri(m)[nrow(m):1,] & upper.tri(m)
# [,1] [,2] [,3] [,4] [,5]
#[1,] FALSE TRUE TRUE TRUE FALSE
#[2,] FALSE FALSE TRUE FALSE FALSE
#[3,] FALSE FALSE FALSE FALSE FALSE
#[4,] FALSE FALSE FALSE FALSE FALSE
#[5,] FALSE FALSE FALSE FALSE FALSE
Here's how you can compute all 4 of the slices:
up <- upper.tri(m)
lo <- lower.tri(m)
n <- nrow(m)
# top
sum(m[lo[n:1,] & up])
# left
sum(m[lo[n:1,] & lo])
# right
sum(m[up[n:1,] & up])
# bottom
sum(m[up[n:1,] & lo])
sum(sapply(1:dim(m)[[2L]], function(i) sum(m[c(-i,-(dim(m)[[1L]]-i+1)),i])))
This goes column by column and for each column takes out the the diagonal elements and sums the rest. These partial results are then summed up.
I believe this would be fast because we go column by column and matrices in R are stored column by column (i.e. it will be CPU cache friendly). We also do not have to produce large vector of indices, only vector of two indices (those taken out) for each column.
EDIT: I read the question again more carefully. The code can be updated to produce list four values for each element in sapply: for each of the regions. The idea stays the same, for large matrix, it will be fast if you go column by column, not jumping back and forth between columns.
I have a two-column matrix and I want to produce a new matrix/data.frame where Col N has 1 if is maximum, 0 otherwise (they are never equal). This is my attempt:
testM <- matrix(c(1,2,3, 1,1,5), ncol = 2, byrow = T)
>testM
V1 V2
1 1 2
2 3 1
3 1 5
apply(data.frame(testM), 1, function(row) ifelse(max(row[1],row[2]),1,0))
I expect to have:
0 1
1 0
0 1
because of the 0,1 parameters in max() function, but I just get
[1] 1 1 1
Any ideas?
Or using pmax
testM <- matrix(c(1,2,3, 1,1,5), ncol = 2, byrow = T)
--(testM==pmax(testM[,1],testM[,2]))
V1 V2
[1,] 0 1
[2,] 1 0
[3,] 0 1
You can perform arithmetic on Booleans in R! Just check if an element in each row is equal to it's max value and multiply by 1.
t(apply(testM, 1, function(row) 1*(row == max(row))))
You can use max.col and col to produce a logical matrix:
res <- col(testM) == max.col(testM)
res
[,1] [,2]
[1,] FALSE TRUE
[2,] TRUE FALSE
[3,] FALSE TRUE
If you want it as 0/1, you can do:
res <- as.integer(col(testM) == max.col(testM)) # this removes the dimension
dim(res) <- dim(testM) # puts the dimension back
res
[,1] [,2]
[1,] 0 1
[2,] 1 0
[3,] 0 1
I have a matrix with 3 columns. The 1. column has either the value 1 or 0 in the rows. I want to delete all the rows in the matrix, where the 1. column is equal to zero (or keep the rows containing ones).
Thanks.
So, say that you have this matrix:
A= matrix(c(1, 2, 3, 0, 3, 5, 1, 3, 8),3,3, byrow=T)
The following command will give you a vector of TRUE/FALSE for each row, depending on whether the 1st column is 1 or not:
A[,1]==1
You can then select only those rows like this:
FILTERED = A[A[,1]==1,]
And you'll then find what you ask for in FILTERED
Try this:
#dummy matrix
x <- matrix(rep(c(1,0,1),4),ncol=3)
x
# [,1] [,2] [,3]
# [1,] 1 0 1
# [2,] 0 1 1
# [3,] 1 1 0
# [4,] 1 0 1
#keep rows where 1st column equals to 1
x[x[,1] == 1,]
# [,1] [,2] [,3]
# [1,] 1 0 1
# [3,] 1 1 0
# [4,] 1 0 1