I have data in the form of n*n matrix for which I want to do some computations (e.g. sum) on whose elements placed between diagonals (excluding diagonals).
For example for this matrix:
[,1] [,2] [,3] [,4] [,5]
[1,] 2 0 1 4 3
[2,] 5 3 6 0 4
[3,] 3 5 2 3 1
[4,] 2 1 5 3 2
[5,] 1 4 3 4 1
The result for sum (between diagonal elements) would be:
# left slice 5+3+2+5 = 15
# bottom slice 4+3+4+5 = 16
# right slice 4+1+2+3 = 10
# top slice 0+1+4+6 = 11
# dput(m)
m <- structure(c(2, 5, 3, 2, 1, 0, 3, 5, 1, 4, 1, 6, 2, 5, 3, 4, 0,
3, 3, 4, 3, 4, 1, 2, 1), .Dim = c(5L, 5L))
How to accomplish that efficiently?
Here's how you can get the "top slice":
sum(m[lower.tri(m)[nrow(m):1,] & upper.tri(m)])
#[1] 11
to visualize it:
lower.tri(m)[nrow(m):1,] & upper.tri(m)
# [,1] [,2] [,3] [,4] [,5]
#[1,] FALSE TRUE TRUE TRUE FALSE
#[2,] FALSE FALSE TRUE FALSE FALSE
#[3,] FALSE FALSE FALSE FALSE FALSE
#[4,] FALSE FALSE FALSE FALSE FALSE
#[5,] FALSE FALSE FALSE FALSE FALSE
Here's how you can compute all 4 of the slices:
up <- upper.tri(m)
lo <- lower.tri(m)
n <- nrow(m)
# top
sum(m[lo[n:1,] & up])
# left
sum(m[lo[n:1,] & lo])
# right
sum(m[up[n:1,] & up])
# bottom
sum(m[up[n:1,] & lo])
sum(sapply(1:dim(m)[[2L]], function(i) sum(m[c(-i,-(dim(m)[[1L]]-i+1)),i])))
This goes column by column and for each column takes out the the diagonal elements and sums the rest. These partial results are then summed up.
I believe this would be fast because we go column by column and matrices in R are stored column by column (i.e. it will be CPU cache friendly). We also do not have to produce large vector of indices, only vector of two indices (those taken out) for each column.
EDIT: I read the question again more carefully. The code can be updated to produce list four values for each element in sapply: for each of the regions. The idea stays the same, for large matrix, it will be fast if you go column by column, not jumping back and forth between columns.
Related
I have a matrix of values with thousands of rows and a couple dozen columns. For a given row, $$R_0$$, I'd like to find all other complementary rows. A complementary row is defined as:
if given row has a non-zero value for a column, then the complement must have a zero value for that column
the sum of the elements of a given row and its complements must be less than 1.0
To illustrate, here is a toy matrix:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0 0 0 0.1816416 0 0.1796779
[2,] 0.1889351 0 0 0 0 0
[3,] 0 0 0.1539683 0 0 0.1983812
[4,] 0 0.155489 0.1869410 0 0 0
[5,] 0 0 0 0 0.1739382 0
For row 1, there are values for columns 4 and 6. A complementary row must have "0" for columns 4 and 6.
I don't know what data structure my desired output should be. But I know the output should tell me:
row 1 has the following complementary rows: 2, 3, 5
row 2 has the following complementary rows: 1, 3, 4, 5
row 3 has the following complementary rows: 2, 5
row 4 has the following complementary rows: 1, 2, 5
row 5 has the following complementary rows: 1, 2, 3, 4
Perhaps a list of lists? I.e.:
[1: 2, 3, 5;
2: 1, 3, 4, 5;
3: 2, 5;
4: 1, 2, 5;
5: 1, 2, 3, 4]
But I'm open to other data structures.
The following code generates the toy matrix above.
set.seed(1)
a = runif(n=30, min=0, max=0.2)
a[a<0.15] = 0
A = matrix(a, # the data elements
nrow=5, # number of rows
ncol=6, # number of columns
byrow = TRUE) # fill matrix by rows
Is there a package or clever way to approach this problem?
We can create a function to check if the combination of two rows is a compliment
check_compliment <- function(x, y) {
all(A[y, A[x,] != 0] == 0) & sum(c(A[x, ], A[y, ])) < 1
}
Here, we subset row y for columns where x is not 0 and check if all of them are 0. Also check if sum of x and y rows is less than 1.
Apply this function for every combination using outer
sapply(data.frame(outer(1:nrow(A), 1:nrow(A), Vectorize(check_compliment))), which)
#$X1
#[1] 2 4 5
#$X2
#[1] 1 3 4 5
#$X3
#[1] 2 5
#$X4
#[1] 1 2 5
#$X5
#[1] 1 2 3 4
outer step gives us TRUE/FALSE value for every combination of a row with every other row indicating if it is a compliment
outer(1:nrow(A), 1:nrow(A), Vectorize(check_compliment))
# [,1] [,2] [,3] [,4] [,5]
#[1,] FALSE TRUE FALSE TRUE TRUE
#[2,] TRUE FALSE TRUE TRUE TRUE
#[3,] FALSE TRUE FALSE FALSE TRUE
#[4,] TRUE TRUE FALSE FALSE TRUE
#[5,] TRUE TRUE TRUE TRUE FALSE
We convert this to data frame and use which to get indices for every column.
In R, say you have a matrix A:
A <- matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18), nrow=6, ncol=3)
and another matrix B:
B <- matrix(c(1, 2, 3, 4, 5, 6, 7, 9, 11, 13, 15, 17), nrow=6, ncol=2)
and you want to see, if in each line the values in A are one of the values of the corresponding line in B by checking each value seperatly.
E.g. You would like to see if the values in the first row of A (1, 7, 13) are equal to either 1 or 7 (first row of B).
How could you do that?
My problem is, that the two matrizes are not of equal size and I would like to get a matrix in the size of A which contains either a TRUE or a FALSE.
E.g.: The first line of this result matrix C would be (TRUE, TRUE, FALSE), since 1 is equal to 1 or 7, 7 is equal to 1 or 7, but 13 is not equal to 1 or 7.
I have tried solutions with %in%, but since I am pretty new to R I couldn´t find out, how to apply it to every row and not just the first one. A solution with "==" didn´t work either, because the matrizes didn´t have the same size and I want to compare every cell of A with both numbers of B and not cell by cell.
I really appreciate your help!
You could use sapply() over the row indices of A like so:
sapply(seq(length = nrow(A)), function(x) A[x, ] %in% B[x, ])
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] TRUE TRUE TRUE TRUE TRUE TRUE
#> [2,] TRUE FALSE FALSE FALSE FALSE FALSE
#> [3,] FALSE FALSE FALSE FALSE FALSE FALSE
This results in a matrix where each column corresponds to a row in A. To transpose it you can use t()
t(sapply(seq(length = nrow(A)), function(x) A[x, ] %in% B[x, ]) )
#> [,1] [,2] [,3]
#> [1,] TRUE TRUE FALSE
#> [2,] TRUE FALSE FALSE
#> [3,] TRUE FALSE FALSE
#> [4,] TRUE FALSE FALSE
#> [5,] TRUE FALSE FALSE
#> [6,] TRUE FALSE FALSE
I am trying to map matching columns between 2 matrices. For simplicity, I have 2 simple matrices, a and b:
a <- matrix(c(1, 2), nrow = 2, ncol = 2)
b <- matrix(c(1,2,1,2,3:8), nrow = 2, ncol = 5)
> a
[,1] [,2]
[1,] 1 1
[2,] 2 2
> b
[,1] [,2] [,3] [,4] [,5]
[1,] 1 1 3 5 7
[2,] 2 2 4 6 8
I want to create a vector of length length(a[, 1]) = 2, ie
> out
[1] 1 2
Where the first element of out is the column number in b that matches the first column of a, and the second element of out is the column number in b that matches the second column in a. I have tried
> match(data.frame(a), data.frame(b))
[1] 1 1
but I need each element of the resulting vector to be unique. Probably simple solution, but I am not seeing it. Thanks!
May be you are looking for something like intersect.
a <- matrix(c(10, 20), nrow = 2, ncol = 2)
b <- matrix(c(10,20,1,2,3:6,10,20), nrow = 2, ncol = 5)
#> b
# [,1] [,2] [,3] [,4] [,5]
#[1,] 10 1 3 5 10
#[2,] 20 2 4 6 20
#Finding matching columns in b from a. Only 1st column of a is considered
matched <- b[,1:ncol(b)] == a[,1:1]
#> matched
# [,1] [,2] [,3] [,4] [,5]
#[1,] TRUE FALSE FALSE FALSE TRUE
#[2,] TRUE FALSE FALSE FALSE TRUE
desired <- which(matched[1,], arr.ind = TRUE)
#> desired
#[1] 1 5
The matched column 1 and 5 are returned.
I guess I'm not allowed to comment on here. Anyhoo...the above answer by MKR looks good, but I would add this line before creating the "desired" object. This is to ensure every column element matches (instead of testing the first row only).
matched<-sapply(1:ncol(matched),function(x) all(matched[,x]))
I want to compare each value of a row of a data.frame to its corresponding value in a vector. Here is an example:
df1 <- matrix(c(2,2,4,8,6,9,9,6,4), ncol = 3)
df2 <- c(5,4,6)
> df1
[,1] [,2] [,3]
[1,] 2 8 9
[2,] 2 6 6
[3,] 4 9 4
> df2
[1] 5 4 6
The comparison would be, if a value in a row of df1 is smaller than its corresponding value in df2, so row1: 2 < 5, 8 < 5, 9 < 5; row2: 2 < 4, 6 < 4, 6 < 4; row3: 4 < 6, 9 < 6, 4 < 6
> result
[,1] [,2] [,3]
[1,] TRUE FALSE FALSE
[2,] TRUE FALSE FALSE
[3,] TRUE FALSE TRUE
Is there any way to do this without use of a loop?
Thanks lads!
We can just do a comparison to create the logical matrix
df1 < df2
# [,1] [,2] [,3]
#[1,] TRUE FALSE FALSE
#[2,] TRUE FALSE FALSE
#[3,] TRUE FALSE TRUE
The reason why it works is based on the recycling of the vector. So, each elements of the vector 'df2', compares with the first columns 'df1', then goes to the second column and so on.
If the length of the vector is not equal to the number of columns of first dataset, we can replicate the vector
df1 < df2[row(df1)]
# [,1] [,2] [,3]
#[1,] TRUE FALSE FALSE
#[2,] TRUE FALSE FALSE
#[3,] TRUE FALSE TRUE
Or another option is sweep
sweep(df1, 1, df2, "<")
I wish to select a set of rows from a data frame in R given multiple parameters. Normally this could be done using an OR statement, however the values are housed in an array. I am querying them as such (and with no luck):
Some data to get us rolling:
x = array(c(1,2,3),c(5,5))
y=c(1,2)
The command I'm presently using is (filtering by column 1):
x[x[,1] == y, ]
The above command yields this error:
Warning message:
In x[, i] == y :
longer object length is not a multiple of shorter object length
Which makes sense. I just don't know how to get around it.
What I am looking for is:
[,1] [,2] [,3] [,4] [,5]
[1,] 1 3 2 1 3
[2,] 2 1 3 2 1
[3,] 1 3 2 1 3
[4,] 2 1 3 2 1
Thanks in advance for the help!
You are looking for %in%.
> x[x[,1] %in% y, ]
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 3 2 1 3
# [2,] 2 1 3 2 1
# [3,] 1 3 2 1 3
# [4,] 2 1 3 2 1
As #Ricardo said under comment, to explain better as to why this is happening. When you equate x[,1] to y, you get:
x[,1] == y
[1] TRUE TRUE FALSE FALSE FALSE
Since y is 1,2 it just equates that to x[, 1] and since both of them match, returns TRUE. Since the length of output must equal length(x[, 1]), the rest is "recycled" (y = 1, 2, 1 against x = 3, 1, 2) which results in FALSE. But now, if you use x[., ] to fetch the rows, only the first two values are TRUE. So, only the first two will be picked. Using `%in% results in:
x[,1] %in% y
# [1] TRUE TRUE FALSE TRUE TRUE
Which is what you expect.
To add to #Arun's answer, if the two vectors being compared are of different sizes, R wil recycle the shorter one so that R is comparing two vectors of the same size, and then it makes a pair-wise comparison. (ie compares the first element of each vector, then the second element of eac vector, etc).
It does not, for example, compare the first element of vector one against all of the elements in vector two. (for that you need %in% as #Arun mentioned)
For example, take a look at the following.
The first two examples yield equivalent output
> c(0, 1, 2, 0, 1, 2) == c(1, 2)
[1] FALSE FALSE FALSE FALSE TRUE TRUE
> c(0, 1, 2, 0, 1, 2) == c(1, 2, 1, 2, 1, 2)
[1] FALSE FALSE FALSE FALSE TRUE TRUE
The comparisons being made are:
# element# LHS RHS areEqual
# 1. 0 1 FALSE <~~ Notice that the '0' from LHS is being compared with the '1' from RHS
# 2. 1 2 FALSE
# 3. 2 1 FALSE
# 4. 0 2 FALSE
# 5. 1 1 TRUE
# 6. 2 2 TRUE
Here is another example, with the LHS "shifted" relative to the previous example.
> c(1, 2, 0, 1, 2, 0) == c(1, 2)
[1] TRUE TRUE FALSE FALSE FALSE FALSE
Note what happens when the shorter vector is not an exact multiple of the longer element.
(ie, 2 does not go into 7).
The recylcing still occurs, but a portion of the shorter vector gets cropped from the last recycle.
R gives us a warning, just in case we did not expect them to be different sizes
> c(1, 2, 3, 4, 1, 2, 0) == c(1, 2)
[1] TRUE TRUE FALSE FALSE TRUE TRUE FALSE
Warning message:
In c(1, 2, 3, 4, 1, 2, 0) == c(1, 2) :
longer object length is not a multiple of shorter object length
Notice that it does not matter if the longer vector is on the RHS or LHS; the recycling works just the same
> c(1, 2) == c(1, 2, 0, 1, 2, 0)
[1] TRUE TRUE FALSE FALSE FALSE FALSE