R Get complement of small matrix in larger matrix - r

I have a n×2 matrix A and and m×2 matrix B with m<n. I want to find the complement of B in A, i.e. all rows from A that are not in B. How would I do that in base r?
setdiff does not work as it does not respect the matrix structure. rbind+duplicate does also not work since there may be rows in B that are not in A at all.

We can paste the values row-wise and check if they are present in B using %in% :
A[!paste(A[, 1], A[, 2], sep = '-') %in% paste(B[, 1], B[, 2], sep = '-'),]
Using reproducible data :
A <- matrix(1:16, ncol = 2)
B <- matrix(c(2, 10, 1, 2, 5, 13, 6, 14, 8, 16), ncol = 2, byrow = TRUE)
A
# [,1] [,2]
#[1,] 1 9
#[2,] 2 10
#[3,] 3 11
#[4,] 4 12
#[5,] 5 13
#[6,] 6 14
#[7,] 7 15
#[8,] 8 16
B
# [,1] [,2]
#[1,] 2 10
#[2,] 1 2
#[3,] 5 13
#[4,] 6 14
#[5,] 8 16
A[!paste(A[, 1], A[, 2], sep = '-') %in% paste(B[, 1], B[, 2], sep = '-'),]
# [,1] [,2]
#[1,] 1 9
#[2,] 3 11
#[3,] 4 12
#[4,] 7 15

Related

Bind dimensions of an array

Let's start with an exemplary multi-dimensional array like
a <- array(1:24, dim = c(3, 2, 2, 2)); a
, , 1, 1
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
, , 2, 1
[,1] [,2]
[1,] 7 10
[2,] 8 11
[3,] 9 12
, , 1, 2
[,1] [,2]
[1,] 13 16
[2,] 14 17
[3,] 15 18
, , 2, 2
[,1] [,2]
[1,] 19 22
[2,] 20 23
[3,] 21 24
Now I want to cbind or rbind the first two dimensions, which are matrices over the remaining dimensions 3 and 4, to an entire data.frame.
The resulting data.frame should like this using rbind:
[,1] [,2]
[1, ] 1 4
[2, ] 2 5
[3, ] 3 6
[4, ] 7 10
[5, ] 8 11
[6, ] 9 12
...
What would be an efficient way to bind the first two dimensions of a multi-dimensional array to an entire structure like data.frame? Please consider that the array can have any number of dimensions greater than 2, and not only 4 like in the above given example.
Thanks in advance
You can use apply:
apply(a, 2, identity)
# [,1] [,2]
# [1,] 1 4
# [2,] 2 5
# [3,] 3 6
# [4,] 7 10
# [5,] 8 11
# [6,] 9 12
# [7,] 13 16
# [8,] 14 17
# [9,] 15 18
#[10,] 19 22
#[11,] 20 23
#[12,] 21 24
Permuting and modifying dimensions is quite efficient:
a <- array(1:24, dim = c(3, 2, 2, 2))
a <- aperm(a, c(2, 1, 3, 4))
dim(a) <- c(dim(a)[1], prod(dim(a)[-1]))
t(a)
# [,1] [,2]
# [1,] 1 4
# [2,] 2 5
# [3,] 3 6
# [4,] 7 10
# [5,] 8 11
# [6,] 9 12
# [7,] 13 16
# [8,] 14 17
# [9,] 15 18
#[10,] 19 22
#[11,] 20 23
#[12,] 21 24

Class of output object differs as input data differs

I am trying to draw a variable number of samples for each of n attempts. In this example n = 8 because length(n.obs) == 8. Once all of the samples have been drawn I want to combine them into a matrix.
Here is my first attempt:
set.seed(1234)
n.obs <- c(2,1,2,2,2,2,2,2)
my.samples <- sapply(1:8, function(x) sample(1:4, size=n.obs[x], prob=c(0.1,0.2,0.3,0.4), replace=TRUE))
my.samples
This approach produces a list.
class(my.samples)
#[1] "list"
I identify the number of columns needed in the output matrix using:
max.len <- max(sapply(my.samples, length))
max.len
#[1] 2
The output matrix can be created using:
corrected.list <- lapply(my.samples, function(x) {c(x, rep(NA, max.len - length(x)))})
output.matrix <- do.call(rbind, corrected.list)
output.matrix[is.na(output.matrix)] <- 0
output.matrix
# [,1] [,2]
#[1,] 4 3
#[2,] 3 0
#[3,] 3 2
#[4,] 3 4
#[5,] 4 3
#[6,] 3 3
#[7,] 3 4
#[8,] 1 4
The above approach seems to work fine as along as n.obs includes multiple values and at least one element in n.obs > 1. However, I want the code to be flexible enough to handle each of the following n.obs:
The above sapply statement returns a 2 x 8 matrix with the following n.obs.
set.seed(1234)
n.obs <- c(2,2,2,2,2,2,2,2)
The above sapply statement returns an integer with the following n.obs.
set.seed(3333)
n.obs <- c(1,1,1,1,1,1,1,1)
The above sapply statement returns a list with the following n.obs.
n.obs <- c(0,0,0,0,0,0,0,0)
Here are example desired results for each of the above three n.obs:
desired.output <- matrix(c(4, 3,
3, 3,
2, 3,
4, 4,
3, 3,
3, 3,
4, 1,
4, 2), ncol = 2, byrow = TRUE)
desired.output <- matrix(c(2,
3,
4,
2,
3,
4,
4,
1), ncol = 1, byrow = TRUE)
desired.output <- matrix(c(0,
0,
0,
0,
0,
0,
0,
0), ncol = 1, byrow = TRUE)
How can I generalize the code so that it always returns a matrix with eight rows regardless of the n.obs used as input? One way would be to use a series of if statements to handle problematic cases, but I thought there might be a simpler and more efficient solution.
We can write a function :
get_matrix <- function(n.obs) {
nr <- length(n.obs)
my.samples <- sapply(n.obs, function(x)
sample(1:4, size=x, prob=c(0.1,0.2,0.3,0.4), replace=TRUE))
max.len <- max(lengths(my.samples))
mat <- matrix(c(sapply(my.samples, `[`, 1:max.len)), nrow = nr, byrow = TRUE)
mat[is.na(mat)] <- 0
mat
}
Checking output :
get_matrix(c(2,1,2,2,2,2,2,2))
# [,1] [,2]
#[1,] 1 4
#[2,] 4 0
#[3,] 4 3
#[4,] 4 4
#[5,] 4 2
#[6,] 4 3
#[7,] 4 4
#[8,] 4 4
get_matrix(c(1,1,1,1,1,1,1,1))
# [,1]
#[1,] 4
#[2,] 4
#[3,] 3
#[4,] 4
#[5,] 2
#[6,] 4
#[7,] 1
#[8,] 4
get_matrix(c(0,0,0,0,0,0,0,0))
# [,1]
#[1,] 0
#[2,] 0
#[3,] 0
#[4,] 0
#[5,] 0
#[6,] 0
#[7,] 0
#[8,] 0
You could Vectorize the sample function on the size= argument.
samplev <- Vectorize(sample, "size", SIMPLIFY=F)
Wrap samplev into a function and assign maximal length using length<- in an lapply.
FUN <- function(n.obs, prob.=c(.1,.2,.3,.4)) {
s <- do.call(rbind, lapply(
samplev(1:4, size=n.obs, prob=prob., replace=TRUE),
`length<-`, max(n.obs)))
if (!all(dim(s))) matrix(0, length(n.obs))
else ({s[is.na(s)] <- 0; s})
}
Results:
set.seed(1234)
FUN(c(2,1,2,2,2,2,2,2))
# [,1] [,2]
# [1,] 4 3
# [2,] 3 0
# [3,] 3 2
# [4,] 3 4
# [5,] 4 3
# [6,] 3 3
# [7,] 3 4
# [8,] 1 4
FUN(c(2,2,2,2,2,2,2,2))
# [,1] [,2]
# [1,] 2 4
# [2,] 4 4
# [3,] 4 4
# [4,] 4 4
# [5,] 4 4
# [6,] 2 3
# [7,] 1 2
# [8,] 4 3
FUN(c(1,1,1,1,1,1,1,1))
# [,1]
# [1,] 4
# [2,] 4
# [3,] 3
# [4,] 4
# [5,] 2
# [6,] 4
# [7,] 4
# [8,] 1
FUN(c(0,0,0,0,0,0,0,0))
# [,1]
# [1,] 0
# [2,] 0
# [3,] 0
# [4,] 0
# [5,] 0
# [6,] 0
# [7,] 0
# [8,] 0
FUN(c(3, 4))
# [,1] [,2] [,3] [,4]
# [1,] 2 3 3 0
# [2,] 4 3 4 3

Reshape matrix by rows

I have a matrix with size 18000 x 54. I would like to reshape it as a matrix with size 54000 x 18, in which each row of my initial matrix becomes a matrix which has 3 rows.
Let's take an example. I have a matrix as follow:
a = matrix(1:18, nrow = 2, ncol = 9, byrow = T)
a
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18
I would like to reshape this matrix so that it becomes:
[,1] [,2] [,3]
1 4 7
2 5 8
3 6 9
10 13 16
11 14 17
12 15 18
I tried two following ways, but they do not work. The first is as follows:
dim(a) = c(6,3)
The second one is to create a function and then apply to each row:
reshapeX = function(x){
dim(x) = c(3,as.integer(length(x)/3))
return(as.matrix(x))
}
rbind(apply(a, 1, reshapeX))
But it does not work neither. Can someone help please?
You can do:
do.call(rbind, lapply(1:nrow(a), function(i) matrix(a[i, ], nrow=3)))
with your data:
a <- matrix(1:18, nrow = 2, ncol = 9, byrow = TRUE)
do.call(rbind, lapply(1:nrow(a), function(i) matrix(a[i, ], nrow=3)))
# [,1] [,2] [,3]
# [1,] 1 4 7
# [2,] 2 5 8
# [3,] 3 6 9
# [4,] 10 13 16
# [5,] 11 14 17
# [6,] 12 15 18
Here is a loop free method,
m1 <- matrix(c(a), ncol = 3, nrow = 6)
rbind(m1[c(TRUE, FALSE),], m1[c(FALSE, TRUE),])
# [,1] [,2] [,3]
#[1,] 1 4 7
#[2,] 2 5 8
#[3,] 3 6 9
#[4,] 10 13 16
#[5,] 11 14 17
#[6,] 12 15 18
An option would be
out <- sapply(split.default(as.data.frame(a), as.integer(gl(ncol(a), 3,
ncol(a)))), function(x) c(t(x)))
colnames(out) <- NULL
out
# [,1] [,2] [,3]
#[1,] 1 4 7
#[2,] 2 5 8
#[3,] 3 6 9
#[4,] 10 13 16
#[5,] 11 14 17
#[6,] 12 15 18
Or in shorter form of the above
sapply(split(a,(col(a)-1) %/%3), function(x) c(matrix(x, nrow = 3, byrow = TRUE)))
Or this can be done more compactly with array
apply(array(c(t(a)), c(3, 3, 2)), 2, c)
# [,1] [,2] [,3]
#[1,] 1 4 7
#[2,] 2 5 8
#[3,] 3 6 9
#[4,] 10 13 16
#[5,] 11 14 17
#[6,] 12 15 18

R Create Matrix From an Operation on a "Row" Vector and a "Column" Vector

First create a "row" vector and a "column" vector in R:
> row.vector <- seq(from = 1, length = 4, by = 1)
> col.vector <- {t(seq(from = 1, length = 3, by = 2))}
From that I'd like to create a matrix by, e.g., multiplying each value in the row vector with each value in the column vector, thus creating from just those two vectors:
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 6 10
[3,] 3 9 15
[4,] 4 12 20
Can this be done with somehow using apply()? sweep()? ...a for loop?
Thank you for any help!
Simple matrix multiplication will work just fine
row.vector %*% col.vector
# [,1] [,2] [,3]
# [1,] 1 3 5
# [2,] 2 6 10
# [3,] 3 9 15
# [4,] 4 12 20
You'd be better off working with two actual vectors, instead of a vector and a matrix:
outer(row.vector,as.vector(col.vector))
# [,1] [,2] [,3]
#[1,] 1 3 5
#[2,] 2 6 10
#[3,] 3 9 15
#[4,] 4 12 20
Here's a way to get there with apply. Is there a reason why you're not using matrix?
> apply(col.vector, 2, function(x) row.vector * x)
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 6 10
## [3,] 3 9 15
## [4,] 4 12 20

Multiplication of matrix in to a vector in R

I have a matrix m and a vector v. I would like to multiply the matrix m into vetcor vand get a matrix whith same dimension as m means that multiply first element of m to v and .... How can I do this in R?
m = matrix(c(1, 2, 3, 4, 5), ncol=1)
v = c(1, 2, 3, 4, 5)
> z
[,1]
[1,] 1
[2,] 4
[3,] 9
[4,] 16
[5,] 25
Cross products can be obtained using the %*% operator:
> m = matrix(c(1, 2, 3, 4, 5), ncol=1)
> v = c(1, 2, 3, 4, 5)
> m %*% v
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 2 4 6 8 10
[3,] 3 6 9 12 15
[4,] 4 8 12 16 20
[5,] 5 10 15 20 25
> m * v
[,1]
[1,] 1
[2,] 4
[3,] 9
[4,] 16
[5,] 25

Resources