Creating a matrix of multiple counters in R - r

So, my goal is to take an input vector and to make an output matrix of different counters. So every time a value appears in my inputs, I want to find that counter and iterate it by 1. I understand that I'm not good at explaining this, so I illustrated a simple version below. However, I want to make 2 changes which I will enumerate after the example so that it makes sense.
nums = c(1,2,3,4,5,1,2,4,3,5)
unis = unique(nums)
counter = matrix(NA, nrow = length(nums), ncol = length(unis))
colnames(counter) = unis
for (i in 1:length(nums)){
temp = nums[i]
if (i == 1){
counter[1,] = 0
counter[1,temp] = 1
} else {
counter[i,] = counter[i-1,]
counter[i,temp] = counter[i-1,temp]+1
}
}
counter
which outputs
> counter
1 2 3 4 5
[1,] 1 0 0 0 0
[2,] 1 1 0 0 0
[3,] 1 1 1 0 0
[4,] 1 1 1 1 0
[5,] 1 1 1 1 1
[6,] 2 1 1 1 1
[7,] 2 2 1 1 1
[8,] 2 2 1 2 1
[9,] 2 2 2 2 1
[10,] 2 2 2 2 2
The 2 modifications. 1) Since the real data is much larger, I would want to do this using apply or however people who know R better than me says it should be done. 2) Whereas the input is a vector where each element is only an element, how could this be generalized if an element of a vector was a tuple? For example (if nums was a tuple of 4 and 5, then it would iterate both in that step and the last line of the output would then be 2,2,2,3,2)
Thanks and if you don't understand please ask questions and I'll try to clarify

Using the Matrix package (which ships with a standard installation of R)
nums <- c(1,2,3,4,5,1,2,4,3,5)
apply(Matrix::sparseMatrix(i=seq_along(nums), j=nums), 2, cumsum)
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 0 0 0 0
# [2,] 1 1 0 0 0
# [3,] 1 1 1 0 0
# [4,] 1 1 1 1 0
# [5,] 1 1 1 1 1
# [7,] 2 2 1 1 1
# [8,] 2 2 1 2 1
# [9,] 2 2 2 2 1
# [10,] 2 2 2 2 2
Note that this behaves a bit differently in a couple of ways from thelatemail's suggested solution. Which behavior you prefer will depend on what you are using this for.
Here's a small example that illustrates the differences:
nums <- c(5,2,1,1)
# My suggestion
apply(Matrix::sparseMatrix(i=seq_along(nums), j=nums), 2, cumsum)
# [,1] [,2] [,3] [,4] [,5]
# [1,] 0 0 0 0 1
# [2,] 0 1 0 0 1
# [3,] 1 1 0 0 1
# [4,] 2 1 0 0 1
# #thelatemail's suggestion
sapply(unique(nums), function(x) cumsum(nums==x) )
# [,1] [,2] [,3]
# [1,] 1 0 0
# [2,] 1 1 0
# [3,] 1 1 1
# [4,] 1 1 2
For your second question, you could do something like this:
nums <- list(1,2,3,4,5,1,2,4,3,c(4,5))
ii <- rep(seq_along(nums), times=lengths(nums)) ## lengths() is in R>=3.2.0
jj <- unlist(nums)
apply(Matrix::sparseMatrix(i=ii, j=jj), 2, cumsum)
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 0 0 0 0
# [2,] 1 1 0 0 0
# [3,] 1 1 1 0 0
# [4,] 1 1 1 1 0
# [5,] 1 1 1 1 1
# [6,] 2 1 1 1 1
# [7,] 2 2 1 1 1
# [8,] 2 2 1 2 1
# [9,] 2 2 2 2 1
# [10,] 2 2 2 3 2

For your first query, you can get there with something like:
sapply(unique(nums), function(x) cumsum(nums==x) )
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 0 0 0 0
# [2,] 1 1 0 0 0
# [3,] 1 1 1 0 0
# [4,] 1 1 1 1 0
# [5,] 1 1 1 1 1
# [6,] 2 1 1 1 1
# [7,] 2 2 1 1 1
# [8,] 2 2 1 2 1
# [9,] 2 2 2 2 1
#[10,] 2 2 2 2 2

Another idea:
do.call(rbind, Reduce("+", lapply(nums, tabulate, max(unlist(nums))), accumulate = TRUE))
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 0 0 0 0
# [2,] 1 1 0 0 0
# [3,] 1 1 1 0 0
# [4,] 1 1 1 1 0
# [5,] 1 1 1 1 1
# [6,] 2 1 1 1 1
# [7,] 2 2 1 1 1
# [8,] 2 2 1 2 1
# [9,] 2 2 2 2 1
#[10,] 2 2 2 2 2
And generally:
x = list(1, 3, 6, c(6, 3), 2, c(4, 6, 1), c(1, 2), 3)
do.call(rbind, Reduce("+", lapply(x, tabulate, max(unlist(x))), accumulate = TRUE))
# [,1] [,2] [,3] [,4] [,5] [,6]
#[1,] 1 0 0 0 0 0
#[2,] 1 0 1 0 0 0
#[3,] 1 0 1 0 0 1
#[4,] 1 0 2 0 0 2
#[5,] 1 1 2 0 0 2
#[6,] 2 1 2 1 0 3
#[7,] 3 2 2 1 0 3
#[8,] 3 2 3 1 0 3

Related

Get index locations of 0s which are completely surrounded by 1s

I have a matrix like so:
m <- matrix(c(1,1,1,1,1,1,1,1,0,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,2,0,1,1,1,1,1,1,1,1,1), nrow = 12, ncol = 12)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] 1 1 1 1 1 1 1 1 1 1 1 1
[2,] 1 1 2 1 1 2 1 1 2 1 1 2
[3,] 1 1 0 1 1 0 1 1 0 1 1 0
[4,] 1 1 1 1 1 1 1 1 1 1 1 1
[5,] 1 1 1 1 1 1 1 1 1 1 1 1
[6,] 1 1 1 1 1 1 1 1 1 1 1 1
[7,] 1 1 1 1 1 1 1 1 1 1 1 1
[8,] 1 1 1 1 1 1 1 1 1 1 1 1
[9,] 0 1 1 0 1 1 0 1 1 0 1 1
[10,] 1 1 1 1 1 1 1 1 1 1 1 1
[11,] 1 1 1 1 1 1 1 1 1 1 1 1
[12,] 1 1 1 1 1 1 1 1 1 1 1 1
and I want to find the index locations where 0 is completely surrounded by 1s in a 3x3 window. I can find all the zeros with:
which(m == 0) but this will also return places where a 2 surrounds a 0 such as at index location m[3,3]
w <- which(m == 0, arr.ind = TRUE)
w
# row col
# [1,] 9 1
# [2,] 3 3
# [3,] 9 4
# [4,] 3 6
# [5,] 9 7
# [6,] 3 9
# [7,] 9 10
# [8,] 3 12
We don't need to know which zeroes are on a boundary, so filter out those:
w <- w[ w[,1] > 1 & w[,1] < (nrow(m)-1) & w[,2] > 2 & w[,2] < (ncol(m)-1), ]
w
# row col
# [1,] 3 3
# [2,] 9 4
# [3,] 3 6
# [4,] 9 7
# [5,] 3 9
# [6,] 9 10
Now we can take those inner indices and build 3x3 submatrices into a list. Here are the first couple (of six):
Map(function(rn,cn) m[rn+(-1:1),cn+(-1:1)], w[,1], w[,2])[1:2]
# [[1]]
# [,1] [,2] [,3]
# [1,] 1 2 1
# [2,] 1 0 1
# [3,] 1 1 1
# [[2]]
# [,1] [,2] [,3]
# [1,] 1 1 1
# [2,] 1 0 1
# [3,] 1 1 1
Now we can just filter out the ones where there is only one non-1 entry in the matrix.
Filter(function(m3) sum(m3 != 1) == 1, Map(function(rn,cn) m[rn+(-1:1),cn+(-1:1)], w[,1], w[,2]))
# [[1]]
# [,1] [,2] [,3]
# [1,] 1 1 1
# [2,] 1 0 1
# [3,] 1 1 1
# [[2]]
# [,1] [,2] [,3]
# [1,] 1 1 1
# [2,] 1 0 1
# [3,] 1 1 1
# [[3]]
# [,1] [,2] [,3]
# [1,] 1 1 1
# [2,] 1 0 1
# [3,] 1 1 1
Since you need to just count the occurrences, add length(...) around that, and you have your answer.
(If you're curious, the reason I went with sum(m3!=1)==1 is because I wasn't certain if you wanted the border submatrices as well. If you wanted those, then the number of 1s would be reduced, not "8" as a typical 3x3 would be. But we know that there should always be exactly one non-1 in the submatrix: the center 0.)
To get just the indices that match,
w[mapply(function(rn,cn) sum(m[rn+(-1:1),cn+(-1:1)] != 1) == 1,
w[,1], w[,2]),]
# row col
# [1,] 9 4
# [2,] 9 7
# [3,] 9 10

Two-circulant matrix in R

How to construct a two-circulant matrix?
For example, the following matrix A is two-circulant, i.e every column (expect from the first one) is obtained from the previous one by putting the last two elements as first. Note that the first column is the generator of the matrix.
N=12
k=6
x=c(0,0,0,0,1,1,1,1,2,2,2,2)
A=matrix(0,N,k)
A[,1]=x
for( j in 2:ncol(A) )
{
A[,j]=c(A[11:12,j-1],A[1:10,j-1])
}
> A
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0 2 2 1 1 0
[2,] 0 2 2 1 1 0
[3,] 0 0 2 2 1 1
[4,] 0 0 2 2 1 1
[5,] 1 0 0 2 2 1
[6,] 1 0 0 2 2 1
[7,] 1 1 0 0 2 2
[8,] 1 1 0 0 2 2
[9,] 2 1 1 0 0 2
[10,] 2 1 1 0 0 2
[11,] 2 2 1 1 0 0
[12,] 2 2 1 1 0 0
Is there any other way to constuct the matrix A? For example by using a function.
You could use the following:
circular_matrix <- function(x, ncol) {
coll <- list(x)
for (i in 1:(ncol-1)) {
current <- coll[[length(coll)]]
coll[[length(coll) + 1]] <- c(tail(current, 2), current[1:(length(current) - 2)])
}
do.call(cbind, coll)
}
circular_matrix(1:10, 5)
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 9 7 5 3
# [2,] 2 10 8 6 4
# [3,] 3 1 9 7 5
# [4,] 4 2 10 8 6
# [5,] 5 3 1 9 7
# [6,] 6 4 2 10 8
# [7,] 7 5 3 1 9
# [8,] 8 6 4 2 10
# [9,] 9 7 5 3 1
#[10,] 10 8 6 4 2

How to modify matrix multiplication to sum only positives or negatives values in R

I want to do a matrix multiplication with a twist.
I have this matrix:
A <- matrix(c(1,-1,-1,0,-1,0,1,0,0,1,0,0,0,1,-1,1,-1,0,0,-1,1,0,1,0,1,-1,-1,1,-1,1), nrow = 6, ncol = 5)
A
[,1] [,2] [,3] [,4] [,5]
[1,] 1 1 0 0 1
[2,] -1 0 1 -1 -1
[3,] -1 0 -1 1 -1
[4,] 0 1 1 0 1
[5,] -1 0 -1 1 -1
[6,] 0 0 0 0 1
And I want to get two different matrices. The first matrix is this:
C
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0 0 0 2 0 1
[2,] 0 0 2 1 2 0
[3,] 0 2 0 0 4 0
[4,] 2 1 0 0 0 1
[5,] 0 2 4 0 0 0
[6,] 1 0 0 1 0 0
This "convergence matrix" is something like the multiplication of A for its transpose (in R is something like this A%*%t(A)), but with a little twist, during the sum to obtain each cell I only want de sum of the positives values. For example, for the cell C23 the regular sum would be:
(-1)(-1) + (0)(0) + (1)(-1) + (-1)(1) + (-1)(-1) = 0
, but I only want the sum of the positive products, in this example the first [(-1)(-1)] and the last [(-1)(-1)] to obtain 2.
The second matrix is this:
D
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0 2 2 0 2 0
[2,] 2 0 2 1 2 1
[3,] 2 2 0 2 0 1
[4,] 0 1 2 0 2 0
[5,] 2 2 0 2 0 1
[6,] 0 1 1 0 1 0
This "divergence matrix" is similar to the previous one, with the difference that I only want to sum de absolute values of the negative values. For example, for the cell D23 the regular sum would be:
(-1)(-1) + (0)(0) + (1)(-1) + (-1)(1) + (-1)(-1) = 0
, but I only want the sum of the absolute values of negative products, in this example the third abs [(1)(-1)] and the fourth abs[(-1)(-1)] to obtain 2.
I've been trying with apply, sweep and loops but I can't get it.
Thanks for your responses.
Another take:
D <- A
D[D<0] = -1i*D[D<0]
D <- Im(tcrossprod(D))
C <- tcrossprod(A) + D
A is defined in the question.
Output:
> D
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0 2 2 0 2 0
[2,] 2 0 2 1 2 1
[3,] 2 2 0 2 0 1
[4,] 0 1 2 0 2 0
[5,] 2 2 0 2 0 1
[6,] 0 1 1 0 1 0
> C
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 3 0 0 2 0 1
[2,] 0 4 2 1 2 0
[3,] 0 2 4 0 4 0
[4,] 2 1 0 3 0 1
[5,] 0 2 4 0 4 0
[6,] 1 0 0 1 0 1
This is a try in base R. So basically you follow the matrix cross-product approach but you try to manage the sum step manually:
f <- function(A, convergence=TRUE){
sapply(seq_len(nrow(A)), function(i) {
r <- t(matrix(A[i,],ncol(A),nrow(A)))*A
if(convergence)
r[r<0] <- 0
else
r[r>0] <- 0
rowSums(abs(r))
})
}
> f(A, convergence = TRUE)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 3 0 0 2 0 1
[2,] 0 4 2 1 2 0
[3,] 0 2 4 0 4 0
[4,] 2 1 0 3 0 1
[5,] 0 2 4 0 4 0
[6,] 1 0 0 1 0 1
> f(A, convergence = FALSE)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0 2 2 0 2 0
[2,] 2 0 2 1 2 1
[3,] 2 2 0 2 0 1
[4,] 0 1 2 0 2 0
[5,] 2 2 0 2 0 1
[6,] 0 1 1 0 1 0
It will be significantly less efficient, but you could break the matrix into a list of row vectors, which are easier to calculate with. Using purrr, which is handy for lists,
library(purrr)
A <- matrix(c(1,-1,-1,0,-1,0,1,0,0,1,0,0,0,1,-1,1,-1,0,0,-1,1,0,1,0,1,-1,-1,1,-1,1),
nrow = 6, ncol = 5)
C <- seq(nrow(A)) %>% # generate a sequence of row indices
map(~A[.x, ]) %>% # subset matrix into a list of rows
cross2(., .) %>% # do a Cartesian join to get pairs of rows
# calculate products, then subset before summing. Simplify to vector
map_dbl(~{ij <- .x[[1]] * .x[[2]]; sum(ij[ij >= 0])}) %>%
matrix(nrow(A)) # reassemble to matrix
C
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] 3 0 0 2 0 1
#> [2,] 0 4 2 1 2 0
#> [3,] 0 2 4 0 4 0
#> [4,] 2 1 0 3 0 1
#> [5,] 0 2 4 0 4 0
#> [6,] 1 0 0 1 0 1
# same except subsetting and `-` to make negatives positive
D <- seq(nrow(A)) %>%
map(~A[.x, ]) %>%
cross2(., .) %>%
map_dbl(~{ij <- .x[[1]] * .x[[2]]; sum(-ij[ij <= 0])}) %>%
matrix(nrow(A))
D
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] 0 2 2 0 2 0
#> [2,] 2 0 2 1 2 1
#> [3,] 2 2 0 2 0 1
#> [4,] 0 1 2 0 2 0
#> [5,] 2 2 0 2 0 1
#> [6,] 0 1 1 0 1 0

maximal number of identical elements between any two columns of a matrix in R

I just was wondering if there was an easy way to compute the maximal number of identical elements between any two columns of a matrix in R.
For example, I have a matrix
test <- replicate(10, sample((0:3), 10, replace = TRUE))
test
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 3 0 1 0 2 2 1 0 2 0
[2,] 1 1 3 2 0 2 3 0 2 2
[3,] 2 3 0 0 1 2 0 3 0 2
[4,] 2 2 1 1 2 0 0 1 1 0
[5,] 2 0 1 2 0 1 1 1 0 0
[6,] 1 0 1 3 2 3 3 1 3 2
[7,] 0 1 3 2 1 0 1 2 1 1
[8,] 0 3 1 3 0 2 3 1 1 1
[9,] 2 3 1 3 0 1 0 1 3 2
[10,] 3 2 1 0 2 1 3 2 3 1
To compare column 1 and 2 I use
table(test[,1] == test[,2])
FALSE TRUE
8 2
So there are two identical elements between these two columns.
I could now repeat this for all pairs of columns using two nested for loops and then find the maximum number of TRUE calls but this does not look nice. Can anyone think of a better way?
Cheers,
Maik
It is always interesting to see a reasonable answer being voted down. Though I don't like this minus score, I would keep my answer. Voter, what do you think?
Let's first get some reproducible toy data:
set.seed(0); x <- replicate(10, sample((0:3), 10, replace = TRUE))
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 3 0 3 1 1 2 1 3 3 0
# [2,] 1 0 3 1 3 1 3 1 1 0
# [3,] 1 0 0 2 2 3 1 3 2 0
# [4,] 2 2 2 1 3 1 1 1 1 2
# [5,] 3 1 0 0 2 0 1 1 1 3
# [6,] 0 3 1 3 2 0 2 1 3 3
# [7,] 3 1 1 2 3 0 1 3 0 3
# [8,] 3 2 0 3 0 1 1 3 2 1
# [9,] 2 3 1 0 1 2 3 1 0 1
#[10,] 2 1 3 2 2 2 0 3 0 3
For any input matrix x, you can use:
y <- unlist(lapply(seq_len(ncol(x)-1L),
function(i) colSums(x[, (i+1):ncol(x), drop = FALSE] == x[, i])))
# [1] 1 2 3 2 4 1 4 2 3 3 1 0 0 3 1 3 5 1 3 1 2 4 1 4 3 4 2 3 5 1 1 3 2 1 2 2 3 3
#[39] 1 2 3 1 4 3 1
max(y)
# [1] 5
The comment by #David is doing essentially the same thing but way slower:
y <- combn(ncol(x), 2, FUN = function(u) sum(x[, u[1]] == x[, u[2]]))
# [1] 1 2 3 2 4 1 4 2 3 3 1 0 0 3 1 3 5 1 3 1 2 4 1 4 3 4 2 3 5 1 1 3 2 1 2 2 3 3
#[39] 1 2 3 1 4 3 1
max(y)
# [1] 5
Benchmarking
We generate a 10 * 1000 matrix for experiment:
set.seed(0); x <- replicate(1e+3, sample((0:3), 10, replace = TRUE))
system.time(unlist(lapply(seq_len(ncol(x)-1L), function(i) colSums(x[, (i+1):ncol(x), drop = FALSE] == x[, i]))))
# user system elapsed
# 0.176 0.032 0.207
system.time(combn(ncol(x), 2, FUN = function(u) sum(x[, u[1]] == x[, u[2]])))
# user system elapsed
# 4.692 0.008 4.708
Something like a distance matrix?
With this idea, you could also generate a "distance" matrix for number of non-equal elements between all columns (just replace the == with !=):
y <- unlist(lapply(seq_len(ncol(x)-1L),
function(i) colSums(x[, (i+1):ncol(x), drop = FALSE] != x[, i])))
z <- matrix(0L, ncol(x), ncol(x))
z[lower.tri(z)] <- y
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 0 0 0 0 0 0 0 0 0 0
# [2,] 9 0 0 0 0 0 0 0 0 0
# [3,] 8 7 0 0 0 0 0 0 0 0
# [4,] 7 9 9 0 0 0 0 0 0 0
# [5,] 8 10 7 7 0 0 0 0 0 0
# [6,] 6 10 9 6 9 0 0 0 0 0
# [7,] 9 7 8 8 7 8 0 0 0 0
# [8,] 6 9 6 7 8 7 8 0 0 0
# [9,] 8 7 9 5 9 7 7 6 0 0
#[10,] 7 5 6 9 8 9 9 7 9 0
Note that only lower triangular matrix is computed due to symmetry. Diagonal are all zeros (or course).
Try:
max(combn(split(test, col(test)), 2, function(x) sum(x[[1]] == x[[2]])))
If you want to know which pair has the greatest number of equal elements it's a little more complicated.

How to generate a matrix to store all non-empty subsets of a set

Suppose I have a set N={1,2,3}, then we can list all its 7 non-empty subsets.
n=3 # number of elements in a set
a=2^n-1 # number of non-empty subsets for that set
subsets=lapply(1:n, function(x) combn(n, x)) # list all the non-empty subest
subsets
Now I want to put these subsets into a matrix and organized like:
if n=3 or in an index matrix:
1 0 0 1 0 0
0 2 0 0 1 0
0 0 3 0 0 1
1 2 0 1 1 0
1 0 3 1 0 1
0 2 3 0 1 1
1 2 3 1 1 1
Anyone knows how to write the code that could be easily extended to any n (=4, 5, 6...)? I tried this:
subindex=matrix(c(0), nrow=a, ncol=n)
i=1
while(i<=a){
j=n
b=2^(n-1)
N=i
while(N>0){
if(b<=N) {subindex[i,j]=1}&{N=N-b}
b=trunc(b/2)
j=j-1
}
i=i+1
}
subindex
But the index matrix I get is wrong in row 3 and 4. If n=4, then there are more errors... Can anybody correct this or simplify this code? or just write a completely new code. Really appreciate.
n <- 4
lapply(seq_len(n), function(i)t(combn(n, i, FUN = tabulate, nbins = n)))
# [[1]]
# [,1] [,2] [,3] [,4]
# [1,] 1 0 0 0
# [2,] 0 1 0 0
# [3,] 0 0 1 0
# [4,] 0 0 0 1
#
# [[2]]
# [,1] [,2] [,3] [,4]
# [1,] 1 1 0 0
# [2,] 1 0 1 0
# [3,] 1 0 0 1
# [4,] 0 1 1 0
# [5,] 0 1 0 1
# [6,] 0 0 1 1
#
# [[3]]
# [,1] [,2] [,3] [,4]
# [1,] 1 1 1 0
# [2,] 1 1 0 1
# [3,] 1 0 1 1
# [4,] 0 1 1 1
#
# [[4]]
# [,1] [,2] [,3] [,4]
# [1,] 1 1 1 1

Resources