Replacing diagonal elements using dplyr pipe - r

I want to replace the diagonal elements of a matrix in the middle of a piping process but can't figure out a way to do this. I know I can replace the diagonal elements this using diag() function, but I just don't know how to use diag() function inside a piping process. Sample data is given below and I want the following steps put together in a piping process. Thanks in advance.
aa <- matrix(1:25, nrow =5)
diag(aa) <- NA

One option could be:
aa %>%
`diag<-`(., NA)
[,1] [,2] [,3] [,4] [,5]
[1,] NA 6 11 16 21
[2,] 2 NA 12 17 22
[3,] 3 8 NA 18 23
[4,] 4 9 14 NA 24
[5,] 5 10 15 20 NA

We could use replace with a logical condition
library(dplyr)
aa %>%
replace(., col(.) == row(.), NA)
-output
# [,1] [,2] [,3] [,4] [,5]
#[1,] NA 6 11 16 21
#[2,] 2 NA 12 17 22
#[3,] 3 8 NA 18 23
#[4,] 4 9 14 NA 24
#[5,] 5 10 15 20 NA

Related

shift matrix elements in R

n <- 5
a <- matrix(c(1:n**2),nrow = n, byrow = T)
output is
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 6 7 8 9 10
[3,] 11 12 13 14 15
[4,] 16 17 18 19 20
[5,] 21 22 23 24 25
how do I shift the '1' to the current position of '25' to look like this:
[,1] [,2] [,3] [,4] [,5]
[1,] 2 3 4 5 6
[2,] 7 8 9 10 11
[3,] 12 13 14 15 16
[4,] 17 18 19 20 21
[5,] 22 23 24 25 1
a <- t(a); a[] <- c(a[-1], a[1]); a <- t(a)
a
# [,1] [,2] [,3] [,4] [,5]
# [1,] 2 3 4 5 6
# [2,] 7 8 9 10 11
# [3,] 12 13 14 15 16
# [4,] 17 18 19 20 21
# [5,] 22 23 24 25 1
c(a) unwinds or unlists the matrix into a vector. It does this column-first, so c(a) results in [1] 1 6 11 16 21 2 .... We want it to be row-first, though, so
t(a) transposes it, so that what was a row-first is now column-first, allowing c(a) and such to work.
c(a[-1], a[1]) is just "concatenate all except the first with the first", the classic way to put the first element of a vector at the end.
a[] <- is a way to do calcs on its values where the calcs do not preserve the "dimensionality" of the object.
After we've rearranged, we then transpose back to the original shape and row/column-order.
Here is a base R one-liner
> t(`dim<-`(t(a)[seq_along(a)%%length(a)+1],rev(dim(a))))
[,1] [,2] [,3] [,4] [,5]
[1,] 2 3 4 5 6
[2,] 7 8 9 10 11
[3,] 12 13 14 15 16
[4,] 17 18 19 20 21
[5,] 22 23 24 25 1

Picking top n% percent of elements from matrix rows, different number of elements on each row

I have a problem with picking the top n% largest and smallest element's
from each data matrix row. Specifically, I would like to find the column numbers of those top n% elements. This would not be a problem if each row had the same number of non-NA-elements, but in this situation the number of picked elements is different for each row. Here's an example of the situation (the real data matrix is 195x1030 so I'wont be using it here), where top 40% are picked
data=
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 NA 100 98 200 78 80 35 NA 55
[2,] 32 67 15 73 NA 12 91 230 3 99
[3,] NA NA NA 45 53 26 112 64 80 41
[4,] 54 38 60 70 163 69 109 205 5 31
[5,] 107 28 296 254 30 40 NA 18 28 90
The resulting top 40% column numbers matrixes should look like these (the number of picked elements is calculated by rounding down, as the function as.integer does)
largest= smallest=
[,1] [,2] [,3] [,4] [,1] [,2] [,3] [,4]
[1,] 5 3 4 NA [1,] 1 8 10 NA
[2,] 8 10 7 NA [2,] 9 6 3 NA
[3,] 7 9 NA NA [3,] 6 10 NA NA
[4,] 8 5 7 4 [4,] 9 10 2 1
[5,] 3 4 1 10 [5,] 8 9 2 5
So the top numbers are selected looking only at the non-NA-elements of the rows. For example the first row of data matrix contains only 8 non-NA-numbers and thus 40%*8=3,2~ 3 elements are selected. This creates the NA's to the resulting matrixes.
Once again, I tried using a for-loop (this code is to finding the largest 40%):
largest <- matrix(rep(NA, 20), nrow = 5)
for(i in 1:5){
largest[i,]<-order(data[i,], decreasing=T)
[1:as.integer(0.4*nrow(data[complete.cases(data[,i]),]))]
}
but R returns an error: "number of items to replace is not a multiple of replacement length", which I think means that since not all the elements of the original largest-matrix are not replaced while looping, this for-loop can't be used. Am I right?
How could this sort of picking be done?
The following reproduces your expected output
# Determine number of columns for output matrix as
# maximum of 40% of all non-NA values per row
ncol <- max(floor(apply(mat, 1, function(x) sum(!is.na(x))) * 0.4))
# Top 40% largest
t(apply(mat, 1, function(x) {
n <- floor(sum(!is.na(x)) * 0.4);
replace(rep(NA, ncol), 1:n, order(x, decreasing = T)[1:n])
}))
# [,1] [,2] [,3] [,4]
#[1,] 5 3 4 NA
#[2,] 8 10 7 NA
#[3,] 7 9 NA NA
#[4,] 8 5 7 4
#[5,] 3 4 1 NA
# Top 40% smallest
t(apply(mat, 1, function(x) {
n <- floor(sum(!is.na(x)) * 0.4);
replace(rep(NA, ncol), 1:n, order(x, decreasing = F)[1:n])
}))
# [,1] [,2] [,3] [,4]
#[1,] 1 8 10 NA
#[2,] 9 6 3 NA
#[3,] 6 10 NA NA
#[4,] 9 10 2 1
#[5,] 8 2 9 NA
Explanation: We first determine the max number of columns for both output matrices; we then loop through mat row-by-row, determine the row-specific number n of non-NA entries corresponding to 40% of all non-NA numbers in that row, and return a column vector of the top 40% decreasing/increasing entries padded with NAs. Final transpose gives the expected output.
Posting my (less exact and very similar) answer as it is in form of a function, which might be handy:
toppct <- function(x, p, largest = TRUE){
t(apply(x, 1, function(y){
c(which(y %in% sort(y, decreasing = largest)[1:floor(length(which(!is.na(y)))*p)]),
rep(NA, floor(length(y)*p) - floor(length(which(!is.na(y)))*p)))
}))
}
This produces the output in the question, just without sorting the top percent positions. For smallest, just set largest = FALSE.
> toppct(mat, .4)
[,1] [,2] [,3] [,4]
[1,] 3 4 5 NA
[2,] 7 8 10 NA
[3,] 7 9 NA NA
[4,] 4 5 7 8
[5,] 1 3 4 NA
> toppct(mat, .4, largest = FALSE)
[,1] [,2] [,3] [,4]
[1,] 1 8 10 NA
[2,] 3 6 9 NA
[3,] 6 10 NA NA
[4,] 1 2 9 10
[5,] 2 8 9 NA
I want to emphasize that I think Maurits' answer is the one to accept, as he gets the output exactly as expected.

One-liner to create a list of iterating sequences?

I need to create a list of sequences that always goes back to the first digit in the sequence. I've written the code below but it seems clunky. Is there a solution that uses fewer characters?
(i = seq(1, 24, by = 3))
#> [1] 1 4 7 10 13 16 19 22
(i_list = purrr::map(i, ~c(.:(. + 2), .)))
#> [[1]]
#> [1] 1 2 3 1
#>
#> [[2]]
#> ...
Edit: here's a way with lapply(). Not sure why this is getting downvotes, any advice on how to improve the question welcome!
(i_list = lapply(i, function(x) c(x:(x+2), x)))
I was wondering if there's a way with replicate() so have added that tag.
In matrix, rather than list form, theres:
cbind(matrix(1:24, ncol=3,byrow=TRUE),seq(1, 24, by = 3))
[,1] [,2] [,3] [,4]
[1,] 1 2 3 1
[2,] 4 5 6 4
[3,] 7 8 9 7
[4,] 10 11 12 10
[5,] 13 14 15 13
[6,] 16 17 18 16
[7,] 19 20 21 19
[8,] 22 23 24 22
and then you'd iterate over rows of the matrix instead of elements of the list.
Or if you are into code golf:
> seq(1,24,by=3) + t(matrix(c(0,1,2,0),ncol=8,nrow=4))
[,1] [,2] [,3] [,4]
[1,] 1 2 3 1
[2,] 4 5 6 4
[3,] 7 8 9 7
[4,] 10 11 12 10
...
but then how much work do you put into constructing the RHS of the + in this case? How is your question parameterised?
This depends on i having a regular pattern (with some adjustment for step size), it doesn't work for arbitrary i sequences.

Transforming NA to specific arrays of a matrix in R

I have a matrix of the form,
mat <- matrix(1:25, 5,5)
that looks like the following:
Now, I need to transform this matrix in the form as shown below:
That is, I want to keep all elements of row 2 and 4 as well as column 2 and 4 and replace all other values with NA. This a just a simple example to explain the problem. My actual matrix size is about 2000 X 2000. Any help would be much appreciated.
Your first and second matrices are a different in that the first one is filled as R would fill a matrix (i.e. column-major order) and the second is row-major.
Assuming that you meant to have identical matrices, your task can be addressed with simple matrix operations:
mat <- matrix(1:25, 5,5)
mat2 <- matrix(NA, 5,5)
mat2[c(2,4),] <- 1
mat2[,c(2,4)] <- 1
mat * mat2
[,1] [,2] [,3] [,4] [,5]
[1,] NA 6 NA 16 NA
[2,] 2 7 12 17 22
[3,] NA 8 NA 18 NA
[4,] 4 9 14 19 24
[5,] NA 10 NA 20 NA
If not, just transpose your initial matrix with t(mat) and follow the same approach as above.
mat = t(mat)
replace(x = mat, which((matrix(row(mat) %in% c(2, 4), NROW(mat), NCOL(mat)) |
matrix(col(mat) %in% c(2, 4), NROW(mat), NCOL(mat))) == FALSE,
arr.ind = TRUE), NA)
# [,1] [,2] [,3] [,4] [,5]
#[1,] NA 2 NA 4 NA
#[2,] 6 7 8 9 10
#[3,] NA 12 NA 14 NA
#[4,] 16 17 18 19 20
#[5,] NA 22 NA 24 NA

Add columns of a matrix based on values of another vector

Suppose I have the following matrix:
mat <- matrix(1:20, ncol=5)
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 9 13 17
[2,] 2 6 10 14 18
[3,] 3 7 11 15 19
[4,] 4 8 12 16 20
and the following vector
counts=c(2,1,2)
I need to collapse this matrix by adding the columns based on each value of that vector counts. That means that the first two columns most be added, the third remain equal and sum the last two columns. My resulting matrix must be like this
[,1] [,2] [,3]
[1,] 6 9 30
[2,] 8 10 32
[3,] 10 11 34
[4,] 12 12 36
How could I do this in an automatic way, given that in my case I have a very big matrix and with a vector of counts with different values?
One way would be to replicate the sequence of 'counts' by 'counts' vector, use that to split the column sequence of 'mat' to return a list, loop through the list with sapply, use the column index to subset the 'mat' for each list element and get the rowSums.
mat2 <- sapply(split(1:ncol(mat), rep(seq_along(counts), counts)),
function(i) rowSums(mat[,i,drop=FALSE]))
dimnames(mat2) <- NULL
mat2
# [,1] [,2] [,3]
#[1,] 6 9 30
#[2,] 8 10 32
#[3,] 10 11 34
#[4,] 12 12 36
Another idea, conceptually similar to akrun's:
t(rowsum(t(mat), rep(seq_along(counts), counts)))
# 1 2 3
#[1,] 6 9 30
#[2,] 8 10 32
#[3,] 10 11 34
#[4,] 12 12 36

Resources