Sumif function in R - r

I'm having issues in doing Excel's SUMIF function in R. There are 2 matrices, m and n. I want matrix n to take the sum of each column j of matrix m limited until the i-th row if the row i+1 in column j+1 is not empty (not sure if I made this clear, below are my explanation for a clearer view of what I want to do).
Below are my codes:
m <- matrix(c(1,2,3,4,5,'',7,'',''), nrow = 3)
n <- matrix('', nrow = 2, ncol = 3)
for (j in 1:2) {
n[2,j] <- sum(as.numeric(m[,j])[!is.na(m[,j+1])]
}
n[2,3] <- ''
Below is matrix m:
> m <- matrix(c(1,2,3,4,5,'',7,'',''), nrow = 3)
> m
[,1] [,2] [,3]
[1,] "1" "4" "7"
[2,] "2" "5" ""
[3,] "3" "" ""
The above codes yield the results for matrix n:
> n <- matrix('', nrow = 2, ncol = 3)
> n
[,1] [,2] [,3]
[1,] "" "" ""
[2,] "" "" ""
But I want the codes to yield this results:
> n <- matrix('', nrow = 2, ncol = 3)
> n
[,1] [,2] [,3]
[1,] "" "" ""
[2,] "3" "4" ""
Please help! Thanks!

Using numeric data:
m <- matrix(c(1,2,3,4,5,NA,7,NA,NA), nrow = 3)
n <- matrix(NA, nrow = 2, ncol = 3)
Bottom line up-front:
n[2,] <- colSums(m * cbind(!is.na(m)[,-1], FALSE), na.rm = TRUE)
Stepping through the logic:
Find NAs and shift one column to the left:
cbind(!is.na(m)[,-1], FALSE)
# [,1] [,2] [,3]
# [1,] TRUE TRUE FALSE
# [2,] TRUE FALSE FALSE
# [3,] FALSE FALSE FALSE
We can multiply that by the original m, where FALSE is effectively 0.
m * cbind(!is.na(m)[,-1], FALSE)
# [,1] [,2] [,3]
# [1,] 1 4 0
# [2,] 2 0 NA
# [3,] 0 NA NA
Column sums, using colSums(..., na.rm = TRUE)
colSums(m * cbind(!is.na(m)[,-1], FALSE), na.rm = TRUE)
# [1] 3 4 0
Assign that value to the second row of n:
n[2,] <- colSums(m * cbind(!is.na(m)[,-1], FALSE), na.rm = TRUE)
n
# [,1] [,2] [,3]
# [1,] NA NA NA
# [2,] 3 4 0

A matrix can hold only one class so having empty character values ("") changes all the numeric variables to character. You can use NA instead which will keep the class intact and you can sum it. Also, I don't really understand why you need additional empty (or NA) rows when your actual data is present only in the last row.
Having said that, you can use apply column-wise to sum the values till the last non-NA value is found in that column - 1.
m <- matrix(c(1,2,3,4,5,NA,7,NA,NA), nrow = 3)
n <- matrix(NA, nrow = 2, ncol = 3)
n[2, ] <- apply(m, 2, function(x) sum(x[seq_len(max(which(!is.na(x))) - 1)]))
n
# [,1] [,2] [,3]
#[1,] NA NA NA
#[2,] 3 4 0

Related

row sum based on matrix with logicals

I have two dataframes that look similar to this example:
> matrix(1:9, nrow = 3, ncol = 3)
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> matrix(rexp(9), 3) < 1
[,1] [,2] [,3]
[1,] TRUE TRUE FALSE
[2,] FALSE TRUE FALSE
[3,] FALSE FALSE TRUE
I want to sum individual entries of a row, but only when the logical matrix of the same size is TRUE, else this row element should not be in the sum. All rows have at least one case where one element of matrix 2 is TRUE.
The result should look like this
[,1]
[1,] 12
[2,] 5
[3,] 9
Thanks for the help.
Multiplying your T/F matrix by your other one will zero out all the elements where FALSE. You can then sum by row.
m1 <- matrix(1:9, nrow = 3, ncol = 3)
m2 <- matrix(rexp(9), 3) < 1
as.matrix(rowSums(m1 * m2), ncol = 1)
We replace the elements to NA and use rowSums with na.rm
matrix(rowSums(replace(m1, m2, NA), na.rm = TRUE))
# [,1]
#[1,] 12
#[2,] 5
#[3,] 9
Or use ifelse
matrix(rowSums(ifelse(m2, 0, m1)))
data
m1 <- matrix(1:9, nrow = 3, ncol = 3)
m2 <- matrix(rexp(9), 3) >= 1

Create a symmetric matrix from circular shifts of a vector

I'm struggling with the creation of a symmetric matrix.
Let's say a vector v <- c(1,2,3)
I want to create a matrix like this:
matrix(ncol = 3, nrow = 3, c(1,2,3,2,3,1,3,1,2), byrow = FALSE)
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 2 3 1
[3,] 3 1 2
(This is just an reprex, I have many vectors with different lengths.)
Notice this is a symmetric matrix with diagonal c(1,3,2) (different from vector v) and the manual process to create the matrix would be like this:
Using the first row as base (vector v) the process is to fill the empty spaces with the remaining values on the left side.
Any help is appreciated. Thanks!
Let me answer my own question in order to close it properly, using the incredible simple and easy solution from Henrik's comment:
matrix(v, nrow = 3, ncol = 4, byrow = TRUE)[ , 1:3]
Maybe the byrow = TRUE matches the three steps of the illustration best conceptually, but the output is the same with:
matrix(v, nrow = 4, ncol = 3)[1:3, ]
# [,1] [,2] [,3]
# [1,] 1 2 3
# [2,] 2 3 1
# [3,] 3 1 2
Because there may be "many vectors with different lengths", it could be convenient to make a simple function and apply it to the vectors stored in a list:
cycle = function(x){
len = length(x)
matrix(x, nrow = len + 1, ncol = len)[1:len , ]
}
l = list(v1 = 1:3, v2 = letters[1:4])
lapply(l, cycle)
# $v1
# [,1] [,2] [,3]
# [1,] 1 2 3
# [2,] 2 3 1
# [3,] 3 1 2
#
# $v2
# [,1] [,2] [,3] [,4]
# [1,] "a" "b" "c" "d"
# [2,] "b" "c" "d" "a"
# [3,] "c" "d" "a" "b"
# [4,] "d" "a" "b" "c"
Another option is to use Reduce and make c(v[-1], v[1]) accumulative.
do.call(rbind, Reduce(function(x, y) c(x[-1], x[1]), v[-1], v, accumulate = TRUE))
# [,1] [,2] [,3]
#[1,] 1 2 3
#[2,] 2 3 1
#[3,] 3 1 2

Find common values between matrices and return matrix with row-col position

I'd like to find between to matrices the shared values, and return the locations (row-col) in a matrix.
set.seed(123)
m <- matrix(sample(4), 2, 2, byrow = T)
# m
# [,1] [,2]
# [1,] 2 3
# [2,] 1 4
m2 <- matrix(sample(4), 2, 2, byrow = F)
# m2
# [,1] [,2]
# [1,] 4 2
# [2,] 1 3
Expected output:
# [,1] [,2]
# [1,] NA NA
# [2,] "2-1" NA
Bonus if this could be generalized to non-identical matrices (different dim).
Equal sizes
One option would be
replace(m * NA, m == m2, paste(row(m), col(m), sep = "-")[m == m2])
# [,1] [,2]
# [1,] NA NA
# [2,] "2-1" NA
Different sizes
I believe that in this case, regardless of the approach, you will first need to trim both matrices to be of equal size.
set.seed(12)
(m <- matrix(sample(6), 2, 3, byrow = TRUE))
# [,1] [,2] [,3]
# [1,] 1 5 4
# [2,] 6 3 2
(m2 <- matrix(sample(6), 3, 2, byrow = FALSE))
# [,1] [,2]
# [1,] 2 5
# [2,] 4 3
# [3,] 1 6
out <- matrix(NA, max(nrow(m), nrow(m2)), max(ncol(m), ncol(m2)))
mrow <- min(nrow(m), nrow(m2))
mcol <- min(ncol(m), ncol(m2))
mTrim <- m[1:mrow, 1:mcol]
m2Trim <- m2[1:mrow, 1:mcol]
out[1:mrow, 1:mcol][mTrim == m2Trim] <- paste(row(mTrim), col(mTrim), sep = "-")[mTrim == m2Trim]
out
# [,1] [,2] [,3]
# [1,] NA "1-2" NA
# [2,] NA "2-2" NA
# [3,] NA NA NA
This function gives the desired output, but works on the condition that dim() is equal between the two matrices.
In order to generalize this for non identical matrices, on solution would be to subset the bigger matrix first.
The key is which(mat1==mat2, arr.ind=T) to get row-col index:
which(m==m2, arr.ind=T)
row col
[1,] 2 1
Inside a function:
find_in_matr <- function(mat1, mat2) {
if (!all(dim(mat1) == dim(mat2))) {
stop("mat1 and mat2 need to have the same dim()!")
}
m <- mat1
m[] <- NA # copy mat1 dim, and empty values
loc <- which(mat1==mat2, arr.ind=T) # find positions (both indxs)
m[loc] <- mapply(paste, sep="-", loc[, 1], loc[, 2]) # paste indxs
return(m)
}
Example:
set.seed(123)
m <- matrix(sample(4), 2, 2, byrow = T)
# m
# [,1] [,2]
# [1,] 2 3
# [2,] 1 4
m2 <- matrix(sample(4), 2, 2, byrow = F)
# m2
# [,1] [,2]
# [1,] 4 2
# [2,] 1 3
find_in_matr(m, m2)
# [,1] [,2]
# [1,] NA NA
# [2,] "2-1" NA
Silly piped version
library(magrittr)
(m == m2) %>%
`[<-`(!., NA) %>%
`[<-`((w <- which(., arr = T)), apply(w, 1, paste, collapse = '-'))
# [,1] [,2]
# [1,] NA NA
# [2,] "2-1" NA
I try to do it with ifelse() :
x <- apply(which(m == m2, arr.ind = T), 1, paste, collapse = "-")
ifelse(m != m2, NA, x)
# [,1] [,2]
# [1,] NA NA
# [2,] "2-1" NA
This method can deal with any dimensions.
e.g.
set.seed(999)
m1 <- matrix(sample(1:3, 12, replace = T), 3, 4)
m2 <- matrix(sample(1:3, 12, replace = T), 3, 4)
x <- apply(which(m1 == m2, arr.ind = T), 1, paste, collapse = "-")
ifelse(m1 != m2, NA, x)
# [,1] [,2] [,3] [,4]
# [1,] NA "1-4" NA "3-4"
# [2,] NA NA "2-3" NA
# [3,] "2-3" NA NA "1-2"

Output converted from matrix to vector in apply

I want to apply a function over one margin (column in my example) of a matrix. The problem is that the function returns matrix and apply converts it to vector so that it returns a matrix. My goal is to get three-dimensional array. Here is the example (note that matrix() is not the function of interest, just an example):
x <- matrix(1:12, 4, 3)
apply(x, 2, matrix, nrow = 2, ncol = 2)
The output is exactly the same as the input. I have pretty dull solution to this:
library(abind)
abind2 <- function (x, ...)
abind(x, ..., along = dim(x) + 1)
apply(x, 2, list) %>%
lapply(unlist) %>%
lapply(matrix, nrow = 2, ncol = 2) %>%
do.call(what = 'abind2')
I believe there must exist something better than this. Something that does not include list()ing and unlist()ing columns.
Edit:
Also, the solution should be ready to be easily applicable to any-dimensional array with any choice of MARGIN which my solution is not.
This, for example, I want to return 4-dimensional array.
x <- array(1:24, c(4,3,2))
apply(x, 2:3, list) %>%
lapply(unlist) %>%
lapply(matrix, nrow = 2, ncol = 2) %>%
do.call(what = 'abind2')
Not that complicated at all. Simply use
array(x, dim = c(2, 2, ncol(x)))
Matrix and general arrays are stored by column into a 1D long array in physical address. You can just reallocate dimension.
OK, here is possibly what you want to do in general:
tapply(x, col(x), FUN = matrix, nrow = 2, ncol = 2)
#$`1`
# [,1] [,2]
#[1,] 1 3
#[2,] 2 4
#
#$`2`
# [,1] [,2]
#[1,] 5 7
#[2,] 6 8
#
#$`3`
# [,1] [,2]
#[1,] 9 11
#[2,] 10 12
You can try to convert your matrix into a data.frame and use lapply to apply your function on the columns (as a data.frame is a list), it will return a list, where each element represents the function result for a column:
lapply(as.data.frame(x), matrix, nrow = 2, ncol = 2)
# $V1
# [,1] [,2]
# [1,] 1 3
# [2,] 2 4
# $V2
# [,1] [,2]
# [1,] 5 7
# [2,] 6 8
# $V3
# [,1] [,2]
# [1,] 9 11
# [2,] 10 12
EDIT with the second definition of x:
x <- array(1:24, c(4,3,2))
lapply(as.data.frame(x), matrix, nrow = 2, ncol = 2)
# $V1
# [,1] [,2]
# [1,] 1 3
# [2,] 2 4
# $V2
# [,1] [,2]
# [1,] 5 7
# [2,] 6 8
# $V3
# [,1] [,2]
# [1,] 9 11
# [2,] 10 12
# $V4
# [,1] [,2]
# [1,] 13 15
# [2,] 14 16
# $V5
# [,1] [,2]
# [1,] 17 19
# [2,] 18 20
# $V6
# [,1] [,2]
# [1,] 21 23
# [2,] 22 24
EDIT2: a try to get an arry as result
Based on this similar question, you may try this code:
x <- array(1:24, c(4,3,2))
sapply(1:3,
function(y) sapply(1:ncol(x[, y, ]),
function(z) matrix(x[,y,z], ncol=2, nrow=2),
simplify="array"),
simplify="array")
Dimension of the result is 2 2 2 3.
Actually, the problem here is that it needs two different calls to apply when x is an array of more than 2 dimension. In the last example of the quesion (with x <- array(1:24, c(4,3,2))), we want to apply to each element of third dimension a function that apply to each element of second dimension the matrix function.

R Inserting a Dataframe/List into a Dataframe Element

I'd like to insert a dataframe into a dataframe element, such that if I called:df1[1,1] I would get:
[A B]
[C D]
I thought this was possible in R but perhaps I am mistaken. In a project of mine, I am essentially working with a 50x50 matrix, where I'd like each element to contain column of data containing numbers and labeled rows.
Trying to do something like df1[1,1] <- df2 yields the following warning
Warning message:
In [<-.data.frame(*tmp*, i, j, value = list(DJN.10 = c(0, 3, :
replacement element 1 has 144 rows to replace 1 rows
And calling df1[1,1] yields 0 . I've tried inserting the data in various ways, as with as.vector() and as.list() to no success.
Best,
Perhaps a matrix could work for you, like so:
x <- matrix(list(), nrow=2, ncol=3)
print(x)
# [,1] [,2] [,3]
#[1,] NULL NULL NULL
#[2,] NULL NULL NULL
x[[1,1]] <- data.frame(a=c("A","C"), b=c("B","D"))
x[[1,2]] <- data.frame(c=2:3)
x[[2,3]] <- data.frame(x=1, y=2:4)
x[[2,1]] <- list(1,2,3,5)
x[[1,3]] <- list("a","b","c","d")
x[[2,2]] <- list(1:5)
print(x)
# [,1] [,2] [,3]
#[1,] List,2 List,1 List,4
#[2,] List,4 List,1 List,2
x[[1,1]]
# a b
#1 A B
#2 C D
class(x)
#[1] "matrix"
typeof(x)
#[1] "list"
See here for details.
Each column in your data.frame can be a list. Just make sure that the list is as long as the number of rows in your data.frame.
Columns can be added using the standard $ notation.
Example:
x <- data.frame(matrix(NA, nrow=2, ncol=3))
x$X1 <- I(list(data.frame(a=c("A","C"), b=c("B","D")), matrix(1:10, ncol = 5)))
x$X2 <- I(list(data.frame(c = 2:3), list(1, 2, 3, 4)))
x$X3 <- I(list(list("a", "b", "c"), 1:5))
x
# X1 X2 X3
# 1 1:2, 1:2 2:3 a, b, c
# 2 1, 2, 3,.... 1, 2, 3, 4 1, 2, 3,....
x[1, 1]
# [[1]]
# a b
# 1 A B
# 2 C D
#
x[2, 1]
# [[1]]
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 3 5 7 9
# [2,] 2 4 6 8 10

Resources