Sum by Rows in R - r

Let me reclaim my question, how I can sum the numbers by row, and list the sum follow by the last column, forming a new column like the second table (sum = a + b+ c + d + e)?
And I also want to know what if some of the values are N/A, can I still treat them as numbers?
Sample input:
a b c d e
1 90 67 18 39 74
2 100 103 20 45 50
3 80 87 23 44 89
4 95 57 48 79 90
5 74 81 61 95 131
Desired output:
a b c d e sum
1 90 67 18 39 74 288
2 100 103 20 45 50 318
3 80 87 23 44 89 323
4 95 57 48 79 90 369
5 74 81 61 95 131 442

To add a row sum, you can use addmargins
M <- matrix(c(90,67,18,39,74), nrow=1)
addmargins(M, 2) #2 = row margin
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 90 67 18 39 74 288
If you have missing data, you'll need to change the margin function to something that will properly handle the NA values
M<-matrix(c(90,67,18,NA,74), nrow=1)
addmargins(M, 2, FUN=function(...) sum(..., na.rm=T))
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 90 67 18 NA 74 249

Consider using apply(). For example:
set.seed(10) # optional, but this command will replicate data as shown
# create some data
x <-matrix(rnorm(1:25),nrow=5,ncol=5) # 5x5 matrix of random numbers
x
[,1] [,2] [,3] [,4] [,5]
[1,] 0.01874617 0.3897943 1.1017795 0.08934727 -0.5963106
[2,] -0.18425254 -1.2080762 0.7557815 -0.95494386 -2.1852868
[3,] -1.37133055 -0.3636760 -0.2382336 -0.19515038 -0.6748659
[4,] -0.59916772 -1.6266727 0.9874447 0.92552126 -2.1190612
[5,] 0.29454513 -0.2564784 0.7413901 0.48297852 -1.2651980
x.sum <-apply(x,1,sum) # sum the rows. Note: apply(x,2,sum) sums cols
x.sum
[1] 1.003356605 -3.776777904 -2.843256446 -2.431935624 -0.002762636
# attach new column (x.sum) to matrix x
x.sum.1 <-cbind(x,x.sum)
x.sum.1
x.sum
[1,] 0.01874617 0.3897943 1.1017795 0.08934727 -0.5963106 1.003356605
[2,] -0.18425254 -1.2080762 0.7557815 -0.95494386 -2.1852868 -3.776777904
[3,] -1.37133055 -0.3636760 -0.2382336 -0.19515038 -0.6748659 -2.843256446
[4,] -0.59916772 -1.6266727 0.9874447 0.92552126 -2.1190612 -2.431935624
[5,] 0.29454513 -0.2564784 0.7413901 0.48297852 -1.2651980 -0.002762636

Let's say you have the dataframe df, then you could try something like this:
# Assuming the columns a,b,c,d,e are at indices 1:5
df$sum = rowSums(df[ , c(1:5)], na.rm = T)
Or you could aslo try this:
transform(df, sum=rowSums(df), na.rm = T)

Related

Is there a way to vectorize selection of columns (with repetition) from a matrix?

I have a matrix L of size n x k and a vector Z of size p. Z is composed of integers which represent the column indices of L. I want to create a matrix X of size n x p which is the aggregation of the corresponding columns of L selected based on the values in Z.
Z = c(1, 3, 1, 2)
L = matrix(c(73,50,4,14,87,5,34,51,17,57,47,65),nrow=4)
> L
[,1] [,2] [,3]
[1,] 73 87 17
[2,] 50 5 57
[3,] 4 34 47
[4,] 14 51 65
I want X to be
> X
[,1] [,2] [,3] [,4]
[1,] 73 17 73 87
[2,] 50 57 50 5
[3,] 4 47 4 34
[4,] 14 65 14 51
In my original data, p, k and n are quite big (30K, 500 and 2K, respectively), and a loop over all Z values to select and combine the columns from L takes a very long time. Can there be a vectorized way (no loops) to do this task?
Pretty sure this is just:
L[,Z]
# [,1] [,2] [,3] [,4]
#[1,] 73 17 73 87
#[2,] 50 57 50 5
#[3,] 4 47 4 34
#[4,] 14 65 14 51
R doesn't care if you have repeating column indexes when you do selections from most objects.

aperm clarification in R

I am following the thread 2d matrix to 3d stacked array in r and have a clarification on the aperm function.
1) I get the first part of the solution, but did not understand the c(2,1,3) used in the function. Could you kindly clarify that?
2) Also I am trying a slight variation of the example in that thread.
My case is as follows:
For a similar matrix in example:
set.seed(1)
mat <- matrix(sample(100, 12 * 5, TRUE), ncol = 5)
[,1] [,2] [,3] [,4] [,5]
[1,] 27 69 27 80 74
[2,] 38 39 39 11 70
[3,] 58 77 2 73 48
[4,] 91 50 39 42 87
[5,] 21 72 87 83 44
[6,] 90 100 35 65 25
[7,] 95 39 49 79 8
[8,] 67 78 60 56 10
[9,] 63 94 50 53 32
[10,] 7 22 19 79 52
[11,] 21 66 83 3 67
[12,] 18 13 67 48 41
I am trying to rearrange such that I have a 3 (row) X 5 (col) x 11 (third dim) array.
So, essentially the rows would overlap and show something like:
,,1
27 69 27 80 74
38 39 39 11 70
58 77 2 73 48
,,2
38 39 39 11 70
58 77 2 73 48
91 50 39 42 87
,,3
58 77 2 73 48
91 50 39 42 87
21 72 87 83 44
and so on until we hit ,,11
Would someone have any experience with this?
Thanks!
Just stumbled over this question. Though the answer comes a little late, here are two options for you.
First, you need to extend mat in such a way that it's rows overlap. We can use this vector for row indexing.
#[1] 1 2 3 2 3 4 3 4 5 4 5 6 5 6 7 6 7 8 7 8 9 8 9 10 9 10 11 10 11 12
I used rollapply from the zoo package to create it as follows:
library(zoo)
row_nums <- c(t(rollapply(1:nrow(mat), width = 3, FUN = rep, 1)))
mat <- mat[row_nums, ]
dim(mat)
#[1] 30 5
Now use the matsplitter function that #Mr.Flick provided in this answer (please consider to upvote his answer) to get the desired output:
matsplitter(mat, 3, 5)
#, , 1
#
# [,1] [,2] [,3] [,4] [,5]
#[1,] 27 69 27 80 74
#[2,] 38 39 39 11 70
#[3,] 58 77 2 73 48
#
#, , 2
#
# [,1] [,2] [,3] [,4] [,5]
#[1,] 38 39 39 11 70
#[2,] 58 77 2 73 48
#[3,] 91 50 39 42 87
#
#, , 3
#
# [,1] [,2] [,3] [,4] [,5]
#[1,] 58 77 2 73 48
#[2,] 91 50 39 42 87
#[3,] 21 72 87 83 44
#
#, , 4
# ...
Note that you will end up with an array of dimension 3 x 5 x 10, not 11.
matsplitter <- function(M, r, c) {
rg <- (row(M) - 1) %/% r + 1
cg <- (col(M) - 1) %/% c + 1
rci <- (rg - 1) * max(cg) + cg
N <- prod(dim(M)) / r / c
cv <- unlist(lapply(1:N, function(x)
M[rci == x]))
dim(cv) <- c(r, c, N)
cv
}
Here is a solution using aperm as in the linked answer (assuming that mat was extended as above and is of dimension 30 x 5).
aperm(`dim<-`(t(mat), list(5, 3, 10)), c(2, 1, 3))
t(mat): transposes mat (new dimension: 5 x 30)
`dim<-`(t(mat), list(5, 3, 10)): changes the dimension of t(mat) from 5 X 30 to 5 x 3 x 10
aperm(..., c(2, 1, 3)) permutes the dimensions of the array `dim<-`(t(mat), list(5, 3, 10)) from 5 x 3 x 10 to 3 x 5 x 10, i.e. the second dimension becomes the first, the first
dimension becomes the second and the third dimension stays the same.

Multiply specific columns of one matrix with specific columns of another matrix for many indices

I have two large matrices P and Q around (10k x 50k dim in both, but to test this yourself a random 10x10 matrix for P and Q is sufficient). I have a list of indices, e.g.
i j
1 4
1 625
1 9207
2 827
... ...
etc. This means that I need to find the dot product of column 1 in P and column 4 in Q, then column 1 in P and column 625 in Q and so on. I could easily solve this with a for loop but I know they are not very efficient in R. Anyone got any ideas?
edit: asked for a reproducible example
P <- matrix(c(1,0,1,0,0,1,0,1,0), nrow = 3, ncol = 3)
Q <- matrix(c(0,0,1,0,1,0,1,0,1), nrow = 3, ncol = 3)
i <- c(1,1,2)
j <- c(2,1,3)
gives output (if in dot product form)
1: 0
2: 1
3: 1
P <- matrix(1:50, nrow = 5,ncol = 10)
Q <- matrix(1:50, nrow = 5, ncol = 10)
i <- c(1,2,4,7)
j <- c(5,3,7,2)
P
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 1 6 11 16 21 26 31 36 41 46
# [2,] 2 7 12 17 22 27 32 37 42 47
# [3,] 3 8 13 18 23 28 33 38 43 48
# [4,] 4 9 14 19 24 29 34 39 44 49
# [5,] 5 10 15 20 25 30 35 40 45 50
Q
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 1 6 11 16 21 26 31 36 41 46
# [2,] 2 7 12 17 22 27 32 37 42 47
# [3,] 3 8 13 18 23 28 33 38 43 48
# [4,] 4 9 14 19 24 29 34 39 44 49
# [5,] 5 10 15 20 25 30 35 40 45 50
P[,i] * Q[, j]
# [,1] [,2] [,3] [,4]
# [1,] 21 66 496 186
# [2,] 44 84 544 224
# [3,] 69 104 594 264
# [4,] 96 126 646 306
# [5,] 125 150 700 350
Using matrix multiplication, you can do
diag(t(P[, i]) %*% Q[, j])
[1] 0 1 1
Here is second a solution with apply.
apply(cbind(i, j), 1, function(x) t(P[, x[1]]) %*% Q[, x[2]])
[1] 0 1 1
To verify these agree in a second example:
set.seed(1234)
A <- matrix(sample(0:10, 100, replace=TRUE), 10, 10)
B <- matrix(sample(0:10, 100, replace=TRUE), 10, 10)
inds <- matrix(sample(10, 10, replace=TRUE), 5)
matrix multiplication
diag(t(A[, inds[,1]]) %*% B[, inds[,2]])
[1] 215 260 306 237 317
and apply
apply(inds, 1, function(x) t(A[, x[1]]) %*% B[, x[2]])
[1] 215 260 306 237 317

How to convert from a matrix to a contingency table?

I have the following matrix
matrix(c(1228,39,2,158,100,649,1,107,1,0,54,9,73,12,4,137), nrow=4)
[,1] [,2] [,3] [,4]
[1,] 1228 100 1 73
[2,] 39 649 0 12
[3,] 2 1 54 4
[4,] 158 107 9 137
And I would like to convert it into a contingency table with named "axes" and ordered column names (basically keeping the existing ones column-row indexing).
In other words, some like:
Variable 1
[,1] [,2] [,3] [,4]
[1,] 1228 100 1 73
var2[2,] 39 649 0 12
[3,] 2 1 54 4
[4,] 158 107 9 137
You can assign dimnames when creating matrix
m = matrix(c(1228,39,2,158,100,649,1,107,1,0,54,9,73,12,4,137), nrow=4)
matrix(m, nrow = NROW(m), dimnames=list(var1 = sequence(NROW(m)), var2 = sequence(NCOL(m))))
# var2
#var1 1 2 3 4
# 1 1228 100 1 73
# 2 39 649 0 12
# 3 2 1 54 4
# 4 158 107 9 137
In fact, you could use dimnames right at the start when creating m too
We can use with dimnames and names
dimnames(m1) <- list(NULL, NULL)
names(dimnames(m1)) <- c("Var2", "Variable 1")
m1
# Variable 1
#Var2 [,1] [,2] [,3] [,4]
# [1,] 1228 100 1 73
# [2,] 39 649 0 12
# [3,] 2 1 54 4
# [4,] 158 107 9 137
Or in one line
dimnames(m1) <- list(Var2 = NULL, `Variable 1` = NULL)
Or another way to write it
dimnames(m1) <- setNames(vector("list", 2), c("Var2", "Variable 1"))
data
m1 <- matrix(c(1228,39,2,158,100,649,1,107,1,0,54,9,73,12,4,137), nrow=4)

How could I build a function that extracts the diagonal block matrices of a larger one in R

How could I build a function that extracts the diagonal blocks matrices of a larger one? The problem is as follows. The function takes a centred matrix as argument, computes the full error covariance matrix and extracts the blocks on the leading diagonal? I tried the following, but not working.
err_cov <- function(x){
m <- nrow(x)
n <- ncol(x)
#compute the full error covariance matrix as the inner product
#of vec(x) and its transpose. Note that, omega is a mnxmn matrix
vec <- as.vector(x)
omega <- vec%*%t(vec)
sigmas <- list()
for(i in 0:n-1){
#here the blocks have to be m nxn matrices along the
#leading diagonal
for (j in 1:m)
sigmas[[j]] <- omega[(n*i+1):n*(i+1), (n*i+1):n*(i+1)]
}
return(sigmas)
}
So, for instance for
A
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
> B<-as.vector(A)
> B
[1] 1 2 3 4 5 6 7 8 9 10 11 12
> C<-B%*%t(B)
> C
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] 1 2 3 4 5 6 7 8 9 10 11 12
[2,] 2 4 6 8 10 12 14 16 18 20 22 24
[3,] 3 6 9 12 15 18 21 24 27 30 33 36
[4,] 4 8 12 16 20 24 28 32 36 40 44 48
[5,] 5 10 15 20 25 30 35 40 45 50 55 60
[6,] 6 12 18 24 30 36 42 48 54 60 66 72
[7,] 7 14 21 28 35 42 49 56 63 70 77 84
[8,] 8 16 24 32 40 48 56 64 72 80 88 96
[9,] 9 18 27 36 45 54 63 72 81 90 99 108
[10,] 10 20 30 40 50 60 70 80 90 100 110 120
[11,] 11 22 33 44 55 66 77 88 99 110 121 132
[12,] 12 24 36 48 60 72 84 96 108 120 132 144
The function should return:
> C1
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 2 4 6
[3,] 3 6 9
> C2
[,1] [,2] [,3]
[1,] 16 20 24
[2,] 20 25 30
[3,] 24 30 36
> C3
[,1] [,2] [,3]
[1,] 49 56 63
[2,] 56 64 72
[3,] 63 72 81
> C4
[,1] [,2] [,3]
[1,] 100 110 120
[2,] 110 121 132
[3,] 120 132 144
Thanks for answering.
I think a clearer solution is to reset the dimensions and then let R do the index calculations for you:
err_cov <- function(x){
m <- nrow(x)
n <- ncol(x)
#compute the full error covariance matrix as the inner product
#of vec(x) and its transpose
vec <- as.vector(x)
omega <- tcrossprod(vec)
dim(omega) <- c(n,m,n,m)
sigmas <- list()
for (j in 1:m)
sigmas[[j]] <- omega[,j,,j]
return(sigmas)
}
Here is an example:
> x
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
> tcrossprod(vec)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 2 3 4 5 6
[2,] 2 4 6 8 10 12
[3,] 3 6 9 12 15 18
[4,] 4 8 12 16 20 24
[5,] 5 10 15 20 25 30
[6,] 6 12 18 24 30 36
> err_cov(x)
[[1]]
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 2 4 6
[3,] 3 6 9
[[2]]
[,1] [,2] [,3]
[1,] 16 20 24
[2,] 20 25 30
[3,] 24 30 36

Resources