Let me reclaim my question, how I can sum the numbers by row, and list the sum follow by the last column, forming a new column like the second table (sum = a + b+ c + d + e)?
And I also want to know what if some of the values are N/A, can I still treat them as numbers?
Sample input:
a b c d e
1 90 67 18 39 74
2 100 103 20 45 50
3 80 87 23 44 89
4 95 57 48 79 90
5 74 81 61 95 131
Desired output:
a b c d e sum
1 90 67 18 39 74 288
2 100 103 20 45 50 318
3 80 87 23 44 89 323
4 95 57 48 79 90 369
5 74 81 61 95 131 442
To add a row sum, you can use addmargins
M <- matrix(c(90,67,18,39,74), nrow=1)
addmargins(M, 2) #2 = row margin
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 90 67 18 39 74 288
If you have missing data, you'll need to change the margin function to something that will properly handle the NA values
M<-matrix(c(90,67,18,NA,74), nrow=1)
addmargins(M, 2, FUN=function(...) sum(..., na.rm=T))
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 90 67 18 NA 74 249
Consider using apply(). For example:
set.seed(10) # optional, but this command will replicate data as shown
# create some data
x <-matrix(rnorm(1:25),nrow=5,ncol=5) # 5x5 matrix of random numbers
x
[,1] [,2] [,3] [,4] [,5]
[1,] 0.01874617 0.3897943 1.1017795 0.08934727 -0.5963106
[2,] -0.18425254 -1.2080762 0.7557815 -0.95494386 -2.1852868
[3,] -1.37133055 -0.3636760 -0.2382336 -0.19515038 -0.6748659
[4,] -0.59916772 -1.6266727 0.9874447 0.92552126 -2.1190612
[5,] 0.29454513 -0.2564784 0.7413901 0.48297852 -1.2651980
x.sum <-apply(x,1,sum) # sum the rows. Note: apply(x,2,sum) sums cols
x.sum
[1] 1.003356605 -3.776777904 -2.843256446 -2.431935624 -0.002762636
# attach new column (x.sum) to matrix x
x.sum.1 <-cbind(x,x.sum)
x.sum.1
x.sum
[1,] 0.01874617 0.3897943 1.1017795 0.08934727 -0.5963106 1.003356605
[2,] -0.18425254 -1.2080762 0.7557815 -0.95494386 -2.1852868 -3.776777904
[3,] -1.37133055 -0.3636760 -0.2382336 -0.19515038 -0.6748659 -2.843256446
[4,] -0.59916772 -1.6266727 0.9874447 0.92552126 -2.1190612 -2.431935624
[5,] 0.29454513 -0.2564784 0.7413901 0.48297852 -1.2651980 -0.002762636
Let's say you have the dataframe df, then you could try something like this:
# Assuming the columns a,b,c,d,e are at indices 1:5
df$sum = rowSums(df[ , c(1:5)], na.rm = T)
Or you could aslo try this:
transform(df, sum=rowSums(df), na.rm = T)
Related
I have a matrix L of size n x k and a vector Z of size p. Z is composed of integers which represent the column indices of L. I want to create a matrix X of size n x p which is the aggregation of the corresponding columns of L selected based on the values in Z.
Z = c(1, 3, 1, 2)
L = matrix(c(73,50,4,14,87,5,34,51,17,57,47,65),nrow=4)
> L
[,1] [,2] [,3]
[1,] 73 87 17
[2,] 50 5 57
[3,] 4 34 47
[4,] 14 51 65
I want X to be
> X
[,1] [,2] [,3] [,4]
[1,] 73 17 73 87
[2,] 50 57 50 5
[3,] 4 47 4 34
[4,] 14 65 14 51
In my original data, p, k and n are quite big (30K, 500 and 2K, respectively), and a loop over all Z values to select and combine the columns from L takes a very long time. Can there be a vectorized way (no loops) to do this task?
Pretty sure this is just:
L[,Z]
# [,1] [,2] [,3] [,4]
#[1,] 73 17 73 87
#[2,] 50 57 50 5
#[3,] 4 47 4 34
#[4,] 14 65 14 51
R doesn't care if you have repeating column indexes when you do selections from most objects.
I am following the thread 2d matrix to 3d stacked array in r and have a clarification on the aperm function.
1) I get the first part of the solution, but did not understand the c(2,1,3) used in the function. Could you kindly clarify that?
2) Also I am trying a slight variation of the example in that thread.
My case is as follows:
For a similar matrix in example:
set.seed(1)
mat <- matrix(sample(100, 12 * 5, TRUE), ncol = 5)
[,1] [,2] [,3] [,4] [,5]
[1,] 27 69 27 80 74
[2,] 38 39 39 11 70
[3,] 58 77 2 73 48
[4,] 91 50 39 42 87
[5,] 21 72 87 83 44
[6,] 90 100 35 65 25
[7,] 95 39 49 79 8
[8,] 67 78 60 56 10
[9,] 63 94 50 53 32
[10,] 7 22 19 79 52
[11,] 21 66 83 3 67
[12,] 18 13 67 48 41
I am trying to rearrange such that I have a 3 (row) X 5 (col) x 11 (third dim) array.
So, essentially the rows would overlap and show something like:
,,1
27 69 27 80 74
38 39 39 11 70
58 77 2 73 48
,,2
38 39 39 11 70
58 77 2 73 48
91 50 39 42 87
,,3
58 77 2 73 48
91 50 39 42 87
21 72 87 83 44
and so on until we hit ,,11
Would someone have any experience with this?
Thanks!
Just stumbled over this question. Though the answer comes a little late, here are two options for you.
First, you need to extend mat in such a way that it's rows overlap. We can use this vector for row indexing.
#[1] 1 2 3 2 3 4 3 4 5 4 5 6 5 6 7 6 7 8 7 8 9 8 9 10 9 10 11 10 11 12
I used rollapply from the zoo package to create it as follows:
library(zoo)
row_nums <- c(t(rollapply(1:nrow(mat), width = 3, FUN = rep, 1)))
mat <- mat[row_nums, ]
dim(mat)
#[1] 30 5
Now use the matsplitter function that #Mr.Flick provided in this answer (please consider to upvote his answer) to get the desired output:
matsplitter(mat, 3, 5)
#, , 1
#
# [,1] [,2] [,3] [,4] [,5]
#[1,] 27 69 27 80 74
#[2,] 38 39 39 11 70
#[3,] 58 77 2 73 48
#
#, , 2
#
# [,1] [,2] [,3] [,4] [,5]
#[1,] 38 39 39 11 70
#[2,] 58 77 2 73 48
#[3,] 91 50 39 42 87
#
#, , 3
#
# [,1] [,2] [,3] [,4] [,5]
#[1,] 58 77 2 73 48
#[2,] 91 50 39 42 87
#[3,] 21 72 87 83 44
#
#, , 4
# ...
Note that you will end up with an array of dimension 3 x 5 x 10, not 11.
matsplitter <- function(M, r, c) {
rg <- (row(M) - 1) %/% r + 1
cg <- (col(M) - 1) %/% c + 1
rci <- (rg - 1) * max(cg) + cg
N <- prod(dim(M)) / r / c
cv <- unlist(lapply(1:N, function(x)
M[rci == x]))
dim(cv) <- c(r, c, N)
cv
}
Here is a solution using aperm as in the linked answer (assuming that mat was extended as above and is of dimension 30 x 5).
aperm(`dim<-`(t(mat), list(5, 3, 10)), c(2, 1, 3))
t(mat): transposes mat (new dimension: 5 x 30)
`dim<-`(t(mat), list(5, 3, 10)): changes the dimension of t(mat) from 5 X 30 to 5 x 3 x 10
aperm(..., c(2, 1, 3)) permutes the dimensions of the array `dim<-`(t(mat), list(5, 3, 10)) from 5 x 3 x 10 to 3 x 5 x 10, i.e. the second dimension becomes the first, the first
dimension becomes the second and the third dimension stays the same.
I have two large matrices P and Q around (10k x 50k dim in both, but to test this yourself a random 10x10 matrix for P and Q is sufficient). I have a list of indices, e.g.
i j
1 4
1 625
1 9207
2 827
... ...
etc. This means that I need to find the dot product of column 1 in P and column 4 in Q, then column 1 in P and column 625 in Q and so on. I could easily solve this with a for loop but I know they are not very efficient in R. Anyone got any ideas?
edit: asked for a reproducible example
P <- matrix(c(1,0,1,0,0,1,0,1,0), nrow = 3, ncol = 3)
Q <- matrix(c(0,0,1,0,1,0,1,0,1), nrow = 3, ncol = 3)
i <- c(1,1,2)
j <- c(2,1,3)
gives output (if in dot product form)
1: 0
2: 1
3: 1
P <- matrix(1:50, nrow = 5,ncol = 10)
Q <- matrix(1:50, nrow = 5, ncol = 10)
i <- c(1,2,4,7)
j <- c(5,3,7,2)
P
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 1 6 11 16 21 26 31 36 41 46
# [2,] 2 7 12 17 22 27 32 37 42 47
# [3,] 3 8 13 18 23 28 33 38 43 48
# [4,] 4 9 14 19 24 29 34 39 44 49
# [5,] 5 10 15 20 25 30 35 40 45 50
Q
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 1 6 11 16 21 26 31 36 41 46
# [2,] 2 7 12 17 22 27 32 37 42 47
# [3,] 3 8 13 18 23 28 33 38 43 48
# [4,] 4 9 14 19 24 29 34 39 44 49
# [5,] 5 10 15 20 25 30 35 40 45 50
P[,i] * Q[, j]
# [,1] [,2] [,3] [,4]
# [1,] 21 66 496 186
# [2,] 44 84 544 224
# [3,] 69 104 594 264
# [4,] 96 126 646 306
# [5,] 125 150 700 350
Using matrix multiplication, you can do
diag(t(P[, i]) %*% Q[, j])
[1] 0 1 1
Here is second a solution with apply.
apply(cbind(i, j), 1, function(x) t(P[, x[1]]) %*% Q[, x[2]])
[1] 0 1 1
To verify these agree in a second example:
set.seed(1234)
A <- matrix(sample(0:10, 100, replace=TRUE), 10, 10)
B <- matrix(sample(0:10, 100, replace=TRUE), 10, 10)
inds <- matrix(sample(10, 10, replace=TRUE), 5)
matrix multiplication
diag(t(A[, inds[,1]]) %*% B[, inds[,2]])
[1] 215 260 306 237 317
and apply
apply(inds, 1, function(x) t(A[, x[1]]) %*% B[, x[2]])
[1] 215 260 306 237 317
I have the following matrix
matrix(c(1228,39,2,158,100,649,1,107,1,0,54,9,73,12,4,137), nrow=4)
[,1] [,2] [,3] [,4]
[1,] 1228 100 1 73
[2,] 39 649 0 12
[3,] 2 1 54 4
[4,] 158 107 9 137
And I would like to convert it into a contingency table with named "axes" and ordered column names (basically keeping the existing ones column-row indexing).
In other words, some like:
Variable 1
[,1] [,2] [,3] [,4]
[1,] 1228 100 1 73
var2[2,] 39 649 0 12
[3,] 2 1 54 4
[4,] 158 107 9 137
You can assign dimnames when creating matrix
m = matrix(c(1228,39,2,158,100,649,1,107,1,0,54,9,73,12,4,137), nrow=4)
matrix(m, nrow = NROW(m), dimnames=list(var1 = sequence(NROW(m)), var2 = sequence(NCOL(m))))
# var2
#var1 1 2 3 4
# 1 1228 100 1 73
# 2 39 649 0 12
# 3 2 1 54 4
# 4 158 107 9 137
In fact, you could use dimnames right at the start when creating m too
We can use with dimnames and names
dimnames(m1) <- list(NULL, NULL)
names(dimnames(m1)) <- c("Var2", "Variable 1")
m1
# Variable 1
#Var2 [,1] [,2] [,3] [,4]
# [1,] 1228 100 1 73
# [2,] 39 649 0 12
# [3,] 2 1 54 4
# [4,] 158 107 9 137
Or in one line
dimnames(m1) <- list(Var2 = NULL, `Variable 1` = NULL)
Or another way to write it
dimnames(m1) <- setNames(vector("list", 2), c("Var2", "Variable 1"))
data
m1 <- matrix(c(1228,39,2,158,100,649,1,107,1,0,54,9,73,12,4,137), nrow=4)
How could I build a function that extracts the diagonal blocks matrices of a larger one? The problem is as follows. The function takes a centred matrix as argument, computes the full error covariance matrix and extracts the blocks on the leading diagonal? I tried the following, but not working.
err_cov <- function(x){
m <- nrow(x)
n <- ncol(x)
#compute the full error covariance matrix as the inner product
#of vec(x) and its transpose. Note that, omega is a mnxmn matrix
vec <- as.vector(x)
omega <- vec%*%t(vec)
sigmas <- list()
for(i in 0:n-1){
#here the blocks have to be m nxn matrices along the
#leading diagonal
for (j in 1:m)
sigmas[[j]] <- omega[(n*i+1):n*(i+1), (n*i+1):n*(i+1)]
}
return(sigmas)
}
So, for instance for
A
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
> B<-as.vector(A)
> B
[1] 1 2 3 4 5 6 7 8 9 10 11 12
> C<-B%*%t(B)
> C
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] 1 2 3 4 5 6 7 8 9 10 11 12
[2,] 2 4 6 8 10 12 14 16 18 20 22 24
[3,] 3 6 9 12 15 18 21 24 27 30 33 36
[4,] 4 8 12 16 20 24 28 32 36 40 44 48
[5,] 5 10 15 20 25 30 35 40 45 50 55 60
[6,] 6 12 18 24 30 36 42 48 54 60 66 72
[7,] 7 14 21 28 35 42 49 56 63 70 77 84
[8,] 8 16 24 32 40 48 56 64 72 80 88 96
[9,] 9 18 27 36 45 54 63 72 81 90 99 108
[10,] 10 20 30 40 50 60 70 80 90 100 110 120
[11,] 11 22 33 44 55 66 77 88 99 110 121 132
[12,] 12 24 36 48 60 72 84 96 108 120 132 144
The function should return:
> C1
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 2 4 6
[3,] 3 6 9
> C2
[,1] [,2] [,3]
[1,] 16 20 24
[2,] 20 25 30
[3,] 24 30 36
> C3
[,1] [,2] [,3]
[1,] 49 56 63
[2,] 56 64 72
[3,] 63 72 81
> C4
[,1] [,2] [,3]
[1,] 100 110 120
[2,] 110 121 132
[3,] 120 132 144
Thanks for answering.
I think a clearer solution is to reset the dimensions and then let R do the index calculations for you:
err_cov <- function(x){
m <- nrow(x)
n <- ncol(x)
#compute the full error covariance matrix as the inner product
#of vec(x) and its transpose
vec <- as.vector(x)
omega <- tcrossprod(vec)
dim(omega) <- c(n,m,n,m)
sigmas <- list()
for (j in 1:m)
sigmas[[j]] <- omega[,j,,j]
return(sigmas)
}
Here is an example:
> x
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
> tcrossprod(vec)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 2 3 4 5 6
[2,] 2 4 6 8 10 12
[3,] 3 6 9 12 15 18
[4,] 4 8 12 16 20 24
[5,] 5 10 15 20 25 30
[6,] 6 12 18 24 30 36
> err_cov(x)
[[1]]
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 2 4 6
[3,] 3 6 9
[[2]]
[,1] [,2] [,3]
[1,] 16 20 24
[2,] 20 25 30
[3,] 24 30 36