I am following the thread 2d matrix to 3d stacked array in r and have a clarification on the aperm function.
1) I get the first part of the solution, but did not understand the c(2,1,3) used in the function. Could you kindly clarify that?
2) Also I am trying a slight variation of the example in that thread.
My case is as follows:
For a similar matrix in example:
set.seed(1)
mat <- matrix(sample(100, 12 * 5, TRUE), ncol = 5)
[,1] [,2] [,3] [,4] [,5]
[1,] 27 69 27 80 74
[2,] 38 39 39 11 70
[3,] 58 77 2 73 48
[4,] 91 50 39 42 87
[5,] 21 72 87 83 44
[6,] 90 100 35 65 25
[7,] 95 39 49 79 8
[8,] 67 78 60 56 10
[9,] 63 94 50 53 32
[10,] 7 22 19 79 52
[11,] 21 66 83 3 67
[12,] 18 13 67 48 41
I am trying to rearrange such that I have a 3 (row) X 5 (col) x 11 (third dim) array.
So, essentially the rows would overlap and show something like:
,,1
27 69 27 80 74
38 39 39 11 70
58 77 2 73 48
,,2
38 39 39 11 70
58 77 2 73 48
91 50 39 42 87
,,3
58 77 2 73 48
91 50 39 42 87
21 72 87 83 44
and so on until we hit ,,11
Would someone have any experience with this?
Thanks!
Just stumbled over this question. Though the answer comes a little late, here are two options for you.
First, you need to extend mat in such a way that it's rows overlap. We can use this vector for row indexing.
#[1] 1 2 3 2 3 4 3 4 5 4 5 6 5 6 7 6 7 8 7 8 9 8 9 10 9 10 11 10 11 12
I used rollapply from the zoo package to create it as follows:
library(zoo)
row_nums <- c(t(rollapply(1:nrow(mat), width = 3, FUN = rep, 1)))
mat <- mat[row_nums, ]
dim(mat)
#[1] 30 5
Now use the matsplitter function that #Mr.Flick provided in this answer (please consider to upvote his answer) to get the desired output:
matsplitter(mat, 3, 5)
#, , 1
#
# [,1] [,2] [,3] [,4] [,5]
#[1,] 27 69 27 80 74
#[2,] 38 39 39 11 70
#[3,] 58 77 2 73 48
#
#, , 2
#
# [,1] [,2] [,3] [,4] [,5]
#[1,] 38 39 39 11 70
#[2,] 58 77 2 73 48
#[3,] 91 50 39 42 87
#
#, , 3
#
# [,1] [,2] [,3] [,4] [,5]
#[1,] 58 77 2 73 48
#[2,] 91 50 39 42 87
#[3,] 21 72 87 83 44
#
#, , 4
# ...
Note that you will end up with an array of dimension 3 x 5 x 10, not 11.
matsplitter <- function(M, r, c) {
rg <- (row(M) - 1) %/% r + 1
cg <- (col(M) - 1) %/% c + 1
rci <- (rg - 1) * max(cg) + cg
N <- prod(dim(M)) / r / c
cv <- unlist(lapply(1:N, function(x)
M[rci == x]))
dim(cv) <- c(r, c, N)
cv
}
Here is a solution using aperm as in the linked answer (assuming that mat was extended as above and is of dimension 30 x 5).
aperm(`dim<-`(t(mat), list(5, 3, 10)), c(2, 1, 3))
t(mat): transposes mat (new dimension: 5 x 30)
`dim<-`(t(mat), list(5, 3, 10)): changes the dimension of t(mat) from 5 X 30 to 5 x 3 x 10
aperm(..., c(2, 1, 3)) permutes the dimensions of the array `dim<-`(t(mat), list(5, 3, 10)) from 5 x 3 x 10 to 3 x 5 x 10, i.e. the second dimension becomes the first, the first
dimension becomes the second and the third dimension stays the same.
Related
I have two large matrices P and Q around (10k x 50k dim in both, but to test this yourself a random 10x10 matrix for P and Q is sufficient). I have a list of indices, e.g.
i j
1 4
1 625
1 9207
2 827
... ...
etc. This means that I need to find the dot product of column 1 in P and column 4 in Q, then column 1 in P and column 625 in Q and so on. I could easily solve this with a for loop but I know they are not very efficient in R. Anyone got any ideas?
edit: asked for a reproducible example
P <- matrix(c(1,0,1,0,0,1,0,1,0), nrow = 3, ncol = 3)
Q <- matrix(c(0,0,1,0,1,0,1,0,1), nrow = 3, ncol = 3)
i <- c(1,1,2)
j <- c(2,1,3)
gives output (if in dot product form)
1: 0
2: 1
3: 1
P <- matrix(1:50, nrow = 5,ncol = 10)
Q <- matrix(1:50, nrow = 5, ncol = 10)
i <- c(1,2,4,7)
j <- c(5,3,7,2)
P
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 1 6 11 16 21 26 31 36 41 46
# [2,] 2 7 12 17 22 27 32 37 42 47
# [3,] 3 8 13 18 23 28 33 38 43 48
# [4,] 4 9 14 19 24 29 34 39 44 49
# [5,] 5 10 15 20 25 30 35 40 45 50
Q
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 1 6 11 16 21 26 31 36 41 46
# [2,] 2 7 12 17 22 27 32 37 42 47
# [3,] 3 8 13 18 23 28 33 38 43 48
# [4,] 4 9 14 19 24 29 34 39 44 49
# [5,] 5 10 15 20 25 30 35 40 45 50
P[,i] * Q[, j]
# [,1] [,2] [,3] [,4]
# [1,] 21 66 496 186
# [2,] 44 84 544 224
# [3,] 69 104 594 264
# [4,] 96 126 646 306
# [5,] 125 150 700 350
Using matrix multiplication, you can do
diag(t(P[, i]) %*% Q[, j])
[1] 0 1 1
Here is second a solution with apply.
apply(cbind(i, j), 1, function(x) t(P[, x[1]]) %*% Q[, x[2]])
[1] 0 1 1
To verify these agree in a second example:
set.seed(1234)
A <- matrix(sample(0:10, 100, replace=TRUE), 10, 10)
B <- matrix(sample(0:10, 100, replace=TRUE), 10, 10)
inds <- matrix(sample(10, 10, replace=TRUE), 5)
matrix multiplication
diag(t(A[, inds[,1]]) %*% B[, inds[,2]])
[1] 215 260 306 237 317
and apply
apply(inds, 1, function(x) t(A[, x[1]]) %*% B[, x[2]])
[1] 215 260 306 237 317
I'm trying to convert some code from MATLAB to R.
I'm having particular problems converting this part of a differential equation:
In MATLAB :
dA.*(A*N - N.*sum(A,2))
where dA is an integer, A is a 10x10 matrix and N is a 10x1 matrix (see example code below)
In R so far I've got this:
dA*(A*N - N*colSums(A))
but for some reason it doesn't seem to be giving the same result. Does anyone have any ideas as to what I've done wrong?
Example of the data I'm using below:
in MATLAB:
dA = 0.1;
N = 120000*ones(1,nN);
seq = [0 1 0 0 0 1 0];
seq2 = repmat(seq,1,20);
seq100 = seq2(1:100)
A = AA-diag(diag(AA));
in R:
dA <- 0.1
N <- c(120000, 120000, 120000, 120000, 120000, 120000, 120000, 120000, 120000, 120000)
num_zeros_int <- zeros(70, 1)
num_ones_int <- ones(30, 1)
seq <- c(0,1,0,0,0,1,0)
seq2<- rep(seq, times = 20)
seq100 <- seq2[0:100]
int_mat <- matrix(seq100, nests, nests)
Matlab expression:
dA.*(A*N - N.*sum(A,2))
where
dA: real number
A: 10 x 10 matrix
N: 10 X 1 matrix
A*N: matrix multiplication
sum(A,2): sum of rows in A (10x1 matrix)
N.*sum(A,2): element by element multiplication (10 x 1 matrix)
Let's set up the following example in R:
A = matrix(data = 1:100,nrow = 10)
N = matrix(data = 1:10)
dA = 0.1
> A
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 11 21 31 41 51 61 71 81 91
[2,] 2 12 22 32 42 52 62 72 82 92
[3,] 3 13 23 33 43 53 63 73 83 93
[4,] 4 14 24 34 44 54 64 74 84 94
[5,] 5 15 25 35 45 55 65 75 85 95
[6,] 6 16 26 36 46 56 66 76 86 96
[7,] 7 17 27 37 47 57 67 77 87 97
[8,] 8 18 28 38 48 58 68 78 88 98
[9,] 9 19 29 39 49 59 69 79 89 99
[10,] 10 20 30 40 50 60 70 80 90 100
> N
[,1]
[1,] 1
[2,] 2
[3,] 3
[4,] 4
[5,] 5
[6,] 6
[7,] 7
[8,] 8
[9,] 9
[10,] 10
The first term is:
z1 = A %*% N
And the second term:
srow = rowSums(A)
z2 = srow * N
Which leads to the final result:
result = dA * (z1-z2)
Final equation
result = dA * (A %*% N - rowSums(A)*N)
This should give you the same answer as Matlab's dA.*(A*N - N.*sum(A,2))
I have a matrix (V), which looks like this
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[V1,] 37 15 30 3 4 11 35 31
[V2,] 44 31 45 30 24 39 1 18
[V3,] 39 49 7 36 14 43 26 24
[V4,] 45 31 26 33 12 47 37 15
[V5,] 23 27 34 29 30 34 17 4
[V6,] 9 46 39 34 8 43 42 37
I have another matrix (X)
[,1] [,2] [,3] [,4] [,5] [,6]
[X1,] 37 15 21 3 4 11 35 31
[X2,] 37 37 45 30 24 39 1 18
[X3,] 39 49 7 36 14 43 26 24
[X4,] 45 31 26 37 12 47 37 15
[X5,] 23 27 34 29 30 37 17 4
[X6,] 9 46 39 34 8 37 42 37
Now each row of matrix V should be matched with each row of matrix X to get a count matrix like
[,V1] [,V2] [,V3] [,V4] [,V5] [,V6] [,V7] [,8]
[X1,] 7
[X2,]
To check the common numbers between X1 and V1??
How do I do it using R? Please suggest me some ideas
Here is one quick 'brute-force' way with apply
row.names(V) <- paste0("V",seq(6))
row.names(X) <- paste0("X",seq(6))
apply(V, 1, function(i){
apply(X, 1, function(j){
length(intersect(i, j))
}
)
})
V1 V2 V3 V4 V5 V6
X1 7 1 0 3 1 1
X2 2 6 2 2 1 2
X3 0 2 8 1 0 2
X4 3 2 1 7 0 1
X5 3 1 0 1 7 2
X6 1 1 1 1 1 7
Use == to compare the elements of the two matrices. This will give you a matrix of logicals (TRUEs and FALSEs). You can then add up the the number of TRUEs in each row using apply().
apply(V==X, 1, sum)
Let me reclaim my question, how I can sum the numbers by row, and list the sum follow by the last column, forming a new column like the second table (sum = a + b+ c + d + e)?
And I also want to know what if some of the values are N/A, can I still treat them as numbers?
Sample input:
a b c d e
1 90 67 18 39 74
2 100 103 20 45 50
3 80 87 23 44 89
4 95 57 48 79 90
5 74 81 61 95 131
Desired output:
a b c d e sum
1 90 67 18 39 74 288
2 100 103 20 45 50 318
3 80 87 23 44 89 323
4 95 57 48 79 90 369
5 74 81 61 95 131 442
To add a row sum, you can use addmargins
M <- matrix(c(90,67,18,39,74), nrow=1)
addmargins(M, 2) #2 = row margin
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 90 67 18 39 74 288
If you have missing data, you'll need to change the margin function to something that will properly handle the NA values
M<-matrix(c(90,67,18,NA,74), nrow=1)
addmargins(M, 2, FUN=function(...) sum(..., na.rm=T))
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 90 67 18 NA 74 249
Consider using apply(). For example:
set.seed(10) # optional, but this command will replicate data as shown
# create some data
x <-matrix(rnorm(1:25),nrow=5,ncol=5) # 5x5 matrix of random numbers
x
[,1] [,2] [,3] [,4] [,5]
[1,] 0.01874617 0.3897943 1.1017795 0.08934727 -0.5963106
[2,] -0.18425254 -1.2080762 0.7557815 -0.95494386 -2.1852868
[3,] -1.37133055 -0.3636760 -0.2382336 -0.19515038 -0.6748659
[4,] -0.59916772 -1.6266727 0.9874447 0.92552126 -2.1190612
[5,] 0.29454513 -0.2564784 0.7413901 0.48297852 -1.2651980
x.sum <-apply(x,1,sum) # sum the rows. Note: apply(x,2,sum) sums cols
x.sum
[1] 1.003356605 -3.776777904 -2.843256446 -2.431935624 -0.002762636
# attach new column (x.sum) to matrix x
x.sum.1 <-cbind(x,x.sum)
x.sum.1
x.sum
[1,] 0.01874617 0.3897943 1.1017795 0.08934727 -0.5963106 1.003356605
[2,] -0.18425254 -1.2080762 0.7557815 -0.95494386 -2.1852868 -3.776777904
[3,] -1.37133055 -0.3636760 -0.2382336 -0.19515038 -0.6748659 -2.843256446
[4,] -0.59916772 -1.6266727 0.9874447 0.92552126 -2.1190612 -2.431935624
[5,] 0.29454513 -0.2564784 0.7413901 0.48297852 -1.2651980 -0.002762636
Let's say you have the dataframe df, then you could try something like this:
# Assuming the columns a,b,c,d,e are at indices 1:5
df$sum = rowSums(df[ , c(1:5)], na.rm = T)
Or you could aslo try this:
transform(df, sum=rowSums(df), na.rm = T)
How could I Replace a NA with mean of its previous and next rows in a fast manner?
name grade
1 A 56
2 B NA
3 C 70
4 D 96
such that B's grade would be 63.
Or you may try na.approx from package zoo: "Missing values (NAs) are replaced by linear interpolation"
library(zoo)
x <- c(56, NA, 70, 96)
na.approx(x)
# [1] 56 63 70 96
This also works if you have more than one consecutive NA:
vals <- c(1, NA, NA, 7, NA, 10)
na.approx(vals)
# [1] 1.0 3.0 5.0 7.0 8.5 10.0
na.approx is based on the base function approx, which may be used instead:
vals <- c(1, NA, NA, 7, NA, 10)
xout <- seq_along(vals)
x <- xout[!is.na(vals)]
y <- vals[!is.na(vals)]
approx(x = x, y = y, xout = xout)$y
# [1] 1.0 3.0 5.0 7.0 8.5 10.0
Assume you have a data.frame df like this:
> df
name grade
1 A 56
2 B NA
3 C 70
4 D 96
5 E NA
6 F 95
Then you can use the following:
> ind <- which(is.na(df$grade))
> df$grade[ind] <- sapply(ind, function(i) with(df, mean(c(grade[i-1], grade[i+1]))))
> df
name grade
1 A 56
2 B 63
3 C 70
4 D 96
5 E 95.5
6 F 95
An alternative solution, using the median instead of mean, is represented by the na.roughfix function of the randomForest package.
As described in the documentation, it works with a data frame or numeric matrix.
Specifically, for numeric variables, NAs are replaced with column medians. For factor variables, NAs are replaced with the most frequent levels (breaking ties at random). If object contains no NAs, it is returned unaltered.
Using the same examples as #Henrik,
library(randomForest)
x <- c(56, NA, 70, 96)
na.roughfix(x)
#[1] 56 70 70 96
or with a larger matrix:
y <- matrix(1:50, nrow = 10)
y[sample(1:length(y), 4, replace = FALSE)] <- NA
y
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 11 21 31 41
# [2,] 2 12 22 32 42
# [3,] 3 NA 23 33 NA
# [4,] 4 14 24 34 44
# [5,] 5 15 25 35 45
# [6,] 6 16 NA 36 46
# [7,] 7 17 27 37 47
# [8,] 8 18 28 38 48
# [9,] 9 19 29 39 49
# [10,] 10 20 NA 40 50
na.roughfix(y)
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 11 21.0 31 41
# [2,] 2 12 22.0 32 42
# [3,] 3 16 23.0 33 46
# [4,] 4 14 24.0 34 44
# [5,] 5 15 25.0 35 45
# [6,] 6 16 24.5 36 46
# [7,] 7 17 27.0 37 47
# [8,] 8 18 28.0 38 48
# [9,] 9 19 29.0 39 49
#[10,] 10 20 24.5 40 50