Diagonal of a 3D array - r

How can one extract the "diagonal" from three-dimensional array in R? For a matrix (2D array) one can use the diag(...) function. In a similar way, given an N x N x M array, a natural operation is to convert it into an N x M matrix by taking the diagonal from each N x N slice and returning it as a matrix.
It's easy to do this using a loop, but that is not idiomatic R and is slow. Another possibility is to use slightly complex indexing (see my own answer to this question) but it is a bit hard to read. What other alternatives are there? Is there a standard R way to do this?

Create an array and fill it by some values:
> a=array(0,c(10,10,5))
> for (i in 1:10) for (j in 1:10) for (k in 1:5) a[i,j,k]=100*i+10*j+k-111
Run the apply function:
> apply(a,3,diag)
[,1] [,2] [,3] [,4] [,5]
[1,] 0 1 2 3 4
[2,] 110 111 112 113 114
[3,] 220 221 222 223 224
[4,] 330 331 332 333 334
[5,] 440 441 442 443 444
[6,] 550 551 552 553 554
[7,] 660 661 662 663 664
[8,] 770 771 772 773 774
[9,] 880 881 882 883 884
[10,] 990 991 992 993 994

Various diagonals:
A = array(1:12, c(2, 2, 3))
apply(A, 1, diag)
# [,1] [,2]
#[1,] 1 2
#[2,] 7 8
apply(A, 2, diag)
# [,1] [,2]
#[1,] 1 3
#[2,] 6 8
apply(A, 3, diag)
# [,1] [,2] [,3]
#[1,] 1 5 9
#[2,] 4 8 12

Although I'm not enamored of the term "3d.diagonal" for this result, it can be achieved with this simple function (up to identity modulo transpose):
arr <- array(1:27,c(3,3,3) )
apply(arr, 3, function(x) x[row(x)==col(x)] )
# returns same value as diag.3d (arr)
[,1] [,2] [,3]
[1,] 1 10 19
[2,] 5 14 23
[3,] 9 18 27
I think a "real diagonal" would be arr[ cbind(1:3,1:3,1:3) ]

One possible approach is to use indexing, where the indices are a matrix with three columns. For example:
diag.3d <- function(A) {
# Expect a N x N x M array
stopifnot(length(dim(A)) == 3)
n <- nrow(A)
stopifnot(n == ncol(A))
m <- dim(A)[3]
IXS <- cbind(1:n, 1:n, rep(1:m, each = n))
cn <- colnames(A)
rn <- dimnames(A)[[3]]
matrix(A[IXS], ncol = n, byrow = T, dimnames = list(rn, cn))
}
Although indices (in variable IXS) seem hard to read.

Another approach is subseting the 3 dimensions array with a 2 dimensions matrix:
a <- array(1:100,dim = c(5,5,4))
ref <- cbind(1:5,1:5,rep(1:4,each= 5))
a[ref]
Output is a vector instead of a matrix. On my computer it is more efficient than apply() and you can also fill the diagonal values.

Related

How to sort every column in ascending order in R

M = matrix(data = c(6,34,923,5,0, 112:116, 5,9,34,76,2, 545:549), nrow = 4)
I want it to ascend into and become like this
Ascend for each column and this is the expected output
I'm assuming that you meant nrow = 5 cause that corresponds to the matrix you've shown. Then all you need to do is simply use apply() to sort each column
M = matrix(data = c(6,34,923,5,0, 112:116, 5,9,34,76,2, 545:549), nrow = 5)
apply(M, 2, sort)
[,1] [,2] [,3] [,4]
[1,] 0 112 2 545
[2,] 5 113 5 546
[3,] 6 114 9 547
[4,] 34 115 34 548
[5,] 923 116 76 549

Is there a way to calculate the number of peaks above a threshold for multiple dependent variables in R?

I apologize if this question has been asked already. I'm a beginner to R and do not have an advanced stats background. I am trying to determine the number of peaks (maximums) for my data in R. For those familiar with GraphPad Prism, essentially I am trying to find the "Number of Peaks" by doing an Area Under the Curve analysis and using a threshold of y=2. My dataframe is below (called example1).
time a b c d e f
0 248 413 473 370 501 235
3 243 408 468 356 496 237
6 243 406 476 354 503 235
9 243 412 469 352 500 232
12 245 411 467 355 504 235
15 241 420 483 348 499 227
18 239 416 477 344 493 230
21 236 413 478 338 490 227
24 234 406 475 337 481 229
My x variable would be the first column and my y variable would be the rest of the columns (very large dataset- 50+ dependent variables). For each dependent variable or column, I am trying to find the number of peaks (local maxima). I need to make my y threshold = 2.
I have already plotted my data (code below) using ggplot by melting my dataframe.
#Melt data
melted <- melt(example1, id.vars="time")
#Create line graph
ggplot(data=melted, aes(x=time, y=value, group=variable)) +
geom_line(show.legend = TRUE))
How can I calculate and output the number of peaks per column (with the y=2 threshold)? Will I need to do an area under the curve analysis? I'm hoping to calculate something like this (number of peaks per column)...
a = 0, b = 3, c = 0, d = 6, e = 1, f = 0
but the output could be something like 0, 3, 0, 6, 1, 0
I do not need to produce another graph. I just need an output of number of peaks per dependent variable.
There is a findpeaks() function available through the pracma package that is exceptionally useful for this type of thing. See documentation here. You can specify the threshold or go with default settings. There are also some parameters to help ignore or include peaks that span multiple points.
You feed findpeaks() the time-series vector (meaning make sure that it is ordered by your x axis first), and it will output a matrix where the number of rows corresponds to the number of peaks, and for each peak you get maxima (y value), index, beginning index, and end index. See the utilization below with your example1 dataset:
peak_info <- lapply(example1[,2:7], findpeaks, threshold=2)
> peak_info
$a
[,1] [,2] [,3] [,4]
[1,] 245 5 4 9
$b
[,1] [,2] [,3] [,4]
[1,] 420 6 5 9
$c
[,1] [,2] [,3] [,4]
[1,] 476 3 2 5
[2,] 483 6 5 7
$d
[,1] [,2] [,3] [,4]
[1,] 355 5 4 9
$e
[,1] [,2] [,3] [,4]
[1,] 503 3 2 4
[2,] 504 5 4 9
$f
[,1] [,2] [,3] [,4]
[1,] 237 2 1 4
[2,] 235 5 4 6
[3,] 230 7 6 8
If you just want to know the number of peaks, you can run the following:
> unlist(lapply(peak_info, nrow))
a b c d e f
1 1 2 1 2 3
If it is local maxima or minima that you are looking for you may have a look to this post.
If it is that you are looking for so you only have to do it for each column :
df <- data.frame(var1 = c(1,2,3,2,1, 1, 2, 1),
var2 = c(1,2,3,2,1, 1, 1, 1),
var3 = c(1,2,3,2,1, 5, 1, 2))
res_list <- apply(df, 2, function(x){
return(which(diff(sign(diff(x)))==-2)+1)
})
res_list :
$var1
[1] 3 7
$var2
[1] 3
$var3
[1] 3 6
It doesn't work for extrema on the edge of your domain.

Convert bigger dimension matrix to smaller dimension matrix with a loop

I currently have 185*185 matrix and the goal is to convert this matrix into a 35*35 matrix by aggregating the value based on the rows and cols of the 185 matrix.
Example:
I have a 8*8 matrix as below:
matrix_x <- matrix(1:64, nrow = 8)
Then I want to convert it into a 4*4 matrix:
matrix_y <- matrix(NA, nrow = 4, ncol = 4)
The list below is created for aggregating the 8*8 matrix cols to a 4*4 matrix
col_list <- list(
1,
2:3,
c(4,8),
5:7
)
What I've done to achieve this is by assigning the value manually as below
matrix_y[1,1] <- sum(matrix_x[col_list[[1]],col_list[[1]]])
matrix_y[1,2] <- sum(matrix_x[col_list[[1]],col_list[[2]]])
matrix_y[1,3] <- sum(matrix_x[col_list[[1]],col_list[[3]]])
matrix_y[1,4] <- sum(matrix_x[col_list[[1]],col_list[[4]]])
matrix_y[2,1] <- sum(matrix_x[col_list[[2]],col_list[[1]]])
matrix_y[2,2] <- sum(matrix_x[col_list[[2]],col_list[[2]]])
matrix_y[2,3] <- sum(matrix_x[col_list[[2]],col_list[[3]]])
matrix_y[2,4] <- sum(matrix_x[col_list[[2]],col_list[[4]]])
matrix_y[3,1] <- sum(matrix_x[col_list[[3]],col_list[[1]]])
matrix_y[3,2] <- sum(matrix_x[col_list[[3]],col_list[[2]]])
matrix_y[3,3] <- sum(matrix_x[col_list[[3]],col_list[[3]]])
matrix_y[3,4] <- sum(matrix_x[col_list[[3]],col_list[[4]]])
matrix_y[4,1] <- sum(matrix_x[col_list[[4]],col_list[[1]]])
matrix_y[4,2] <- sum(matrix_x[col_list[[4]],col_list[[2]]])
matrix_y[4,3] <- sum(matrix_x[col_list[[4]],col_list[[3]]])
matrix_y[4,4] <- sum(matrix_x[col_list[[4]],col_list[[4]]])
This approach works well, but I'm looking for a more efficient way to achieve this since the approach I've done takes so many code lines.
There should be a neater/easier way to do this but here is one straight-forward option :
n <- 4
t(sapply(seq_len(n), function(p) sapply(col_list, function(q) sum(matrix_x[p, q]))))
# [,1] [,2] [,3] [,4]
#[1,] 1 26 82 123
#[2,] 2 28 84 126
#[3,] 3 30 86 129
#[4,] 4 32 88 132
This gives the same matrix as matrix_y in the post.
For the updated question, we can use outer
apply_fun <- function(x, y) sum(matrix_x[x, y])
outer(col_list, col_list, Vectorize(apply_fun))
# [,1] [,2] [,3] [,4]
#[1,] 1 26 82 123
#[2,] 5 58 170 255
#[3,] 12 72 184 276
#[4,] 18 108 276 414
Or following the same approach as in original answer with nested sapply
t(sapply(col_list, function(p) sapply(col_list, function(q) sum(matrix_x[p, q]))))

How to do exponential calculation with matrix?

I want to calculate exponential with a matrix and vector. The matrix is as below
ID var_0 var_01 var_02 var_03
1 1 2 3 4
2 5 6 7 8
3 9 10 11 12
...
and vector is (0.1,0.2,0.3,0.4)
I want to get the result as below
ID var_0 var_01 var_02 var_03
1 1^0.1 2^0.2 3^0.3 4^0.4
2 5^0.1 6^0.2 7^0.3 8^0.4
3 9^0.1 10^0.2 11^0.3 12^0.4
...
That is, I want to get (ith var)^ith vector for each ID
You can use R's recycling of vectors. Transpose your matrix so that the power calculations are applied in the correct order and then transpose back.
(m <- matrix(1:12, nrow=3, ncol=4, byrow=TRUE))
# [,1] [,2] [,3] [,4]
# [1,] 1 2 3 4
# [2,] 5 6 7 8
# [3,] 9 10 11 12
p <- 1:4
t(t(m)^p)
# [,1] [,2] [,3] [,4]
# [1,] 1 4 27 256
# [2,] 5 36 343 4096
# [3,] 9 100 1331 20736
Or you could do (data from #user20650's post)
m^p[col(m)]
# [,1] [,2] [,3] [,4]
#[1,] 1 4 27 256
#[2,] 5 36 343 4096
#[3,] 9 100 1331 20736
Or maybe (using #user20650's data set)
m^rep(p, each = nrow(m))
# [,1] [,2] [,3] [,4]
# [1,] 1 4 27 256
# [2,] 5 36 343 4096
# [3,] 9 100 1331 20736
Another option
m ^ matrix(p, nrow(m), ncol(m), byrow = TRUE)
# [,1] [,2] [,3] [,4]
# [1,] 1 4 27 256
# [2,] 5 36 343 4096
# [3,] 9 100 1331 20736
Some benchmarks on a bigger data set. Seems like my two answers and #akruns scales the best
n <- 1e6
cols <- 100
m <- matrix(seq_len(n), nrow = n, ncol = cols)
p <- seq_len(cols)
user20650 = function() {t(t(m)^p)}
Nick = function() {sweep(m, 2, p, `^`)}
akrun = function() {m^p[col(m)]}
David1 = function() {m^rep(p, each = nrow(m))}
David2 = function() {m ^ matrix(p, nrow(m), ncol(m), byrow = TRUE)}
library(microbenchmark)
Res <- microbenchmark(
user20650() ,
Nick(),
akrun(),
David1(),
David2()
)
Res
# Unit: seconds
# expr min lq median uq max neval
# user20650() 9.692392 9.800470 9.878385 10.010198 11.002012 100
# Nick() 10.487660 10.595750 10.687573 10.896852 14.083319 100
# akrun() 8.213784 8.316646 8.395962 8.529671 9.325273 100
# David1() 9.115449 9.219430 9.304380 9.425614 10.445129 100
# David2() 8.157632 8.275277 8.335884 8.437017 9.348252 100
boxplot(Res)
You can do this using the sweep function. The signature is
sweep(x, MARGIN, STATS, FUN)
This function iterates over parts of x according to how you set MARGIN. On each iteration, the current part of x and the entire argument STATS get passed to FUN, which should be a function taking 2 arguments.
Setting MARGIN to 1 means STATS lines up with the rows of x (dimension 1), 2 means STATS lines up with the columns of x (dimension 2). Other variations are also possible.
So for your particular example, use
sweep(your.matrix, 2, your.exponents, `^`)
Edit: Based on #david-arenburg's answer, you probably shouldn't use sweep. I had no idea it was so slow!

do.call in r with matrix and lists

in language R, in order to generate a new matrix (N*6) as from an older one (N*3), is there a better way than the next one to do it without having to "unpack/unlist" the inner lists created in the apply function in order to "expand" the source matrix?
transformed <- matrix(byrow=T)
transformed <- as.matrix(
do.call("rbind", as.list(
apply(dataset, 1, function(x) {
x <- list(x[1], x[2], x[3], x[2]*x[3], x[2]^2, x[3]^2)
})
))
)
#Unpack all inner lists from the expanded matrix
ret_trans <- as.matrix( apply(transformed, 2, function(x) unlist(x)) )
EDIT: I add an example of that
dataset
[,1] [,2] [,3]
[1,] 1 6 11
[2,] 2 7 12
[3,] 3 8 13
[4,] 4 9 14
[5,] 5 10 15
and on applying the code above I want to expand to N*6, 5*6 (sorry, I misspelled the column dimension up there, and the margin of apply function) it should be like that
transformed
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 6 11 66 36 121
[2,] 2 7 12 84 49 144
[3,] 3 8 13 104 64 169
[4,] 4 9 14 126 81 196
[5,] 5 10 15 150 100 225
The question is if there is another way of doing that without having to use the last apply function, without having to coerce the x to be a list
thanks all for your replies
Like suggested in the comments, do:
cbind(dataset, dataset[,2] * dataset[,3], dataset[,c(2, 3)]^2)
It will be a lot faster than using apply, which should have looked like this:
transformed <- function(x) c(x[1], x[2], x[3], x[2]*x[3], x[2]^2, x[3]^2)
apply(dataset, 1, transformed)

Resources