I need to create an accumulation index across columns in a matrix - r

I need to create an accumulation index across columns in my data. I set up the problem as follows
#accumulation function
mat1 <- matrix(nrow=16, ncol =4)
mat1[1,] <- c(1,1,1,1)
mat1[2:16,] <- 1+rnorm(60,0,0.1)
[,1] [,2] [,3] [,4]
[1,] 1.0000000 1.0000000 1.0000000 1.0000000
[2,] 0.9120755 0.9345682 0.8533162 0.8737582
[3,] 0.7838427 0.9691806 0.8216284 0.9863669
[4,] 0.9095204 1.1906031 1.0253083 1.0700338
[5,] 1.0202524 0.9974672 1.1348315 1.1115018
[6,] 0.9456184 1.1250529 1.0348011 0.9323336
[7,] 1.0053195 0.9917475 1.0178855 1.0880626
[8,] 0.9550709 0.9107060 0.8876688 0.9060996
[9,] 1.0728177 1.0559643 0.9161789 0.9711522
[10,] 0.9579642 1.0082560 0.9833227 0.9306639
[11,] 1.0044883 1.1323498 1.0388025 0.8926033
[12,] 0.8777846 0.9940302 0.8314166 0.8479962
[13,] 1.1042297 0.9767410 0.9355374 0.8859680
[14,] 1.1245737 0.8291948 1.0491585 0.9887672
[15,] 0.9687700 0.9915095 0.8962534 1.0220163
[16,] 0.9432597 1.0310273 0.9288159 1.0838243
The desired output takes the product of entries in each column, up to each row number.
therefore:
mat2 <- matrix(nrow=16, ncol=4)
mat2[1,] <- c(1,1,1,1)
mat2[2,] <- mat1[1,]*mat1[2,]
mat2[3,] <-mat1[1,]*mat1[2,]*mat1[3,]
mat2[4,] <-mat1[1,]*mat1[2,]*mat1[3,]*mat1[4,]
and so on and so forth up to row 16. The idea is to accumulate (take the product) of all entries in mat1 up to a particular row number. So row1 of mat2 = row 1 of mat 1. row 2 of mat 2, is equal to row1 mat1 *row2 mat1. row3 of mat2 is equal to row1 of mat1 *row2 of mat1, *row3 of mat1. This process continues up to row 16.
I need to write a function able to do this calculation for matrices in a list all of the same size.

Basically what you need is cumulative product over each column which can be applied using cumprod function in base R
apply(mat1, 2, cumprod)
# [,1] [,2] [,3] [,4]
# [1,] 1.0000 1.0000 1.0000 1.0000
# [2,] 0.8793 0.9890 1.1102 0.9031
# [3,] 0.9037 0.9384 1.0574 0.8031
# [4,] 1.0017 0.8529 0.9824 0.7026
# [5,] 0.7667 0.7815 0.9332 0.6658
# [6,] 0.7996 0.9703 0.7811 0.6327
# [7,] 0.8401 0.9833 0.6899 0.5184
# [8,] 0.7918 0.9351 0.5395 0.4883
# [9,] 0.7485 0.8939 0.4672 0.4341
#[10,] 0.7063 0.9350 0.4534 0.3901
#[11,] 0.6434 0.8701 0.4323 0.3837
#[12,] 0.6127 0.7441 0.4950 0.4053
#[13,] 0.5515 0.7869 0.4421 0.4721
#[14,] 0.5087 0.7063 0.4043 0.4356
#[15,] 0.5120 0.7052 0.3929 0.5056
#[16,] 0.5611 0.6392 0.3538 0.4470
data
set.seed(1234)
mat1 <- matrix(nrow=16, ncol =4)
mat1[1,] <- c(1,1,1,1)
mat1[2:16,] <- 1+rnorm(60,0,0.1)

We can make use of rowCumprods from matrixStats which would be efficient
library(matrixStats)
rowCumprods(mat1)
# [,1] [,2] [,3] [,4]
# [1,] 1.0000000 1.0000000 1.0000000 1.0000000
# [2,] 0.8792934 0.8695961 0.9654515 0.8719461
# [3,] 1.0277429 0.9752243 0.9288433 0.8259908
# [4,] 1.1084441 1.0074432 0.9359711 0.8187889
# [5,] 0.7654302 0.7013506 0.6661948 0.6312977
# [6,] 1.0429125 1.2948629 1.0839177 1.0300632
# [7,] 1.0506056 1.0646930 0.9403774 0.7705423
# [8,] 0.9425260 0.8962776 0.7008855 0.6600887
# [9,] 0.9453368 0.9036902 0.7825060 0.6957347
#[10,] 0.9435548 0.9869196 0.9578751 0.8606545
#[11,] 0.9109962 0.8477986 0.8082998 0.7951804
#[12,] 0.9522807 0.8143710 0.9324137 0.9849138
#[13,] 0.9001614 0.9518986 0.8501747 0.9902680
#[14,] 0.9223746 0.8279552 0.7571348 0.6985816
#[15,] 1.0064459 1.0049223 0.9767219 1.1335746
#[16,] 1.0959494 0.9933742 0.8945990 0.7910216
data
set.seed(1234)
mat1 <- matrix(nrow=16, ncol =4)
mat1[1,] <- c(1,1,1,1)
mat1[2:16,] <- 1+rnorm(60,0,0.1)

Related

Dot Product In purrr

How would I calculate the dot product in purrr? As a reprex, here is a simple example.
Data generation
#fake data
X <- as_tibble(list(a = rnorm(10,0,1),
b = rnorm(10,10,1),
c = rnorm(10,100,1)))
z <- c(1,0,1)
#make tibble matrix
X_matrix <- X %>% as.matrix()
X_matrix
a b c
[1,] 0.01182775 9.032966 100.95322
[2,] 0.85718250 10.015310 102.30181
[3,] -0.06742915 10.535482 100.21764
[4,] -0.18236798 9.052234 99.37345
[5,] -0.32151084 10.329401 98.81186
[6,] 2.94303948 9.994800 99.93874
[7,] 0.03299169 9.079023 99.73501
[8,] 0.06518171 8.841637 99.91130
[9,] -0.71944580 10.281631 100.32533
[10,] 1.49983359 10.776108 99.35903
Calculate dot product
The dot product is sum(a*z[1] + b*z[2] + c*z[3])
X_matrix %*% z
[,1]
[1,] 100.96505
[2,] 103.15900
[3,] 100.15021
[4,] 99.19108
[5,] 98.49035
[6,] 102.88178
[7,] 99.76800
[8,] 99.97648
[9,] 99.60588
[10,] 100.85886
Ideally, I would like to add the dot product as a column to X

R - Dividing columns of matrix list by vector list

I have a list of matrices and a list of vectors, and I want to divide the columns of each matrix with the corresponding vector element.
For example, given
set.seed(230)
data <- list(cbind(c(NA, rnorm(6)),c(rnorm(6),NA)), cbind(runif(7), runif(7)))
divisors <- list(c(0.5,2), c(3,4))
I'm looking for a vectorized function that produces output that looks the same as
for(i in 1:length(data)){
for(j in 1:ncol(data[[i]])){data[[i]][,j] <- data[[i]][,j] / divisors[[i]][j]}
}
i.e.
[[1]]
[,1] [,2]
[1,] NA 0.28265752
[2,] -0.46967014 -0.07132588
[3,] 0.20253439 -0.37432527
[4,] 0.65736410 0.06630705
[5,] 0.72349294 0.67202129
[6,] 0.88532648 -0.80892508
[7,] 0.08162027 NA
[[2]]
[,1] [,2]
[1,] 0.26597435 0.18120979
[2,] 0.31213250 0.16493883
[3,] 0.19250804 0.14104145
[4,] 0.21196882 0.10172964
[5,] 0.10389773 0.04979742
[6,] 0.02754329 0.15064043
[7,] 0.25771766 0.23042586
The closest I have been able to come is
Map(`/`, data, divisors)
But that divides rows (rather than columns) of the matrix by the vector. Any help appreciated.
Transpose your matrices before and after:
lapply(Map(`/`, lapply(data, t), divisors), t)
# [[1]]
# [,1] [,2]
# [1,] NA 0.28265752
# [2,] -0.46967014 -0.07132588
# [3,] 0.20253439 -0.37432527
# [4,] 0.65736410 0.06630705
# [5,] 0.72349294 0.67202129
# [6,] 0.88532648 -0.80892508
# [7,] 0.08162027 NA
#
# [[2]]
# [,1] [,2]
# [1,] 0.26597435 0.18120979
# [2,] 0.31213250 0.16493883
# [3,] 0.19250804 0.14104145
# [4,] 0.21196882 0.10172964
# [5,] 0.10389773 0.04979742
# [6,] 0.02754329 0.15064043
# [7,] 0.25771766 0.23042586
I prefer the transpose approach above, but another option is to expand your divisor vectors into matrices of the same dimensions as in data:
div_mat = Map(matrix, data = divisors, nrow = sapply(data, nrow), ncol = 2, byrow = T)
Map("/", data, div_mat)

Matrix into another matrix with specified dimensions

I have a matrix with 2 columns, and I'd like to turn it into a matrix with specified dimensions.
> t <- matrix(rnorm(20), ncol=2, nrow=10)
[,1] [,2]
[1,] 1.4938530 1.2493088
[2,] -0.8079445 1.8715868
[3,] 0.5775695 -0.9277420
[4,] 0.4415969 2.6357908
[5,] 0.3209226 -1.1306049
[6,] 0.5109251 -0.8661100
[7,] 1.9495571 0.2092941
[8,] 0.7816373 1.1517466
[9,] 0.0300595 -0.1351532
[10,] 0.7550894 0.7778869
What I'd like to do is something like:
> tt <- matrix(t, ncol=4, nrow=5)
[,1] [,2] [3,] [4,]
[1,] 1.4938530 1.2493088 -0.8079445 1.8715868
[2,] 0.5775695 -0.9277420 0.4415969 2.6357908
[3,] etc.
I tried to do things with modulo but my head hurts too much for me to try even one more minute.
You can transpose your first matrix, so that data is stored in the order you want, and then fill the second matrix by row:
tt <- matrix(t(t), ncol=4, nrow=5, byrow = T)
t
# [,1] [,2]
# [1,] -1.4162465950 0.01532476
# [2,] -0.2366332875 -0.04024386
# [3,] 0.5146631983 -0.34720239
# [4,] 1.9243922633 -0.24016160
# [5,] 1.6161165230 0.63187438
# [6,] -0.3558181508 -0.73199138
# [7,] 0.7459405376 0.01934826
# [8,] -1.0428581093 -2.04422042
# [9,] 0.0003166344 0.98973993
#[10,] 0.6390745275 -0.65584930
tt
# [,1] [,2] [,3] [,4]
# [1,] -1.4162465950 0.01532476 -0.2366333 -0.04024386
# [2,] 0.5146631983 -0.34720239 1.9243923 -0.24016160
# [3,] 1.6161165230 0.63187438 -0.3558182 -0.73199138
# [4,] 0.7459405376 0.01934826 -1.0428581 -2.04422042
# [5,] 0.0003166344 0.98973993 0.6390745 -0.65584930
When you work with matrix in R, you can think of it as a vector with data stored column by column. So extracting data by row from a matrix is not as straight forward as extracting by column which is essentially how data is stored. After transposing the first matrix, the data will be stored in an order you want to extract and then fill the second matrix by row would be straight forward.

R Correlation significance matrix

I have a large correlation matrix (something like 50*50).
I calculated the matrix using cor(mydata) function.
Now I would like to have equal significance matrix.
Using cor.test() I can have one significance level but is there a easy way to get all 1200?
The function cor_pmat from the ggcorrplot package gives you the p-values of correlations.
library(ggcorrplot)
set.seed(123)
xmat <- matrix(rnorm(50), ncol = 5)
cor_pmat(xmat)
[,1] [,2] [,3] [,4] [,5]
[1,] 0.00000000 0.08034470 0.24441138 0.03293644 0.3234899
[2,] 0.08034470 0.00000000 0.08716815 0.44828479 0.4824117
[3,] 0.24441138 0.08716815 0.00000000 0.20634394 0.9504582
[4,] 0.03293644 0.44828479 0.20634394 0.00000000 0.8378530
[5,] 0.32348990 0.48241166 0.95045815 0.83785303 0.0000000
I think this should do what you want, we use expand.grid in conjunction with the apply function:
Since you didn't provide your data, I created my own set.
set.seed(123)
xmat <- matrix(rnorm(50), ncol = 5)
matrix(apply(expand.grid(1:ncol(xmat), 1:ncol(xmat)),
1,
function(x) cor.test(xmat[,x[1]], xmat[,x[2]])$`p.value`),
ncol = ncol(xmat), byrow = T)
[,1] [,2] [,3] [,4] [,5]
[1,] 0.00000000 0.08034470 0.24441138 3.293644e-02 0.3234899
[2,] 0.08034470 0.00000000 0.08716815 4.482848e-01 0.4824117
[3,] 0.24441138 0.08716815 0.00000000 2.063439e-01 0.9504582
[4,] 0.03293644 0.44828479 0.20634394 1.063504e-62 0.8378530
[5,] 0.32348990 0.48241166 0.95045815 8.378530e-01 0.0000000
Note that if you didn't want a matrix, and instead were comfortable with a data.frame, we could use combn which would involve much less iteration and be more efficient.
cbind(t(combn(1:ncol(xmat), 2)),
combn(1:ncol(xmat), 2, function(x) cor.test(xmat[,x[1]], xmat[,x[2]])$`p.value`)
)
[,1] [,2] [,3]
[1,] 1 2 0.08034470
[2,] 1 3 0.24441138
[3,] 1 4 0.03293644
[4,] 1 5 0.32348990
[5,] 2 3 0.08716815
[6,] 2 4 0.44828479
[7,] 2 5 0.48241166
[8,] 3 4 0.20634394
[9,] 3 5 0.95045815
[10,] 4 5 0.83785303
Alternatively, we can perform the same operation, but use the pipe operator %>% to make it a bit more concise:
library(magrittr)
combn(1:ncol(xmat), 2) %>%
apply(., 2, function(x) cor.test(xmat[,x[1]], xmat[,x[2]])$`p.value`) %>%
cbind(t(combn(1:ncol(xmat), 2)), .)
Here is one solution:
data <- swiss
#cor(data)
n <- ncol(data)
p.value.vec <- apply(combn(1:ncol(data), 2), 2, function(x)cor.test(data[,x[1]], data[,x[2]])$p.value)
p.value.matrix = matrix(0, n, n)
p.value.matrix[upper.tri(p.value.matrix, diag=FALSE)] = p.value.vec
p.value.matrix[lower.tri(p.value.matrix, diag=FALSE)] = p.value.vec
p.value.matrix
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0.000000e+00 1.491720e-02 9.450437e-07 1.028523e-03 1.304590e-06 2.588308e-05
[2,] 1.491720e-02 0.000000e+00 3.658617e-07 3.585238e-03 5.204434e-03 4.453814e-01
[3,] 9.450437e-07 9.951515e-08 0.000000e+00 9.951515e-08 6.844724e-01 3.018078e-01
[4,] 3.658617e-07 1.304590e-06 4.811397e-08 0.000000e+00 4.811397e-08 5.065456e-01
[5,] 1.028523e-03 5.204434e-03 2.588308e-05 3.018078e-01 0.000000e+00 2.380297e-01
[6,] 3.585238e-03 6.844724e-01 4.453814e-01 5.065456e-01 2.380297e-01 0.000000e+00

Subset out all elements with the same name of list

Data
I have a list of lists that looks something like this:
sublist1 <- list(power=as.matrix(c(rnorm(10)),c(rnorm)),x=rnorm(10),y=rnorm(10))
sublist2 <- list(power=as.matrix(c(rnorm(10)),c(rnorm)),x=rnorm(10),y=rnorm(10))
sublist3 <- list(power=as.matrix(c(rnorm(10)),c(rnorm)),x=rnorm(10),y=rnorm(10))
mylist = list(sublist1,sublist2,sublist3)
My goal would be to pull out only the matrices named power
I've tried
mylist_power =mylist[sapply(mylist, '[', 'Power')]
But thats not working.
Brownie point alert!!!
How can I find the mean of the newly created list of matrices named power?
mylist_power <- sapply(mylist, '[', 'power')
and some means:
sapply(mylist_power, mean) # one per matrix
sapply(mylist_power, colMeans) # for each column and each matrix
sapply(mylist_power, rowMeans) # for each row and each matrix
mean(unlist(mylist_power)) # for the whole list
Reduce(`+`, mylist_power) / length(mylist_power) # element-wise
purrr solution which can be replicated to baseR's Map
#part 1 (to return only $power of every list item)
map(mylist, ~.x$power)
[[1]]
[,1]
[1,] 0.33281918
[2,] -1.12404046
[3,] -0.70613078
[4,] -0.72754386
[5,] -1.83431439
[6,] -0.40768794
[7,] 0.02686119
[8,] 0.91162864
[9,] 1.63434648
[10,] 0.06068561
[[2]]
[,1]
[1,] -0.02256943
[2,] -0.90315486
[3,] 0.90777295
[4,] 1.16194290
[5,] -0.45795340
[6,] 0.92795667
[7,] -2.10293514
[8,] -1.67716711
[9,] 1.76565577
[10,] 0.79444742
[[3]]
[,1]
[1,] -0.36200564
[2,] -1.13955016
[3,] -0.81537133
[4,] 1.31024563
[5,] -0.25836094
[6,] 0.60626489
[7,] 0.31344822
[8,] 0.05360308
[9,] 1.12825379
[10,] -0.55813346
part-2
map(mylist, ~.x$power %>% colMeans)
[[1]]
[1] -0.1833376
[[2]]
[1] 0.03939958
[[3]]
[1] 0.02783941
To get these values in a vector instead
map_dbl(mylist, ~.x$power %>% colMeans)
[1] -0.18333763 0.03939958 0.02783941

Resources