I wonder how to compute pairwise Lepage statistic between columns on data like:
> cbind(v1=rnorm(10), v2=rnorm(10), v3=rnorm(10), v4=rnorm(10))
v1 v2 v3 v4
[1,] -2.47148729 0.61727115 1.28285770 0.72974010
[2,] 0.42657513 0.77615280 1.88207246 0.41295301
[3,] -0.32480814 -1.75461602 -0.16589154 -0.52731722
[4,] 0.02760296 -2.08827618 -0.47176830 -0.17416765
[5,] -0.52760532 -0.20514629 0.15589594 -0.54623986
[6,] -0.47143259 -0.56666084 -1.35046101 -0.92754741
[7,] 0.61071291 -1.65132215 1.61024187 0.83128254
[8,] -0.17746888 -1.09887111 -0.32012303 0.69382341
[9,] -0.38707069 -0.69628506 0.04597653 0.13479181
[10,] 0.52030680 1.11764587 -1.10243994 -0.83949756
I'm thinking of having something like:
v1.v1 v1.v2 v1.v3 v1.v4 ... v4.v4
[1,] 0 1 2 5 ... 0
Like what cor(x) does when x is a matrix. I guess dplyr might be an answer? Or there is a multisample version pLepage()?
Consider using base R's sapply. Not familiar with LePage test in R, but using correlation and your example data:
rdmatrix <- cbind(v1=rnorm(10), v2=rnorm(10), v3=rnorm(10), v4=rnorm(10))
corrmatrix <- sapply(1:ncol(rdmatrix),
function(x,y) cor(rdmatrix[,x], rdmatrix[,y]), 1:ncol(rdmatrix))
# [,1] [,2] [,3] [,4]
# [1,] 1.0000000 -0.4613219 -0.5661391 -0.1703655
# [2,] -0.4613219 1.0000000 0.1965278 0.2111900
# [3,] -0.5661391 0.1965278 1.0000000 -0.3305471
# [4,] -0.1703655 0.2111900 -0.3305471 1.0000000
To flatten it out in a matrix of one row, consider the below using outer() for all combination set of column names and do.call(cbind, ...) to flatten:
# MATRIX OF ALL COLS PAIRINGS
cols <- outer(colnames(rdmatrix), colnames(rdmatrix),
function(y,x) paste0(x, '.', y)) # NOTICE INVERSION OF X AND Y
# FLATTEN COL NAMES
cols <- do.call(cbind, as.list(cols))
# FLATTEN CORR MATRIX DATA
finalmatrix <- do.call(cbind, as.list(corrmatrix))
# NAME MATRIX COLUMNS
colnames(finalmatrix) <- cols[1,]
# v1.v1 v1.v2 v1.v3 v1.v4
# [1,] 1 -0.4613219 -0.5661391 -0.1703655
# v2.v1 v2.v2 v2.v3 v2.v4
# [1,] -0.4613219 1 0.1965278 0.21119
# v3.v1 v3.v2 v3.v3 v3.v4
# [1,] -0.5661391 0.1965278 1 -0.3305471
# v4.v1 v4.v2 v4.v3 v4.v4
# [1,] -0.1703655 0.21119 -0.3305471 1
Related
I have a list of 100 - 3 x 51 matrices in r and would like to divide the first row of each matrix in the list by the lagged (n=2) sum of all rows in each corresponding matrix iteratively corresponding with the lag. I know how to achieve this within the same row within a list of vectors with the following code
Example_List2 <- lapply(Example_List1,function(x){x/lag(x,n=2)})
My attempt at the 3 row list is coded below. I would ultimately like to make this the replacement first row of a new DB and repeat this process for each row with dif lags. My attempted code is
List2 <- List1
lapply(List2, `[`,1,) <- lapply(List1,function(x){lapply(x, `[`,1,)/lag(colSums(x),n=2)})
lapply(List2, `[`,2,) <- lapply(List1,function(x){lapply(x, `[`,2,)/lag(colSums(x),n=3)})
lapply(List2, `[`,3,) <- lapply(List1,function(x){lapply(x, `[`,3,)/lag(colSums(x),n=4)})
We may use
library(rbindlist)
List2 <- lapply(List1, \(x) x/do.call(rbind, shift(colSums(x), n = 2:4)))
For the second case
List3 <- lapply(List1, \(x) {
n1 <- 2:4
x1 <- colSums(x)
x2 <- x
for(i in seq_along(n1)) {
x2[i,] <- shift(x[i,], n = n1[i], type = "lead")}
colSums(x2)/x1
})
data
set.seed(24)
List1 <- replicate(100, matrix(rnorm(3*51), nrow = 3), simplify = FALSE)
Try this.
lapply(List1, \(x) {
u <- 1:2
x[1, -u] <- x[1, -u]/colSums(x[, -u])
# x[1, u] <- NA_real_ ## uncomment if you want NA for x[1, 1:2]
x
})
# [[1]]
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 4 0.2916667 0.3030303 0.3095238
# [2,] 2 5 8.0000000 11.0000000 14.0000000
# [3,] 3 6 9.0000000 12.0000000 15.0000000
#
# [[2]]
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 4 0.2916667 0.3030303 0.3095238
# [2,] 2 5 8.0000000 11.0000000 14.0000000
# [3,] 3 6 9.0000000 12.0000000 15.0000000
#
# [[3]]
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 4 0.2916667 0.3030303 0.3095238
# [2,] 2 5 8.0000000 11.0000000 14.0000000
# [3,] 3 6 9.0000000 12.0000000 15.0000000
Data:
List1 <- replicate(3, matrix(1:15, 3, 5), simplify=FALSE)
Suppose I have a list like this in R:
> L
[[1]]
[1] 0.6876619 0.7847888 0.6377801 0.2078056 0.8981001
[[2]]
[1] 0.9358160 0.8905056 0.7715877 0.8648426 0.4915060
[[3]]
[1] 0.88095630 0.08010288 0.15140700 0.35400865 0.60317717
[[4]]
[1] 0.07436267 0.85873209 0.49881141 0.92363954 0.87208334
And I want to find the correlation coefficient between each pair of vectors, e.g, cor(L[[i]],L[[j]]). Is there any solution to perform it with the apply family function?
Please take it as a specific case of a general question: What if we need to do a triple nested loop over a List() in R?
You can nest lapply calls:
lapply(L, function(x) lapply(L, function(y) cor(x,y))))
If you want the results presented more nicely, put them in a matrix:
L <- list(rnorm(10), rnorm(10), rnorm(10))
matrix(unlist(lapply(L,
function(x) lapply(L,
function(y) cor(x,y)))),
length(L))
#> [,1] [,2] [,3]
#> [1,] 1.0000000 -0.3880931 -0.4164212
#> [2,] -0.3880931 1.0000000 0.4158335
#> [3,] -0.4164212 0.4158335 1.0000000
Created on 2021-05-31 by the reprex package (v2.0.0)
You could use mapply. Generate all the combinations of interest (pairs, triples, ...) and then apply
L=replicate(5,rnorm(5),simplify=F)
tmp=expand.grid(1:length(L),1:length(L))
tmp$cor=mapply(
function(y,x){cor(L[[y]],L[[x]])},
tmp$Var1,
tmp$Var2
)
Var1 Var2 cor
1 1 1 1.0000000
2 2 1 0.1226881
3 3 1 -0.2871613
4 4 1 0.4746545
5 5 1 0.9779644
6 1 2 0.1226881
7 2 2 1.0000000
...
You can cbind the list and call cor with the resulting matirx.
cor(do.call(cbind, L))
# [,1] [,2] [,3] [,4]
#[1,] 1.0000000 -0.46988357 0.14151672 0.14151672
#[2,] -0.4698836 1.00000000 -0.09177819 -0.09177819
#[3,] 0.1415167 -0.09177819 1.00000000 1.00000000
#[4,] 0.1415167 -0.09177819 1.00000000 1.00000000
In case there is one more level in the list use unlist.
L2 <- lapply(L, list) #Create list with one more level.
cor(do.call(cbind, unlist(L2, FALSE)))
In case it is unknown or mixed, a recursive call of a function could be used:
L3 <- list(L[[1]], L[[2]], L2[[3]], L2[[4]])
f <- function(x) {
if(is.list(x)) sapply(x, f)
else x
}
cor(f(L3))
Data:
L <- list(c(0.6876619,0.7847888,0.6377801,0.2078056,0.8981001)
, c(0.9358160,0.8905056,0.7715877,0.8648426,0.4915060)
, c(0.88095630,0.08010288,0.15140700,0.35400865,0.60317717)
, c(0.88095630,0.08010288,0.15140700,0.35400865,0.60317717))
I'm trying to subset number of rows in a list using R.
I have 2 lists one has matrix with n rows and p columns the second list has the number of rows that I need to subset.
mat <- list(a = matrix(rnorm(8*4),8), b = matrix(rnorm(15*4),15), c = matrix(rnorm(7*4),7))
rw <- list(a = 6, b = 7, c = 4)
Both list have common names, in the above example, I would like to retain for element a first 6 rows, for b first 7 rows and c 4 rows.
How would you do that in R
One solution with Map:
Map(function(x, y) x[1:y, ], mat, rw)
# $a
# [,1] [,2] [,3] [,4]
# [1,] 1.3331549 -0.6985623 -1.1842788 -0.1496880
# [2,] 0.2096395 -0.2901906 0.4210395 0.9116542
# [3,] 0.1763317 1.3858205 -1.1567526 -1.1794618
# [4,] 1.3596395 0.5815012 -0.3681799 -0.6569447
# [5,] 0.2251352 0.2331387 -1.2509844 -1.1346729
# [6,] 0.6796729 1.1274772 0.3992489 0.2305927
#
# $b
# [,1] [,2] [,3] [,4]
# [1,] 0.30700748 -1.2173855 -0.3377885 -0.6748974
# [2,] 1.09506443 -0.6142685 -1.1301122 -0.7792081
# [3,] -0.61049306 -1.3414474 0.9771373 1.0191636
# [4,] 0.66687294 -0.5269721 0.9971987 -0.6514121
# [5,] 0.54623236 0.9020964 0.3252700 -0.3925129
# [6,] -0.04848903 -0.5204047 0.3344675 -0.3232105
# [7,] -0.56502719 -0.3743275 2.1760364 -0.2941956
#
# $c
# [,1] [,2] [,3] [,4]
# [1,] -0.3225609 -0.40126955 -1.787255 -1.5005721
# [2,] 0.3474430 -1.16657015 1.106033 0.3114282
# [3,] 0.4099467 -0.04353555 0.838330 0.3282246
# [4,] -1.4648740 0.51279791 0.198768 -0.3394502
I have a list of matrices and a list of vectors, and I want to divide the columns of each matrix with the corresponding vector element.
For example, given
set.seed(230)
data <- list(cbind(c(NA, rnorm(6)),c(rnorm(6),NA)), cbind(runif(7), runif(7)))
divisors <- list(c(0.5,2), c(3,4))
I'm looking for a vectorized function that produces output that looks the same as
for(i in 1:length(data)){
for(j in 1:ncol(data[[i]])){data[[i]][,j] <- data[[i]][,j] / divisors[[i]][j]}
}
i.e.
[[1]]
[,1] [,2]
[1,] NA 0.28265752
[2,] -0.46967014 -0.07132588
[3,] 0.20253439 -0.37432527
[4,] 0.65736410 0.06630705
[5,] 0.72349294 0.67202129
[6,] 0.88532648 -0.80892508
[7,] 0.08162027 NA
[[2]]
[,1] [,2]
[1,] 0.26597435 0.18120979
[2,] 0.31213250 0.16493883
[3,] 0.19250804 0.14104145
[4,] 0.21196882 0.10172964
[5,] 0.10389773 0.04979742
[6,] 0.02754329 0.15064043
[7,] 0.25771766 0.23042586
The closest I have been able to come is
Map(`/`, data, divisors)
But that divides rows (rather than columns) of the matrix by the vector. Any help appreciated.
Transpose your matrices before and after:
lapply(Map(`/`, lapply(data, t), divisors), t)
# [[1]]
# [,1] [,2]
# [1,] NA 0.28265752
# [2,] -0.46967014 -0.07132588
# [3,] 0.20253439 -0.37432527
# [4,] 0.65736410 0.06630705
# [5,] 0.72349294 0.67202129
# [6,] 0.88532648 -0.80892508
# [7,] 0.08162027 NA
#
# [[2]]
# [,1] [,2]
# [1,] 0.26597435 0.18120979
# [2,] 0.31213250 0.16493883
# [3,] 0.19250804 0.14104145
# [4,] 0.21196882 0.10172964
# [5,] 0.10389773 0.04979742
# [6,] 0.02754329 0.15064043
# [7,] 0.25771766 0.23042586
I prefer the transpose approach above, but another option is to expand your divisor vectors into matrices of the same dimensions as in data:
div_mat = Map(matrix, data = divisors, nrow = sapply(data, nrow), ncol = 2, byrow = T)
Map("/", data, div_mat)
I have a matrix with 2 columns, and I'd like to turn it into a matrix with specified dimensions.
> t <- matrix(rnorm(20), ncol=2, nrow=10)
[,1] [,2]
[1,] 1.4938530 1.2493088
[2,] -0.8079445 1.8715868
[3,] 0.5775695 -0.9277420
[4,] 0.4415969 2.6357908
[5,] 0.3209226 -1.1306049
[6,] 0.5109251 -0.8661100
[7,] 1.9495571 0.2092941
[8,] 0.7816373 1.1517466
[9,] 0.0300595 -0.1351532
[10,] 0.7550894 0.7778869
What I'd like to do is something like:
> tt <- matrix(t, ncol=4, nrow=5)
[,1] [,2] [3,] [4,]
[1,] 1.4938530 1.2493088 -0.8079445 1.8715868
[2,] 0.5775695 -0.9277420 0.4415969 2.6357908
[3,] etc.
I tried to do things with modulo but my head hurts too much for me to try even one more minute.
You can transpose your first matrix, so that data is stored in the order you want, and then fill the second matrix by row:
tt <- matrix(t(t), ncol=4, nrow=5, byrow = T)
t
# [,1] [,2]
# [1,] -1.4162465950 0.01532476
# [2,] -0.2366332875 -0.04024386
# [3,] 0.5146631983 -0.34720239
# [4,] 1.9243922633 -0.24016160
# [5,] 1.6161165230 0.63187438
# [6,] -0.3558181508 -0.73199138
# [7,] 0.7459405376 0.01934826
# [8,] -1.0428581093 -2.04422042
# [9,] 0.0003166344 0.98973993
#[10,] 0.6390745275 -0.65584930
tt
# [,1] [,2] [,3] [,4]
# [1,] -1.4162465950 0.01532476 -0.2366333 -0.04024386
# [2,] 0.5146631983 -0.34720239 1.9243923 -0.24016160
# [3,] 1.6161165230 0.63187438 -0.3558182 -0.73199138
# [4,] 0.7459405376 0.01934826 -1.0428581 -2.04422042
# [5,] 0.0003166344 0.98973993 0.6390745 -0.65584930
When you work with matrix in R, you can think of it as a vector with data stored column by column. So extracting data by row from a matrix is not as straight forward as extracting by column which is essentially how data is stored. After transposing the first matrix, the data will be stored in an order you want to extract and then fill the second matrix by row would be straight forward.