This question already has an answer here:
correlation matrix in R
(1 answer)
Closed 6 years ago.
Starting from a Matrix (nxm), I would like to create a new Matrix mxm that contains the correlation between the permutation of the columns of the starting matrix by 2. So if my input is a Matrix 3x3, I would like to calculate the correlation of the columns 12, 13, 23 and assign the results to the destination Matrix. Banally I used two nested for loop (~O(n^2))
for (i in 1:n) {
for (j in i+1:n) {
if (j <= n) {
tmp = cor(inMatrix[, i], inMatrix[, j])
dstMatrix[i,j] = tmp;
}
}
}
this appears to be working, and I was wondering if exists a better way to achieve it in R.
The simple cor(inMatrix) does it (whole matrix directly passed to cor()):
n <- 7
m <- 5
set.seed(123)
inMatrix <- replicate(m, sample(c(1, - 1), 1) * cumsum(runif(n)))
inMatrix
# [,1] [,2] [,3] [,4] [,5]
# [1,] 0.7883051 -0.4566147 0.04205953 -0.7085305 -0.7954674
# [2,] 1.1972821 -1.4134481 0.36998025 -1.2525965 -0.8200811
# [3,] 2.0802995 -1.8667822 1.32448390 -1.8467385 -1.2978771
# [4,] 3.0207667 -2.5443529 2.21402322 -2.1358983 -2.0563366
# [5,] 3.0663232 -3.1169863 2.90682662 -2.2830119 -2.2727445
# [6,] 3.5944287 -3.2199110 3.54733344 -3.2460361 -2.5909256
# [7,] 4.4868478 -4.1197359 4.54160321 -4.1483352 -2.8225513
dstMatrix <- matrix(nrow = m, ncol = m)
for (i in 1:(m - 1)) {
for (j in (i+1):m) {
if (j <= n) {
tmp = cor(inMatrix[, i], inMatrix[, j])
dstMatrix[i,j] = tmp;
}
}
}
dstMatrix
# [,1] [,2] [,3] [,4] [,5]
# [1,] NA -0.9823516 0.9902370 -0.9688212 -0.9825973
# [2,] NA NA -0.9811424 0.9570599 0.9626469
# [3,] NA NA NA -0.9742235 -0.9862355
# [4,] NA NA NA NA 0.9331879
# [5,] NA NA NA NA NA
dstMatrix_2 <- cor(inMatrix)
dstMatrix_2
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1.0000000 -0.9823516 0.9902370 -0.9688212 -0.9825973
# [2,] -0.9823516 1.0000000 -0.9811424 0.9570599 0.9626469
# [3,] 0.9902370 -0.9811424 1.0000000 -0.9742235 -0.9862355
# [4,] -0.9688212 0.9570599 -0.9742235 1.0000000 0.9331879
# [5,] -0.9825973 0.9626469 -0.9862355 0.9331879 1.0000000
dstMatrix == dstMatrix_2
# [,1] [,2] [,3] [,4] [,5]
# [1,] NA TRUE TRUE FALSE TRUE
# [2,] NA NA TRUE FALSE TRUE
# [3,] NA NA NA FALSE TRUE
# [4,] NA NA NA NA FALSE
# [5,] NA NA NA NA NA
# The difference lies in machine precision magnitude, not sure what caused it:
dstMatrix - dstMatrix_2
# [,1] [,2] [,3] [,4] [,5]
# [1,] NA 0 0 -1.110223e-16 0.000000e+00
# [2,] NA NA 0 2.220446e-16 0.000000e+00
# [3,] NA NA NA -1.110223e-16 0.000000e+00
# [4,] NA NA NA NA 1.110223e-16
# [5,] NA NA NA NA NA
compute correlation coefficient for combinations of columns. combn function is used to get pairs of column numbers
As per #Sotos, function can be passed directly into combn, so it avoids using apply()
cor_vals <- combn(1:col_n, 2, function(x) cor(mat1[, x[1]], mat1[, x[2]]))
# cor_vals <- apply(combn(1:col_n, 2), 2, function(x) cor(mat1[, x[1]], mat1[, x[2]]))
assign names to correlation values
cor_vals <- setNames(cor_vals, combn(1:col_n, 2, paste0, collapse = ''))
cor_vals
# 12 13 23
# 0.1621491 -0.8211970 0.4299367
Data:
set.seed(1L)
row_n <- 3
col_n <- 3
mat1 <- matrix(runif(row_n * col_n, min = 0, max = 20), nrow = row_n, ncol = col_n)
Related
I have a matrix of values that also contains NAs like:
> matrix(rexp(200), 10)
> df[ df < 0.5 ] <- NA
> df
[,1] [,2] [,3] [,4] [,5]
[1,] 2.124043 1.6119230 NA 0.7222127 1.400924
[2,] 4.143728 NA NA 1.0343577 NA
[3,] 2.395984 0.6794447 0.8327695 1.0258656 NA
[4,] NA NA NA NA 1.421674
[5,] NA 1.0446031 0.7762776 NA NA
I would like to scramble each column in my matrix and realised that I can do so using:
> df<- df[sample(nrow(df)),]
> df
[,1] [,2] [,3] [,4] [,5]
[1,] 2.395984 0.6794447 0.8327695 1.0258656 NA
[2,] 2.124043 1.6119230 NA 0.7222127 1.400924
[3,] NA NA NA NA 1.421674
[4,] 4.143728 NA NA 1.0343577 NA
[5,] NA 1.0446031 0.7762776 NA NA
However, I would like to randomise this way, while keeping the positiong of NAs the same as before. Does anybody know of an easy way to do so?
Thanks a lot!
Wrap it in an apply to randomize the columns only
apply(X = df,
MARGIN = 2,
FUN = function(x) {
x[which(!is.na(x))] <- sample(x[which(!is.na(x))])
return(x)
})
So there is a hobby project I am currently working on in order to improve my R skills. What I created with my previous code are various subsets of data "returnseries.1, returnseries.2, returnseries.3, ... "(from 1 to 119) which are stored each in a 252x6 matrix.
Now I am building a for loop to calculate the covariance matrix for each subset.
My code goes as the following:
for(k in 1:119){
covmat[k] = matrix(c(cov(returnseries[k])),nrow=6, ncol=6)
}
For some reason I get the error that: "My column index must be at most 7 not 8."
And I don't get why. I tried several other code versions but nothing gives me an answer. Thought that it had to do with the naming but using return series.[k] is providing me an error, that returnseries. is not defined
Would be delighted if somebody could provide a quick
You can use an array. A 3D array in this case.
Generate some data.
> xy <- list(one = matrix(rnorm(9), ncol = 3),
+ two = matrix(rnorm(9), ncol = 3),
+ three = matrix(rnorm(9), ncol = 3))
> xy
$one
[,1] [,2] [,3]
[1,] 0.1341714 -1.27229790 0.22431441
[2,] 1.0853899 0.02335881 -0.05600098
[3,] -1.5645181 0.83745858 -1.47670091
$two
[,1] [,2] [,3]
[1,] 1.4891642 -0.3766222 -0.86981432
[2,] 0.3424295 -1.7882177 1.79601480
[3,] -1.1583058 -0.1604330 0.02690498
$three
[,1] [,2] [,3]
[1,] -0.1511346 -0.3672432 -0.3008405
[2,] -1.9881830 -0.8545396 -0.7108430
[3,] 0.1637134 -0.7958267 1.1923535
Create empty array
> N <- 3
> ar <- array(rep(NA, 3*3*N), dim = c(3, 3, N))
> ar
, , 1
[,1] [,2] [,3]
[1,] NA NA NA
[2,] NA NA NA
[3,] NA NA NA
, , 2
[,1] [,2] [,3]
[1,] NA NA NA
[2,] NA NA NA
[3,] NA NA NA
, , 3
[,1] [,2] [,3]
[1,] NA NA NA
[2,] NA NA NA
[3,] NA NA NA
Fill in values.
> for (i in 1:N) {
+ ar[,, i] <- xy[[i]]
+ }
>
> ar
, , 1
[,1] [,2] [,3]
[1,] 0.1341714 -1.27229790 0.22431441
[2,] 1.0853899 0.02335881 -0.05600098
[3,] -1.5645181 0.83745858 -1.47670091
, , 2
[,1] [,2] [,3]
[1,] 1.4891642 -0.3766222 -0.86981432
[2,] 0.3424295 -1.7882177 1.79601480
[3,] -1.1583058 -0.1604330 0.02690498
, , 3
[,1] [,2] [,3]
[1,] -0.1511346 -0.3672432 -0.3008405
[2,] -1.9881830 -0.8545396 -0.7108430
[3,] 0.1637134 -0.7958267 1.1923535
You can do all sorts of wonderful things with this now. For example, do row sums.
> apply(ar, MARGIN = 3, FUN = rowSums)
[,1] [,2] [,3]
[1,] -0.9138121 0.2427277 -0.8192183
[2,] 1.0527477 0.3502266 -3.5535656
[3,] -2.2037604 -1.2918338 0.5602402
Here's proof for the first matrix. Compare it to the first column:
> rowSums(xy[[1]])
[1] -0.9138121 1.0527477 -2.2037604
I have a vector of data such as the following:
data <- c(1, 3, 4, 7)
And I would like to apply a function to every pair of elements in the vector such that it will return an upper triangle matrix as the following does:
mat <- matrix(data = NA, nrow = length(data), ncol = length(data))
for (i in 1:(length(data) - 1)) {
for (j in (i+1):length(data)) {
mat[i, j] <- "-"(data[j], data[i])
}
}
But I would like to do so with an apply type function instead of a for loop.
I am unsure how to do so. Any suggestions?
Thanks!
We can use combn
mat[lower.tri(mat, diag=FALSE)] <- combn(data, 2,
FUN= function(x) x[2]-x[1])
t(mat)
# [,1] [,2] [,3] [,4]
#[1,] NA 2 3 6
#[2,] NA NA 1 4
#[3,] NA NA NA 3
#[4,] NA NA NA NA
data
mat <- matrix(data = NA, nrow = length(data), ncol = length(data))
Using outer:
t(outer(data,data,"-"))*
NA^lower.tri(matrix(0,length(data),length(data)),diag=TRUE)
# [,1] [,2] [,3] [,4]
#[1,] NA 2 3 6
#[2,] NA NA 1 4
#[3,] NA NA NA 3
#[4,] NA NA NA NA
m <- "mData"
assign(m, matrix(data = NA, nrow = 4, ncol = 5))
Now I want to use variable m to assign values to the mData matrix
assign(m[1, 2], 35) will not work.
Any solution will be much appreciated?
I'm kind of ashamed to post this but there would be a way to do this. It feels so wrong because the R-way would be to build a list of matrices and then operate on them by passing a function to transform them using lapply.
assign.by.char <- function(x, ...) {
eval.parent(assign(x, do.call(`[<-`, list(get(x) , ...)))) }
assign.by.char(m, 1,2,35)
[,1] [,2] [,3] [,4] [,5]
[1,] NA 35 NA NA NA
[2,] NA NA NA NA NA
[3,] NA NA NA NA NA
[4,] NA NA NA NA NA
If you really need to use assign(), you could do it with replace()
m <- matrix(, 3, 3)
assign("m", replace(m, cbind(1, 2), 35))
m
# [,1] [,2] [,3]
# [1,] NA 35 NA
# [2,] NA NA NA
# [3,] NA NA NA
Or you can use assign directly (a variant of #BondedDust's solution)
assign(m, `[<-`(get(m), cbind(1,2), 35))
mData
# [,1] [,2] [,3]
#[1,] NA 35 NA
#[2,] NA NA NA
#[3,] NA NA NA
Or as a function
assign.by.char <- function(x, ...){
eval.parent(assign(x, `[<-`(get(x), ...)))}
data
mData <- matrix(, 3, 3)
m <- 'mData'
I created an empty matrix by matrix(), when I need to test whether a given matrix is empty, How can I do that? I know that is.na(matrix()) is TRUE, but if given matrix is higher dimension, it cannot determine.
What I mean empty is element full of NA or NULL.
I'm guessing that you are just looking for all. Here's a small example:
M1 <- matrix(NA, ncol = 3, nrow = 3)
# [,1] [,2] [,3]
# [1,] NA NA NA
# [2,] NA NA NA
# [3,] NA NA NA
M2 <- matrix(c(1, rep(NA, 8)), ncol = 3, nrow = 3)
M2
# [,1] [,2] [,3]
# [1,] 1 NA NA
# [2,] NA NA NA
# [3,] NA NA NA
all(is.na(M1))
# [1] TRUE
all(is.na(M2))
# [1] FALSE