So there is a hobby project I am currently working on in order to improve my R skills. What I created with my previous code are various subsets of data "returnseries.1, returnseries.2, returnseries.3, ... "(from 1 to 119) which are stored each in a 252x6 matrix.
Now I am building a for loop to calculate the covariance matrix for each subset.
My code goes as the following:
for(k in 1:119){
covmat[k] = matrix(c(cov(returnseries[k])),nrow=6, ncol=6)
}
For some reason I get the error that: "My column index must be at most 7 not 8."
And I don't get why. I tried several other code versions but nothing gives me an answer. Thought that it had to do with the naming but using return series.[k] is providing me an error, that returnseries. is not defined
Would be delighted if somebody could provide a quick
You can use an array. A 3D array in this case.
Generate some data.
> xy <- list(one = matrix(rnorm(9), ncol = 3),
+ two = matrix(rnorm(9), ncol = 3),
+ three = matrix(rnorm(9), ncol = 3))
> xy
$one
[,1] [,2] [,3]
[1,] 0.1341714 -1.27229790 0.22431441
[2,] 1.0853899 0.02335881 -0.05600098
[3,] -1.5645181 0.83745858 -1.47670091
$two
[,1] [,2] [,3]
[1,] 1.4891642 -0.3766222 -0.86981432
[2,] 0.3424295 -1.7882177 1.79601480
[3,] -1.1583058 -0.1604330 0.02690498
$three
[,1] [,2] [,3]
[1,] -0.1511346 -0.3672432 -0.3008405
[2,] -1.9881830 -0.8545396 -0.7108430
[3,] 0.1637134 -0.7958267 1.1923535
Create empty array
> N <- 3
> ar <- array(rep(NA, 3*3*N), dim = c(3, 3, N))
> ar
, , 1
[,1] [,2] [,3]
[1,] NA NA NA
[2,] NA NA NA
[3,] NA NA NA
, , 2
[,1] [,2] [,3]
[1,] NA NA NA
[2,] NA NA NA
[3,] NA NA NA
, , 3
[,1] [,2] [,3]
[1,] NA NA NA
[2,] NA NA NA
[3,] NA NA NA
Fill in values.
> for (i in 1:N) {
+ ar[,, i] <- xy[[i]]
+ }
>
> ar
, , 1
[,1] [,2] [,3]
[1,] 0.1341714 -1.27229790 0.22431441
[2,] 1.0853899 0.02335881 -0.05600098
[3,] -1.5645181 0.83745858 -1.47670091
, , 2
[,1] [,2] [,3]
[1,] 1.4891642 -0.3766222 -0.86981432
[2,] 0.3424295 -1.7882177 1.79601480
[3,] -1.1583058 -0.1604330 0.02690498
, , 3
[,1] [,2] [,3]
[1,] -0.1511346 -0.3672432 -0.3008405
[2,] -1.9881830 -0.8545396 -0.7108430
[3,] 0.1637134 -0.7958267 1.1923535
You can do all sorts of wonderful things with this now. For example, do row sums.
> apply(ar, MARGIN = 3, FUN = rowSums)
[,1] [,2] [,3]
[1,] -0.9138121 0.2427277 -0.8192183
[2,] 1.0527477 0.3502266 -3.5535656
[3,] -2.2037604 -1.2918338 0.5602402
Here's proof for the first matrix. Compare it to the first column:
> rowSums(xy[[1]])
[1] -0.9138121 1.0527477 -2.2037604
Related
I have created list of objects in R as follows:
set.seed(1234)
data <- matrix(rnorm(3*4,mean=0,sd=1), 3, 4)
results <- lapply(1:ncol(data), function(i) outer(data[, i], data[, i]))
all 4 list objects have dim=3x3. I also have the following matrix matr <- matrix(c(2,4,6,8),ncol=4), where each value corresponds to the above list objects.
Then, I use this equation matr[,1]*matr[,2]*results[[1]]*results[[2]] between the first two objects in order to create the below matrix
[,1] [,2] [,3]
[1,] 64.135122 2.6966755 12.4307531
[2,] 2.696676 0.1133865 0.5226732
[3,] 12.430753 0.5226732 2.4093448
How can I calculate the above equation for all all possible object combinations and save them to a new list?
We can use combn to create pairwise combination on the sequence of the list, extract the elements and do the multiplication
new_lst <- combn(seq_along(results), 2, \(i) matr[,i[1]] * matr[,i[2]] *
results[[i[1]]] * results[[i[2]]], simplify = FALSE)
names(new_lst) <- combn(seq_along(results), 2, paste, collapse="_")
-output
> new_lst
$`1_2`
[,1] [,2] [,3]
[1,] 64.135122 2.6966755 12.4307531
[2,] 2.696676 0.1133865 0.5226732
[3,] 12.430753 0.5226732 2.4093448
$`1_3`
[,1] [,2] [,3]
[1,] 5.775451 -1.2624981 -5.095849
[2,] -1.262498 0.2759787 1.113939
[3,] -5.095849 1.1139391 4.496217
$`1_4`
[,1] [,2] [,3]
[1,] 18.46710 -2.275650 -18.610758
[2,] -2.27565 0.280422 2.293352
[3,] -18.61076 2.293352 18.755530
$`2_3`
[,1] [,2] [,3]
[1,] 43.621251 -7.589849 -9.242303
[2,] -7.589849 1.320590 1.608108
[3,] -9.242303 1.608108 1.958223
$`2_4`
[,1] [,2] [,3]
[1,] 139.47970 -13.680683 -33.754187
[2,] -13.68068 1.341852 3.310735
[3,] -33.75419 3.310735 8.168537
$`3_4`
[,1] [,2] [,3]
[1,] 12.560327 6.404863 13.837154
[2,] 6.404863 3.266019 7.055953
[3,] 13.837154 7.055953 15.243778
This question already has an answer here:
correlation matrix in R
(1 answer)
Closed 6 years ago.
Starting from a Matrix (nxm), I would like to create a new Matrix mxm that contains the correlation between the permutation of the columns of the starting matrix by 2. So if my input is a Matrix 3x3, I would like to calculate the correlation of the columns 12, 13, 23 and assign the results to the destination Matrix. Banally I used two nested for loop (~O(n^2))
for (i in 1:n) {
for (j in i+1:n) {
if (j <= n) {
tmp = cor(inMatrix[, i], inMatrix[, j])
dstMatrix[i,j] = tmp;
}
}
}
this appears to be working, and I was wondering if exists a better way to achieve it in R.
The simple cor(inMatrix) does it (whole matrix directly passed to cor()):
n <- 7
m <- 5
set.seed(123)
inMatrix <- replicate(m, sample(c(1, - 1), 1) * cumsum(runif(n)))
inMatrix
# [,1] [,2] [,3] [,4] [,5]
# [1,] 0.7883051 -0.4566147 0.04205953 -0.7085305 -0.7954674
# [2,] 1.1972821 -1.4134481 0.36998025 -1.2525965 -0.8200811
# [3,] 2.0802995 -1.8667822 1.32448390 -1.8467385 -1.2978771
# [4,] 3.0207667 -2.5443529 2.21402322 -2.1358983 -2.0563366
# [5,] 3.0663232 -3.1169863 2.90682662 -2.2830119 -2.2727445
# [6,] 3.5944287 -3.2199110 3.54733344 -3.2460361 -2.5909256
# [7,] 4.4868478 -4.1197359 4.54160321 -4.1483352 -2.8225513
dstMatrix <- matrix(nrow = m, ncol = m)
for (i in 1:(m - 1)) {
for (j in (i+1):m) {
if (j <= n) {
tmp = cor(inMatrix[, i], inMatrix[, j])
dstMatrix[i,j] = tmp;
}
}
}
dstMatrix
# [,1] [,2] [,3] [,4] [,5]
# [1,] NA -0.9823516 0.9902370 -0.9688212 -0.9825973
# [2,] NA NA -0.9811424 0.9570599 0.9626469
# [3,] NA NA NA -0.9742235 -0.9862355
# [4,] NA NA NA NA 0.9331879
# [5,] NA NA NA NA NA
dstMatrix_2 <- cor(inMatrix)
dstMatrix_2
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1.0000000 -0.9823516 0.9902370 -0.9688212 -0.9825973
# [2,] -0.9823516 1.0000000 -0.9811424 0.9570599 0.9626469
# [3,] 0.9902370 -0.9811424 1.0000000 -0.9742235 -0.9862355
# [4,] -0.9688212 0.9570599 -0.9742235 1.0000000 0.9331879
# [5,] -0.9825973 0.9626469 -0.9862355 0.9331879 1.0000000
dstMatrix == dstMatrix_2
# [,1] [,2] [,3] [,4] [,5]
# [1,] NA TRUE TRUE FALSE TRUE
# [2,] NA NA TRUE FALSE TRUE
# [3,] NA NA NA FALSE TRUE
# [4,] NA NA NA NA FALSE
# [5,] NA NA NA NA NA
# The difference lies in machine precision magnitude, not sure what caused it:
dstMatrix - dstMatrix_2
# [,1] [,2] [,3] [,4] [,5]
# [1,] NA 0 0 -1.110223e-16 0.000000e+00
# [2,] NA NA 0 2.220446e-16 0.000000e+00
# [3,] NA NA NA -1.110223e-16 0.000000e+00
# [4,] NA NA NA NA 1.110223e-16
# [5,] NA NA NA NA NA
compute correlation coefficient for combinations of columns. combn function is used to get pairs of column numbers
As per #Sotos, function can be passed directly into combn, so it avoids using apply()
cor_vals <- combn(1:col_n, 2, function(x) cor(mat1[, x[1]], mat1[, x[2]]))
# cor_vals <- apply(combn(1:col_n, 2), 2, function(x) cor(mat1[, x[1]], mat1[, x[2]]))
assign names to correlation values
cor_vals <- setNames(cor_vals, combn(1:col_n, 2, paste0, collapse = ''))
cor_vals
# 12 13 23
# 0.1621491 -0.8211970 0.4299367
Data:
set.seed(1L)
row_n <- 3
col_n <- 3
mat1 <- matrix(runif(row_n * col_n, min = 0, max = 20), nrow = row_n, ncol = col_n)
Let's say I have a 5 by 7 matrix and a function f :
a <- matrix(rnorm(7*5),5,7)
f <- function(x,y) sum(x+y)
I would like to compute the matrix b whose element b[i,j] is equal to f(a[i,],a[j,]) without for loops. How could I do ?
You can use outer to apply a function to all possible combinations:
rowNums <- seq(nrow(a)) # vector with all row numbers
outer(rowNums, rowNums, Vectorize(function(x, y) sum(a[x, ] + a[y, ])))
[,1] [,2] [,3] [,4] [,5]
[1,] 6.319860 10.978305 6.911812 2.4609471 4.7021136
[2,] 10.978305 15.636751 11.570257 7.1193924 9.3605589
[3,] 6.911812 11.570257 7.503764 3.0528993 5.2940659
[4,] 2.460947 7.119392 3.052899 -1.3979658 0.8432008
[5,] 4.702114 9.360559 5.294066 0.8432008 3.0843673
Edit:
The calculations are more efficient if you calculate the rowSums before using outer. This code is shorter and faster:
rs <- rowSums(a)
outer(rs, rs, "+")
[,1] [,2] [,3] [,4] [,5]
[1,] 6.319860 10.978305 6.911812 2.4609471 4.7021136
[2,] 10.978305 15.636751 11.570257 7.1193924 9.3605589
[3,] 6.911812 11.570257 7.503764 3.0528993 5.2940659
[4,] 2.460947 7.119392 3.052899 -1.3979658 0.8432008
[5,] 4.702114 9.360559 5.294066 0.8432008 3.0843673
Edit 2:
A solution to your actual problem (see comments):
ta <- t(a) # transpose
apply(a, 1, function(x) colSums(abs(ta - x)))
[,1] [,2] [,3] [,4] [,5]
[1,] 0.000000 10.687579 10.933269 9.306339 7.763612
[2,] 10.687579 0.000000 7.465742 8.517358 7.847622
[3,] 10.933269 7.465742 0.000000 5.768676 6.851272
[4,] 9.306339 8.517358 5.768676 0.000000 6.687477
[5,] 7.763612 7.847622 6.851272 6.687477 0.000000
One way is to use expand.grid to create to subsetting indicies and use on this apply on this:
matrix(apply(expand.grid(seq(nrow(a)),seq(nrow(a))),1,
function(x) f(a[x[1],],a[x[2],])),nrow(a))
[,1] [,2] [,3] [,4] [,5]
[1,] 8.9116431 4.1067161 0.6589584 3.681561 3.207056
[2,] 4.1067161 -0.6982109 -4.1459686 -1.123366 -1.597871
[3,] 0.6589584 -4.1459686 -7.5937263 -4.571123 -5.045629
[4,] 3.6815615 -1.1233656 -4.5711232 -1.548520 -2.023026
[5,] 3.2070558 -1.5978712 -5.0456289 -2.023026 -2.497531
If I have a matrix mat1
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
it is possible via a very simple command to square all individual values by
mat1 * mat1
[,1] [,2] [,3]
[1,] 1 9 25
[2,] 4 16 36
Now, what I want to do is to create a new matrix where all values are computed by e^(old_value), e.g., e^1, e^2, e^3 and so forth. How can I do this?
exp computes the exponential function
> mat1 <- matrix(1:6, nrow=2)
> exp(mat1)
[,1] [,2] [,3]
[1,] 2.718282 20.08554 148.4132
[2,] 7.389056 54.59815 403.4288
Suppose I have a data array,
dat <- array(NA, c(115,45,248))
Q1: What I do if I want to get a new data array,
datnew <- array(NA, c(115,45,248))
in which, all the positive value remain and the negative value changed to NA?
Q2: What I do if I want to get a new data array,
datnew <- array(NA,c(115,45,31))
by averaging with the third dimension, but only averaging every 8 values?
Thanks a lot.
For question 2,
you can reverse the order of the dimensions, then add a dimension representing the groups to average over, then use apply:
tmp <- array( 1:32, c(2,2,8) )
tmp2 <- array( aperm(tmp), c(4,2,2,2) )
apply( tmp2, 2:4, mean )
Answer to Q1:
dat[dat < 0] <- NA
We treat dat as if it were a vector (it is but just with dims).
Answer to Q2:
Following Greg's nice, succinct solution, the solution I had in mind when posting my comment earlier was this (using Greg's tmp)
foo <- function(x, grp) aggregate(x, by = list(grp = grp), mean)$x
apply(tmp, 2:1, foo, grp = gl(2,4))
Examples:
Q1
> dat <- array(rnorm(3*3*3), c(3,3,3))
> dat
, , 1
[,1] [,2] [,3]
[1,] 0.1427815 0.1642626 -0.6876034
[2,] 0.6791252 2.1420478 -0.7073936
[3,] -0.9695173 -1.1050933 -0.3068230
, , 2
[,1] [,2] [,3]
[1,] 0.8246182 0.5132398 2.5428203
[2,] -0.4328711 0.9080648 -0.1231653
[3,] -0.7798170 -1.1160706 -0.9237559
, , 3
[,1] [,2] [,3]
[1,] -0.79505298 0.8795420 0.4520150
[2,] 0.04154077 -1.0422061 0.4657002
[3,] -0.67168971 0.7925304 -0.5461143
> dat[dat < 0] <- NA
> dat
, , 1
[,1] [,2] [,3]
[1,] 0.1427815 0.1642626 NA
[2,] 0.6791252 2.1420478 NA
[3,] NA NA NA
, , 2
[,1] [,2] [,3]
[1,] 0.8246182 0.5132398 2.542820
[2,] NA 0.9080648 NA
[3,] NA NA NA
, , 3
[,1] [,2] [,3]
[1,] NA 0.8795420 0.4520150
[2,] 0.04154077 NA 0.4657002
[3,] NA 0.7925304 NA
Q2
> foo <- function(x, grp) aggregate(x, by = list(grp = grp), mean)$x
> apply(tmp, 2:1, foo, grp = gl(2,4))
, , 1
[,1] [,2]
[1,] 7 9
[2,] 23 25
, , 2
[,1] [,2]
[1,] 8 10
[2,] 24 26
> all.equal(apply(tmp, 2:1, foo, grp = gl(2,4)), apply( tmp2, 2:4, mean ))
[1] TRUE
For question 1:
tmp2 <- ifelse(tmp1<0,tmp1,NA)
For question 2 see Greg's solution.