I have a large correlation matrix (something like 50*50).
I calculated the matrix using cor(mydata) function.
Now I would like to have equal significance matrix.
Using cor.test() I can have one significance level but is there a easy way to get all 1200?
The function cor_pmat from the ggcorrplot package gives you the p-values of correlations.
library(ggcorrplot)
set.seed(123)
xmat <- matrix(rnorm(50), ncol = 5)
cor_pmat(xmat)
[,1] [,2] [,3] [,4] [,5]
[1,] 0.00000000 0.08034470 0.24441138 0.03293644 0.3234899
[2,] 0.08034470 0.00000000 0.08716815 0.44828479 0.4824117
[3,] 0.24441138 0.08716815 0.00000000 0.20634394 0.9504582
[4,] 0.03293644 0.44828479 0.20634394 0.00000000 0.8378530
[5,] 0.32348990 0.48241166 0.95045815 0.83785303 0.0000000
I think this should do what you want, we use expand.grid in conjunction with the apply function:
Since you didn't provide your data, I created my own set.
set.seed(123)
xmat <- matrix(rnorm(50), ncol = 5)
matrix(apply(expand.grid(1:ncol(xmat), 1:ncol(xmat)),
1,
function(x) cor.test(xmat[,x[1]], xmat[,x[2]])$`p.value`),
ncol = ncol(xmat), byrow = T)
[,1] [,2] [,3] [,4] [,5]
[1,] 0.00000000 0.08034470 0.24441138 3.293644e-02 0.3234899
[2,] 0.08034470 0.00000000 0.08716815 4.482848e-01 0.4824117
[3,] 0.24441138 0.08716815 0.00000000 2.063439e-01 0.9504582
[4,] 0.03293644 0.44828479 0.20634394 1.063504e-62 0.8378530
[5,] 0.32348990 0.48241166 0.95045815 8.378530e-01 0.0000000
Note that if you didn't want a matrix, and instead were comfortable with a data.frame, we could use combn which would involve much less iteration and be more efficient.
cbind(t(combn(1:ncol(xmat), 2)),
combn(1:ncol(xmat), 2, function(x) cor.test(xmat[,x[1]], xmat[,x[2]])$`p.value`)
)
[,1] [,2] [,3]
[1,] 1 2 0.08034470
[2,] 1 3 0.24441138
[3,] 1 4 0.03293644
[4,] 1 5 0.32348990
[5,] 2 3 0.08716815
[6,] 2 4 0.44828479
[7,] 2 5 0.48241166
[8,] 3 4 0.20634394
[9,] 3 5 0.95045815
[10,] 4 5 0.83785303
Alternatively, we can perform the same operation, but use the pipe operator %>% to make it a bit more concise:
library(magrittr)
combn(1:ncol(xmat), 2) %>%
apply(., 2, function(x) cor.test(xmat[,x[1]], xmat[,x[2]])$`p.value`) %>%
cbind(t(combn(1:ncol(xmat), 2)), .)
Here is one solution:
data <- swiss
#cor(data)
n <- ncol(data)
p.value.vec <- apply(combn(1:ncol(data), 2), 2, function(x)cor.test(data[,x[1]], data[,x[2]])$p.value)
p.value.matrix = matrix(0, n, n)
p.value.matrix[upper.tri(p.value.matrix, diag=FALSE)] = p.value.vec
p.value.matrix[lower.tri(p.value.matrix, diag=FALSE)] = p.value.vec
p.value.matrix
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0.000000e+00 1.491720e-02 9.450437e-07 1.028523e-03 1.304590e-06 2.588308e-05
[2,] 1.491720e-02 0.000000e+00 3.658617e-07 3.585238e-03 5.204434e-03 4.453814e-01
[3,] 9.450437e-07 9.951515e-08 0.000000e+00 9.951515e-08 6.844724e-01 3.018078e-01
[4,] 3.658617e-07 1.304590e-06 4.811397e-08 0.000000e+00 4.811397e-08 5.065456e-01
[5,] 1.028523e-03 5.204434e-03 2.588308e-05 3.018078e-01 0.000000e+00 2.380297e-01
[6,] 3.585238e-03 6.844724e-01 4.453814e-01 5.065456e-01 2.380297e-01 0.000000e+00
Related
I have a square matrix as follows.
> ex_mat
[,1] [,2] [,3] [,4]
[1,] 0.4270634 2.1920890 0.5647472 1.7149861
[2,] 2.0556220 1.1157322 2.6723637 0.3155507
[3,] 1.2252602 0.1063053 0.6396099 0.7903348
[4,] 0.3614062 1.1118661 0.5000143 0.2491543
I've ranked the upper off-diagonal part of the matrix in row order (from largest to smallest) with this.
> rank(-(t(ex_mat)[lower.tri(ex_mat)]))
[1] 2 5 3 1 6 4
I want to replace the upper off-diagonal elements of "ex_mat" with the ranks obtained above. I'm using https://statisticsglobe.com/modify-diagonal-lower-upper-triangular-part-matrix-r as example code. While my ranks are correct, the code seems to be inserting the ranks in column order.
> ex_mat_new <- ex_mat
> ex_mat_new[upper.tri(ex_mat_new)] <- rank(-(t(ex_mat)[lower.tri(ex_mat)]))
> ex_mat_new
[,1] [,2] [,3] [,4]
[1,] 0.4270634 2.0000000 5.0000000 1.0000000
[2,] 2.0556220 1.1157322 3.0000000 6.0000000
[3,] 1.2252602 0.1063053 0.6396099 4.0000000
[4,] 0.3614062 1.1118661 0.5000143 0.2491543
How can I fix this? [1,4] and [2,3] are off. Thank you.
The value insertions are in column order (in matrix, data.frame). We may assign on the lower.tri and then get the transpose
ex_mat_new[lower.tri(ex_mat_new)] <- rank(-(t(ex_mat)[lower.tri(ex_mat)]))
ex_mat_new <- t(ex_mat_new)
ex_mat_new[lower.tri(ex_mat_new)] <- ex_mat[lower.tri(ex_mat)]
-output
> ex_mat_new
[,1] [,2] [,3] [,4]
[1,] 0.4270634 2.0000000 5.0000000 3.0000000
[2,] 2.0556220 1.1157322 1.0000000 6.0000000
[3,] 1.2252602 0.1063053 0.6396099 4.0000000
[4,] 0.3614062 1.1118661 0.5000143 0.2491543
or this can be done in a single line with replace
t(replace(t(ex_mat), lower.tri(ex_mat), rank(-(t(ex_mat)[lower.tri(ex_mat)]))))
-output
[,1] [,2] [,3] [,4]
[1,] 0.4270634 2.0000000 5.0000000 3.0000000
[2,] 2.0556220 1.1157322 1.0000000 6.0000000
[3,] 1.2252602 0.1063053 0.6396099 4.0000000
[4,] 0.3614062 1.1118661 0.5000143 0.2491543
data
ex_mat <- structure(c(0.4270634, 2.055622, 1.2252602, 0.3614062, 2.192089,
1.1157322, 0.1063053, 1.1118661, 0.5647472, 2.6723637, 0.6396099,
0.5000143, 1.7149861, 0.3155507, 0.7903348, 0.2491543), .Dim = c(4L,
4L), .Dimnames = list(NULL, NULL))
I have 3 vectors as such:
p
[,1]
[1,] 0.002715955
[2,] 0.004460214
[3,] 0.006855524
[4,] 0.007438570
[5,] 0.042657555
q
[,1]
[1,] 0.9972840
[2,] 0.9955398
[3,] 0.9931445
[4,] 0.9925614
[5,] 0.9573424
f
[,1]
[1,] 0.008364157
[2,] 0.013058930
[3,] 0.019131397
[4,] 0.020559785
[5,] 0.090786054
I want to perform the following operation so as to obtain the var-cov matrix (which fills only the upper triangle of the matrix)
v_ij <- p_i*q_j/(n*f_i*f_j); 1<=i<=j<=n
Where n is the number of observations(here 5). so I wrote the following code to perform the code
v_ij <- matrix(nrow=length(p), ncol=length(p))
for(i in 1:nrow(p)){
for(j in 1:nrow(q)){
if(i==j){
diag(v_ij) <- p[i]*q[j]/(f[i]*f[j])
}
}
}
This code was only able to fill the diagonal but I didn't which function could fill the upper part of the matrix for i
Maybe you can use tcrossprod + lower.tri, e.g.,
v <- tcrossprod(p,q)/tcrossprod(f)/5
v[lower.tri(v)] <- 0
such that
> v
[,1] [,2] [,3] [,4] [,5]
[1,] 0.04444444 0.050 0.05333333 0.05555556 0.05714286
[2,] 0.00000000 0.075 0.08000000 0.08333333 0.08571429
[3,] 0.00000000 0.000 0.09600000 0.10000000 0.10285714
[4,] 0.00000000 0.000 0.00000000 0.11111111 0.11428571
[5,] 0.00000000 0.000 0.00000000 0.00000000 0.12244898
DATA
p <- matrix(1:5,nrow = 5,ncol = 1)
q <- matrix(2:6,nrow = 5,ncol = 1)
f <- matrix(3:7,nrow = 5,ncol = 1)
You can use upper.tri() as suggested to produce the upper part if that is all you need. The cov() function produces the full symmetric matrix:
X <- matrix(rnorm(15), ncol=3)
vcov <- cov(X)
If you run
vcov[upper.tri(vcov, diag=TRUE)]
then you only extract the elements but return them as a vector, but
vcov*upper.tri(vcov, diag=TRUE)
will give you what you want.
> X <- matrix(rnorm(15), ncol=3)
> vcov <- cov(X)
> vcov[upper.tri(vcov, diag=TRUE)]
[1] 0.2698385 0.2844101 0.9375529 0.1457635 0.0223076 0.1626811
> vcov*upper.tri(vcov, diag=TRUE)
[,1] [,2] [,3]
[1,] 0.2698385 0.2844101 0.1457635
[2,] 0.0000000 0.9375529 0.0223076
[3,] 0.0000000 0.0000000 0.1626811
I need to create an accumulation index across columns in my data. I set up the problem as follows
#accumulation function
mat1 <- matrix(nrow=16, ncol =4)
mat1[1,] <- c(1,1,1,1)
mat1[2:16,] <- 1+rnorm(60,0,0.1)
[,1] [,2] [,3] [,4]
[1,] 1.0000000 1.0000000 1.0000000 1.0000000
[2,] 0.9120755 0.9345682 0.8533162 0.8737582
[3,] 0.7838427 0.9691806 0.8216284 0.9863669
[4,] 0.9095204 1.1906031 1.0253083 1.0700338
[5,] 1.0202524 0.9974672 1.1348315 1.1115018
[6,] 0.9456184 1.1250529 1.0348011 0.9323336
[7,] 1.0053195 0.9917475 1.0178855 1.0880626
[8,] 0.9550709 0.9107060 0.8876688 0.9060996
[9,] 1.0728177 1.0559643 0.9161789 0.9711522
[10,] 0.9579642 1.0082560 0.9833227 0.9306639
[11,] 1.0044883 1.1323498 1.0388025 0.8926033
[12,] 0.8777846 0.9940302 0.8314166 0.8479962
[13,] 1.1042297 0.9767410 0.9355374 0.8859680
[14,] 1.1245737 0.8291948 1.0491585 0.9887672
[15,] 0.9687700 0.9915095 0.8962534 1.0220163
[16,] 0.9432597 1.0310273 0.9288159 1.0838243
The desired output takes the product of entries in each column, up to each row number.
therefore:
mat2 <- matrix(nrow=16, ncol=4)
mat2[1,] <- c(1,1,1,1)
mat2[2,] <- mat1[1,]*mat1[2,]
mat2[3,] <-mat1[1,]*mat1[2,]*mat1[3,]
mat2[4,] <-mat1[1,]*mat1[2,]*mat1[3,]*mat1[4,]
and so on and so forth up to row 16. The idea is to accumulate (take the product) of all entries in mat1 up to a particular row number. So row1 of mat2 = row 1 of mat 1. row 2 of mat 2, is equal to row1 mat1 *row2 mat1. row3 of mat2 is equal to row1 of mat1 *row2 of mat1, *row3 of mat1. This process continues up to row 16.
I need to write a function able to do this calculation for matrices in a list all of the same size.
Basically what you need is cumulative product over each column which can be applied using cumprod function in base R
apply(mat1, 2, cumprod)
# [,1] [,2] [,3] [,4]
# [1,] 1.0000 1.0000 1.0000 1.0000
# [2,] 0.8793 0.9890 1.1102 0.9031
# [3,] 0.9037 0.9384 1.0574 0.8031
# [4,] 1.0017 0.8529 0.9824 0.7026
# [5,] 0.7667 0.7815 0.9332 0.6658
# [6,] 0.7996 0.9703 0.7811 0.6327
# [7,] 0.8401 0.9833 0.6899 0.5184
# [8,] 0.7918 0.9351 0.5395 0.4883
# [9,] 0.7485 0.8939 0.4672 0.4341
#[10,] 0.7063 0.9350 0.4534 0.3901
#[11,] 0.6434 0.8701 0.4323 0.3837
#[12,] 0.6127 0.7441 0.4950 0.4053
#[13,] 0.5515 0.7869 0.4421 0.4721
#[14,] 0.5087 0.7063 0.4043 0.4356
#[15,] 0.5120 0.7052 0.3929 0.5056
#[16,] 0.5611 0.6392 0.3538 0.4470
data
set.seed(1234)
mat1 <- matrix(nrow=16, ncol =4)
mat1[1,] <- c(1,1,1,1)
mat1[2:16,] <- 1+rnorm(60,0,0.1)
We can make use of rowCumprods from matrixStats which would be efficient
library(matrixStats)
rowCumprods(mat1)
# [,1] [,2] [,3] [,4]
# [1,] 1.0000000 1.0000000 1.0000000 1.0000000
# [2,] 0.8792934 0.8695961 0.9654515 0.8719461
# [3,] 1.0277429 0.9752243 0.9288433 0.8259908
# [4,] 1.1084441 1.0074432 0.9359711 0.8187889
# [5,] 0.7654302 0.7013506 0.6661948 0.6312977
# [6,] 1.0429125 1.2948629 1.0839177 1.0300632
# [7,] 1.0506056 1.0646930 0.9403774 0.7705423
# [8,] 0.9425260 0.8962776 0.7008855 0.6600887
# [9,] 0.9453368 0.9036902 0.7825060 0.6957347
#[10,] 0.9435548 0.9869196 0.9578751 0.8606545
#[11,] 0.9109962 0.8477986 0.8082998 0.7951804
#[12,] 0.9522807 0.8143710 0.9324137 0.9849138
#[13,] 0.9001614 0.9518986 0.8501747 0.9902680
#[14,] 0.9223746 0.8279552 0.7571348 0.6985816
#[15,] 1.0064459 1.0049223 0.9767219 1.1335746
#[16,] 1.0959494 0.9933742 0.8945990 0.7910216
data
set.seed(1234)
mat1 <- matrix(nrow=16, ncol =4)
mat1[1,] <- c(1,1,1,1)
mat1[2:16,] <- 1+rnorm(60,0,0.1)
n the matrix example below (Stocks Return) :
IBOV PETR4 VALE5 ITUB4 BBDC4 PETR3
[1,] -0.03981646 -0.027412907 -0.051282051 -0.05208333 -0.047300526 -0.059805285
[2,] -0.03000415 -0.030534351 -0.046332046 -0.03943116 -0.030090271 -0.010355030
[3,] -0.02241318 -0.026650515 0.000000000 -0.04912517 -0.077559462 0.005231689
[4,] -0.05584830 -0.072184194 -0.066126856 -0.04317056 -0.066704036 0.000000000
[5,] 0.01196833 -0.004694836 0.036127168 -0.00591716 -0.006006006 Inf
[6,] 0.02039587 0.039083558 0.009762901 0.01488095 0.024169184 0.011783189
I would like to replace the 0 (Zeros) and Inf values for the values of the same row in the first column.
Here's a sample matrix
set.seed(15)
stocks<-matrix(rnorm(3*5), nrow=3)
stocks[cbind(c(2,3,1),c(4,4,2))] <- 0
stocks[2,2] <- Inf
stocks
# [,1] [,2] [,3] [,4] [,5]
# [1,] 0.2588229 0.000000 0.0227882 -1.075001 0.1655543
# [2,] 1.8311207 Inf 1.0907732 0.000000 -1.2427850
# [3,] -0.3396186 -1.255386 -0.1321224 0.000000 1.45928777
Now we can find the bad values, and then replace them with the values in the first column of the same row by using matrix indexing and the row() function to find the correct row.
bad <- stocks==0 | is.infinite(stocks)
stocks[bad] <- stocks[row(bad)[bad], 1]
stocks
# [,1] [,2] [,3] [,4] [,5]
# [1,] 0.2588229 0.2588229 0.0227882 -1.0750013 0.1655543
# [2,] 1.8311207 1.8311207 1.0907732 1.8311207 -1.2427850
# [3,] -0.3396186 -1.2553858 -0.1321224 -0.3396186 1.4592877
Is it possible to obtain the actual observations within each cluster after performing k-means in R?
Like for example, after my analysis, I have 2 clusters, and I want to find the exact observations within each cluster, is it possible?
# random samples
x <- matrix(c(rnorm(30,10,2), rnorm(30,0,1)), nrow=12, byrow=T)
# clustering
clusters <- kmeans(x, 2)
# accessing cluster membership
clusters$cluster
[1] 1 1 1 1 1 1 2 2 2 2 2 2
# samples within cluster 1
c1 <- x[which(clusters$cluster == 1),]
# samples within cluster 2
c2 <- x[which(clusters$cluster == 2),]
# printing variables
x
[,1] [,2] [,3] [,4] [,5]
[1,] 10.8415151 9.3075438 9.443433171 13.5402818 7.0574904
[2,] 6.0721775 7.4570368 9.999411972 12.8186182 6.1697638
[3,] 11.3170525 10.9458832 7.576416396 12.7177707 6.7104535
[4,] 8.1377999 8.0558304 9.925363089 11.6547736 9.4911071
[5,] 11.6078294 8.7782984 8.619840508 12.2816048 9.4460169
[6,] 10.2972477 9.1498916 11.769122361 7.6224395 12.0658246
[7,] -0.9373027 -0.5051318 -0.530429758 -0.8200562 -0.0623147
[8,] -0.7257655 -1.1469400 -0.297539831 -0.0477345 -1.0278240
[9,] 0.7285393 -0.6621878 2.914976054 0.6390049 -0.5032553
[10,] 0.2672737 -0.6393167 -0.198287317 0.1430110 -2.2213365
[11,] -0.8679649 0.3354149 -0.003510304 0.6665495 0.6664689
[12,] 0.1731384 -1.8827645 0.270357961 0.3944154 1.3564678
c1
[,1] [,2] [,3] [,4] [,5]
[1,] 10.841515 9.307544 9.443433 13.540282 7.057490
[2,] 6.072177 7.457037 9.999412 12.818618 6.169764
[3,] 11.317053 10.945883 7.576416 12.717771 6.710454
[4,] 8.137800 8.055830 9.925363 11.654774 9.491107
[5,] 11.607829 8.778298 8.619841 12.281605 9.446017
[6,] 10.297248 9.149892 11.769122 7.622439 12.065825
c2
[,1] [,2] [,3] [,4] [,5]
[1,] -0.9373027 -0.5051318 -0.530429758 -0.8200562 -0.0623147
[2,] -0.7257655 -1.1469400 -0.297539831 -0.0477345 -1.0278240
[3,] 0.7285393 -0.6621878 2.914976054 0.6390049 -0.5032553
[4,] 0.2672737 -0.6393167 -0.198287317 0.1430110 -2.2213365
[5,] -0.8679649 0.3354149 -0.003510304 0.6665495 0.6664689
[6,] 0.1731384 -1.8827645 0.270357961 0.3944154 1.3564678