using cor.test function in R - r

If x be a n*m matrix, when I use cor(x), I have a m*m correlation matrix between each pair of columns.
How can I use cor.test function on the n*m matrix to have a m*m p-value matrix also?

There may be an existing function, but here's my version. p_cor_mat runs cor.test on each pair of columns in matrix x and records the p-value. These are then put into a square matrix and returned.
# Set seed
set.seed(42)
# Matrix of data
x <- matrix(runif(120), ncol = 4)
# Function for creating p value matrix
p_cor_mat <- function(x){
# All combinations of columns
colcom <- t(combn(1:ncol(x), 2))
# Calculate p values
p_vals <- apply(colcom, MAR = 1, function(i)cor.test(x[,i[1]], x[,i[2]])$p.value)
# Create matrix for result
p_mat <- diag(ncol(x))
# Fill upper & lower triangles
p_mat[colcom] <- p_mat[colcom[,2:1]] <- p_vals
# Return result
p_mat
}
# Test function
p_cor_mat(x)
#> [,1] [,2] [,3] [,4]
#> [1,] 1.0000000 0.4495713 0.9071164 0.8462530
#> [2,] 0.4495713 1.0000000 0.5960786 0.7093539
#> [3,] 0.9071164 0.5960786 1.0000000 0.7466226
#> [4,] 0.8462530 0.7093539 0.7466226 1.0000000
Created on 2019-03-06 by the reprex package (v0.2.1)

Please also see the cor.mtest() function in the corrplot package.
https://www.rdocumentation.org/packages/corrplot/versions/0.92/topics/cor.mtest

Related

When I concatenate in R am I creating a row or a column?

I concatenate the following:
ExampleConCat <- c(1, 1, 1, 0) and I have a 20x4 matrix (MatrixExample as below).
I can do matrix multiplication in Rstudio as below:
matrix.multipl <- MatrixExample %*% ExampleConCat
I get the below results:
# [,1]
# cycle_1 0.99019608
# cycle_2 0.96400149
# cycle_3 0.91064055
# cycle_4 0.83460040
# cycle_5 0.74478532
# cycle_6 0.64981877
# cycle_7 0.55637987
# cycle_8 0.46893791
# cycle_9 0.39005264
# cycle_10 0.32083829
# cycle_11 0.26141338
# cycle_12 0.21127026
# cycle_13 0.16955189
# cycle_14 0.13524509
# cycle_15 0.10730721
# cycle_16 0.08474320
# cycle_17 0.06664783
# cycle_18 0.05222437
# cycle_19 0.04078855
# cycle_20 0.03176356
My understanding is that:
To multiply an m×n matrix by an n×p matrix, the ns must be the same, and the result is an m×p matrix. https://www.mathsisfun.com/algebra/matrix-multiplying.html
So, the fact that it calculates at all indicates to me that concatenate above creates a column, i.e.: MatrixExample is a 20X4 matrix, thus ExampleConCat must be a 4X1 vector, in order for these two to multiply by eachother.
Or, are there different rules when one multiplies a vector by a matrix, and could you explain those to me simply?
I noticed that when I tried
matrix.multipl <- ExampleConCat %*% MatrixExample
I get the following:
Error in ExampleConCat %*% MatrixExample : non-conformable arguments
I would appreciate an explanation which reflects that I am new to R and newer still to matrix multiplication.
# MatrixExample:
# State A State B State C State D
# cycle_1 0.721453287 0.201845444 0.06689735 0.009803922
# cycle_2 0.520494846 0.262910628 0.18059602 0.035998510
# cycle_3 0.375512717 0.257831905 0.27729592 0.089359455
# cycle_4 0.270914884 0.225616773 0.33806874 0.165399604
# cycle_5 0.195452434 0.185784574 0.36354831 0.255214678
# cycle_6 0.141009801 0.147407084 0.36140189 0.350181229
# cycle_7 0.101731984 0.114117654 0.34053023 0.443620127
# cycle_8 0.073394875 0.086845747 0.30869729 0.531062087
# cycle_9 0.052950973 0.065278842 0.27182282 0.609947364
# cycle_10 0.038201654 0.048620213 0.23401643 0.679161707
# cycle_11 0.027560709 0.035963116 0.19788955 0.738586622
# cycle_12 0.019883764 0.026460490 0.16492601 0.788729740
# cycle_13 0.014345207 0.019389137 0.13581754 0.830448113
# cycle_14 0.010349397 0.014162175 0.11073351 0.864754914
# cycle_15 0.007466606 0.010318351 0.08952225 0.892692795
# cycle_16 0.005386808 0.007502899 0.07185350 0.915256795
# cycle_17 0.003886330 0.005447095 0.05731440 0.933352173
# cycle_18 0.002803806 0.003949642 0.04547092 0.947775632
# cycle_19 0.002022815 0.002860998 0.03590474 0.959211445
# cycle_20 0.001459366 0.002070768 0.02823342 0.968236444
If you check the help section help("%*%"), it briefly describes the rule for matrix multiplcation is used for vectors.
Multiplies two matrices, if they are conformable. If one argument is a vector, it will be promoted to either a row or column matrix to make the two arguments conformable. If both are vectors of the same length, it will return the inner product (as a matrix).
Doing MatrixExample %*% ExampleConCat, as you rightly pointed out conforms to those rules, ExampleConCat is treated as a 4 by 1 matrix. But when ExampleConCat %*% MatrixExample is done, the dimensions don't match i.e. ExampleConCat has 4*1 (or 1*4) whereas MatrixExample has 20*4 as dimension.
The vector will be converted to either row or column matrix, whichever makes the matrix work, as an example please see below:
exm = c(1,1,1,0)
exm_matrix = matrix(rnorm(16),
ncol=4)
exm_matrix%*%exm
#> [,1]
#> [1,] 2.1098758
#> [2,] -1.4432619
#> [3,] -0.2540392
#> [4,] -0.4211889
exm%*%exm_matrix
#> [,1] [,2] [,3] [,4]
#> [1,] 1.161164 -0.3602107 -0.3883783 -1.580562
Created on 2021-07-02 by the reprex package (v0.3.0)

Is there a way to generate a matrix in R with at least some negative eigenvalues?

I want to generate a matrix with at least some negative eigenvalues? I am attempting to use the spectral decomposition of a matrix to do so but it does not guarantee at least one negative eigenvalue
Here is a simple example that may help you construct such kind of matrix
library(pracma)
N <- 3
U <- randortho(N, type = "orthonormal")
A <- diag(sample(c(-runif(1),rnorm(N-1)))) # ensure at least one negative eigenvalue
M <- U %*% A %*% t(U)
then
> M
[,1] [,2] [,3]
[1,] -0.36818879 0.02406988 0.1634275
[2,] 0.02406988 -0.72613068 -0.1872272
[3,] 0.16342748 -0.18722722 -0.3116400
To double check the eigenvalues
> eig(M)
[1] -0.1432527 -0.4484647 -0.8142421
and
> A
[,1] [,2] [,3]
[1,] -0.1432527 0.0000000 0.0000000
[2,] 0.0000000 -0.4484647 0.0000000
[3,] 0.0000000 0.0000000 -0.8142421
What if you generate a random positive definite matrix and multiply it by -1 ... ?
In this example I'll create a lower-triangular matrix with positive diagonal and multiply it by its transpose (there are lots of other ways):
set.seed(101)
m <- matrix(0,5,5)
m[lower.tri(m,diag=TRUE)] <- rnorm(15)
diag(m) <- abs(diag(m))
m2 <- m %*% t(m)
(If you only want it semidefinite you just need to make sure the diagonal is non-negative ...)
Is it pos def?
v1 <- eigen(m2)$values
## [1] 5.976142640 1.908831945 0.904991040 0.037025982 0.002181558
all(v1>0) ## TRUE
Check that evals(-m2) == -evals(m2) ...
v2 <- eigen(-m2)$values
all(v2<0) ## TRUE
all.equal(sort(v1),-v2) ## TRUE

Generate multivariate normal r.v.'s with rank-deficient covariance via Pivoted Cholesky Factorization

I'm just beating my head against the wall trying to get a Cholesky decomposition to work in order to simulate correlated price movements.
I use the following code:
cormat <- as.matrix(read.csv("http://pastebin.com/raw/qGbkfiyA"))
cormat <- cormat[,2:ncol(cormat)]
rownames(cormat) <- colnames(cormat)
cormat <- apply(cormat,c(1,2),FUN = function(x) as.numeric(x))
chol(cormat)
#Error in chol.default(cormat) :
# the leading minor of order 8 is not positive definite
cholmat <- chol(cormat, pivot=TRUE)
#Warning message:
# In chol.default(cormat, pivot = TRUE) :
# the matrix is either rank-deficient or indefinite
rands <- array(rnorm(ncol(cholmat)), dim = c(10000,ncol(cholmat)))
V <- t(t(cholmat) %*% t(rands))
#Check for similarity
cor(V) - cormat ## Not all zeros!
#Check the standard deviations
apply(V,2,sd) ## Not all ones!
I'm not really sure how to properly use the pivot = TRUE statement to generate my correlated movements. The results look totally bogus.
Even if I have a simple matrix and I try out "pivot" then I get bogus results...
cormat <- matrix(c(1,.95,.90,.95,1,.93,.90,.93,1), ncol=3)
cholmat <- chol(cormat)
# No Error
cholmat2 <- chol(cormat, pivot=TRUE)
# No warning... pivot changes column order
rands <- array(rnorm(ncol(cholmat)), dim = c(10000,ncol(cholmat)))
V <- t(t(cholmat2) %*% t(rands))
#Check for similarity
cor(V) - cormat ## Not all zeros!
#Check the standard deviations
apply(V,2,sd) ## Not all ones!
There are two errors with your code:
You did not use pivoting index to revert the pivoting done to the Cholesky factor. Note, pivoted Cholesky factorization for a semi-positive definite matrix A is doing:
P'AP = R'R
where P is a column pivoting matrix, and R is an upper triangular matrix. To recover A from R, we need apply the inverse of P (i.e., P'):
A = PR'RP' = (RP')'(RP')
Multivariate normal with covariance matrix A, is generated by:
XRP'
where X is multivariate normal with zero mean and identity covariance.
Your generation of X
X <- array(rnorm(ncol(R)), dim = c(10000,ncol(R)))
is wrong. First, it should not be ncol(R) but nrow(R), i.e., the rank of X, denoted by r. Second, you are recycling rnorm(ncol(R)) along columns, and the resulting matrix is not random at all. Therefore, cor(X) is never close to an identity matrix. The correct code is:
X <- matrix(rnorm(10000 * r), 10000, r)
As a model implementation of the above theory, consider your toy example:
A <- matrix(c(1,.95,.90,.95,1,.93,.90,.93,1), ncol=3)
We compute the upper triangular factor (suppressing possible rank-deficient warnings) and extract inverse pivoting index and rank:
R <- suppressWarnings(chol(A, pivot = TRUE))
piv <- order(attr(R, "pivot")) ## reverse pivoting index
r <- attr(R, "rank") ## numerical rank
Then we generate X. For better result we centre X so that column means are 0.
X <- matrix(rnorm(10000 * r), 10000, r)
## for best effect, we centre `X`
X <- sweep(X, 2L, colMeans(X), "-")
Then we generate target multivariate normal:
## compute `V = RP'`
V <- R[1:r, piv]
## compute `Y = X %*% V`
Y <- X %*% V
We can verify that Y has target covariance A:
cor(Y)
# [,1] [,2] [,3]
#[1,] 1.0000000 0.9509181 0.9009645
#[2,] 0.9509181 1.0000000 0.9299037
#[3,] 0.9009645 0.9299037 1.0000000
A
# [,1] [,2] [,3]
#[1,] 1.00 0.95 0.90
#[2,] 0.95 1.00 0.93
#[3,] 0.90 0.93 1.00

Generation of random variables

I have a problem about the generation of random variables with R .
I have to generate random variables
$X_{ij}$ (i=1,...,25, j=1,...,5 ) knowing that
each X_ij follows a binomial distribution
$X_{ij} \sim Bin(n_{ij}, p_{ij})
$and I know already
$n_{ij}$ and $p_{ij}$
for each index. How to generate these random variable?
I don't know if it could be useful, but I have generated $p_{ij}$ knowing that they are also random variable which follow a beta distribution (hence actually $X_{ij}$ follow a beta binomial)
Let's say you had the following matrices for n and p:
(n <- matrix(4:7, nrow=2))
# [,1] [,2]
# [1,] 4 6
# [2,] 5 7
set.seed(144)
(p <- matrix(rbeta(4, 1, 2), nrow=2))
# [,1] [,2]
# [1,] 0.1582904 0.2794913
# [2,] 0.5176909 0.2889718
Now you can draw samples X_{ij} with something like:
set.seed(144)
matrix(apply(cbind(as.vector(n), as.vector(p)), 1, function(x) rbinom(1, x[1], x[2])), nrow=2)
# [,1] [,2]
# [1,] 0 2
# [2,] 2 2
The cbind part of this expression builds a 2-column matrix containing each (n, p) pairing and the apply part draws a single binomially distributed sample for each (n, p) pair, with the matrix part converting the resulting vector to a matrix.

Pairwise Correlation Table

I'm new to R, so I apologize if this is a straightforward question, however I've done quite a bit of searching this evening and can't seem to figure it out. I've got a data frame with a whole slew of variables, and what I'd like to do is create a table of the correlations among a subset of these, basically the equivalent of "pwcorr" in Stata, or "correlations" in SPSS. The one key to this is that not only do I want the r, but I also want the significance associated with that value.
Any ideas? This seems like it should be very simple, but I can't seem to figure out a good way.
Bill Venables offers this solution in this answer from the R mailing list to which I've made some slight modifications:
cor.prob <- function(X, dfr = nrow(X) - 2) {
R <- cor(X)
above <- row(R) < col(R)
r2 <- R[above]^2
Fstat <- r2 * dfr / (1 - r2)
R[above] <- 1 - pf(Fstat, 1, dfr)
cor.mat <- t(R)
cor.mat[upper.tri(cor.mat)] <- NA
cor.mat
}
So let's test it out:
set.seed(123)
data <- matrix(rnorm(100), 20, 5)
cor.prob(data)
[,1] [,2] [,3] [,4] [,5]
[1,] 1.0000000 NA NA NA NA
[2,] 0.7005361 1.0000000 NA NA NA
[3,] 0.5990483 0.6816955 1.0000000 NA NA
[4,] 0.6098357 0.3287116 0.5325167 1.0000000 NA
[5,] 0.3364028 0.1121927 0.1329906 0.5962835 1
Does that line up with cor.test?
cor.test(data[,2], data[,3])
Pearson's product-moment correlation
data: data[, 2] and data[, 3]
t = 0.4169, df = 18, p-value = 0.6817
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.3603246 0.5178982
sample estimates:
cor
0.09778865
Seems to work ok.
Here is something that I just made, I stumbled on this post because I was looking for a way to take every pair of variables, and get a tidy nX3 dataframe. Column 1 is a variable, Column 2 is a variable, and Column 3 and 4 are their absolute value and true correlation. Just pass the function a dataframe of numeric and integer values.
pairwiseCor <- function(dataframe){
pairs <- combn(names(dataframe), 2, simplify=FALSE)
df <- data.frame(Vairable1=rep(0,length(pairs)), Variable2=rep(0,length(pairs)),
AbsCor=rep(0,length(pairs)), Cor=rep(0,length(pairs)))
for(i in 1:length(pairs)){
df[i,1] <- pairs[[i]][1]
df[i,2] <- pairs[[i]][2]
df[i,3] <- round(abs(cor(dataframe[,pairs[[i]][1]], dataframe[,pairs[[i]][2]])),4)
df[i,4] <- round(cor(dataframe[,pairs[[i]][1]], dataframe[,pairs[[i]][2]]),4)
}
pairwiseCorDF <- df
pairwiseCorDF <- pairwiseCorDF[order(pairwiseCorDF$AbsCor, decreasing=TRUE),]
row.names(pairwiseCorDF) <- 1:length(pairs)
pairwiseCorDF <<- pairwiseCorDF
pairwiseCorDF
}
This is what the output is:
> head(pairwiseCorDF)
Vairable1 Variable2 AbsCor Cor
1 roll_belt accel_belt_z 0.9920 -0.9920
2 gyros_dumbbell_x gyros_dumbbell_z 0.9839 -0.9839
3 roll_belt total_accel_belt 0.9811 0.9811
4 total_accel_belt accel_belt_z 0.9752 -0.9752
5 pitch_belt accel_belt_x 0.9658 -0.9658
6 gyros_dumbbell_z gyros_forearm_z 0.9491 0.9491
I've found that the R package picante does a nice job dealing with the problem that you have. You can easily pass your dataset to the cor.table function and get a table of correlations and p-values for all of your variables. You can specify Pearson's r or Spearman in the function. See this link for help:
http://www.inside-r.org/packages/cran/picante/docs/cor.table
Also remember to remove any non-numeric columns from your dataset prior to running the function. Here's an example piece of code:
install.packages("picante")
library(picante)
#Insert the name of your dataset in the code below
cor.table(dataset, cor.method="pearson")
You can use the sjt.corr function of the sjPlot-package, which gives you a nicely formatted correlation table, ready for use in your Office application.
Simplest function call is just to pass the data frame:
sjt.corr(df)
See examples here.

Resources