Spectral decomposition (eigen) - r

i have a problem with calculation the Spectral decomposition, i guess, with the sorting of eigen.
According to this website http://www.deltaquants.com/cleaning-correlation-matrices.html i would like to do the same calculation in R
Input <- data.frame(read.csv2(file="testmatrix.csv", header=FALSE, sep=";"))
# same matrix as the example on the website
Eigen <- eigen(Input, only.values=FALSE, symmetric = TRUE)
#Get the eigenvalues/eigenvectors
Eigen$values
Eigen$vectors
The result on the website (excel):
The result from eigen (R)
As the result the new correlation matrix C is not correct.
Thanks for the help. I could provide further information e.c. Code or more details - if it helps.

If you want to order the eigenvalue of a matrix in increasing order, just index eigenvectors and eigenvalues with the output of the order function:
mat <- matrix(c(1, 2, 3, 2, 7, 4, 3, 4, 0), nrow=3)
e <- eigen(mat)
o <- order(e$values, decreasing=FALSE)
e$values[o]
# [1] -2.961797 1.056689 9.905108
e$vectors[,o]
# [,1] [,2] [,3]
# [1,] 0.5110650 0.7915817 -0.3349790
# [2,] 0.2299503 -0.5014262 -0.8340831
# [3,] -0.8282122 0.3492421 -0.4382859

The eigenvalues are in a different order. Both results are correct.
Note that it is accepted practice to order the eigenvalues according to nonincreasing absolute value, as is returned by R, but not by Excel. So if one answer is "wrong," it is Excel's.

Related

Using sapply instead of a for loop

Working on a project where we need to take the average of numbers in a matrix with those around it. For example, imagine a 3x3 matrix such as
[(1,2,3),
(4,5,6),
(7,8,9)].
Step 1 is to add padding around the matrix. Lets say we add 1 layer of padding thus getting a 5x5 matrix
[[0,0,0,0,0],
[0,1,2,3,0],
[0,4,5,6,0],
[0,7,8,9,0],
[0,0,0,0,0]].
matrix(c(0,0,0,0,0,0,1,2,3,0,0,4,5,6,0,0,7,8,9,0,0,0,0,0,0), nrow=5, ncol=5, byrow=T)
Then we average and filter getting the final 3x3 matrix. The first row/first column of this matrix should be (1+2+4+5)/9 = 1.33.
Right now my code works and looks like
for(row in (k+1):(nrow(pad.m) - k)){
for(col in (k+1):(ncol(pad.m) - k)) {
y <- pad.m[seq(row-k, row+k), seq(col-k, col+k)]
filter.m[row-k, col-k]<- mean(y)
}
where k is the number of layers of padding and pad.m is our matrix. Unfortunately my professor says that this is too unwieldy and prefers sapply over 2 for loops. I was wondering how I could subset and iterate through the matrix with sapply.
Use tensorflow. You can use either a convolutional layer or a pooling layer. Example:
library(tensorflow)
mymat <- matrix(c(0,0,0,0,0,0,1,2,3,0,0,4,5,6,0,0,7,8,9,0,0,0,0,0,0), nrow=5, ncol=5, byrow=T) # Your padded matrix
matrix1 <- tf$constant( array(mymat, dim=c(1,nrow(mymat),ncol(mymat),1)), dtype="float64" )
pool1 <- tf$nn$avg_pool(matrix1, c(1L,2L,2L,1L), c(1L,1L,1L,1L), "SAME")
sess <- tf$Session()
sess$run(tf$global_variables_initializer())
res <- pool1$eval(session=sess)
sess$close()
The above takes the average over 2x2 regions. But you added up the 2x2 regions and then divided by 9, which is weird, but okay. So you can get the results like this:
res <- res[1,,,]
(res * 4/9)[-1,][,-1][-(3:4),][,-(3:4)]
[,1] [,2]
[1,] 1.333333 1.777778
[2,] 2.666667 3.111111
The above is just formatting the array output back to matrix.

Generate viable sampling distributions of discrete data in R

I'm trying to simulate 2 X 2 data that would yield a relatively strong negative phi coefficients.
I'm using the library GenOrd as follows:
library(GenOrd)
# Specify sample size N
N <- 40
# Marginal distribution
marginal <- list(c(.5), c(.5))
# Matrix
Sigma <- matrix(c(1.0, -.71, -.71, 1.0), 2, 2, byrow=TRUE)
# Generate a sample of the categorical variables with specified parameters
m <- ordsample(N, marginal, Sigma)
However, I'm getting the following error whenever I input a correlation larger than -.70.
Error in contord(list(marginal[[q]], marginal[[r]]), matrix(c(1, Sigma[q, :
Correlation matrix not valid!
I'm clearly specifying something untenable somewhere - but I don't know what it is.
Help appreciated.
I'll give a go at answering this as a coding question. The error points to where the packages spots the problem beginning: at your Sigma entry. Given your marginal distribution, having -.71 in your corr. matrix is out of bounds and the packages is warning you of this. You can see this by altering the signs in your Sigma:
Sigma <- matrix(c(1.0, .71, .71, 1.0), 2, 2, byrow=TRUE)
m <- ordsample(N, marginal, Sigma)
> m
[,1] [,2]
[1,] 1 1
[2,] 1 2
....
As to WHY -.71 is not valid, you may want to direct that statistical question to Cross Validated for a succinct answer.
I'm not exactly sure "why", however, I found no problems simulating 2 X 2 data that would yield a relatively strong negative correlation using the generate.binary() function from the MultiOrd package.
For example, the following code will work for the complete range of correlation inputs. The documentation for the generate.binary() function indicates that the matrix specified is interpreted as a tetrachoric correlation matrix.
library(MultiOrd)
# Specify sample size N
N <- 40
# Marginal distribution for two variables as a vector for MultiOrd rather than a list
marginal <- c(.5, .5)
# Correlation (tetrachoric) matrix as target for simulated relationship between variables
Sigma <- matrix(c(1.0, -.71, -.71, 1.0), 2, 2, byrow=TRUE)
# Generate a sample of the categorical variables with specified parameters
m <- generate.binary(40, marginal, Sigma)

create a random non-singular matrix reliably

How can I create a matrix of pseudo-random values that is guaranteed to be non-singular? I tried the code below, but it failed. I suppose I could just loop until I got one by chance but I would prefer a more elegant "R-like" solution if anyone has an idea.
library(matrixcalc)
exampledf<- matrix(ceiling(runif(16,0,50)), ncol=4)
is.singular.matrix(exampledf) #this may or may not return false
using a while loop:
exampledf<-NULL
library(matrixcalc)
while(is.singular.matrix(exampledf)!=TRUE){
exampledf<- matrix(ceiling(runif(16,0,50)), ncol=4)
}
I suppose one method that guarantees (not is fairly likely, but actually guarantees) that the matrix is non-singular, is to start from a known non-singular matrix and apply the basic linear operations used for example in Gaussian Elimination: 1. add / subtract a multiple of one row from another row or 2. multiply row by a constant.
Depending on how "random" and how dense you want your matrix to be you can start from the identity matrix and multiply all elements with a random constant. Afterwards, you can apply a randomly selected set of operations from above, that will result in a non singular matrix. You can even apply a predefined set of operations, but using a randomly selected constant at each step.
An alternative could be to start from an upper triangular matrix for which the product of main diagonal entries is not zero. This is because the determinant of a triangular matrix is the product of the elements on the main diagonal. This effectively boils down to generating N random numbers, placing them on the main diagonal, and setting the rest of the entries (above the main diagonal) to whatever you like. If you want the matrix to be fully dense, add the first row to every other row of the matrix.
Of course this approach (like any other probably would) assumes that the matrix is relatively numerically stable and the singularity will not be affected by precision errors (as you know the precision of data types in all programming languages is limited). You would do well to avoid very small / very large values which can make the method numerically unstable.
It should be fairly unlikely that this will produce a singular matrix:
Mat1 <- matrix(rnorm(100), ncol=4)
Mat2 <- matrix(rnorm(100), ncol=4)
crossprod(Mat1,Mat2)
[,1] [,2] [,3] [,4]
[1,] 0.8138 5.112 2.945 -5.003
[2,] 4.9755 -2.420 1.801 -4.188
[3,] -3.8579 8.791 -2.594 3.340
[4,] 7.2057 6.426 2.663 -1.235
solve( crossprod(Mat1,Mat2) )
[,1] [,2] [,3] [,4]
[1,] -0.11273 0.15811 0.05616 0.07241
[2,] 0.03387 0.01187 0.07626 0.02881
[3,] 0.19007 -0.60377 -0.40665 0.17771
[4,] -0.07174 -0.31751 -0.15228 0.14582
inv1000 <- replicate(1000, {
Mat1 <- matrix(rnorm(100), ncol=4)
Mat2 <- matrix(rnorm(100), ncol=4)
try(solve( crossprod(Mat1,Mat2)))} )
str(inv1000)
#num [1:4, 1:4, 1:1000] 0.1163 0.0328 0.3424 -0.227 0.0347 ...
max(inv1000)
#[1] 451.6
> inv100000 <- replicate(100000, {Mat1 <- matrix(rnorm(100), ncol=4)
+ Mat2 <- matrix(rnorm(100), ncol=4)
+ is.singular.matrix( crossprod(Mat1,Mat2))} )
> sum(inv100000)
[1] 0

Calculation of mutual information in R

I am having problems interpreting the results of the mi.plugin() (or mi.empirical()) function from the entropy package. As far as I understand, an MI=0 tells you that the two variables that you are comparing are completely independent; and as MI increases, the association between the two variables is increasingly non-random.
Why, then, do I get a value of 0 when running the following in R (using the {entropy} package):
mi.plugin( rbind( c(1, 2, 3), c(1, 2, 3) ) )
when I'm comparing two vectors that are exactly the same?
I assume my confusion is based on a theoretical misunderstanding on my part, can someone tell me where I've gone wrong?
Thanks in advance.
Use mutinformation(x,y) from package infotheo.
> mutinformation(c(1, 2, 3), c(1, 2, 3) )
[1] 1.098612
> mutinformation(seq(1:5),seq(1:5))
[1] 1.609438
and normalized mutual information will be 1.
the mi.plugin function works on the joint frequency matrix of the two random variables. The joint frequency matrix indicates the number of times for X and Y getting the specific outcomes of x and y.
In your example, you would like X to have 3 possible outcomes - x=1, x=2, x=3, and Y should also have 3 possible outcomes, y=1, y=2, y=3.
Let's go through your example and calculate the joint frequency matrix:
> X=c(1, 2, 3)
> Y=c(1, 2, 3)
> freqs=matrix(sapply(seq(max(X)*max(Y)), function(x) length(which(((X-1)*max(Y)+Y)==x))),ncol=max(X))
> freqs
[,1] [,2] [,3]
[1,] 1 0 0
[2,] 0 1 0
[3,] 0 0 1
This matrix shows the number of occurrences of X=x and Y=y. For example there was one observation for which X=1 and Y=1. There were 0 observations for which X=2 and Y=1.
You can now use the mi.plugin function:
> mi.plugin(freqs)
[1] 1.098612

Determining if a matrix is diagonalizable in the R Programming Language

I have a matrix and I would like to know if it is diagonalizable. How do I do this in the R programming language?
If you have a given matrix, m, then one way is the take the eigen vectors times the diagonal of the eigen values times the inverse of the original matrix. That should give us back the original matrix. In R that looks like:
m <- matrix( c(1:16), nrow = 4)
p <- eigen(m)$vectors
d <- diag(eigen(m)$values)
p %*% d %*% solve(p)
m
so in that example p %*% d %*% solve(p) should be the same as m
You can implement the full algorithm to check if the matrix reduces to a Jordan form or a diagonal one (see e.g., this document). Or you can take the quick and dirty way: for an n-dimensional square matrix, use eigen(M)$values and check that they are n distinct values. For random matrices, this always suffices: degeneracy has prob.0.
P.S.: based on a simple observation by JD Long below, I recalled that a necessary and sufficient condition for diagonalizability is that the eigenvectors span the original space. To check this, just see that eigenvector matrix has full rank (no zero eigenvalue). So here is the code:
diagflag = function(m,tol=1e-10){
x = eigen(m)$vectors
y = min(abs(eigen(x)$values))
return(y>tol)
}
# nondiagonalizable matrix
m1 = matrix(c(1,1,0,1),nrow=2)
# diagonalizable matrix
m2 = matrix(c(-1,1,0,1),nrow=2)
> m1
[,1] [,2]
[1,] 1 0
[2,] 1 1
> diagflag(m1)
[1] FALSE
> m2
[,1] [,2]
[1,] -1 0
[2,] 1 1
> diagflag(m2)
[1] TRUE
You might want to check out this page for some basic discussion and code. You'll need to search for "diagonalized" which is where the relevant portion begins.
All symmetric matrices across the diagonal are diagonalizable by orthogonal matrices. In fact if you want diagonalizability only by orthogonal matrix conjugation, i.e. D= P AP' where P' just stands for transpose then symmetry across the diagonal, i.e. A_{ij}=A_{ji}, is exactly equivalent to diagonalizability.
If the matrix is not symmetric, then diagonalizability means not D= PAP' but merely D=PAP^{-1} and we do not necessarily have P'=P^{-1} which is the condition of orthogonality.
you need to do something more substantial and there is probably a better way but you could just compute the eigenvectors and check rank equal to total dimension.
See this discussion for a more detailed explanation.

Resources