Create an artificial correlation matrix - math

I want to do some testing of a program but I would like to have a really big matrix
Is there any tool that can generate an artificial correlation matrix?

Pick n random n-dimensional vectors of numbers from -1 to 1. Use the dot product of any 2 vectors is their correlation. Use that fact to make a random n x n correlation matrix.
Is this really a correlation matrix? Make each dimension into an independent standard normal distribution. The coefficients of each vector then describes a random variable. Those random variables have the specified correlations. So yes, this is actually going to be a correlation matrix.

There is a repository of sample matrix data for use in comparing algos available at the Matrix Market - free despite the name.

Related

why the calculation of eigenvectors and eigenvalues when performing PCA is so effective?

The core of Principal Componenet Analysis (PCA) lies at calculating eigenvalues and eigenvectors from the variance-covariance matrix corresponding to some dataset (for example, a matrix of multivariate data coming from a set of individuals). Text-book knowledge I have is that:
a) by multipliying such eigenvectors with the original data matrix one can calculate "scores" (as many as orignal set of variables) which are independent from each other
b) the eigenvalues summarize the amount of variance of each score.
These two properties make this process a very effective data transformation technique to simplify the analysis of multivariate data.
My question is why is that so? why is calculating eigenvalues and eigenvectors from a covariance-variance matrix results in such unique properties of the scores?

how to generate a zero-mean Gaussian random vector ,which have correlation matrices whose eigenvalues are exponentially distributed

1.This is a question from paper "Fast Generalized Eigenvector Tracking Based on the Power Method".
2.The author wrote "We generate two zero-mean Gaussian random vectors ,which have correlation matrices A and B whose eigenvalues are exponentially distributed".
3.But how to generate a zero-mean Gaussian random vector ,which have correlation matrices whose eigenvalues are exponentially distributed ,this confused me almost a week.
4.It seems that we could only use randn in MATLAB to generate random vector,
so the problem is how to make sure correlation matrices whose eigenvalues exponentially distributed at the same time?
Let S be a positive definite matrix. Therefore S has a Cholesky decomposition L.L' = S where L is a lower-triangular matrix and ' denotes the matrix transpose and . denotes matrix multiplication. Let x be drawn from a Gaussian distribution with mean zero and covariance equal to the identity matrix. Then y = L.x has a Gaussian distribution with mean zero and covariance S.
So if you can find suitable covariance matrices A and B, you can use their Cholesky decompositions to generate samples. Now about constructing a matrix which has eigenvalues following a given distribution. My advice is to start with a list of samples from an exponential distribution; these will be your eigenvalues. Let E = a matrix with the exponential samples on the diagonal and zeros otherwise. Let U be any unitary matrix (i.e. columns are orthogonal and norm of each column is 1). Then U.E.U' is a positive definite matrix with the specified eigenvalues.
U can be any unitary matrix. In particular U can be the identity matrix. That might make everything else simpler; you'll have to verify whether U = identity is workable for the problem you're working on.

correlation between features with R

I wanna calculate the correlation of features, every feature consists of a 50*100 matrix. My question is how can I use R to calculate the correlation between features, instead of correlation of columns inside the matrix.

Generate random matrix given a correlation value to an input matrix

Given an input matrix and a correlation Rho, I want to generate a random matrix that is correlated to the input matrix with a correlation value of Rho.
I can create random matrices through rnorm, but I'm not sure how to force this new matrix to be correlated to the original input matrix.
I looked through some other posts such as this but couldn't find what I was looking for. For example, this post Generating random correlation matrix with given average correlation looks to calculate a random matrix, but correlated to itself, not an input matrix.

Generating random values from non-normal and correlated distributions

I have a random variable X that is a mixture of a binomial and two normals (see what the probability density function would look like (first chart))
and I have another random variable Y of similar shape but with different values for each normally distributed side.
X and Y are also correlated, here's an example of data that could be plausible :
X Y
1. 0 -20
2. -5 2
3. -30 6
4. 7 -2
5. 7 2
As you can see, that was simply to represent that my random variables are either a small positive (often) or a large negative (rare) and have a certain covariance.
My problem is : I would like to be able to sample correlated and random values from these two distributions.
I could use Cholesky decomposition for generating correlated normally distributed random variables, but the random variables we are talking here are not normal but rather a mixture of a binomial and two normals.
Many thanks!
Note, you don't have a mixture of a binomial and two normals, but rather a mixture of two normals. Even though for some reason in your previous post you did not want to use a two-step generation process (first genreate a Bernoulli variable telling which component to sample from, and then sampling from that component), that is typically what you would want to do with a mixture distribution. This process naturally generalizes to a mixture of two bivariate normal distributions: first pick a component, and then generate a pair of correlated normal values. Your description does not make it clear whether you are fitting some data with this distribution, or just trying to simulate such a distribution - the difficulty of getting the covariance matrices for the two components will depend on your situation.

Resources