positive solutions to a homogeneous linear system - linear-algebra

Given an m by n matrix A (of real numbers), I would like an algorithm to determine if there exists a solution to the linear equation Ax=0 whose coordinates are all strictly positive.
Note: It is not required that the coordinates of the solution be integer valued.


Covariance matrix in RStan

I would like to define a covariance matrix in RStan.
Similarly to how you can provide constraints to scalar and vector values, e.g. real a, I would like to provide constraints that the leading diagonal of the covariance matrix must be positive, but the off-diagonal components could take any real value.
Is there a way to enforce that the matrix must also be positive semi-definite? Otherwise, some of the samples produced will not be valid covariance matrices.
Yes, defining
cov_matrix[K] Sigma;
ensures that Sigma is symmetric and positive definite K x K matrix. It can reduce to semidefinite due to floating point, but we'll catch that and raise exceptions to ensure it stays strictly positive definite.
Under the hood, Stan uses the Cholesky factor transform---the unconstrained representation is a lower triangular matrix with positive diagonal. We just use that as the real parameters, then transform and apply the Jacobian implicitly under the hood as described the reference manual chapter on constrained variables to create a covariance matrix with an implicit (improper) uniform prior.

Computational complexity of n-dimensional Discrete Fourier Transform?

The computational complexity of n-dimensional Fast Fourier Transform was discussed here and (as the former's duplicate) here.
The computational complexity of a 1-dimensional Discrete Fourier Transform is O(N^2), N is the data set size.
Could you please tell us what is the computational complexity of the n-dimensional Discrete Fourier Transform consisting {N1, N2 ... Nn} points along each dimension?
The FFT itself is also a DFT (with some constraints). Will assume that you mean the naive summation method.
Re-writing the 1D DFT in integral form (the continuous version):
A particular value of f-tilde is equivalent to a single element in your DFT array. When the integral is discretized (i.e. converted a finite sum), there are N terms in the sum. This gives O(N) for each element and hence O(N^2) overall.
In case you were wondering, writing in this form allows for more compact notation for a general n-D DFT:
When this is discretized, we can see that for each element there are n sums, each over one of the dimensions and of length N. There are N ^ n values in the input "array", so the complexity is:

Mathematical representation of a set of points in N dimensional space?

Given some x data points in an N dimensional space, I am trying to find a fixed length representation that could describe any subset s of those x points? For example the mean of the s subset could describe that subset, but it is not unique for that subset only, that is to say, other points in the space could yield the same mean therefore mean is not a unique identifier. Could anyone tell me of a unique measure that could describe the points without being number of points dependent?
In short - it is impossible (as you would achieve infinite noiseless compression). You have to either have varied length representation (or fixed length with length being proportional to maximum number of points) or dealing with "collisions" (as your mapping will not be injective). In the first scenario you simply can store coordinates of each point. In the second one you approximate your point clouds with more and more complex descriptors to balance collisions and memory usage, some posibilities are:
storing mean and covariance (so basically perofming maximum likelihood estimation over Gaussian families)
performing some fixed-complexity density estimation like Gaussian Mixture Model or training a generative Neural Network
use set of simple geometrical/algebraical properties such as:
number of points
mean, max, min, median distance between each pair of points
Any subset can be identified by a bit mask of length ceiling(lg(x)), where bit i is 1 if the corresponding element belongs to the subset. There is no fixed-length representation that is not a function of x.
I was wrong. PCA is a good way to perform dimensionality reduction for this problem, but it won't work for some sets.
However, you can almost do it. Where "almost" is formally defined by the Johnson-Lindenstrauss Lemma, which states that for a given large dimension N, there exists a much lower dimension n, and a linear transformation that maps each point from N to n, while keeping the Euclidean distance between every pair of points of the set within some error ε from the original. Such linear transformation is called the JL Transform.
In other words, your problem is only solvable for sets of points where each pair of points are separated by at least ε. For this case, the JL Transform gives you one possible solution. Moreover, there exists a relationship between N, n and ε (see the lemma), such that, for example, if N=100, the JL Transform can map each point to a point in 5D (n=5), an uniquely identify each subset, if and only if, the minimum distance between any pair of points in the original set is at least ~2.8 (i.e. the points are sufficiently different).
Note that n depends only on N and the minimum distance between any pair of points in the original set. It does not depend on the number of points x, so it is a solution to your problem, albeit some constraints.

Normalizing a matrix with respect to a constraint

I am doing a project which requires me to normalize a sparse NxNmatrix. I read somewhere that we can normalize a matrix so that its eigen values lie between [-1,1] by multiplying it with a diagonal matrix D such that N = D^{-1/2}*A*D^{-1/2}.
But I am not sure what D is here. Also, is there a function in Matlab that can do this normalization for sparse matrices?
It's possible that I am misunderstanding your question, but as it reads it makes no sense to me.
A matrix is just a representation of a linear transformation. Given that a matrix A corresponds to a linear transformation T, any matrix of the form B^{-1} A B (called the conjugate of A by B) for an invertible matrix B corresponds to the same transformation, represented in a difference basis. In particular, the eigen values of a matrix correspond to the eigen values of the linear transformation, so conjugating by an invertible matrix cannot change the eigen values.
It's possible that you meant that you want to scale the eigen vectors so that each has unit length. This is a common thing to do since then the eigen values tell you how far a vector of unit length is magnified by the transformation.

What is SVD(singular value decomposition)

How does it actually reduce noise..can you suggest some nice tutorials?
SVD can be understood from a geometric sense for square matrices as a transformation on a vector.
Consider a square n x n matrix M multiplying a vector v to produce an output vector w:
w = M*v
The singular value decomposition M is the product of three matrices M=U*S*V, so w=U*S*V*v. U and V are orthonormal matrices. From a geometric transformation point of view (acting upon a vector by multiplying it), they are combinations of rotations and reflections that do not change the length of the vector they are multiplying. S is a diagonal matrix which represents scaling or squashing with different scaling factors (the diagonal terms) along each of the n axes.
So the effect of left-multiplying a vector v by a matrix M is to rotate/reflect v by M's orthonormal factor V, then scale/squash the result by a diagonal factor S, then rotate/reflect the result by M's orthonormal factor U.
One reason SVD is desirable from a numerical standpoint is that multiplication by orthonormal matrices is an invertible and extremely stable operation (condition number is 1). SVD captures any ill-conditioned-ness in the diagonal scaling matrix S.
One way to use SVD to reduce noise is to do the decomposition, set components that are near zero to be exactly zero, then re-compose.
Here's an online tutorial on SVD.
You might want to take a look at Numerical Recipes.
Singular value decomposition is a method for taking an nxm matrix M and "decomposing" it into three matrices such that M=USV. S is a diagonal square (the only nonzero entries are on the diagonal from top-left to bottom-right) matrix containing the "singular values" of M. U and V are orthogonal, which leads to the geometric understanding of SVD, but that isn't necessary for noise reduction.
With M=USV, we still have the original matrix M with all its noise intact. However, if we only keep the k largest singular values (which is easy, since many SVD algorithms compute a decomposition where the entries of S are sorted in nonincreasing order), then we have an approximation of the original matrix. This works because we assume that the small values are the noise, and that the more significant patterns in the data will be expressed through the vectors associated with larger singular values.
In fact, the resulting approximation is the most accurate rank-k approximation of the original matrix (has the least squared error).
To answer to the tittle question: SVD is a generalization of eigenvalues/eigenvectors to non-square matrices.
$X \in N \times p$, then the SVD decomposition of X yields X=UDV^T where D is diagonal and U and V are orthogonal matrices.
Now X^TX is a square matrice, and the SVD decomposition of X^TX=VD^2V where V is equivalent to the eigenvectors of X^TX and D^2 contains the eigenvalues of X^TX.
SVD can also be used to greatly ease global (i.e. to all observations simultaneously) fitting of an arbitrary model (expressed in an formula) to data (with respect to two variables and expressed in a matrix).
For example, data matrix A = D * MT where D represents the possible states of a system and M represents its evolution wrt some variable (e.g. time).
By SVD, A(x,y) = U(x) * S * VT(y) and therefore D * MT = U * S * VT
then D = U * S * VT * MT+ where the "+" indicates a pseudoinverse.
One can then take a mathematical model for the evolution and fit it to the columns of V, each of which are a linear combination the components of the model (this is easy, as each column is a 1D curve). This obtains model parameters which generate M? (the ? indicates it is based on fitting).
M * M?+ * V = V? which allows residuals R * S2 = V - V? to be minimized, thus determining D and M.
Pretty cool, eh?
The columns of U and V can also be inspected to glean information about the data; for example each inflection point in the columns of V typically indicates a different component of the model.
Finally, and actually addressing your question, it is import to note that although each successive singular value (element of the diagonal matrix S) with its attendant vectors U and V does have lower signal to noise, the separation of the components of the model in these "less important" vectors is actually more pronounced. In other words, if the data is described by a bunch of state changes that follow a sum of exponentials or whatever, the relative weights of each exponential get closer together in the smaller singular values. In other other words the later singular values have vectors which are less smooth (noisier) but in which the change represented by each component are more distinct.
