SVD in LSI in the book Introduction to Information Retrieval - information-retrieval

In the example 18.4 of the book Introduction to Information Retrieval. The term-document matrix is decomposed using SVD. My question is why Σ is a 5*5 matrix in the example? Shouldn't it be a 5*6 matrix? Is it wrong?
Here is the link of the Chapter 18 of the book Introduction to Information Retrieval. Thanks!

The book is correct. A term document matrix (of dimension DxT) is split into a product of three matrices. The middle matrix (denoted as \Sigma in the book) is the key matrix whose dimension is TxT (T=5 in the example).
Intuitively, you can think of this matrix as denoting relationship between terms. In the best case, all the column vectors of this matrix should be linearly independent meaning that this forms the basis vector in the term space and there is no dependence between the terms. However, this is not true in practice. You'll find that the rank of this matrix is typically a few orders of magnitude less than T (say T'), meaning that there are T-T' linearly dependent column vectors in this matrix.
One can then take a lower order approximation of this matrix by considering only a T'xT' term matrix. In effect, you take the principal eigen values of the matrix and project your vectors on these eigen vectors (treated as new basis) using rotation and scaling. That's precisely what spectral decomposition or PCA (or LSA) does.

Related

How can I perform two matrix comparison and statistical testing applying permutations in R?

Im a begginer using R, and Im trying to apply a permutation process (statistical test) for two matrix comparison in order to assess if there is a real relation between spatial association and functional traits of forest tree species.
My first matrix is formed by indices of spatial interactions positive (+) and negative (-) between species. Positive interactions (+) were assigned with the value of 3 and negative interactions (-)
with the value of 1. The second matrix, include the mean euclidean distance of functional traits between species.
Theoretically, what I want to do, is to randomize or permute the trait matrix (incidence matrix) along the spatial matrix, without broken (or better said-retaining) the spatial structure of association among species. One permutation for positive ones (+) and other permutation for negative ones (-).
Can anybody help me to structure a script in R to test this previous relationship?
I just have the two csv species x species matrix files!.

rotational matrix in R

I want to achieve an algorithm in R. I cannot start with the code because I am having problem figuring out the problem clearly. The problem is related to rotational matrix, which is actully pretty challenging.
The problem is as follow:
The historical data of monthly flows X is transformed into Y by the transformation matrix R where,
Y = RX (3)
The procedure for obtaining the transformation matrix is described in detail in the appendix of Tarboton et al. (1998), here we summarize from their description. The transformation matrix is developed from a standard basis (basis vectors aligned with the coordinate axes) which is orthonormal but does not have a basis vector perpendicular to the conditioning plane defined by (3). One of the standard basis vectors is replaced by a vector perpendicular to the conditioning plane. Operationally this amounts to starting with an identity matrix and replacing the last column with . Clearly the basis set in no longer orthonormal. Gram Schmidt orthonormalization procedure is applied to the remaining 1 standard basis vectors to obtain an orthonormal basis that now includes a vector perpendicular to the conditioning plane.
The last column of the matrix Y, , and the R matrix has the property
RT = R^(-1). The first components of the vector can be denoted as as the last component is , i.e., . Hence, the simulation involves re-sampling from the conditional PDF ( )

Perform sum of vectors in CUDA/thrust

So I'm trying to implement stochastic gradient descent in CUDA, and my idea is to parallelize it similar to the way that is described in the paper Optimal Distributed Online Prediction Using Mini-Batches
That implementation is aimed at MapReduce distributed environments so I'm not sure if it's optimal when using GPUs.
In short the idea is: at each iteration, calculate the error gradients for each data point in a batch (map), take their average by sum/reducing the gradients, and finally perform the gradient step updating the weights according to the average gradient. The next iteration starts with the updated weights.
The thrust library allows me to perform a reduction on a vector allowing me for example to sum all the elements in a vector.
My question is: How can I sum/reduce an array of vectors in CUDA/thrust?
The input would be an array of vectors and the output would be a vector that is the sum of all the vectors in the array (or, ideally, their average).
Converting my comment into this answer:
Let's say each vector has length m and the array has size n.
An "array of vectors" is then the same as a matrix of size n x m.
If you change your storage format from this "array of vectors" to a single vector of size n * m, you can use thrust::reduce_by_key to sum each row of this matrix separately.
The sum_rows example shows how to do this.

An "asymmetric" pairwise distance matrix

Suppose there are three sequences to be compared: a, b, and c. Traditionally, the resulting 3-by-3 pairwise distance matrix is symmetric, indicating that the distance from a to b is equal to the distance from b to a.
I am wondering if TraMineR provides some way to produce an asymmetric pairwise distance matrix.
No, TraMineR does not produce 'assymetric' dissimilaries precisely for the reasons stressed in Pat's comment.
The main interest of computing pairwise dissimilarities between sequences is that once we have such dissimilarities we can for instance
measure the discrepancy among sequences, determine neighborhoods, find medoids, ...
run cluster algorithms, self-organizing maps, MDS, ...
make ANOVA-like analysis of the sequences
grow regression trees for the sequences
Inputting a non symmetric dissimilarity matrix in those processes would most probably generate irrelevant outcomes.
It is because of this symmetry requirement that the substitution costs used for computing Optimal Matching distances MUST be symmetrical. It is important to not interpret substitution costs as the cost of switching from one state to the other, but to understand them for what they are, i.e., edit costs. When comparing two sequences, for example
aabcc and aadcc, we can make them equal either by replacing arbitrarily b with d in the first one or d with b in the second one. It would then not make sense not giving the same cost for the two substitutions.
Hope this helps.

Normalizing a matrix with respect to a constraint

I am doing a project which requires me to normalize a sparse NxNmatrix. I read somewhere that we can normalize a matrix so that its eigen values lie between [-1,1] by multiplying it with a diagonal matrix D such that N = D^{-1/2}*A*D^{-1/2}.
But I am not sure what D is here. Also, is there a function in Matlab that can do this normalization for sparse matrices?
It's possible that I am misunderstanding your question, but as it reads it makes no sense to me.
A matrix is just a representation of a linear transformation. Given that a matrix A corresponds to a linear transformation T, any matrix of the form B^{-1} A B (called the conjugate of A by B) for an invertible matrix B corresponds to the same transformation, represented in a difference basis. In particular, the eigen values of a matrix correspond to the eigen values of the linear transformation, so conjugating by an invertible matrix cannot change the eigen values.
It's possible that you meant that you want to scale the eigen vectors so that each has unit length. This is a common thing to do since then the eigen values tell you how far a vector of unit length is magnified by the transformation.

Resources