Array-processing: Eigenstructure of the Spatial Covariance Matrix - math

I've been staring at the following underlined statement from this book for hours, and I cannot for the life of me figure out how it can be right:
For some definitions:
is an r x r matrix (we may ignore its contents for this purpose).
A is an N x r matrix defined as the following matrix of column vectors, where each vector is N elements long:
First of all I'm convinced that when they write:
they really mean:
otherwise it simply would not make sense from the start. My confusion is when they say is a linear combination of the column vectors of A.
At first I thought maybe it just wasn't obvious to me, so I started doing the calculations as an exercise.
My calculation (please don't make me type all this into a text equation editor):
I THINK my calculation is correct, but there was a lot to keep track of, so...
I did not do the multiplication with because it's trivial, and it doesn't solve the problem.
How can the products of different elements (complex conjugated no less) of the vectors in A end up as a linear combination of the columns of A?
Am I forgetting something fundamental here? Maybe something to do with the fact that is an eigenvector of ...?


generating completely new vector based on other vectors

Assume I have four-vectors (v1,v2,v3,v4), and I want to create a new vector (vec_new) that is not close to any of those four-vectors. I was thinking about interpolation and extrapolation. Do you think they are suitable? Are they also apply for vector and generate a vector of let's say 300 dimensions? Another possible option would be the transformation matrix. But I am not sure if it fit my concern. I think averaging and concatenation are not the good ones as I might be close to some of those four-vectors.
based on my problem, Imagine I divided my vectors into two categories. I need to find a vector which belongs to non-of those categories.
Any other ideas?
Per my comment, I wouldn't expect the creation of synthetic "far away" examples to be useful for realistic goals.
Even things like word antonyms are not maximally cosine-dissimilar from each other, because among the realm of all word-meaning-possibilities, antonyms are quite similar to each other. For example, 'hot' and 'cold' are considered opposites, but are the same kind of word, describing the same temperature-property, and can often be drop-in replacements for each other in the same sentences. So while they may show an interesting contrast in word-vector space, the "direction of difference" isn't going to be through the origin -- as would create maximal cosine-dissimilarity.
And in classification contexts, even a simple 2-category classifier will need actual 'negative' examples. With only positive examples, the 'vector space' won't necessarily model anything about hypothesized-but-not-actually-present negative examples. (It's nearly impossible to divide the space into two categories without training examples showing the real "boundaries".)
Still, there's an easy way to make a vector that is maximally dissimilar to another single vector: negate it. That creates a vector that's in the exact opposite direction from the original, and thus will have a cosine-similarity of -1.0.
If you have a number of vectors against which you want to find a maximally-dissimilar vector, I suspect you can't do much better than negating the average of all the vectors. That is, average the vectors, then negate that average-vector, to find the vector that's pointing exactly-opposite the average.
Good luck!

closed/fixed:Interpertation of basic R code

I have a basic question in regards to the R programming language.
I'm at a beginners level and I wish to understand the meaning behind two lines of code I found online in order to gain a better understanding. Here is the code:[1:(n-k)])[(k+1):n])
... where y and n are given. I do understand that the results are transformed into a data frame by the function but what about the rest? I'm still at a beginners level so pardon me if this question is off-topic or irrelevant in this forum. Thank you in advance, I appreciate every answer :)
Looks like you understand the function so let's look at what is happening inside of it. We're looking at y[1:(n-k)]. Here, y is a vector which is a collection of data points of the same type. For example:
> y <- c(1,2,3,4,5,6)
Try running that and then calling back y. What you get are those numbers listed out. Now, consider the case you want to just call out the number 1 in that vector. How would you do that? Well, this is where the brackets come into play. If you wanted to just call the number 1 in y:
> y[1]
[1] 1
Therefore, the brackets are a way of calling out or indexing specific items in the vector. Note that the indexing starts at the value 1 and goes up to the number of items in the vector, or length. One last thing before we go back to the example you gave. What if we want to index the numbers 1, 2, and 3 from the vector but not the rest?
> y[1:3]
[1] 1 2 3
This is where the colon comes into play. It allows us to reference a subset of the numbers. However, it will reference all the numbers between the index left of the colon and right of it. Try this out for yourself in R! Play around and see what happens.
Finally going back to your example:
How would this work based on what we discussed? Well, the colon means that we are indexing all values in the vector y from two index values. What are those values? Well, they are the numbers to the left and right of the colon. Therefore, we are asking R to give us the values from the first position (index of 1) to the (n-k) position. Therefore, it's important to know what n and k are. If n is 4 and k is 1 then the command becomes:
The same logic can apply to the second command in your question. Essentially, R is picking out different numbers from a vector y and multiplying them together.
Hope this helps. The best way to learn R is to play around with a command, throw different numbers at it, guess what will happen, and then see what happens!

Finding a closest looking segment of data in another sequence

I am doing image processing, in which I came across a situation, where I have to compare two vectors and find an instance of the smaller vector in the larger vector.
Say the two vectors are A: with 100 elements (or entries)
and B; with 10 elements. B is a model and it may not be present exactly as it is' in the vector A. I can compare 10 elements at a time and find the difference. Ideal case is that the B is present somewhere and the difference is zero. Otherwise a minimum will result at some random location, and i am missing the location.
Please help me in giving an algorithm such that the i can find Bs' closest instance in A.
What you are looking for is the cross-correlation function.The peak the the cross correlation of the two vectors will be the point were vector B is most similar to vector A.
You may want to get an explanation of how it is implemented in matlab HERE as it gives an easier explanation of how this operation can be implemented in software.

R or MATLAB: permute a large sparse matrix into a block diagonal matrix

I have a large sparse matrix, and I want to permute its rows or columns to turn the original matrix into a block diagonal matrix. Anyone knows which functions in R or MATLAB can do this? Thanks a lot.
I'm not really set up to test this, but for a matrix m I would try:
p = symrcm(m);
block_m = m(p,p);
If that doesn't work, look through the other functions listed in help sparfun to see if any will help you out.
The seriation package in R has a number of tools for problems related to this one.
Not exactly sure if this is what you want, but in MATLAB this is what I have used in the past. Probably not the most elegant solution.
I go from sparse to full and then chop the thing into square blocks.
blockedmatrix = mat2cell(A, (n*ones(1,size(A,1)/n)), ...
(n*ones(1,size(A,1)/n))); %found somewhere on internetz
This returns a cell, where each entry is of size nxn.
It's easy to extract the blocks of interest, manipulate them, and then restore them to a matrix with cell2mat.
Maybe a bit late to the game, but since there are available commands, here is a simple one. If you have a matrix H and the block diagonal form is needed, you can obtain it through the following lines (MATLAB):
[p,q] = dmperm(H);
which is equivalent to Dulmage - Mendelsohn permutation.

n-gram sentence similarity with cosine similarity measurement

I have been working on a project about sentence similarity. I know it has been asked many times in SO, but I just want to know if my problem can be accomplished by the method I use by the way that I am doing it, or I should change my approach to the problem. Roughly speaking, the system is supposed to split all sentences of an article and find similar sentences among other articles that are fed to the system.
I am using cosine similarity with tf-idf weights and that is how I did it.
1- First, I split all the articles into sentences, then I generate trigrams for each sentence and sort them(should I?).
2- I compute the tf-idf weights of trigrams and create vectors for all sentences.
3- I calculate the dot product and magnitude of original sentence and of the sentence to be compared. Then calculate the cosine similarity.
However, the system does not work as I expected. Here, I have some questions in my mind.
As far as I have read about tf-idf weights, I guess they are more useful for finding similar "documents". Since I am working on sentences, I modified the algorithm a little by changing some variables of the formula of tf and idf definitions(instead of document I tried to come up with sentence based definition).
tf = number of occurrences of trigram in sentence / number of all trigrams in sentence
idf = number of all sentences in all articles / number of sentences where trigram appears
Do you think it is ok to use such a definition for this problem?
Another one is that I saw the normalization is mentioned many times when calculating the cosine similarity. I am guessing that this is important because the trigrams vectors might not be the same size(which they rarely are in my case). If a trigram vector is size of x and the other is x+1, then I treat the first vector as it was the size of x+1 with the last value is being 0. Is this what it is meant by normalization? If not, how do I do the normalization?
Besides these, if I have chosen the wrong algorithm what else can be used for such problem(preferably with n-gram approach)?
Thank you in advance.
I am not sure why you are sorting the trigrams for every sentence. All you need to care about when computing cosine similarity is that whether the same trigram occurred in the two sentences or not and with what frequencies. Conceptually speaking you define a fixed and common order among all possible trigrams. Remember the order has to be the same for all sentences. If the number of possible trigrams is N, then for each sentence you obtain a vector of dimensionality N. If a certain trigram does not occur, you set the corresponding value in the vector to zero. You dont really need to store the zeros, but have to take care of them when you define the dot product.
Having said that, trigrams are not a good choice as chances of a match are a lot sparser. For high k you will have better results from bags of k consecutive words, rather than k-grams. Note that the ordering does not matter inside a bag, its a set. You are using k=3 k-grams, but that seems to be on the high side, especially for sentences. Either drop down to bi-grams or use bags of different lengths, starting from 1. Preferably use both.
I am sure you have noticed that sentences that do not use the exact trigram has 0 similarity in your method. K-bag of words
will alleviate the situation somewhat but not solve it completely. Because now you need sentences to share actual words. Two sentences may be similar without using the same words. There are a couple of ways to fix this. Either use LSI(latent Semantic Indexing) or clustering of the words and use the cluster labels to define your cosine similarity.
In order to compute the cosine similarity between vectors x and y you compute the dot product and divide by the norms of x and y.
The 2-norm of the vector x can be computed as square root of the sum of the components squared. However you should also try your algorithm out without any normalization to compare. Usually it works fine, because you are already taking care of the relative sizes of the sentences when you compute the term frequencies (tf).
Hope this helps.
