Compute expectation when multiple vectors are involved [closed] - math

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
From my understanding, the expectation of vector (let's say nx1) is equivalent to finding the mean. However if we have two vectors x and y, both of which are (nx1), what does it mean to try to find the expectation of the product of these vectors?
e.g:
E[x * y] = ?
Here are we taking the inner product or the outer product? If I was using Matlab, would I be doing:
E[x' * y]
or
E[x * y']
or
E[x .* y]
I'm not really understanding the intuition behind expectation as applied to the product of vectors (my background is not in mathematics), so if someone could shed light on this for me I would really appreciate it. Thanks!
== EDIT ==
You're right, I wasn't clear. I came across the definition of the covariance where the formula given was:
Cov[X; Y] = E[X * Y] - E[X] * E[Y]
And the part where E[X * Y] came up is what confused me. I should have put this up on a math site, and will next time. Thanks for the help.

As much as I believe this belongs either on a math or statistics site, I'm feeling bored at the moment, so I'll say a few words.
YOU need to define when you are doing, and to understand what you want to see. Numbers, vectors, by themselves are all just that - numbers. There is no meaning without context. I'll argue this is your problem.
For example, you can view a vector as a list of numbers, thus samples from some distribution, but samples of a scalar valued parameter. Thus, my vector might be a list of the temperatures in my house over the course of a day, or of the rainfall for the last week. As such, we can talk about a mean of those measurements. If we had a distribution, we could talk about the expected value of that distribution.
You might also look at a vector as a SINGLE piece of information. It might represent my location on the surface of the earth, so perhaps [latitude, longitude, elevation]. As such, it makes no sense to take the mean of these three pieces of information. However, I might be interested in an average location, taken over many such location measurements over a period of time.
As far as worrying about inner versus outer products, they are confusing you. Instead, think about WHAT these numbers represent and what you need to do with them, and only THEN worry about how to compute what you need.

Following on from #woodchips 's answer - when it does make sense to multiply two random variables and find the expectation of the product, in the discrete case it depends on whether you have the values for X and Y that correspond with each other i.e. if for each event you have an x and a y. In that case to find the expectation of the product, you simply multiply each pair of x and y you have and find the mean. If they're independent and you just have two vectors of samples and there is no co-occurrence, the expectation of the product is simply the product of their individual expectations.

Related

How would you optimize dividing bi variate data in R?

I'm not looking for a specific line a code - just built in functions or common packages that may help me do the following. Basically, something like, write up some code and use this function. I'm stuck on how to actually optimize - should I use SGD?
I have two variables, X, Y. I want to separate Y into 4 groups so that the L2, that is $(Xji | Yi - mean(Xji) | Yi)^2$ is minimized subject to the constraint that there are at least n observations in each group.
How would one go about solving this? I'd imagine you can't do this with the optim function? Basically the algo needs to move 3 values around (there are 3 cutoff points for Y) until L2 is minimized subject to n being a certain size.
Thanks
You could try optim and simply add a penalty if the constraints are not satisfied: since you minimise, add zero if all constraints are okay; otherwise a positive number.
If that does not work, since you only look for three cutoff points, I'd probably try a grid search, i.e. compute the objective function for different levels of the cutoff point; throw away those that violate the constraints, and then keep the best solution.

Whats the meaning of "unique up to isomorphism"? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 6 years ago.
Improve this question
Whats the meaning of "unique up to isomorphism"? To give some context, I came across the phrase reading about initial algebras.
It seems that up to means "to ignore" (sometimes said as "modulo"). Isomorphism means that the objects are the same in some way (with a bidirectional mapping). However, "unique ignoring that they are the same" still perplexes me.
Rather than "unique ignoring they are the same" it is more like "unique (ignoring irrelevant differences that are no real differences in the context that we are discussing here)".
For example, if you are considering geometic figures, an equilateral triangle is "the same" as another equilateral triangle of twice the size that is upside-down, so you can count this as a unique figure.
Suppose I have a set of numbers {0, 1, 2, ..., 11} under addition modulo 12 or a regular 12-gon under the rotations generated by a rotation of 30 degrees. Both of these sets are different, but the corresponding algebraic structure is the same (it's the cyclic group on 12 elements). There's an isomorphism between them in that addition by "1" modulo 12 corresponds to rotation (say clockwise) by 30 degrees.
Its awkward to say "look at this unique structure" because it's clearly shown up in at least two distinct settings. But somehow the distinguishing features between these two examples are non-essential in that they disappear under isomorphism while the algebraic structure is preserved. Hence they concede "It's unique, up to ismorphism."
The background notion is that of equivalence relations. An equivalence relation on a set is a relation, ~, on a set S which shares with equality the three properties of symmetry (x ~ y => y ~ x) reflexivity (x ~ x for all x) and transitivity (x ~ y ~ z => x ~ z). They are ubiquitous, and familiar, in mathematics. For example 1/2 is equivalent to 5/10 even though 1/2 is manifestly not really identical to 5/10. Whenever you have an equivalence relation you can have objects which are the same in one perspective but different from another. For example, it is a common undergraduate programming exercise to implement sets as lists. As a set you wouldn't distinguish between the set {1,2,3} and the set {2,3,1}, but if you represent them as lists, you can distinguish [1,2,3] from [2,3,1]. These later are different qua lists but the same qua sets.
Isomorphism is an equivalence relation on algebraic structures. To say that initial algebras are unique up to isomorphism means that they are all equivalent to each other with respect to the equivalence relation of isomorphism. #AlfredRossi 's example is an excellent illustration of the way this plays out in abstract algebra.

Solving a system of linear equations in a non-square matrix [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 3 years ago.
Improve this question
I have a system of linear equations that make up an NxM matrix (i.e. Non-square) which I need to solve - or at least attempt to solve in order to show that there is no solution to the system. (more likely than not, there will be no solution)
As I understand it, if my matrix is not square (over or under-determined), then no exact solution can be found - am I correct in thinking this? Is there a way to transform my matrix into a square matrix in order to calculate the determinate, apply Gaussian Elimination, Cramer's rule, etc?
It may be worth mentioning that the coefficients of my unknowns may be zero, so in certain, rare cases it would be possible to have a zero-column or zero-row.
Whether or not your matrix is square is not what determines the solution space. It is the rank of the matrix compared to the number of columns that determines that (see the rank-nullity theorem). In general you can have zero, one or an infinite number of solutions to a linear system of equations, depending on its rank and nullity relationship.
To answer your question, however, you can use Gaussian elimination to find the rank of the matrix and, if this indicates that solutions exist, find a particular solution x0 and the nullspace Null(A) of the matrix. Then, you can describe all your solutions as x = x0 + xn, where xn represents any element of Null(A). For example, if a matrix is full rank its nullspace will be empty and the linear system will have at most one solution. If its rank is also equal to the number of rows, then you have one unique solution. If the nullspace is of dimension one, then your solution will be a line that passes through x0, any point on that line satisfying the linear equations.
Ok, first off: a non-square system of equations can have an exact solution
[ 1 0 0 ][x] = [1]
[ 0 0 1 ][y] [1]
[z]
clearly has a solution (actually, it has an 1-dimensional family of solutions: x=z=1). Even if the system is overdetermined instead of underdetermined it may still have a solution:
[ 1 0 ][x] = [1]
[ 0 1 ][y] [1]
[ 1 1 ] [2]
(x=y=1). You may want to start by looking at least squares solution methods, which find the exact solution if one exists, and "the best" approximate solution (in some sense) if one does not.
Taking Ax = b, with A having m columns and n rows. We are not guaranteed to have one and only one solution, which in many cases is because we have more equations than unknowns (m bigger n). This could be because of repeated measurements, that we actually want because we are cautious about influence of noise.
If we observe that we can not find a solution that actually means, that there is no way to find b travelling the column space spanned by A. (As x is only taking a combination of the columns).
We can however ask for the point in the space spanned by A that is nearest to b. How can we find such a point? Walking on a plane the closest one can get to a point outside it, is to walk until you are right below. Geometrically speaking this is when our axis of sight is perpendicular to the plane.
Now that is something we can have a mathematical formulation of. A perpendicular vector reminds us of orthogonal projections. And that is what we are going to do. The simplest case tells us to do a.T b. But we can take the whole matrix A.T b.
For our equation let us apply the transformation to both sides: A.T Ax = A.T b.
Last step is to solve for x by taking the inverse of A.T A:
x = (A.T A)^-1 * A.T b
The least squares recommendation is a very good one.
I'll add that you can try a singular value decomposition (SVD) that will give you the best answer possible and provide information about the null space for free.

Mathematical library to compare simularities in graphs of data for a high level language (eg. Javascript)?

I'm looking for something that I guess is rather sophisticated and might not exist publicly, but hopefully it does.
I basically have a database with lots of items which all have values (y) that correspond to other values (x). Eg. one of these items might look like:
x | 1 | 2 | 3 | 4 | 5
y | 12 | 14 | 16 | 8 | 6
This is just a a random example. Now, there are thousands of these items all with their own set of x and y values. The range between one x and the x after that one is not fixed and may differ for every item.
What I'm looking for is a library where I can plugin all these sets of Xs and Ys and tell it to return things like the most common item (sets of x and y that follow a compareable curve / progression), and the ability to check whether a certain set is atleast x% compareable with another set.
With compareable I mean the slope of the curve if you would draw a graph of the data. So, not actaully the static values but rather the detection of events, such as a high increase followed by a slow decrease, etc.
Due to my low amount of experience in mathematics I'm not quite sure what I'm looking for is called, and thus have trouble explaining what I need. Hopefully I gave enough pointers for someone to point me into the right direction.
I'm mostly interested in a library for javascript, but if there is no such thing any library would help, maybe I can try to port what I need.
About Markov Cluster(ing) again, of which I happen to be the author, and your application. You mention you are interested in trend similarity between objects. This is typically computed using Pearson correlation. If you use the mcl implementation from http://micans.org/mcl/, you'll also obtain the program 'mcxarray'. This can be used to compute pearson correlations between e.g. rows in a table. It might be useful to you. It is able to handle missing data - in a simplistic approach, it just computes correlations on those indices for which values are available for both. If you have further questions I am happy to answer them -- with the caveat that I usually like to cc replies to the mcl mailing list so that they are archived and available for future reference.
What you're looking for is an implementation of a Markov clustering. It is often used for finding groups of similar sequences. Porting it to Javascript, well... If you're really serious about this analysis, you drop Javascript as soon as possible and move on to R. Javascript is not meant to do this kind of calculations, and it is far too slow for it. R is a statistical package with much implemented. It is also designed specifically for very speedy matrix calculations, and most of the language is vectorized (meaning you don't need for-loops to apply a function over a vector of values, it happens automatically)
For the markov clustering, check http://www.micans.org/mcl/
An example of an implementation : http://www.orthomcl.org/cgi-bin/OrthoMclWeb.cgi
Now you also need to define a "distance" between your sets. As you are interested in the events and not the values, you could give every item an extra attribute being a vector with the differences y[i] - y[i-1] (in R : diff(y) ). The distance between two items can then be calculated as the sum of squared differences between y1[i] and y2[i].
This allows you to construct a distance matrix of your items, and on that one you can call the mcl algorithm. Unless you work on linux, you'll have to port that one.
What you're wanting to do is ANOVA, or ANalysis Of VAriance. If you run the numbers through an ANOVA test, it'll give you information about the dataset that will help you compare one to another. I was unable to locate a Javascript library that would perform ANOVA, but there are plenty of programs that are capable of it. Excel can perform ANOVA from a plugin. R is a stats package that is free and can also perform ANOVA.
Hope this helps.
Something simple is (assuming all the graphs have 5 points, and x = 1,2,3,4,5 always)
Take u1 = the first point of y, ie. y1
Take u2 = y2 - y1
...
Take u5 = y5 - y4
Now consider the vector u as a point in 5-dimensional space. You can use simple clustering algorithms, like k-means.
EDIT: You should not aim for something too complicated as long as you go with javascript. If you want to go with Java, I can suggest something based on PCA (requiring the use of singular value decomposition, which is too complicated to be implemented efficiently in JS).
Basically, it goes like this: Take as previously a (possibly large) linear representation of data, perhaps differences of components of x, of y, absolute values. For instance you could take
u = (x1, x2 - x1, ..., x5 - x4, y1, y2 - y1, ..., y5 - y4)
You compute the vector u for each sample. Call ui the vector u for the ith sample. Now, form the matrix
M_{ij} = dot product of ui and uj
and compute its SVD. Now, the N most significant singular values (ie. those above some "similarity threshold") give you N clusters.
The corresponding columns of the matrix U in the SVD give you an orthonormal family B_k, k = 1..N. The squared ith component of B_k gives you the probability that the ith sample belongs to cluster K.
If it is ok to use java you really should have a look at Weka. It is possible to access all features via java code. Maybe you find a markov clustering, but if not, they hava a lot other clustering algorithem and its really easy to use.

How can determine dice sum probabilities? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
In trying to solve a particular Project Euler question, I ran into difficulties with a particular mathematical formula. According to this web page (http://www.mathpages.com/home/kmath093.htm), the formula for determining the probability for rolling a sum, T, on a number of dice, n, each with number of sides, s, each numbered 1 to s, can be given as follows:
alt text http://www.freeimagehosting.net/uploads/8294d47194.gif
After I started getting nonsensical answers in my program, I started stepping through, and tried this for some specific values. In particular, I decided to try the formula for a sum T=20, for n=9 dice, each with s=4 sides. As the sum of 9 4-sided dice should give a bell-like curve of results, ranging from 4 to 36, a sum of 20 seems like it should be fairly (relatively speaking) likely. Dropping the values into the formula, I got:
alt text http://www.freeimagehosting.net/uploads/8e7b339e32.gif
Since j runs from 0 to 7, we must add over all j...but for most of these values, the result is 0, because at least one the choose formulae results are 0. The only values for j that seem to return non-0 results are 3 and 4. Dropping 3 and 4 into this formula, I got
alt text http://www.freeimagehosting.net/uploads/490f943fa5.gif
Which, when simplified, seemed to go to:
alt text http://www.freeimagehosting.net/uploads/603ca84541.gif
which eventually simplifies down to ~30.75. Now, as a probability, of course, 30.75 is way off...the probability must be between 0 and 1, so something has gone terribly wrong. But I'm not clear what it is.
Could I misunderstanding the formula? Very possible, though I'm not clear at all where the breakdown would be occuring. Could it be transcribed wrong on the web page? Also possible, but I've found it difficult to find another version of it online to check it against. Could I be just making a silly math error? Also possible...though my program comes up with a similar value, so I think it's more likely that I'm misunderstanding something.
Any hints?
(I would post this on MathOverflow.com, but I don't think it even comes close to being the kind of "postgraduate-level" mathematics that is required to survive there.)
Also: I definitely do not want the answer to the Project Euler question, and I suspect that other people that my stumble across this would feel the same way. I'm just trying to figure out where my math skills are breaking down.
According to mathworld (formula 9 is the relevant one), the formula from your source is wrong.
The correct formula is supposed to be n choose j, not n choose T. That'll really reduce the size of the values within the summation.
The mathworld formula uses k instead of j and p instead of T:
Take a look at article in wikipedia - Dice.
The formula here looks almost similar, but have one difference. I think it will solve your problem.
I'm going to have to show my ignorance here.... Isn't 9 choose 20 = 0? More generally, isn't n choose T going to always be 0 since T>=n? Perhaps I'm reading this formula incorrectly (I'm not a math expert), but looking at de Moive's work, I'm not sure how this formula was derived; it seems slightly off. You might try working up from Moive's original math, page 39, in the lemma.

Resources