Solving a system of linear equations in a non-square matrix [closed] - math

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 3 years ago.
Improve this question
I have a system of linear equations that make up an NxM matrix (i.e. Non-square) which I need to solve - or at least attempt to solve in order to show that there is no solution to the system. (more likely than not, there will be no solution)
As I understand it, if my matrix is not square (over or under-determined), then no exact solution can be found - am I correct in thinking this? Is there a way to transform my matrix into a square matrix in order to calculate the determinate, apply Gaussian Elimination, Cramer's rule, etc?
It may be worth mentioning that the coefficients of my unknowns may be zero, so in certain, rare cases it would be possible to have a zero-column or zero-row.

Whether or not your matrix is square is not what determines the solution space. It is the rank of the matrix compared to the number of columns that determines that (see the rank-nullity theorem). In general you can have zero, one or an infinite number of solutions to a linear system of equations, depending on its rank and nullity relationship.
To answer your question, however, you can use Gaussian elimination to find the rank of the matrix and, if this indicates that solutions exist, find a particular solution x0 and the nullspace Null(A) of the matrix. Then, you can describe all your solutions as x = x0 + xn, where xn represents any element of Null(A). For example, if a matrix is full rank its nullspace will be empty and the linear system will have at most one solution. If its rank is also equal to the number of rows, then you have one unique solution. If the nullspace is of dimension one, then your solution will be a line that passes through x0, any point on that line satisfying the linear equations.

Ok, first off: a non-square system of equations can have an exact solution
[ 1 0 0 ][x] = [1]
[ 0 0 1 ][y] [1]
[z]
clearly has a solution (actually, it has an 1-dimensional family of solutions: x=z=1). Even if the system is overdetermined instead of underdetermined it may still have a solution:
[ 1 0 ][x] = [1]
[ 0 1 ][y] [1]
[ 1 1 ] [2]
(x=y=1). You may want to start by looking at least squares solution methods, which find the exact solution if one exists, and "the best" approximate solution (in some sense) if one does not.

Taking Ax = b, with A having m columns and n rows. We are not guaranteed to have one and only one solution, which in many cases is because we have more equations than unknowns (m bigger n). This could be because of repeated measurements, that we actually want because we are cautious about influence of noise.
If we observe that we can not find a solution that actually means, that there is no way to find b travelling the column space spanned by A. (As x is only taking a combination of the columns).
We can however ask for the point in the space spanned by A that is nearest to b. How can we find such a point? Walking on a plane the closest one can get to a point outside it, is to walk until you are right below. Geometrically speaking this is when our axis of sight is perpendicular to the plane.
Now that is something we can have a mathematical formulation of. A perpendicular vector reminds us of orthogonal projections. And that is what we are going to do. The simplest case tells us to do a.T b. But we can take the whole matrix A.T b.
For our equation let us apply the transformation to both sides: A.T Ax = A.T b.
Last step is to solve for x by taking the inverse of A.T A:
x = (A.T A)^-1 * A.T b

The least squares recommendation is a very good one.
I'll add that you can try a singular value decomposition (SVD) that will give you the best answer possible and provide information about the null space for free.

Related

Why do we calculate the margin in SVM?

I'm learning SVM (Support Vector Machine) : there are several points that remain ambiguous : (linearly separable, primal case)
I know how to find the weigth w and the hyperplan equation, but if we can deduce the support vectors from it, why do we calculate the margin ? What do I need to calculate first ? In which case ? (Sorry for those mixed questions, but I'm really lost with it)
I saw in some exemples that the margin is caculated in this manner :
1 / ||w||
while in others, this way :
2 / ||w||
so what is the difference between those two cases ?
Thanks
The optimization objective of SVM is to reduce w, b in such a way that we have the maximum margin with the hyperplane.
Mathematically speaking,
it is a nonlinear optimization task which is solved by KKT (Karush-Kunn-Tucker) conditions, using lagrange multipliers.
The following video explains this in simple terms for linearly seperable case
https://www.youtube.com/watch?v=1NxnPkZM9bc
Also how this is calculated is better explained here for both linear and primal cases.
https://www.csie.ntu.edu.tw/~cjlin/talks/rome.pdf
The margin between the separating hyperplane and the class boundaries of an SVM is an essential feature of this algorithm.
See, you have two hyperplanes (1) w^tx+b>=1, if y=1 and (2) w^tx+b<=-1, if y=-1. This says that any vector with a label y=1 must lie ether on or behind the hyperplane (1). The same applies to the vectors with label y=-1 and hyperplane (2).
Note: If those requirements can be fulfilled, it implicitly means the dataset is linearly separatable. This makes sense because otherwise no such margin can be constructed.
So, what an SVM tries to find is a decision boundary which ist half-way between (1) and (2). Let's define this boundary as (3) w^tx+b=0. What you see here is that (1), (2) and (3) are parallel hyperplanes because they share the same parameters w and b. The parameters w holds the direction of those planes. Recall that a vector always has a direction and a magnitude/length.
The question is now: How can one calculate the hyperplane (3)? The equations (1) and (2) tell us that any vector with a label y=1 which is closest to (3) lies exactly on the hyperplane (1), hence (1) becomes w^tx+b=1 for such x. The similar applies for the closest vectors with a negative label and (2). Those vectors on the planes called 'support vectors' and the decision boundary (3) only depends on those, because one simply can subtract (2) from (1) for the support vectors and gets:
w^tx+b-w^tx+b=1-(-1) => wt^x-w^tx=2
Note: x for the two planes are different support vectors.
Now, we want to get the direction of w but ignoring it's length to get the shortest distance between (3) and the other planes. This distance is a perpendicular line segment from (3) to the others. To do so, one can divide by the length of w to get the norm vector which is perpendicular to (3), hence (wt^x-w^tx)/||w||=2/||w||. By ignoring the left hand site (it's equal) we see that the distance between the two planes is actually 2/||w||. This distance must be maximized.
Edit:
As others state here, use Lagrange multipliers or the SMO algorithm to minimize the term
1/2 ||w||^2
s.t. y(w^tx+b)>=1
This is the convex form of the optimization problem for the primal svm.

Support Vector Machine Geometrical Intuition

Hi,
I have a big difficult trying to understand why in the equation of the hyperplane of support vector machine there is a 1 after >=?? w.x + b >= 1 <==(why this 1??) I know that could be something about the intersection point on y axes but I cannot relate that to the support vector and to its meaning of classification.
Can anyone please explain me why the equation has that 1(-1) ?
Thank you.
The 1 is just an algebraic simplification, which comes in handy in the later optimization.
First, notice, that all three hyperplanes can be denotes as
w'x+b= 0
w'x+b=+A
w'x+b=-A
If we would fix the norm of the normal w, ||w||=1, then the above would have one solution with some arbitrary A depending on the data, lets call our solution v and c (values of optimal w and b respectively). But if we let w to have any norm, then we can easily see, that if we put
w'x+b= 0
w'x+b=+1
w'x+b=-1
then there is one unique w which satisfies these equations, and it is given by w=v/A, b=c/A, because
(v/A)'x+(b/A)= 0 (when v'x+b=0) // for the middle hyperplane
(v/A)'x+(b/A)=+1 (when v'x+b=+A) // for the positive hyperplane
(v/A)'x+(b/A)=-1 (when v'x+b=-A) // for the negative hyperplane
In other words - we assume that these "supporting vectors" satisfy w'x+b=+/-1 equation for future simplification, and we can do it, because for any solution satisfing v'x+c=+/-A there is a solution for our equation (with different norm of w)
So once we have these simplifications our optimization problem simplifies to the minimization of the norm of ||w|| (maximization of the size of the margin, which now can be expressed as `2/||w||). If we would stay with the "normal" equation with (not fixed!) A value, then the maximization of the margin would be in one more "dimension" - we would have to look through w,b,A to find the triple which maximizes it (as the "restrictions" would be in the form of y(w'x+b)>A). Now, we just search through w and b (and in the dual formulation - just through alpha but this is the whole new story).
This step is not required. You can build SVM without it, but this makes thing simplier - the Ockham's razor rule.
This boundary is called "margin" and must be maximized then you have to minimize ||w||.
The aim of SVM is to find a hyperplane able to maximize the distances between the two groups.
However there are infinite solutions ( see figure: move the optimal hyperplane along the perpendicualr vector) and we need to fix at least the boundaries: the +1 or -1 is a common convention to avoid these infinite solutions.
Formally you have to optimize r ||w|| and we set a bounadry condition r ||w|| = 1.

How to calculate the likelihood for an element in a route that traverses a probability graph is correct?

I have an asymmetric directed graph with a set of probabilities (so the likelihood that a person will move from point A to B, or point A to C, etc). Given a route through all the points, I would like to calculate the likelihood that each choice made in the route is a good choice.
As an example, suppose a graph of just 2 points.
//In a matrix, the probabilities might look like
//A B
[ 0 0.9 //A
0.1 0 ] //B
So the probability of moving from A to B is 0.9 and from B to A is 0.1. Given the route A->B, how correct is the first point (A), and how correct is the second point (B).
Suppose I have a bigger matrix with a route that goes A->B->C->D. So, some examples of what I would like to know:
How likely is it that A comes before B,C, & D
How likely is it that B comes after A
How likely is it that C & D come after B
Basically, at each point, I want to know the likelihood that the previous points come before the current and also the likelihood that the following points come after. I don't need something that is statistically sound. Just an indicator that I can use for relative comparisons. Any ideas?
update: I see that this question is not useful to everyone but the answer is really useful to me so I've tried to make the description of the problem more clear and will include my answer shortly in case it helps someone.
I don't think that's possible efficiently. If there was an algorithm to calculate the probability that a point was in the wrong position, you could simply work out which position was least wrong for each point, and thus calculate the correct order. The problem is essentially the same as finding the optimal route.
The subsidiary question is what the probability is 'of', here. Can the probability be 100%? How would you know?
Part of the reason the travelling salesman problem is hard is that there is no way to know that you have the optimal solution except looking at all the solutions and finding that it is the shortest.
Replace probability matrix (p) with -log(p) and finding shortest path in that matrix would solve your problem.
After much thought, I came up with something that suits my needs. It still has the the same problem where to get an accurate answer would require checking every possible route. However, in my case, only checking direct and the first indirect routes are enough to give an idea of how "correct" my answer is.
First I need the confidence for each probability. This is a separate calculation and is contained in a separate matrix (that maps 1 to 1 to the probability matrix). I just take the 1.0-confidenceInterval for each probability.
If I have a route A->B->C->D, I calculate a "correctness indicator" for a point. It looks like I am getting some sort of average of a direct route and the first level of indirect routes.
Some examples:
Denote P(A,B) as probability that A comes before B
Denote C(A,B) as confidence in the probability that A comes before B
Denote P`(A,C) as confidence that A comes before C based on the indirect route A->B->C
At point B, likelihood that A comes before it:
indicator = P(A,B)*C(A,B)/C(A,B)
At point C, likelihood that A & B come before:
P(A,C) = P(A,B)*P(B,C)
C(A,C) = C(A,B)*C(B,C)
indicator = [P(A,C)*C(A,C) + P(B,C)*C(B,C) + P'(A,C)*C'(A,C)]/[C(A,C)+C(B,C)+C'(A,C)]
So this gives me some sort of indicator that is always between 0 and 1, and takes the first level indirect route into account (from->indirectPoint->to). It seems to provide the rough estimation I was looking for. It is not a great answer, but it does provide some estimate and since nothing else provides anything better, it is suitable

If there are M different boxes and N identical balls

and we need to put these balls into boxes.
How many states of the states could there be?
This is part of a computer simulation puzzle. I've almost forget all my math knowledges.
I believe you are looking for the Multinomial Coefficient.
I will check myself and expand my answer.
Edit:
If you take a look at the wikipedia article I gave a link to, you can see that the M and N you defined in your question correspond to the m and n defined in the Theorem section.
This means that your question corresponds to: "What is the number of possible coefficient orderings when expanding a polynomial raised to an arbitrary power?", where N is the power, and M is the number of variables in the polynomial.
In other words:
What you are looking for is to sum over the multinomial coefficients of a polynomial of M variables expanded when raised to the power on N.
The exact equations are a bit long, but they are explained very clearly in wikipedia.
Why is this true:
The multinomial coefficient gives you the number of ways to order identical balls between baskets when grouped into a specific grouping (for example, 4 balls grouped into 3, 1, and 1 - in this case M=4 and N=3). When summing over all grouping options you get all possible combinations.
I hope this helped you out.
These notes explain how to solve the "balls in boxes" problem in general: whether the balls are labeled or not, whether the boxes are labeled or not, whether you have to have at least one ball in each box, etc.
this is a basic combinatorial question (distribution of identical objects into non identical slots)
the number of states is [(N+M-1) choose (M-1)]

How to check if m n-sized vectors are linearly independent?

Disclaimer
This is not strictly a programming question, but most programmers soon or later have to deal with math (especially algebra), so I think that the answer could turn out to be useful to someone else in the future.
Now the problem
I'm trying to check if m vectors of dimension n are linearly independent. If m == n you can just build a matrix using the vectors and check if the determinant is != 0. But what if m < n?
Any hints?
See also this video lecture.
Construct a matrix of the vectors (one row per vector), and perform a Gaussian elimination on this matrix. If any of the matrix rows cancels out, they are not linearly independent.
The trivial case is when m > n, in this case, they cannot be linearly independent.
Construct a matrix M whose rows are the vectors and determine the rank of M. If the rank of M is less than m (the number of vectors) then there is a linear dependence. In the algorithm to determine the rank of M you can stop the procedure as soon as you obtain one row of zeros, but running the algorithm to completion has the added bonanza of providing the dimension of the spanning set of the vectors. Oh, and the algorithm to determine the rank of M is merely Gaussian elimination.
Take care for numerical instability. See the warning at the beginning of chapter two in Numerical Recipes.
If m<n, you will have to do some operation on them (there are multiple possibilities: Gaussian elimination, orthogonalization, etc., almost any transformation which can be used for solving equations will do) and check the result (eg. Gaussian elimination => zero row or column, orthogonalization => zero vector, SVD => zero singular number)
However, note that this question is a bad question for a programmer to ask, and this problem is a bad problem for a program to solve. That's because every linearly dependent set of n<m vectors has a different set of linearly independent vectors nearby (eg. the problem is numerically unstable)
I have been working on this problem these days.
Previously, I have found some algorithms regarding Gaussian or Gaussian-Jordan elimination, but most of those algorithms only apply to square matrix, not general matrix.
To apply for general matrix, one of the best answers might be this:
http://rosettacode.org/wiki/Reduced_row_echelon_form#MATLAB
You can find both pseudo-code and source code in various languages.
As for me, I transformed the Python source code to C++, causes the C++ code provided in the above link is somehow complex and inappropriate to implement in my simulation.
Hope this will help you, and good luck ^^
If computing power is not a problem, probably the best way is to find singular values of the matrix. Basically you need to find eigenvalues of M'*M and look at the ratio of the largest to the smallest. If the ratio is not very big, the vectors are independent.
Another way to check that m row vectors are linearly independent, when put in a matrix M of size mxn, is to compute
det(M * M^T)
i.e. the determinant of a mxm square matrix. It will be zero if and only if M has some dependent rows. However Gaussian elimination should be in general faster.
Sorry man, my mistake...
The source code provided in the above link turns out to be incorrect, at least the python code I have tested and the C++ code I have transformed does not generates the right answer all the time. (while for the exmample in the above link, the result is correct :) -- )
To test the python code, simply replace the mtx with
[30,10,20,0],[60,20,40,0]
and the returned result would be like:
[1,0,0,0],[0,1,2,0]
Nevertheless, I have got a way out of this. It's just this time I transformed the matalb source code of rref function to C++. You can run matlab and use the type rref command to get the source code of rref.
Just notice that if you are working with some really large value or really small value, make sure use the long double datatype in c++. Otherwise, the result will be truncated and inconsistent with the matlab result.
I have been conducting large simulations in ns2, and all the observed results are sound.
hope this will help you and any other who have encontered the problem...
A very simple way, that is not the most computationally efficient, is to simply remove random rows until m=n and then apply the determinant trick.
m < n: remove rows (make the vectors shorter) until the matrix is square, and then
m = n: check if the determinant is 0 (as you said)
m < n (the number of vectors is greater than their length): they are linearly dependent (always).
The reason, in short, is that any solution to the system of m x n equations is also a solution to the n x n system of equations (you're trying to solve Av=0). For a better explanation, see Wikipedia, which explains it better than I can.

Resources