How to calculate the likelihood for an element in a route that traverses a probability graph is correct? - math

I have an asymmetric directed graph with a set of probabilities (so the likelihood that a person will move from point A to B, or point A to C, etc). Given a route through all the points, I would like to calculate the likelihood that each choice made in the route is a good choice.
As an example, suppose a graph of just 2 points.
//In a matrix, the probabilities might look like
//A B
[ 0 0.9 //A
0.1 0 ] //B
So the probability of moving from A to B is 0.9 and from B to A is 0.1. Given the route A->B, how correct is the first point (A), and how correct is the second point (B).
Suppose I have a bigger matrix with a route that goes A->B->C->D. So, some examples of what I would like to know:
How likely is it that A comes before B,C, & D
How likely is it that B comes after A
How likely is it that C & D come after B
Basically, at each point, I want to know the likelihood that the previous points come before the current and also the likelihood that the following points come after. I don't need something that is statistically sound. Just an indicator that I can use for relative comparisons. Any ideas?
update: I see that this question is not useful to everyone but the answer is really useful to me so I've tried to make the description of the problem more clear and will include my answer shortly in case it helps someone.

I don't think that's possible efficiently. If there was an algorithm to calculate the probability that a point was in the wrong position, you could simply work out which position was least wrong for each point, and thus calculate the correct order. The problem is essentially the same as finding the optimal route.
The subsidiary question is what the probability is 'of', here. Can the probability be 100%? How would you know?
Part of the reason the travelling salesman problem is hard is that there is no way to know that you have the optimal solution except looking at all the solutions and finding that it is the shortest.

Replace probability matrix (p) with -log(p) and finding shortest path in that matrix would solve your problem.

After much thought, I came up with something that suits my needs. It still has the the same problem where to get an accurate answer would require checking every possible route. However, in my case, only checking direct and the first indirect routes are enough to give an idea of how "correct" my answer is.
First I need the confidence for each probability. This is a separate calculation and is contained in a separate matrix (that maps 1 to 1 to the probability matrix). I just take the 1.0-confidenceInterval for each probability.
If I have a route A->B->C->D, I calculate a "correctness indicator" for a point. It looks like I am getting some sort of average of a direct route and the first level of indirect routes.
Some examples:
Denote P(A,B) as probability that A comes before B
Denote C(A,B) as confidence in the probability that A comes before B
Denote P`(A,C) as confidence that A comes before C based on the indirect route A->B->C
At point B, likelihood that A comes before it:
indicator = P(A,B)*C(A,B)/C(A,B)
At point C, likelihood that A & B come before:
P(A,C) = P(A,B)*P(B,C)
C(A,C) = C(A,B)*C(B,C)
indicator = [P(A,C)*C(A,C) + P(B,C)*C(B,C) + P'(A,C)*C'(A,C)]/[C(A,C)+C(B,C)+C'(A,C)]
So this gives me some sort of indicator that is always between 0 and 1, and takes the first level indirect route into account (from->indirectPoint->to). It seems to provide the rough estimation I was looking for. It is not a great answer, but it does provide some estimate and since nothing else provides anything better, it is suitable

Related

How is RSME calculated between point clouds?

RSME calculates how close the predicted value is compared to the actual value, but in a point cloud, there are 2 things that I am confused about:
How do we know which point corresponds to which point, to be subtracted from?
Point clouds are 3-dimensional since it has xyz values, but how do people turn those 3 values to one RSME value?
First of all, it's RMSE, not RSME. It stands for Root Mean Square Error:
https://en.wikipedia.org/wiki/Root-mean-square_deviation
With 3D coordinates you can compare component wise, or however else you choose to define a distance measure. Then you plug this into the RMSE formula. Essentially this means comparing an expected value to your observed value.
As for the point correspondence - this depends on the algorithm of choice. Probably one of the most famous examples is ICP:
https://de.wikipedia.org/wiki/Iterative_Closest_Point_Algorithm
In a nutshell for every point of one cloud, the closest point of the other cloud is determined. Then an error measure is calculated and lastly points are transformed. This is done an arbitrary number of times, depending on the desired precision.
Since I strongly suspect that you are indeed looking for ICP, here is the description as to how they are put together:
https://en.wikipedia.org/wiki/Iterative_closest_point
Other than that you will have to do some reading yourself.

Find the first root and local maximum/minimum of a function

Problem
I want to find
The first root
The first local minimum/maximum
of a black-box function in a given range.
The function has following properties:
It's continuous and differentiable.
It's combination of constant and periodic functions. All periods are known.
(It's better if it can be done with weaker assumptions)
What is the fastest way to get the root and the extremum?
Do I need more assumptions or bounds of the function?
What I've tried
I know I can use root-finding algorithm. What I don't know is how to find the first root efficiently.
It needs to be fast enough so that it can run within a few miliseconds with precision of 1.0 and range of 1.0e+8, which is the problem.
Since the range could be quite large and it should be precise enough, I can't brute-force it by checking all the possible subranges.
I considered bisection method, but it's too slow to find the first root if the function has only one big root in the range, as every subrange should be checked.
It's preferable if the solution is in java, but any similar language is fine.
Background
I want to calculate when arbitrary celestial object reaches certain height.
It's a configuration-defined virtual object, so I can't assume anything about the object.
It's not easy to get either analytical solution or simple approximation because various coordinates are involved.
I decided to find a numerical solution for this.
For a general black box function, this can't really be done. Any root finding algorithm on a black box function can't guarantee that it has found all the roots or any particular root, even if the function is continuous and differentiable.
The property of being periodic gives a bit more hope, but you can still have periodic functions with infinitely many roots in a bounded domain. Given that your function relates to celestial objects, this isn't likely to happen. Assuming your periodic functions are sinusoidal, I believe you can get away with checking subranges on the order of one-quarter of the shortest period (out of all the periodic components).
Maybe try Brent's Method on the shortest quarter period subranges?
Another approach would be to apply your root finding algorithm iteratively. If your range is (a, b), then apply your algorithm to that range to find a root at say c < b. Then apply your algorithm to the range (a, c) to find a root in that range. Continue until no more roots are found. The last root you found is a good candidate for your minimum root.
Black box function for any range? You cannot even be sure it has the continuous domain over that range. What kind of solutions are you looking for? Natural numbers, integers, real numbers, complex? These are all the question that greatly impact the answer.
So 1st thing should be determining what kind of number you accept as the result.
Second is having some kind of protection against limes of function that will try to explode your calculations as it goes for plus or minus infinity.
Since we are touching the limes topics you could have your solution edge towards zero and look like a solution but never touch 0 and become a solution. This depends on your margin of error, how close something has to be to be considered ok, it's good enough.
I think for this your SIMPLEST TO IMPLEMENT bet for real number solutions (I assume those) is to take an interval and this divide and conquer algorithm:
Take lower and upper border and middle value (or approx middle value for infinity decimals border/borders)
Try to calculate solution with all 3 and have some kind of protection against infinities
remember all 3 values in an array with results from them (3 pair of values)
remember the current best value (one its closest to solution) in seperate variable (a pair of value and result for that value)
STEP FORWARD - repeat above with 1st -2nd value range and 2nd -3rd value range
have a new pair of value and result to be closest to solution.
clear the old value-result pairs, replace them with new ones gotten from this iteration while remembering the best value solution pair (total)
Repeat above for how precise you wish to get and look at that memory explode with each iteration, keep in mind you are gonna to have exponential growth of values there. It can be further improved if you lets say take one interval and go as deep as you wanna, remember best value-result pair and then delete all other memory and go for next interval and dig deep.

Find solution minimum spanning tree (with conditions) when extending graph

I have a logic question, therefore chose from two explanations:
Mathematical:
I have a undirected weighted complete graph over 2-14 nodes. The nodes always come in pairs (startpoint to endpoint). For this I already have the minimum spanning tree, which considers that the pairs startpoint always comes before his endpoint. Now I want to add another pair of nodes.
Real life explanation:
I already have a optimal taxi route for 1-7 people. Each joins (startpoint) and leaves (endpoint) at different places. Now I want to find the optimal route when I add another person to the taxi. I have already the calculated subpaths from each point to each point in my database (therefore this is a weighted graph). All calculated paths are real value, not heuristics.
Now I try to find the most performant solution to solve this. My current idea:
Find the point nearest to the new startpoint. Add it a) before and b) after this point. Choose the faster one.
Find the point nearest to the new endpoint. Add it a) before and b) after this point. Choose the faster one.
Ignoring the case that the new endpoint comes before the new start point, this seams feasible.
I expect that the general direction of the taxi is one direction, this eliminates the following edge case.
Is there any case I'm missing in which this algorithm wouldn't calculate the optimal solution?
There are definitely many cases were this algorithm (which is a First Fit construction heuristic) won't find the optimal solution. Given a reasonable sized dataset, in my experience, I would guess to get improvements of 10-20% by simply taking that result and adding metaheuristics (or other optimization algo's).
Explanation:
If you have multiple taxis with a limited person capacity, it has an inherit bin packing problem, which is NP-complete (which is proven to be suboptimally solved by all known construction heuristics in P).
But even if you have just 1 taxi, it is similar to TSP: if you have the optimal solution for 10 locations and add 1 location, it can create a snowball effect in the optimal solution to make the optimal solution look completely different. (sorry, no visual image of this yet)
And if you need to any additional constraints on top of that later on, you need to be aware of these false assumptions.

Solving a system of linear equations in a non-square matrix [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 3 years ago.
Improve this question
I have a system of linear equations that make up an NxM matrix (i.e. Non-square) which I need to solve - or at least attempt to solve in order to show that there is no solution to the system. (more likely than not, there will be no solution)
As I understand it, if my matrix is not square (over or under-determined), then no exact solution can be found - am I correct in thinking this? Is there a way to transform my matrix into a square matrix in order to calculate the determinate, apply Gaussian Elimination, Cramer's rule, etc?
It may be worth mentioning that the coefficients of my unknowns may be zero, so in certain, rare cases it would be possible to have a zero-column or zero-row.
Whether or not your matrix is square is not what determines the solution space. It is the rank of the matrix compared to the number of columns that determines that (see the rank-nullity theorem). In general you can have zero, one or an infinite number of solutions to a linear system of equations, depending on its rank and nullity relationship.
To answer your question, however, you can use Gaussian elimination to find the rank of the matrix and, if this indicates that solutions exist, find a particular solution x0 and the nullspace Null(A) of the matrix. Then, you can describe all your solutions as x = x0 + xn, where xn represents any element of Null(A). For example, if a matrix is full rank its nullspace will be empty and the linear system will have at most one solution. If its rank is also equal to the number of rows, then you have one unique solution. If the nullspace is of dimension one, then your solution will be a line that passes through x0, any point on that line satisfying the linear equations.
Ok, first off: a non-square system of equations can have an exact solution
[ 1 0 0 ][x] = [1]
[ 0 0 1 ][y] [1]
[z]
clearly has a solution (actually, it has an 1-dimensional family of solutions: x=z=1). Even if the system is overdetermined instead of underdetermined it may still have a solution:
[ 1 0 ][x] = [1]
[ 0 1 ][y] [1]
[ 1 1 ] [2]
(x=y=1). You may want to start by looking at least squares solution methods, which find the exact solution if one exists, and "the best" approximate solution (in some sense) if one does not.
Taking Ax = b, with A having m columns and n rows. We are not guaranteed to have one and only one solution, which in many cases is because we have more equations than unknowns (m bigger n). This could be because of repeated measurements, that we actually want because we are cautious about influence of noise.
If we observe that we can not find a solution that actually means, that there is no way to find b travelling the column space spanned by A. (As x is only taking a combination of the columns).
We can however ask for the point in the space spanned by A that is nearest to b. How can we find such a point? Walking on a plane the closest one can get to a point outside it, is to walk until you are right below. Geometrically speaking this is when our axis of sight is perpendicular to the plane.
Now that is something we can have a mathematical formulation of. A perpendicular vector reminds us of orthogonal projections. And that is what we are going to do. The simplest case tells us to do a.T b. But we can take the whole matrix A.T b.
For our equation let us apply the transformation to both sides: A.T Ax = A.T b.
Last step is to solve for x by taking the inverse of A.T A:
x = (A.T A)^-1 * A.T b
The least squares recommendation is a very good one.
I'll add that you can try a singular value decomposition (SVD) that will give you the best answer possible and provide information about the null space for free.

Function for returning a list of points on a Bezier curve at equal arclength

Someone somewhere has had to solve this problem. I can find many a great website explaining this problem and how to solve it. While I'm sure they are well written and make sense to math whizzes, that isn't me. And while I might understand in a vague sort of way, I do not understand how to turn that math into a function that I can use.
So I beg of you, if you have a function that can do this, in any language, (sure even fortran or heck 6502 assembler) - please help me out.
prefer an analytical to iterative solution
EDIT: Meant to specify that its a cubic bezier I'm trying to work with.
What you're asking for is the inverse of the arc length function. So, given a curve B, you want a function Linv(len) that returns a t between 0 and 1 such that the arc length of the curve between 0 and t is len.
If you had this function your problem is really easy to solve. Let B(0) be the first point. To find the next point, you'd simply compute B(Linv(w)) where w is the "equal arclength" that you refer to. To get the next point, just evaluate B(Linv(2*w)) and so on, until Linv(n*w) becomes greater than 1.
I've had to deal with this problem recently. I've come up with, or come across a few solutions, none of which are satisfactory to me (but maybe they will be for you).
Now, this is a bit complicated, so let me just give you the link to the source code first:
http://icedtea.classpath.org/~dlila/webrevs/perfWebrev/webrev/raw_files/new/src/share/classes/sun/java2d/pisces/Dasher.java. What you want is in the LengthIterator class. You shouldn't have to look at any other parts of the file. There are a bunch of methods that are defined in another file. To get to them just cut out everything from /raw_files/ to the end of the URL. This is how you use it. Initialize the object on a curve. Then to get the parameter of a point with arc length L from the beginning of the curve just call next(L) (to get the actual point just evaluate your curve at this parameter, using deCasteljau's algorithm, or zneak's suggestion). Every subsequent call of next(x) moves you a distance of x along the curve compared to your last position. next returns a negative number when you run out of curve.
Explanation of code: so, I needed a t value such that B(0) to B(t) would have length LEN (where LEN is known). I simply flattened the curve. So, just subdivide the curve recursively until each curve is close enough to a line (you can test for this by comparing the length of the control polygon to the length of the line joining the end points). You can compute the length of this sub-curve as (controlPolyLength + endPointsSegmentLen)/2. Add all these lengths to an accumulator, and stop the recursion when the accumulator value is >= LEN. Now, call the last subcurve C and let [t0, t1] be its domain. You know that the t you want is t0 <= t < t1, and you know the length from B(0) to B(t0) - call this value L0t0. So, now you need to find a t such that C(0) to C(t) has length LEN-L0t0. This is exactly the problem we started with, but on a smaller scale. We could use recursion, but that would be horribly slow, so instead we just use the fact that C is a very flat curve. We pretend C is a line, and compute the point at t using P=C(0)+((LEN-L0t0)/length(C))*(C(1)-C(0)). This point doesn't actually lie on the curve because it is on the line C(0)->C(1), but it's very close to the point we want. So, we just solve Bx(t)=Px and By(t)=Py. This is just finding cubic roots, which has a closed source solution, but I just used Newton's method. Now we have the t we want, and we can just compute C(t), which is the actual point.
I should mention that a few months ago I skimmed through a paper that had another solution to this that found an approximation to the natural parameterization of the curve. The author has posted a link to it here: Equidistant points across Bezier curves

Resources