Easiest way to determine time complexity from run times - math

Lets suppose I am trying to analyze an algorithm and all I can do is run it with different inputs. I can construct a set of points (x,y) as (sample size, run time).
I would like to dynamically categorize the algorithm into a complexity class (linear, quadratic, exponential, logarithmic, etc..)
Ideally I could give an equation that more or less approximates the behavior.
I am just not sure what the best way to do this is.
For any degree polynomial I can create regression curves and come up with some measure of fitness, but I don't really have a clue how I would do that for any nonpolynomial function. It is harder since I don't have any previous knowledge of what shape I should try to fit.
This may be more of a math question than a programming question, but it is very interesting to me. I'm not a mathematician, so there may be a simpler established method to get a reasonable function from a set of points that I just don't know about. Does anyone have any ideas for solving a problem like this? Is there a numerical library for C# that could help me crunch the numbers?

Well there are not that many complexity classes you really care about, so let's say: linear, quadratic, polynomial (degree > 2), exponential, and logarithmic.
For each of these you could use the largest (x,y) pair to solve for the unknown variable. Let y = f(x) denote the runtime of your algorithm as a function of the sample size. Let's assume that f(1) = 0, and if it doesn't we can always subtract of that value y(1) from each of the y's, this just eliminates the constants in f(x). Let y(end) denote the last (and largest) value of y in your (x,y) data set.
At this point we can solve for the unknown in each canonical form:
f(x) = c*x
f(x) = c*x^2
f(x) = x^c
f(x) = c^x
f(x) = log(x)/log(c)
Since there is only a single unknown in each equation we can you any point to solve for it. Consider the following data generated from a polynomial of random degree > 2:
x = [ 1 2 3 4 5 6 7 8 9 10 ];
y = [ 0 6 19 44 81 135 206 297 411 550 ];
If we use the last point to solve for c for each possibility (assuming this would be the least noise estimate)
550 = c*10 -> c = 55
550 = c*10^2 -> c = 5.5
550 = 10^c -> c = log(550)/log(10) ~= 2.74
550 = c^10 -> c = 550^(1/10) ~= 1.88
550 = log(x)/log(c) -> c = 10^(1/550) ~= 1.0042
We can now compare how well each of these functions fit the remaining data, here is a plot:
I'm new and I can't post images so look at the plot here: http://i.stack.imgur.com/UH6T8.png
The true data is shown in the red asterisk, linear with green line, quadratic in blue, polynomial in black, exponential in pink, and the log plot in green with O's. It should be pretty clear from the residuals what function fits your data the best.

Curve fitting used to be an art, but is now somehow decadent :) (That's a joke for the physicists around)
A lot of progress has been made, that allows simple mortals to guess (some) non trivial functional dependencies.
I'll not enter into a description of the methods and limitations, but instead I'll refer you to eureqa, which is a very nice piece of software developed at Cornell.
Eureqa (pronounced "eureka") is a software tool for detecting equations and hidden mathematical relationships in your data. Its goal is to identify the simplest mathematical formulas which could describe the underlying mechanisms that produced the data. Eureqa is free to download and use. Look for the program download, video tutorial, user forum, and other and reference materials.
I tried eureqa several times with very good results if the models are not too complicated. I think it is good enough for distinguishing between polynomials, logs and exponentials.
HTH!
Post Scriptum:
Regrettably the software isn't free anymore :(

Related

lpsolve - unfeasible solution, but I have example of 1

I'm trying to solve this in LPSolve IDE:
/* Objective function */
min: x + y;
/* Variable bounds */
r_1: 2x = 2y;
r_2: x + y = 1.11 x y;
r_3: x >= 1;
r_4: y >= 1;
but the response I get is:
Model name: 'LPSolver' - run #1
Objective: Minimize(R0)
SUBMITTED
Model size: 4 constraints, 2 variables, 5 non-zeros.
Sets: 0 GUB, 0 SOS.
Using DUAL simplex for phase 1 and PRIMAL simplex for phase 2.
The primal and dual simplex pricing strategy set to 'Devex'.
The model is INFEASIBLE
lp_solve unsuccessful after 2 iter and a last best value of 1e+030
How come this can happen when x=1.801801802 and y=1.801801802 are possible solutions here?
How To Find The Solution
Let's do some math.
Your problem is:
min x+y
s.t. 2x = 2y
x + y = 1.11 x y
x >= 1
y >= 1
The first constraint 2x = 2y can be simplified to x=y. We now substitute throughout the problem:
min 2*x
s.t. 2*x = 1.11 x^2
x >= 1
And rearrange:
min 2*x
s.t. 1.11 x^2-2*x=0
x >= 1
From geometry we know that 1.11 x^2-2*x makes an upward-opening parabola with a minimum less than zero. Therefore, there are exactly two points. These are given by the quadratic equation: 200/111 and 0.
Only one of these satisfies the second constraint: 200/111.
Why Can't I Find This Constraint With My Solver
The easy way out is to say it's because the x^2 term (x*y before the substitution is nonlinear). But it goes a little deeper than that. Nonlinear problems can be easy to solve as long as they are convex. A convex problem is one whose constraints form a single, contiguous space such that any line drawn between two points in the space stays within the boundaries of the space.
Your problem is not convex. The constraint 1.11 x^2-2*x=0 defines an infinite number of points. No two of these points can be connected by a straight line which stays in the space defined by the constraint because that space is curved. If the constraint were instead 1.11 x^2-2*x<=0 then the space would be convex because all points could be connected with straight lines that stay in its interior.
Nonconvex problems are part of a broader class of problems called NP-Hard. This means that there is not (and perhaps cannot) be any easy way of solving the problem. We have to be smart.
Solvers that can handle mixed-integer programming (MIP/MILP) can solve many non-convex problems efficiently, as can other techniques such as genetic algorithms. But, beneath the hood, these techniques all rely on glorified guess-and-check.
So your solver fails because the problem is nonconvex and your solver is neither smart enough to use MIP to guess-and-check its way to a solution nor smart enough to use the quadratic equation.
How Then Can I Solve The Problem?
In this particular instance, we are able to use mathematics to quickly find a solution because, although the problem is nonconvex, it is part of a class of special cases. Deep thinking by mathematicians has given us a simple way of handling this class.
But consider a few generalizations of the problem:
(a) a x^3+b x^2+c x+d=0
(b) a x^4+b x^3+c x^2+d x+e =0
(c) a x^5+b x^4+c x^3+d x^2+e x+f=0
(a) has three potential solutions which must be checked (exact solutions are tricky), (b) has four (trickier), and (c) has five. The formulas for (a) and (b) are much more complex than the quadratic formula and mathematicians have shown that there is no formula for (c) that can be expressed using "elementary operations". Instead, we have to resort to glorified guess-and-check.
So the techniques we used to solve your problem don't generalize very well. This is what it means to live in the realm of the nonconvex and NP-hard, and it's a good reason to fund research in mathematics, computer science, and related fields.

Simple Orthographic Structure from Motion using R -- Determining Metric Constraints

I would like to build a simple structure from motion program according to Tomasi and Kanade [1992]. The article can be found below:
https://people.eecs.berkeley.edu/~yang/courses/cs294-6/papers/TomasiC_Shape%20and%20motion%20from%20image%20streams%20under%20orthography.pdf
This method seems elegant and simple, however, I am having trouble calculating the metric constraints outlined in equation 16 of the above reference.
I am using R and have outlined my work thus far below:
Given a set of images
I want to track the corners of the three cabinet doors and the one picture (black points on images). First we read in the points as a matrix w where
Ultimately, we want to factorize w into a rotation matrix R and shape matrix S that describe the 3 dimensional points. I will spare as many details as I can but a complete description of the maths can be gleaned from the Tomasi and Kanade [1992] paper.
I supply w below:
w.vector=c(0.2076,0.1369,0.1918,0.1862,0.1741,0.1434,0.176,0.1723,0.2047,0.233,0.3593,0.3668,0.3744,0.3593,0.3876,0.3574,0.3639,0.3062,0.3295,0.3267,0.3128,0.2811,0.2979,0.2876,0.2782,0.2876,0.3838,0.3819,0.3819,0.3649,0.3913,0.3555,0.3593,0.2997,0.3202,0.3137,0.31,0.2718,0.2895,0.2867,0.825,0.7703,0.742,0.7251,0.7232,0.7138,0.7345,0.6911,0.1937,0.1248,0.1723,0.1741,0.1657,0.1313,0.162,0.1657,0.8834,0.8118,0.7552,0.727,0.7364,0.7232,0.7288,0.6892,0.4309,0.3798,0.4021,0.3965,0.3844,0.3546,0.3695,0.3583,0.314,0.3065,0.3989,0.3876,0.3857,0.3781,0.3989,0.3593,0.5184,0.4849,0.5147,0.5193,0.5109,0.4812,0.4979,0.4849,0.3536,0.3517,0.4121,0.3951,0.3951,0.3781,0.397,0.348,0.5175,0.484,0.5091,0.5147,0.5128,0.4784,0.4905,0.4821,0.7722,0.7326,0.7326,0.7232,0.7232,0.7119,0.7402,0.7006,0.4281,0.3779,0.3918,0.3863,0.3825,0.3472,0.3611,0.3537,0.8043,0.7628,0.7458,0.7288,0.727,0.7213,0.7364,0.6949,0.5789,0.5491,0.5761,0.5817,0.5733,0.5444,0.5537,0.5379,0.3649,0.3536,0.4177,0.3951,0.3857,0.3819,0.397,0.3461,0.697,0.671,0.6821,0.6821,0.6719,0.6412,0.6468,0.6235,0.3744,0.3649,0.4159,0.3819,0.3781,0.3612,0.3763,0.314,0.7008,0.6691,0.6794,0.6812,0.6747,0.6393,0.6412,0.6235,0.7571,0.7345,0.7439,0.7496,0.7402,0.742,0.7647,0.7213,0.5817,0.5463,0.5696,0.5779,0.5761,0.5398,0.551,0.5398,0.7665,0.7326,0.7439,0.7345,0.7288,0.727,0.7515,0.7062,0.8301,0.818,0.8571,0.8878,0.8766,0.8561,0.858,0.8394,0.4121,0.3876,0.4347,0.397,0.38,0.3631,0.3668,0.2971,0.912,0.8962,0.9185,0.939,0.9259,0.898,0.8887,0.8571,0.3989,0.3781,0.4215,0.3725,0.3612,0.3461,0.3423,0.2782,0.9092,0.8952,0.9176,0.9399,0.925,0.8971,0.8887,0.8571,0.4743,0.4536,0.4894,0.4517,0.446,0.4328,0.4385,0.3706,0.8273,0.8171,0.8571,0.8878,0.8766,0.8543,0.8561,0.8394,0.4743,0.4554,0.4969,0.4668,0.4536,0.4404,0.4536,0.3857)
w=matrix(w.vector,ncol=16,nrow=16,byrow=FALSE)
Then create registered measurement matrix wm according to equation 2 as
by
wm = w - rowMeans(w)
We can decompose wm into a '2FxP' matrix o1 a diagonal 'PxP' matrix e and 'PxP' matrix o2 by using a singular value decomposition.
svdwm <- svd(wm)
o1 <- svdwm$u
e <- diag(svdwm$d)
o2 <- t(svdwm$v) ## dont forget the transpose!
However, because of noise, we only pay attention to the first 3 columns of o1, first 3 values of e and the first 3 rows of o2 by:
o1p <- svdwm$u[,1:3]
ep <- diag(svdwm$d[1:3])
o2p <- t(svdwm$v)[1:3,] ## dont forget the transpose!
Now we can solve for our rhat and shat in equation (14)
by
rhat <- o1p%*%ep^(1/2)
shat <- ep^(1/2) %*% o2p
However, these results are not unique and we still need to solve for R and S by equation (15)
by using the metric constraints of equation (16)
Now I need to find Q. I believe there are two potential methods but am unclear how to employ either.
Method 1 involves solving for B where B=Q%*%solve(Q) then using Cholesky decomposition to find Q. Method 1 appears to be the common choice in literature, however, little detail is given as to how to actually solve the linear system. It is apparent that B is a '3x3' symmetric matrix of 6 unknowns. However, given the metric constraints (equations 16), I don't know how to solve for 6 unknowns given 3 equations. Am I forgetting a property of symmetric matrices?
Method II involves using non-linear methods to estimate Q and is less commonly used in structure from motion literature.
Can anyone offer some advice as to how to go about solving this problem? Thanks in advance and let me know if I need to be more clear in my question.
can be written as .
can be written as .
can be written as .
so our equations are:
So the first equation can be written as:
which is equivalent to
To keep it short we define now:
(I know the spacings are terrably small, but yes, this is a Vector...)
So for all equations in all different Frames f, we can write one big equation:
(sorry for the ugly formulas...)
Now you just need to solve the -Matrix using Cholesky decomposition or whatever...

differentiation in matlab

i need to find acceleration of an object the formula for that given in text is a = d^2(L)/d(T)^2 , where L= length and T= time
i calculated this in matlab by using this equation
a = (1/(T3-T1))*(((L3-L2)/(T3-T2))-((L2-L1)/(T2-T1)))
or
a = (v2-v1)/(T2-T1)
but im not getting the right answers ,can any body tell me how to find (a) by any other method in matlab.
This has nothing to do with matlab, you are just trying to numerically differentiate a function twice. Depending on the behaviour of the higher (3rd, 4th) derivatives of the function this will or will not yield reasonable results. You will also have to expect an error of order |T3 - T1|^2 with a formula like the one you are using, assuming L is four times differentiable. Instead of using intervals of different size you may try to use symmetric approximations like
v (x) = (L(x-h) - L(x+h))/ 2h
a (x) = (L(x-h) - 2 L(x) + L(x+h))/ h^2
From what I recall from my numerical math lectures this is better suited for numerical calculation of higher order derivatives. You will still get an error of order
C |h|^2, with C = O( ||d^4 L / dt^4 || )
with ||.|| denoting the supremum norm of a function (that is, the fourth derivative of L needs to be bounded). In case that's true you can use that formula to calculate how small h has to be chosen in order to produce a result you are willing to accept. Note, though, that this is just the theoretical error which is a consequence of an analysis of the Taylor approximation of L, see [1] or [2] -- this is where I got it from a moment ago -- or any other introductory book on numerical mathematics. You may get additional errors depending on the quality of the evaluation of L; also, if |L(x-h) - L(x)| is very small numerical substraction may be ill conditioned.
[1] Knabner, Angermann; Numerik partieller Differentialgleichungen; Springer
[2] http://math.fullerton.edu/mathews/n2003/numericaldiffmod.html

Help understanding unipolar transfer function

There is a question I am stuck on using the following formula for the unipolar transfer function:
f(net)= 1
__________
-net
1 + e
The example has the following:
out = 1
____________ = 0.977
-3.75
1 + e
How do we arrive at 0.977?
What is e?
e = 2.71828... is the base of natural logarithms. It's a mathematical constant that comes up in many different equations, similar to π. You will see it all the time when doing exponents and logarithms.
Plug it into your equation and you get 0.977.
While factually correct the other responses merely provide the value of e and confirm the underlying computation. This type of sigmoid functions is so ubiquitous to neural networks that some additional insight may be welcome.
Essentially the exponential function (e to the x power), has a very characteristic curve:
Mostly flat at zero (very slightly above zero, actually), from - infinity to about -2
incrementally sharp turn towards the vertical, between about -2 and +4
quasi "vertical", with values in excess of 150 and increasingly huge, from +5 to infinity
As a result exponential curves are very useful for producing "S-shaped" functions; BTW, "S" is Sigma in Greek which supplied the etymology for "sigmoid". Such functions are often patterned on the formula shown in the question:
1/(1 + e^-x)
where x is the variable. Typically such functions also include constants aimed at stretching the range (the input zone where changes in x are significant) and/or at modifying the curve in this middle zone.
The result of such functions is that up to a particular value of the input, the function is quasi constant, then, for a particular range of inputs, the function provides a increasing output, and finally past the upper value of the range, the function is quasi constant. Also looking in more details, such Sigmoids have a point of inflection which correspond to a reversing of the rate of change of the ouptut and which also marks an area of the curve, on either side, where the changes are the slowest, relatively.
In turn, such S-shaped curves (1) are very useful to normalize the output of neural network neurons, or more generally, to normalize various numeric values during processes of various nature. Intuitively these correspond to a "sweet spot" or a "sweet range" of the underlying neuron or device.
(1) Or also, possibly, "step-down" shaped curves, i.e. curves with a mostly constant high value, a decreasing value within the mid-range, and a low mostly constant value thereafter.
e is Euler's number == 2.718281828....
If you raise e to the -3.75 power, add one to it, and take the inverse, you'll get precisely 0.977022630....
'e' is the base for the natural logarithm function, the value of which is equivalent to the sum of the infinite series 1/n! for n from 0 to infinity. It is available in the C standard library or the java Math package as the exp() function.
If you evaluate 1/(1+exp(-3.75)) you will get 0.977

higher order linear regression

I have the matrix system:
A x B = C
A is a by n and B is n by b. Both A and B are unknown but I have partial information about C (I have some values in it but not all) and n is picked to be small enough that the system is expected to be over constrained. It is not required that all rows in A or columns in B are over constrained.
I'm looking for something like least squares linear regression to find a best fit for this system (Note: I known there will not be a single unique solution but all I want is one of the best solutions)
To make a concrete example; all the a's and b's are unknown, all the c's are known, and the ?'s are ignored. I want to find a least squares solution only taking into account the know c's.
[ a11, a12 ] [ c11, c12, c13, c14, ? ]
[ a21, a22 ] [ b11, b12, b13, b14, b15] [ c21, c22, c23, c24, c25 ]
[ a31, a32 ] x [ b21, b22, b23, b24, b25] = C ~= [ c31, c32, c33, ?, c35 ]
[ a41, a42 ] [ ?, ?, c43, c44, c45 ]
[ a51, a52 ] [ c51, c52, c53, c54, c55 ]
Note that if B is trimmed to b11 and b21 only and the unknown row 4 chomped out, then this is almost a standard least squares linear regression problem.
This problem is illposed as described.
Let A, B, and C=5, be scalars. You are asking to solve
a*b=5
which has an infinite number of solutions.
One approach, on the information provided above, is to minimize
the function g defined as
g(A,B) = ||AB-C||^2 = trace((AB-C)*(AB-C))^2
using Newtons method or a quasi-secant approach (BFGS).
(You can easily compute the gradient here).
M* is the transpose of M and multiplication is implicit.
(The norm is the frobenius norm... I removed the
underscore F as it was not displaying properly)
As this is an inherently nonlinear problem, standard linear
algebra approaches do not apply.
If you provide more information, I may be able to help more.
Some more questions: I think the issue is here is that without
more information, there is no "best solution". We need to
determine a more concrete idea of what we are looking for.
One idea, could be a "sparsest" solution. This area is
a hot area of research, with some of the best minds in the
world working here (See Terry Tao et al. work on Nuclear Norm)
This problem although tractable is still hard.
Unfortunately, I am not yet able to comment, so I will add my comments here.
As said below, LM is a great approach to solving this and is just one approach.
along the lines of the Newton type approaches to either
the optimization problem or the nonlinear solving problem.
Here is an idea, using the example you gave above: Lets define
two new vectors, V and U each with 21 elements (exactly the same number of defined
elements in C).
V is precisely the known elements of C, column ordered, so (in matlab notation)
V = [C11; C21; C31; C51; C12; .... ; C55]
U is a vector which is a column ordering of the product AB, LEAVING OUT THE
ELEMENTS CORRESPONDING TO '?' in matrix C. Collecting all the variables into x
we have
x = [a11, a21, .. a52, b11, b21 ..., b25].
f(x) = U (as defined above).
We can now try to solve f(x)=V with your favorite nonlinear least squares method.
As an aside, although a poster below recommended simulated annealing, I recommend
against it. THere are some problems it works, but it is a heuristic. When you have
powerful analytic methods such as Gauss-Newton or LM, I say use them. (in my own
experience that is)
A wild guess: A singular value decomposition might do the trick?
I have no idea on how to deal with your missing values, so I'm going to ignore that problem.
There are no unique solutions. To find a best solution you need some sort of a metric to judge them by. I'm going to suppose you want to use a least squares metric, i.e. the best guess values of A and B are those that minimize sum of the numbers [C_ij-(A B)_ij]^2.
One thing you didn't mention is how to determine the value you are going to use for n. In short, we can come up with 'good' solutions if 1 <= n <= b. This is because 1 <= rank(span(C)) <= b. Where rank(span(C)) = the dimension of the column space of C. Note that this is assuming a >= b. To be more correct we would write 1 <= rank(span(C)) <= min(a,b).
Now, supposing that you have chosen n such that 1 <= n <= b. You are going to minimize the residual sum of squares if you chose the columns of A such that span(A) = span(First n eigen vectors of C). If you don't have any other good reasons, just choose the columns of A to be to first n eigen vectors of C. Once you have chosen A, you can get the values of B in the usual linear regression way. I.e. B = (A'A)^(-1)A' C
You have a couple of options. The Levenberg-Marquadt algorithm is generally recognized as the best LS method. A free implementation is available at here. However, if the calculation is fast and you have a decent number of parameters, I would strongly suggest a Monte Carlo method such as simulated annealing.
You start with some set of parameters in the answer, and then you increase one of them by a random percentage up to a maximum. You then calculate the fitness function for your system. Now, here's the trick. You don't throw away the bad answers. You accept them with a Boltzmann probability distribution.
P = exp(-(x-x0)/T)
where T is a temperature parameter and x-x0 is the current fitness value minus the previous. After x number of iterations, you decrease T by a fixed amount (this is called the cooling schedule). You then repeat this process for another random parameter. As T decreases, fewer poor solutions are chosen, and eventually the procedure becomes a "greedy search" only accepting the solutions that improve the fit. If your system has many free parameters (> 10 or so), this is really the only way to go where you will have any chance of getting to a global minimum. This fitting method takes about 20 minutes to write in code, and a couple of hours to tweak. Hope this helps.
FYI, Wolfram has a nice discussion of this in the context of the traveling salesman problem, and I've been using it very successfully to solve some very difficult global minimization problems. It is slower than LM methods, but much better in most difficult/relatively large cases.
Based on the realization that cutting B to a single column and them removing row with unknowns converts this to very near a known problem, One approach would be to:
seed A with random values.
solve for each column of B independently.
rework the problem to allow solving for each row of A given the B values from step 2.
repeat at step 2 until things settle out.
I have no clue if that is even stable.

Resources