What order of precedence does ∑ have? - math

Trying to implement a neural network algorithm here, but I'm a bit lost on the math side of things:
Note that p and i should be subscript (not sure how to do it in stackoverflow).
(ΣpΣi(tpi - opi)^2) / (n * k)
Basically my question is concerning the inner ∑ : Do I sum (for all i (tpi - opi)^2)? Or do I square (the sum for all i (tpi - opi))?

Sum of squares. So Σi(tpi - opi)^2 means (tp1 - op1)^2 + ... + (tpn - opn)^2. If you wanted square of sum it would be written most likely as (Σi(tpi - opi))^2. Also since its neural nets you probably mean the sum of squares.

Related

Runtime and space complexity of the recursive determinant algorithm for a n x n matrix

I am trying to figure out the runtime and space complexity of the algorithm below.
Some say that the runtime complexity of this is O(n!) and I am guessing it is because there are n! recursive calls for a recursive algorithm that solves for a n*n matrix. But I am not sure if I am right.
Also, is the space complexity also n!?
It might help to write out an explicit recurrence relation that governs the runtime of a straightforward implementation of the recursive algorithm. Notice that, in working on an n × n matrix, evaluating the sum requires making n recursive calls on matrices of size (n - 1) × (n - 1). Each recursive call requires about (n - 1)2 additional time to set up, since we need to extract a submatrix of that size from the original matrix, so the total per-call overhead of the algorithm would be Θ(n3) because we’re doing quadratic work linearly many times. That means that our work done is roughly
T(n) = nT(n - 1) + n3.
Completely ignoring the cubic term here, notice that expanding out the recursion will have the following effect:
T(n) = nT(n - 1) + ...
= n(n-1)T(n-2) + ...
= n(n-1)(n-2)T(n-3) + ...
and eventually we’ll get an n! term showing up, plus a bunch of extra terms from the cubic. So the work done here is at least Ω(n!), and probably a lot more once we factor in the cubic term.
As for the space complexity - when working with the space complexity, remember that once one branch of the recursion terminates we can reuse the space that branch was using. This means that we only really need to look at any one branch to see how much space is needed.
With a naive implementation of this summation where we explicitly compute the submatrices for the recursive calls, we’ll need space to store one matrix of size n × n, one of size (n-1) × (n-1), one of size (n-2) × (n-2), etc. That space usage sums up to Θ(n3).
There are a bunch of other algorithms you can use to compute determinants in much less time and space. Some are based on Gaussian elimination and run in time O(n3), for example.

calculate time complexity in recursive and dynamic solution of “number of ways to move from top left to bottom right in a matrix”

There is a m*n matrix and we need to find all possible paths from top left to bottom right.
It can be traversed only in right and down directions.
I have the following doubts:
In recursive approach I understand that the time complexity will be O(2(m+n)). How can I get it using induction?
How do I find the complexity in case of dynamic programming solution?
In dynamic programming you try to fill the array dp[i][j] where dp[i][j] means number of ways to reach cell (i,j) from top left cell. Also dp[i][j]=dp[i][j-1]+dp[i-1][j] , ( avoiding the corner case where i=1 or j=1). So in total you have to fill the dp table with n*m entries and each entry depends on constant number of entries ( at max 2 ) dp[i-1][j] and dp[i][j-1]. Thus complexity will be O(2*n * m) which is O(n*m).
Secondly,if we dont do dp or memoization ( can goole it ) and do it recursively then you are basically tracing all the possible paths while finding the count. So complexity would be number of paths from top left cell to bottom right. All paths will have m-1 horizontal and n-1 vertical moves.So number of paths becomes (m+n-2)! / ( (m-1)! * (n-1)! ). Which is the complexity, not exponential as you suggested.
For the first question without memoisation:
1) In recursive approach I understand that the time complexity will be
O(2(m+n)). How can I get it using induction?
when we represent the successive calls of the recursive function in a binary tree, at each floor k representing the kth move — 0 standing for the root of the binary tree, the start position — the function makes two new recursive calls at the k + 1th floor. Besides, as stated in sachas's answer, all paths will have m-1 horizontal and n-1 vertical moves. There are therefore (m-1)(n-1) floors, one for each possible kth move.
Then, because:
there are 2k calls per floor,
calls at every floor add up,
there is a total of (m-1)(n-1) floors,
there are therefore 20 + 20 + ... + 2(m - 1)(n - 1) = 2(m - 1)(n - 1) + 1 - 1 calls of the function (according to the formula of a sum of a geometric sequence), and the recursive function having a time complexity of O(1), the complexity is then O(2(m - 1)(n - 1) + 1) = O(2mn). Hence the result.

Support Vector Machine Geometrical Intuition

Hi,
I have a big difficult trying to understand why in the equation of the hyperplane of support vector machine there is a 1 after >=?? w.x + b >= 1 <==(why this 1??) I know that could be something about the intersection point on y axes but I cannot relate that to the support vector and to its meaning of classification.
Can anyone please explain me why the equation has that 1(-1) ?
Thank you.
The 1 is just an algebraic simplification, which comes in handy in the later optimization.
First, notice, that all three hyperplanes can be denotes as
w'x+b= 0
w'x+b=+A
w'x+b=-A
If we would fix the norm of the normal w, ||w||=1, then the above would have one solution with some arbitrary A depending on the data, lets call our solution v and c (values of optimal w and b respectively). But if we let w to have any norm, then we can easily see, that if we put
w'x+b= 0
w'x+b=+1
w'x+b=-1
then there is one unique w which satisfies these equations, and it is given by w=v/A, b=c/A, because
(v/A)'x+(b/A)= 0 (when v'x+b=0) // for the middle hyperplane
(v/A)'x+(b/A)=+1 (when v'x+b=+A) // for the positive hyperplane
(v/A)'x+(b/A)=-1 (when v'x+b=-A) // for the negative hyperplane
In other words - we assume that these "supporting vectors" satisfy w'x+b=+/-1 equation for future simplification, and we can do it, because for any solution satisfing v'x+c=+/-A there is a solution for our equation (with different norm of w)
So once we have these simplifications our optimization problem simplifies to the minimization of the norm of ||w|| (maximization of the size of the margin, which now can be expressed as `2/||w||). If we would stay with the "normal" equation with (not fixed!) A value, then the maximization of the margin would be in one more "dimension" - we would have to look through w,b,A to find the triple which maximizes it (as the "restrictions" would be in the form of y(w'x+b)>A). Now, we just search through w and b (and in the dual formulation - just through alpha but this is the whole new story).
This step is not required. You can build SVM without it, but this makes thing simplier - the Ockham's razor rule.
This boundary is called "margin" and must be maximized then you have to minimize ||w||.
The aim of SVM is to find a hyperplane able to maximize the distances between the two groups.
However there are infinite solutions ( see figure: move the optimal hyperplane along the perpendicualr vector) and we need to fix at least the boundaries: the +1 or -1 is a common convention to avoid these infinite solutions.
Formally you have to optimize r ||w|| and we set a bounadry condition r ||w|| = 1.

Big-O running time for functions

Find the big-O running time for each of these functions:
T(n) = T(n - 2) + n²
Our Answers: n², n³
T(n) = 3T(n/2) + n
Our Answers: O(n log n), O(nlog₂3)
T(n) = 2T(n/3) + n
Our Answers: O(n log base 3 of n), O(n)
T(n) = 2T(n/2) + n^3
Our Answers: O(n³ log₂n), O(n³)
So we're having trouble deciding on the right answers for each of the questions.
We all got different results and would like an outside opinion on what the running time would be.
Thanks in advance.
A bit of clarification:
The functions in the questions appear to be running time functions as hinted by their T() name and their n parameter. A more subtle hint is the fact that they are all recursive and recursive functions are, alas, a common occurrence when one produces a function to describe the running time of an algorithm (even when the algorithm itself isn't formally using recursion). Indeed, recursive formulas are a rather inconvenient form and that is why we use the Big O notation to better summarize the behavior of an algorithm.
A running time function is a parametrized mathematical expression which allows computing a [sometimes approximate] relative value for the running time of an algorithm, given specific value(s) for the parameter(s). As is the case here, running time functions typically have a single parameter, often named n, and corresponding to the total number of items the algorithm is expected to work on/with (for e.g. with a search algorithm it could be the total number of records in a database, with a sort algorithm it could be the number of entries in the unsorted list and for a path finding algorithm, the number of nodes in the graph....). In some cases a running time function may have multiple arguments, for example, the performance of an algorithm performing some transformation on a graph may be bound to both the total number of nodes and the total number of vertices or the average number of connections between two nodes, etc.
The task at hand (for what appears to be homework, hence my partial answer), is therefore to find a Big O expression that qualifies the upper bound limit of each of running time functions, whatever the underlying algorithm they may correspond to. The task is not that of finding and qualifying an algorithm to produce the results of the functions (this second possibility is also a very common type of exercise in Algorithm classes of a CS cursus but is apparently not what is required here.)
The problem is therefore more one of mathematics than of Computer Science per se. Basically one needs to find the limit (or an approximation thereof) of each of these functions as n approaches infinity.
This note from Prof. Jeff Erikson at University of Illinois Urbana Champaign provides a good intro to solving recurrences.
Although there are a few shortcuts to solving recurrences, particularly if one has with a good command of calculus, a generic approach is to guess the answer and then to prove it by induction. Tools like Excel, a few snippets in a programming languages such as Python or also MATLAB or Sage can be useful to produce tables of the first few hundred values (or beyond) along with values such as n^2, n^3, n! as well as ratios of the terms of the function; these tables often provide enough insight into the function to find the closed form of the function.
A few hints regarding the answers listed in the question:
Function a)
O(n^2) is for sure wrong:
a quick inspection of the first few values in the sequence show that n^2 is increasingly much smaller than T(n)
O(n^3) on the other hand appears to be systematically bigger than T(n) as n grows towards big numbers. A closer look shows that O(n^3) is effectively the order of the Big O notation for this function, but that O(n^3 / 6) is a more precise notation which systematically exceed the value of T(n) [for bigger values of n, and/or as n tends towards infinity] but only by a minute fraction compared with the coarser n^3 estimate.
One can confirm that O(n^3 / 6) is it, by induction:
T(n) = T(n-2) + n^2 // (1) by definition
T(n) = n^3 / 6 // (2) our "guess"
T(n) = ((n - 2)^3 / 6) + n^2 // by substitution of T(n-2) by the (2) expression
= (n^3 - 2n^2 -4n^2 -8n + 4n - 8) / 6 + 6n^2 / 6
= (n^3 - 4n -8) / 6
= n^3/6 - 2n/3 - 4/3
~= n^3/6 // as n grows towards infinity, the 2n/3 and 4/3 factors
// become relatively insignificant, leaving us with the
// (n^3 / 6) limit expression, QED

Fastest numerical solution of a real cubic polynomial?

R question: Looking for the fastest way to NUMERICALLY solve a bunch of arbitrary cubics known to have real coeffs and three real roots. The polyroot function in R is reported to use Jenkins-Traub's algorithm 419 for complex polynomials, but for real polynomials the authors refer to their earlier work. What are the faster options for a real cubic, or more generally for a real polynomial?
The numerical solution for doing this many times in a reliable, stable manner, involve: (1) Form the companion matrix, (2) find the eigenvalues of the companion matrix.
You may think this is a harder problem to solve than the original one, but this is how the solution is implemented in most production code (say, Matlab).
For the polynomial:
p(t) = c0 + c1 * t + c2 * t^2 + t^3
the companion matrix is:
[[0 0 -c0],[1 0 -c1],[0 1 -c2]]
Find the eigenvalues of such matrix; they correspond to the roots of the original polynomial.
For doing this very fast, download the singular value subroutines from LAPACK, compile them, and link them to your code. Do this in parallel if you have too many (say, about a million) sets of coefficients.
Notice that the coefficient of t^3 is one, if this is not the case in your polynomials, you will have to divide the whole thing by the coefficient and then proceed.
Good luck.
Edit: Numpy and octave also depend on this methodology for computing the roots of polynomials. See, for instance, this link.
The fastest known way (that I'm aware of) to find the real solutions a system of arbitrary polynomials in n variables is polyhedral homotopy. A detailed explanation is probably beyond a StackOverflow answer, but essentially it's a path algorithm that exploits the structure of each equation using toric geometries. Google will give you a number of papers.
Perhaps this question is better suited for mathoverflow?
Fleshing out Arietta's answer above:
> a <- c(1,3,-4)
> m <- matrix(c(0,0,-a[1],1,0,-a[2],0,1,-a[3]), byrow=T, nrow=3)
> roots <- eigen(m, symm=F, only.values=T)$values
Whether this is faster or slower than using the cubic solver in the GSL package (as suggested by knguyen above) is a matter of benchmarking it on your system.
Do you need all 3 roots or just one? If just one, I would think Newton's Method would work ok. If all 3 then it might be problematic in circumstances where two are close together.
1) Solve for the derivative polynomial P' to locate your three roots. See there to know how to do it properly. Call those roots a and b (with a < b)
2) For the middle root, use a few steps of bisection between a and b, and when you're close enough, finish with Newton's method.
3) For the min and max root, "hunt" the solution. For the max root:
Start with x0 = b, x1 = b + (b - a) * lambda, where lambda is a moderate number (say 1.6)
do x_n = b + (x_{n - 1} - a) * lambda until P(x_n) and P(b) have different signs
Perform bisection + newton between x_{n - 1} and x_n
The common methods are available: Newton's Method, Bisection Method, Secant, Fixed point iteration, etc. Google any one of them.
If you have a non-linear system on the other hand (e.g. a system on N polynomial eqn's in N unknowns), a method such as high-order Newton may be used.
Have you tried looking into the GSL package http://cran.r-project.org/web/packages/gsl/index.html?

Resources