How to transpose a high-degree tensor in FP way? - math

I'm currently working on a math library.
It's now supporting several matrix operations:
- Plus
- Product
- Dot
- Get & Set
- Transpose
- Multiply
- Determinant
I always want to generalize everything I can generalize
I was thinking about a recursive way to implement the transpose of a matrix, but I just couldn't figure it out.
Anybody help?

I would advise you against trying to write a recursive method to transpose a matrix.
The idea is easy:
transpose(A) = A(j,i)
Recursion isn't hard in this case. You can see the stopping condition: a 1x1 matrix with a single value is its own transpose. Build it up for 2x2, etc.
The problem is that this will be terribly inefficient, both in terms of stack depth and memory, for any matrix beyond a trivial size. People who apply linear algebra to real problems can require tens of thousands or billions of degrees of freedom.
You don't talk about meaningful, practical cases like sparse or banded matricies, etc.
You're better off doing it using a straightforward declarative approach.
Haskell use BLAS as its backing implementation. It's a more functional language than JavaScript. Perhaps you could crib some ideas by looking at the source code.
I'd recommend that you do the simple thing first, get it all working, and then branch out from there.
Here's a question to ask yourself: Why would anyone want to do serious numerical work using JavaScript? What will your library offer that's an improvement on what's available?
If you want to learn how to reinvent wheels, by all means proceed. Just understand that you aren't the first.

Related

Dynamic programing: Tabular vs memoization

Is the time complexity of dynamic programming tabular approach and recursion with memoization approach the same? For example, in the Knapsack problem the tabular approach takes O(N*W) where N is the number of items and W is the weight. But what is the time complexity for the memoization approach?
Memoization is a method used to solve dynamic programming (DP) problems recursively in an efficient manner. DP abstracts away from the specific implementation, which may be either recursive or iterative (with loops and a table). Therefore, if used appropriately, the time complexity is the same, i.e. O(NW) in the knapsack problem over the integers.
This is what we used in introduction to CS and algorithm design courses in BGU (I was a T.A. in both if matters), but there might be other terminologies which I'm unaware of.
I hope it was helpful, good luck!
Is the time complexity of dynamic programming tabular approach and recursion with memoization approach the same?
Yes the do have same time complexity of O(N*W) where N is the number of items and W is the weight. However, If the original problem requires all subproblems to be solved like in the case of Knapsack problem,
tabulation usually outperformes memoization by a constant factor.
This is because tabulation has no overhead for recursion and can use a preallocated array rather than, say, a hash map.
What is the difference between tabulation and memoization?
When you solve a dynamic programming problem using tabulation (generally iterative) you solve the problem "bottom up", i.e., by solving all related sub-problems first, typically by filling up an n-dimensional table. Based on the results in the table, the solution to the "top" / original problem is then computed.
If you apply memoization (generally recursive) to solve the problem you do it by maintaining a map of already solved sub problems. You do it "top down" in the sense that you solve the "top" problem first (which typically recurses down to solve the sub-problems).
Which is better? Memoizaiton or tabulation?
If we don’t require to solve all the problems and are just looking for the optimal solution, memoization is better.
If we do require to solve all the problems, that means we are going to make numerous recursive calls which may fill the stack space respectively, and there tabulation is better.
The caveat is that memoization is generally more intuitive to implement especially when we don’t know the solution to subproblems, whereas tabulation requires us to know the solutions, or bottom, in advance, in order to build our way up.
Useful Resources:
What is Dynamic Programming? Memoization and Tabulation
Tabulation vs Memoization

Can any existing Machine Learning structures perfectly emulate recursive functions like the Fibonacci sequence?

To be clear I don't mean, provided the last two numbers in the sequence provide the next one:
(2, 3, -> 5)
But rather given any index provide the Fibonacci number:
(0 -> 1) or (7 -> 21) or (11 -> 144)
Adding two numbers is a very simple task for any machine learning structure, and by extension counting by ones, twos or any fixed number is a simple addition rule. Recursive calculations however...
To my understanding, most learning networks rely on forwards only evaluation, whereas most programming languages have loops, jumps, or circular flow patterns (all of which are usually ASM jumps of some kind), thus allowing recursion.
Sure some networks aren't forwards only; But can processing weights using the hyperbolic tangent or sigmoid function enter any computationally complete state?
i.e. conditional statements, conditional jumps, forced jumps, simple loops, complex loops with multiple conditions, providing sort order, actual reordering of elements, assignments, allocating extra registers, etc?
It would seem that even a non-forwards only network would only find a polynomial of best fit, reducing errors across the expanse of the training set and no further.
Am I missing something obvious, or did most of Machine Learning just look at recursion and pretend like those problems don't exist?
Update
Technically any programming language can be considered the DNA of a genetic algorithm, where the compiler (and possibly console out measurement) would be the fitness function.
The issue is that programming (so far) cannot be expressed in a hill climbing way - literally, the fitness is 0, until the fitness is 1. Things don't half work in programming, and if they do, there is no way of measuring how 'working' a program is for unknown situations. Even an off by one error could appear to be a totally different and chaotic system with no output. This is exactly the reason learning to code in the first place is so difficult, the learning curve is almost vertical.
Some might argue that you just need to provide stronger foundation rules for the system to exploit - but that just leads to attempting to generalize all programming problems, which circles right back to designing a programming language and loses all notion of some learning machine at all. Following this road brings you to a close variant of LISP with mutate-able code and virtually meaningless fitness functions that brute force the 'nice' and 'simple' looking code-space in attempt to follow human coding best practices.
Others might argue that we simply aren't using enough population or momentum to gain footing on the error surface, or make a meaningful step towards a solution. But as your population approaches the number of DNA permutations, you are really just brute forcing (and very inefficiently at that). Brute forcing code permutations is nothing new, and definitely not machine learning - it's actually quite common in regex golf, I think there's even an xkcd about it...
The real problem isn't finding a solution that works for some specific recursive function, but finding a solution space that can encompass the recursive domain in some useful way.
So other than Neural Networks trained using Backpropagation hypothetically finding the closed form of a recursive function (if a closed form even exists, and they don't in most real cases where recursion is useful), or a non-forwards only network acting like a pseudo-programming language with awful fitness prospects in the best case scenario, plus the virtually impossible task of tuning exit constraints to prevent infinite recursion... That's really it so far for machine learning and recursion?
According to Kolmogorov et al's On the representation of continuous functions of many variables by superposition of continuous functions of one variable and addition, a three layer neural network can model arbitrary function with the linear and logistic functions, including f(n) = ((1+sqrt(5))^n - (1-sqrt(5))^n) / (2^n * sqrt(5)), which is the close form solution of Fibonacci sequence.
If you would like to treat the problem as a recursive sequence without a closed-form solution, I would view it as a special sliding window approach (I called it special because your window size seems fixed as 2). There are more general studies on the proper window size for your interest. See these two posts:
Time Series Prediction via Neural Networks
Proper way of using recurrent neural network for time series analysis
Ok, where to start...
Firstly, you talk about 'machine learning' and 'perfectly emulate'. This is not generally the purpose of machine learning algorithms. They make informed guesses given some evidence and some general notions about structures that exist in the world. That typically means an approximate answer is better than an 'exact' one that is wrong. So, no, most existing machine learning approaches aren't the right tools to answer your question.
Second, you talk of 'recursive structures' as some sort of magic bullet. Yet they are merely convenient ways to represent functions, somewhat analogous to higher order differential equations. Because of the feedbacks they tend to introduce, the functions tend to be non-linear. Some machine learning approaches will have trouble with this, but many (neural networks for example) should be able to approximate you function quite well, given sufficient evidence.
As an aside, having or not having closed form solutions is somewhat irrelevant here. What matters is how well the function at hand fits with the assumptions embodied in the machine learning algorithm. That relationship may be complex (eg: try approximating fibbonacci with a support vector machine), but that's the essence.
Now, if you want a machine learning algorithm tailored to the search for exact representations of recursive structures, you could set up some assumptions and have your algorithm produce the most likely 'exact' recursive structure that fits your data. There are probably real world problems in which such a thing would be useful. Indeed the field of optimisation approaches similar problems.
The genetic algorithms mentioned in other answers could be an example of this, especially if you provided a 'genome' that matches the sort of recursive function you think you may be dealing with. Closed form primitives could form part of that space too, if you believe they are more likely to be 'exact' than more complex genetically generated algorithms.
Regarding your assertion that programming cannot be expressed in a hill climbing way, that doesn't prevent a learning algorithm from scoring possible solutions by how many much of your evidence it's able to reproduce and how complex they are. In many cases (most? though counting cases here isn't really possible) such an approach will find a correct answer. Sure, you can come up with pathological cases, but with those, there's little hope anyway.
Summing up, machine learning algorithms are not usually designed to tackle finding 'exact' solutions, so aren't the right tools as they stand. But, by embedding some prior assumptions that exact solutions are best, and perhaps the sort of exact solution you're after, you'll probably do pretty well with genetic algorithms, and likely also with algorithms like support vector machines.
I think you also sum things up nicely with this:
The real problem isn't finding a solution that works for some specific recursive function, but finding a solution space that can encompass the recursive domain in some useful way.
The other answers go a long way to telling you where the state of the art is. If you want more, a bright new research path lies ahead!
See this article:
Turing Machines are Recurrent Neural Networks
http://lipas.uwasa.fi/stes/step96/step96/hyotyniemi1/
The paper describes how a recurrent neural network can simulate a register machine, which is known to be a universal computational model equivalent to a Turing machine. The result is "academic" in the sense that the neurons have to be capable of computing with unbounded numbers. This works mathematically, but would have problems pragmatically.
Because the Fibonacci function is just one of many computable functions (in fact, it is primitive recursive), it could be computed by such a network.
Genetic algorithms should do be able to do the trick. The important this is (as always with GAs) the representation.
If you define the search space to be syntax trees representing arithmetic formulas and provide enough training data (as you would with any machine learning algorithm), it probably will converge to the closed-form solution for the Fibonacci numbers, which is:
Fib(n) = ( (1+srqt(5))^n - (1-sqrt(5))^n ) / ( 2^n * sqrt(5) )
[Source]
If you were asking for a machine learning algorithm to come up with the recursive formula to the Fibonacci numbers, then this should also be possible using the same method, but with individuals being syntax trees of a small program representing a function.
Of course, you also have to define good cross-over and mutation operators as well as a good evaluation function. And I have no idea how well it would converge, but it should at some point.
Edit: I'd also like to point out that in certain cases there is always a closed-form solution to a recursive function:
Like every sequence defined by a linear recurrence with constant coefficients, the Fibonacci numbers have a closed-form solution.
The Fibonacci sequence, where a specific index of the sequence must be returned, is often used as a benchmark problem in Genetic Programming research. In most cases recursive structures are generated, although my own research focused on imperative programs so used an iterative approach.
There's a brief review of other GP research that uses the Fibonacci problem in Section 3.4.2 of my PhD thesis, available here: http://kar.kent.ac.uk/34799/. The rest of the thesis also describes my own approach, which is covered a bit more succinctly in this paper: http://www.cs.kent.ac.uk/pubs/2012/3202/
Other notable research which used the Fibonacci problem is Simon Harding's work with Self-Modifying Cartesian GP (http://www.cartesiangp.co.uk/papers/eurogp2009-harding.pdf).

CVX-esque convex optimization in R?

I need to solve (many times, for lots of data, alongside a bunch of other things) what I think boils down to a second order cone program. It can be succinctly expressed in CVX something like this:
cvx_begin
variable X(2000);
expression MX(2000);
MX = M * X;
minimize( norm(A * X - b) + gamma * norm(MX, 1) )
subject to
X >= 0
MX((1:500) * 4 - 3) == MX((1:500) * 4 - 2)
MX((1:500) * 4 - 1) == MX((1:500) * 4)
cvx_end
The data lengths and equality constraint patterns shown are just arbitrary values from some test data, but the general form will be much the same, with two objective terms -- one minimizing error, the other encouraging sparsity -- and a large number of equality constraints on the elements of a transformed version of the optimization variable (itself constrained to be non-negative).
This seems to work pretty nicely, much better than my previous approach, which fudges the constraints something rotten. The trouble is that everything else around this is happening in R, and it would be quite a nuisance to have to port it over to Matlab. So is doing this in R viable, and if so how?
This really boils down to two separate questions:
1) Are there any good R resources for this? As far as I can tell from the CRAN task page, the SOCP package options are CLSCOP and DWD, which includes an SOCP solver as an adjunct to its classifier. Both have similar but fairly opaque interfaces and are a bit thin on documentation and examples, which brings us to:
2) What's the best way of representing the above problem in the constraint block format used by these packages? The CVX syntax above hides a lot of tedious mucking about with extra variables and such, and I can just see myself spending weeks trying to get this right, so any tips or pointers to nudge me in the right direction would be very welcome...
You might find the R package CVXfromR useful. This lets you pass an optimization problem to CVX from R and returns the solution to R.
OK, so the short answer to this question is: there's really no very satisfactory way to handle this in R. I have ended up doing the relevant parts in Matlab with some awkward fudging between the two systems, and will probably migrate everything to Matlab eventually. (My current approach predates the answer posted by user2439686. In practice my problem would be equally awkward using CVXfromR, but it does look like a useful package in general, so I'm going to accept that answer.)
R resources for this are pretty thin on the ground, but the blog post by Vincent Zoonekynd that he mentioned in the comments is definitely worth reading.
The SOCP solver contained within the R package DWD is ported from the Matlab solver SDPT3 (minus the SDP parts), so the programmatic interface is basically the same. However, at least in my tests, it runs a lot slower and pretty much falls over on problems with a few thousand vars+constraints, whereas SDPT3 solves them in a few seconds. (I haven't done a completely fair comparison on this, because CVX does some nifty transformations on the problem to make it more efficient, while in R I'm using a pretty naive definition, but still.)
Another possible alternative, especially if you're eligible for an academic license, is to use the commercial Mosek solver, which has an R interface package Rmosek. I have yet to try this, but may give it a go at some point.
(As an aside, the other solver bundled with CVX, SeDuMi, fails completely on the same problem; the CVX authors aren't kidding when they suggest trying multiple solvers. Also, in a significant subset of cases, SDTP3 has to switch from Cholesky to LU decomposition, which makes the processing orders of magnitude slower, with only very marginal improvement in the objective compared to the pre-LU steps. I've found it worth reducing the requested precision to avoid this, but YMMV.)
There is a new alternative: CVXR, which comes from the same people.
There is a website, a paper and a github project.
Disciplined Convex Programming seems to be growing in popularity observing cvxpy (Python) and Convex.jl (Julia), again, backed by the same people.

Most efficient way to solve SEVERAL linear systems Ax=b with SMALL A (minimum 3x3 maximum 8x8)

I need to solve thousands of time SMALL linear system of the type Ax=b. Here A is a matrix that is not smaller than 3x3 and maximum 8x8. I am aware of this http://www.johndcook.com/blog/2010/01/19/dont-invert-that-matrix/ so I dont think it is smart to invert the matrix even if the matrices are small right? So what is the most efficient way to do that? I am programming in Fortran so probably I should use lapack library right? My matrices are full and in general non-simmetric.
Thanks
A.
Caveat: I didn't look into this extensively, but I have some experience I am happy to share.
In my experience, the fastest way to solve a 3x3 system is to basically use Cramer's rule. If you need to solve multiple systems with the same matrix A, it pays to pre-compute the inverse of A. This is only true for 2x2 and 3x3.
If you have to solve multiple 4x4 systems with the same matrix, then again using the inverse is noticeably faster than the forward and back-substitution of LU. I seem to remember that it uses less operations, and in practice the difference is even more (again, in my experience). As the matrix size grows, the difference shrinks, and asymptotically the difference disappears. If you are solving systems with difference matrices, then I don't think there is an advantage in computing the inverse.
In all cases, solving the system with the inverse can be much less accurate than using the LU-decomposition is A is fairly ill-conditioned. So if accuracy is an issue, then LU-factorization is definitely the way to go.
The LU factorization sounds like just the ticket for you, and the lapack routine dgetrf will compute this for you, after which you can use dgetrs to solve that linear system. Lapack has been optimized to the gills over the years, so in all likelihood you are better using that than writing any of this code yourself.
The computational cost of computing the matrix inverse and then multiplying that by the right-hand side vector is the same if not more than computing the LU-factorization of the matrix and then forward- and back-solving to find your answer. Moreover, computing the inverse exhibits even more bizarre pathological behavior than computing the LU-factorization, the stability of which is still a fairly subtle issue. It can be useful to know the inverse for small matrices, but it sounds like you don't need that for your purpose, so why do it?
Moreover, provided there are no loop-carried dependencies, you can parallelize this using OpenMP without too much trouble.

R function to solve large dense linear systems of equations?

Sorry, maybe I am blind, but I couldn't find anything specific for a rather common problem:
I want to implement
solve(A,b)
with
A
being a large square matrix in the sense that command above uses all my memory and issues an error (b is a vector with corresponding length). The matrix I have is not sparse in the sense that there would be large blocks of zero etc.
There must be some function out there which implements a stepwise iterative scheme such that a solution can be found even with limited memory available.
I found several posts on sparse matrix and, of course, the Matrix package, but could not identify a function which does what I need. I have also seen this post but
biglm
produces a complete linear model fit. All I need is a simple solve. I will have to repeat that step several times, so it would be great to keep it as slim as possible.
I already worry about the "duplication of an old issue" and "look here" comments, but I would be really grateful for some help.

Resources