Where are "Special Numbers" mentioned in Concrete Maths used? - math

I was glancing through the contents of Concrete Maths online. I had at least heard most of the functions and tricks mentioned but there is a whole section on Special Numbers. These numbers include Stirling Numbers, Eulerian Numbers, Harmonic Numbers so on. Now I have never encountered any of these weird numbers. How do they aid in computational problems? Where are they generally used?

Harmonic Numbers appear almost everywhere! Musical Harmonies, analysis of Quicksort...
Stirling Numbers (first and second kind) arise in a variety of combinatorics and partitioning problems.
Eulerian Numbers also occur several places, most notably in permutations and coefficients of polylogarithm functions.

A lot of the numbers you mentioned are used in the analysis of algorithms. You may not have these numbers in your code, but you'll need them if you want to estimate how long it will take for your code to run. You might see them in your code too. Some of these numbers are related to combinatorics, counting how many ways something can happen.
Sometimes it's not enough to know how many possibilities there are because you need to enumerate over the possibilities. Volume 4 of Knuth's TAOCP, in progress, gives the algorithms you need.
Here's an example of using Fibonacci numbers as part of a numerical integration problem.
Harmonic numbers are a discrete analog of logarithms and so they come up in difference equations just like logs come up in differential equations. Here's an example of physical applications of harmonic means, related to harmonic numbers. See the book Gamma for many examples of harmonic numbers in action, especially the chapter "It's a harmonic world."

These special numbers can help out in computational problems in many ways. For example:
You want to find out when your program to compute the GCD of 2 numbers is going to take the longest amount of time: Try 2 consecutive Fibonacci Numbers.
You want to have a rough estimate of the factorial of a large number, but your factorial program is taking too long: Use Stirling's Approximation.
You're testing for prime numbers, but for some numbers you always get the wrong answer: It could be you're using Fermat's Prime test, in which case the Carmicheal numbers are your culprits.
The most common general case I can think of is in looping. Most of the time you specify a loop using a (start;stop;step) type of syntax, in which case it may be possible to reduce the execution time by using properties of the numbers involved.
For example, summing up all the numbers from 1 to n when n is large in a loop is definitely slower than using the identity sum = n*(n + 1)/2.
There are a large number of examples like these. Many of them are in cryptography, where the security of information systems sometimes depends on tricks like these. They can also help you with performance issues, memory issues, because when you know the formula, you may find a faster/more efficient way to compute other things -- things that you actually care about.
For more information, check out wikipedia, or simply try out Project Euler. You'll start finding patterns pretty fast.

Most of these numbers count certain kinds of discrete structures (for instance, Stirling Numbers count Subsets and Cycles). Such structures, and hence these sequences, implicitly arise in the analysis of algorithms.
There is an extensive list at OEIS that lists almost all sequences that appear in Concrete Math. A short summary from that list:
Golomb's Sequence
Binomial Coefficients
Rencontres Numbers
Stirling Numbers
Eulerian Numbers
Hyperfactorials
Genocchi Numbers
You can browse the OEIS pages for the respective sequences to get detailed information about the "properties" of these sequences (though not exactly applications, if that's what you're most interested in).
Also, if you want to see real-life uses of these sequences in analysis of algorithms, flip through the index of Knuth's Art of Computer Programming, and you'll find many references to "applications" of these sequences. John D. Cook already mentioned applications of Fibonacci & Harmonic numbers; here are some more examples:
Stirling Cycle Numbers arise in the analysis of the standard algorithm that finds the maximum element of an array (TAOCP Sec. 1.2.10): How many times must the current maximum value be updated when finding the maximum value? It turns out that the probability that the maximum will need to be updated k times when finding a maximum in an array of n elements is p[n][k] = StirlingCycle[n, k+1]/n!. From this, we can derive that on the average, approximately Log(n) updates will be necessary.
Genocchi Numbers arise in connection with counting the number of BDDs that are "thin" (TAOCP 7.1.4 Exercise 174).

Not necessarily a magic number from the reference you mentioned, but nonetheless --
0x5f3759df
-- the notorious magic number used to calculate inverse square root of a number by giving a good first estimate to Newton's Approximation of Roots, often attributed to the work of John Carmack - more info here.
Not programming related, huh? :)

Is this directly programming related? Surely related, but I don't know how closely.
Special numbers, such as e, pi, etc., come up all over the place. I don't think that anyone would argue about these two. The Golden_ratio also appears with amazing frequency, in everything from art to other special numbers themselves (look at the ratio between successive Fibonacci numbers.)
Various sequences and families of numbers also appear in many places in mathematics and therefore, in programming too. A beautiful place to look is the Encyclopedia of integer sequences.
I'll suggest this is an experience thing. For example, when I took linear algebra, many, many years ago, I learned about the eigenvalues and eigenvectors of a matrix. I'll admit that I did not at all appreciate the significance of eigenvalues/eigenvectors until I saw them in use in a variety of places. In statistics, in terms of what they tell you about uncertainty of an estimate from a covariance matrix, the size and shape of a confidence ellipse, in terms of principal component analysis, or the long term state of a Markov process. In numerical methods, where they tell you about convergence of a method, be it in optimization or an ODE solver. In mechanical engineering, where you see them as principal stresses and strains.

Discussion in Reddit

Related

Why do we use log probability in deep learning?

I got curious while reading the paper 'Sequence to Sequence Learning with Neural Networks'.
In fact, not only this paper but also many other papers use log probabilities, is there a reason for that?
Please check the attached photo.
Two reasons -
Theoretical - Probabilities of two independent events A and B co-occurring together is given by P(A).P(B). This easily gets mapped to a sum if we use log, i.e. log(P(A)) + log(P(B)). It is thus easier to address the neuron firing 'events' as a linear function.
Practical - The probability values are in [0, 1]. Hence multiplying two or more such small numbers could easily lead to an underflow in a floating point precision arithmetic (e.g. consider multiplying 0.0001*0.00001). A practical solution is to use the logs to get rid of the underflow.
For any given problem we need to optimise the likelihood of parameters. But optimising the product require all data at once and requires huge computation.
We know that a sum is a lot easier to optimise as the derivative of a sum is the sum of derivatives. So, taking log convert it to sum and makes computation faster.
Refer this

Fast integer power of nine

I need to calculate 9^n where n is a natural number. I used binary exponentiation, but addition chain is not optimal. Also there exists an optimal solution, but it's proven to be NP-complete and is very hard to calculate. I cannot use lookup table in my task. Also this algorithm still doesn't use the fact that i know the basis. Maybe there are some papers in number theory, or you can suggest a better solution?

Can any existing Machine Learning structures perfectly emulate recursive functions like the Fibonacci sequence?

To be clear I don't mean, provided the last two numbers in the sequence provide the next one:
(2, 3, -> 5)
But rather given any index provide the Fibonacci number:
(0 -> 1) or (7 -> 21) or (11 -> 144)
Adding two numbers is a very simple task for any machine learning structure, and by extension counting by ones, twos or any fixed number is a simple addition rule. Recursive calculations however...
To my understanding, most learning networks rely on forwards only evaluation, whereas most programming languages have loops, jumps, or circular flow patterns (all of which are usually ASM jumps of some kind), thus allowing recursion.
Sure some networks aren't forwards only; But can processing weights using the hyperbolic tangent or sigmoid function enter any computationally complete state?
i.e. conditional statements, conditional jumps, forced jumps, simple loops, complex loops with multiple conditions, providing sort order, actual reordering of elements, assignments, allocating extra registers, etc?
It would seem that even a non-forwards only network would only find a polynomial of best fit, reducing errors across the expanse of the training set and no further.
Am I missing something obvious, or did most of Machine Learning just look at recursion and pretend like those problems don't exist?
Update
Technically any programming language can be considered the DNA of a genetic algorithm, where the compiler (and possibly console out measurement) would be the fitness function.
The issue is that programming (so far) cannot be expressed in a hill climbing way - literally, the fitness is 0, until the fitness is 1. Things don't half work in programming, and if they do, there is no way of measuring how 'working' a program is for unknown situations. Even an off by one error could appear to be a totally different and chaotic system with no output. This is exactly the reason learning to code in the first place is so difficult, the learning curve is almost vertical.
Some might argue that you just need to provide stronger foundation rules for the system to exploit - but that just leads to attempting to generalize all programming problems, which circles right back to designing a programming language and loses all notion of some learning machine at all. Following this road brings you to a close variant of LISP with mutate-able code and virtually meaningless fitness functions that brute force the 'nice' and 'simple' looking code-space in attempt to follow human coding best practices.
Others might argue that we simply aren't using enough population or momentum to gain footing on the error surface, or make a meaningful step towards a solution. But as your population approaches the number of DNA permutations, you are really just brute forcing (and very inefficiently at that). Brute forcing code permutations is nothing new, and definitely not machine learning - it's actually quite common in regex golf, I think there's even an xkcd about it...
The real problem isn't finding a solution that works for some specific recursive function, but finding a solution space that can encompass the recursive domain in some useful way.
So other than Neural Networks trained using Backpropagation hypothetically finding the closed form of a recursive function (if a closed form even exists, and they don't in most real cases where recursion is useful), or a non-forwards only network acting like a pseudo-programming language with awful fitness prospects in the best case scenario, plus the virtually impossible task of tuning exit constraints to prevent infinite recursion... That's really it so far for machine learning and recursion?
According to Kolmogorov et al's On the representation of continuous functions of many variables by superposition of continuous functions of one variable and addition, a three layer neural network can model arbitrary function with the linear and logistic functions, including f(n) = ((1+sqrt(5))^n - (1-sqrt(5))^n) / (2^n * sqrt(5)), which is the close form solution of Fibonacci sequence.
If you would like to treat the problem as a recursive sequence without a closed-form solution, I would view it as a special sliding window approach (I called it special because your window size seems fixed as 2). There are more general studies on the proper window size for your interest. See these two posts:
Time Series Prediction via Neural Networks
Proper way of using recurrent neural network for time series analysis
Ok, where to start...
Firstly, you talk about 'machine learning' and 'perfectly emulate'. This is not generally the purpose of machine learning algorithms. They make informed guesses given some evidence and some general notions about structures that exist in the world. That typically means an approximate answer is better than an 'exact' one that is wrong. So, no, most existing machine learning approaches aren't the right tools to answer your question.
Second, you talk of 'recursive structures' as some sort of magic bullet. Yet they are merely convenient ways to represent functions, somewhat analogous to higher order differential equations. Because of the feedbacks they tend to introduce, the functions tend to be non-linear. Some machine learning approaches will have trouble with this, but many (neural networks for example) should be able to approximate you function quite well, given sufficient evidence.
As an aside, having or not having closed form solutions is somewhat irrelevant here. What matters is how well the function at hand fits with the assumptions embodied in the machine learning algorithm. That relationship may be complex (eg: try approximating fibbonacci with a support vector machine), but that's the essence.
Now, if you want a machine learning algorithm tailored to the search for exact representations of recursive structures, you could set up some assumptions and have your algorithm produce the most likely 'exact' recursive structure that fits your data. There are probably real world problems in which such a thing would be useful. Indeed the field of optimisation approaches similar problems.
The genetic algorithms mentioned in other answers could be an example of this, especially if you provided a 'genome' that matches the sort of recursive function you think you may be dealing with. Closed form primitives could form part of that space too, if you believe they are more likely to be 'exact' than more complex genetically generated algorithms.
Regarding your assertion that programming cannot be expressed in a hill climbing way, that doesn't prevent a learning algorithm from scoring possible solutions by how many much of your evidence it's able to reproduce and how complex they are. In many cases (most? though counting cases here isn't really possible) such an approach will find a correct answer. Sure, you can come up with pathological cases, but with those, there's little hope anyway.
Summing up, machine learning algorithms are not usually designed to tackle finding 'exact' solutions, so aren't the right tools as they stand. But, by embedding some prior assumptions that exact solutions are best, and perhaps the sort of exact solution you're after, you'll probably do pretty well with genetic algorithms, and likely also with algorithms like support vector machines.
I think you also sum things up nicely with this:
The real problem isn't finding a solution that works for some specific recursive function, but finding a solution space that can encompass the recursive domain in some useful way.
The other answers go a long way to telling you where the state of the art is. If you want more, a bright new research path lies ahead!
See this article:
Turing Machines are Recurrent Neural Networks
http://lipas.uwasa.fi/stes/step96/step96/hyotyniemi1/
The paper describes how a recurrent neural network can simulate a register machine, which is known to be a universal computational model equivalent to a Turing machine. The result is "academic" in the sense that the neurons have to be capable of computing with unbounded numbers. This works mathematically, but would have problems pragmatically.
Because the Fibonacci function is just one of many computable functions (in fact, it is primitive recursive), it could be computed by such a network.
Genetic algorithms should do be able to do the trick. The important this is (as always with GAs) the representation.
If you define the search space to be syntax trees representing arithmetic formulas and provide enough training data (as you would with any machine learning algorithm), it probably will converge to the closed-form solution for the Fibonacci numbers, which is:
Fib(n) = ( (1+srqt(5))^n - (1-sqrt(5))^n ) / ( 2^n * sqrt(5) )
[Source]
If you were asking for a machine learning algorithm to come up with the recursive formula to the Fibonacci numbers, then this should also be possible using the same method, but with individuals being syntax trees of a small program representing a function.
Of course, you also have to define good cross-over and mutation operators as well as a good evaluation function. And I have no idea how well it would converge, but it should at some point.
Edit: I'd also like to point out that in certain cases there is always a closed-form solution to a recursive function:
Like every sequence defined by a linear recurrence with constant coefficients, the Fibonacci numbers have a closed-form solution.
The Fibonacci sequence, where a specific index of the sequence must be returned, is often used as a benchmark problem in Genetic Programming research. In most cases recursive structures are generated, although my own research focused on imperative programs so used an iterative approach.
There's a brief review of other GP research that uses the Fibonacci problem in Section 3.4.2 of my PhD thesis, available here: http://kar.kent.ac.uk/34799/. The rest of the thesis also describes my own approach, which is covered a bit more succinctly in this paper: http://www.cs.kent.ac.uk/pubs/2012/3202/
Other notable research which used the Fibonacci problem is Simon Harding's work with Self-Modifying Cartesian GP (http://www.cartesiangp.co.uk/papers/eurogp2009-harding.pdf).

Genetic Algorithms Introduction

Starting off let me clarify that i have seen This Genetic Algorithm Resource question and it does not answer my question.
I am doing a project in Bioinformatics. I have to take data about the NMR spectrum of a cell(E. Coli) and find out what are the different molecules(metabolites) present in the cell.
To do this i am going to be using Genetic Algorithms in R language. I DO NOT have the time to go through huge books on Genetic algorithms. Heck! I dont even have time to go through little books.(That is what the linked question does not answer)
So i need to know of resources which will help me understand quickly what it is Genetic Algorithms do and how they do it. I have read the Wikipedia entry ,this webpage and also a couple of IEEE papers on the subject.
Any working code in R(even in C) or pointers to which R modules(if any) to be used would be helpful.
A brief (and opinionated) introduction to genetic algorithms is at http://www.burns-stat.com/pages/Tutor/genetic.html
A simple GA written in R is available at http://www.burns-stat.com/pages/Freecode/genopt.R The "documentation" is in 'S Poetry' http://www.burns-stat.com/pages/Spoetry/Spoetry.pdf and the code.
I assume from your question you have some function F(metabolites) which yields a spectrum but you do not have the inverse function F'(spectrum) to get back metabolites. The search space of metabolites is large so rather than brute force it you wish to try an approximate method (such as a genetic algorithm) which will make a more efficient random search.
In order to apply any such approximate method you will have to define a score function which compares the similarity between the target spectrum and the trial spectrum. The smoother this function is the better the search will work. If it can only yield true/false it will be a purely random search and you'd be better off with brute force.
Given the F and your score (aka fitness) function all you need to do is construct a population of possible metabolite combinations, run them all through F, score all the resulting spectrums, and then use crossover and mutation to produce a new population that combines the best candidates. Choosing how to do the crossover and mutation is generally domain specific because you can speed the process greatly by avoiding the creation of nonsense genomes. The best mutation rate is going to be very small but will also require tuning for your domain.
Without knowing about your domain I can't say what a single member of your population should look like, but it could simply be a list of metabolites (which allows for ordering and duplicates, if that's interesting) or a string of boolean values over all possible metabolites (which has the advantage of being order invariant and yielding obvious possibilities for crossover and mutation). The string has the disadvantage that it may be more costly to filter out nonsense genes (for example it may not make sense to have only 1 metabolite or over 1000). It's faster to avoid creating nonsense rather than merely assigning it low fitness.
There are other approximate methods if you have F and your scoring function. The simplest is probably Simulated Annealing. Another I haven't tried is the Bees Algorithm, which appears to be multi-start simulated annealing with effort weighted by fitness (sort of a cross between SA and GA).
I've found the article "The science of computing: genetic algorithms", by Peter J. Denning (American Scientist, vol 80, 1, pp 12-14). That article is simple and useful if you want to understand what genetic algorithms do, and is only 3 pages to read!!

Articles on analysis of mixed precision numerical algorithms?

Many numerical algorithms tend to run on 32/64bit floating points.
However, what if you had access to lower precision (and less power hungry) co-processors? How can then be utilized in numerical algorithms?
Does anyone know of good books/articles that address these issues?
Thanks!
Numerical analysis theory uses methods to predict the precision error of operations, independent of the machine they are running on. There are always cases where even on the most advanced processor operations may lose accuracy.
Some books to read about it:
Accuracy and Stability of Numerical Algorithms by N.J. Higham
An Introduction to Numerical Analysis by E. Süli and D. Mayers
If you cant find them or are too lazy to read them tell me and i will try to explain some things to you. (Well im no expert in this because im a Computer Scientist, but i think i can explain you the basics)
I hope you understand what i wrote (my english is not the best).
Most of what you are likely to find will be about doing floating-point arithmetic on computers irrespective of the size of the representation of the numbers themselves. The basic issues surround f-p arithmetic apply whatever the number of bits. Off the top of my head these basic issues will be:
range and accuracy of numbers that are represented;
careful selection of algorithms which are robust and reliable on f-p numbers rather than on real numbers;
the perils and pitfalls of iterative and lengthy calculations in which you run the risk of losing precision and accuracy.
In general, the fewer bits you have the sooner you run into problems, but just as there are algorithms which are useful in 32 bits, there are algorithms which are useful in 8 bits. Sometimes the same algorithm is useful however many bits you use.
As #George suggested, you should probably start with a basic text on numerical analysis, though I think the Higham book is not a basic text.
Regards
Mark

Resources