Why is the Fibonacci series used in agile planning poker? [closed] - math

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
When estimating the relative size of user stories in agile software development the members of the team are supposed to estimate the size of a user story as being 1, 2, 3, 5, 8, 13, ... . So the estimated values should resemble the Fibonacci series. But I wonder, why?
The description of http://en.wikipedia.org/wiki/Planning_poker on Wikipedia holds the mysterious sentence:
The reason for using the Fibonacci sequence is to reflect the inherent
uncertainty in estimating larger items.
But why should there be inherent uncertainty in larger items? Isn't the uncertainty higher, if we make fewer measurement, meaning if fewer people estimate the same story?
And even if the uncertainty is higher in larger stories, why does that imply the use of the Fibonacci sequence? Is there a mathematical or statistical reason for it?
Otherwise using the Fibonacci series for estimation feels like CargoCult science to me.

The Fibonacci series is just one example of an exponential estimation scale. The reason an exponential scale is used comes from Information Theory.
The information that we obtain out of estimation grows much slower than the precision of estimation. In fact it grows as a logarithmic function. This is the reason for the higher uncertainty for larger items.
Determining the most optimal base of the exponential scale (normalization) is difficult in practise. The base corresponding to the Fibonacci scale may or may not be optimal.
Here is a more detailed explanation of the mathematical justification: http://www.yakyma.com/2012/05/why-progressive-estimation-scale-is-so.html

Out of the first six numbers of the Fibonacci sequence, four are prime. This limits the possibilities to break down a task equally into smaller tasks to have multiple people work on it in parallel. Doing so could lead to the misconception that the speed of a task could scale proportionally with the number of people working on it. The 2^n series is most vulnerable to such a problem. The Fibonacci sequence in fact forces one to re-estimate the smaller tasks one by one.

According to this agile blog
"because they grow at about the same rate at which we humans can perceive meaningful changes in magnitude."
Yeah right. I think it's because they add an air of legitimacy (Fibonacci! math!) to what is in essence a very high-level, early-stage sizing (not scoping) exercise (which does have value).
But you can get the same results using t-shirt sizing...

You definitely want something exponential, so that you can express any quantity of time with a constant relative error. The precision of your estimation as well is very likely to be proportional to your estimation.
So you want something :
a) with integers
b) exponential
c) easy
Now why Fibonacci instead of, 1 2 4 8?
My guess is that it's because fibonacci grows slower. It's in goldratio^n, and goldratio=1.61...

The Fibonacci sequence is just one of several that are used in project planning poker.
It is difficult to accurately estimate large units of work and it is easy to get bogged down in hours vs days discussions if your numbers are too "realistic".
I like the explanation at http://www.agilelearninglabs.com/2009/06/story-sizing-a-better-start-than-planning-poker/, namely the Fibonacci series represents a set of numbers that we can intuitively distinguish between them as different magnitudes.

I use Fibonacci for a couple of reasons:
As task gets larger the details become more difficult to grasp
Task estimate is the number of hours for anyone in the team to complete the task
Not everyone in the team will have the same amount of experience for
a particular task so that adds to the uncertainty too
Human gets fatigue over larger and potentially more complex task.
While a task twice as complex is solved in double time for a computer
it may take quite a bit more for a developer.
As we adds up all the uncertainties we are less sure of what the hours actually should be. It ends up easier if we can just gauge if this task is larger/smaller than another one where we gave a estimate of already. As we up the size/complexity of the task the effect of uncertainty is also amplified. I would be happily taking an estimate of 13 hours for a task that seems twice as large as one I've previously estimated at 5 hours.

Related

Can any existing Machine Learning structures perfectly emulate recursive functions like the Fibonacci sequence?

To be clear I don't mean, provided the last two numbers in the sequence provide the next one:
(2, 3, -> 5)
But rather given any index provide the Fibonacci number:
(0 -> 1) or (7 -> 21) or (11 -> 144)
Adding two numbers is a very simple task for any machine learning structure, and by extension counting by ones, twos or any fixed number is a simple addition rule. Recursive calculations however...
To my understanding, most learning networks rely on forwards only evaluation, whereas most programming languages have loops, jumps, or circular flow patterns (all of which are usually ASM jumps of some kind), thus allowing recursion.
Sure some networks aren't forwards only; But can processing weights using the hyperbolic tangent or sigmoid function enter any computationally complete state?
i.e. conditional statements, conditional jumps, forced jumps, simple loops, complex loops with multiple conditions, providing sort order, actual reordering of elements, assignments, allocating extra registers, etc?
It would seem that even a non-forwards only network would only find a polynomial of best fit, reducing errors across the expanse of the training set and no further.
Am I missing something obvious, or did most of Machine Learning just look at recursion and pretend like those problems don't exist?
Update
Technically any programming language can be considered the DNA of a genetic algorithm, where the compiler (and possibly console out measurement) would be the fitness function.
The issue is that programming (so far) cannot be expressed in a hill climbing way - literally, the fitness is 0, until the fitness is 1. Things don't half work in programming, and if they do, there is no way of measuring how 'working' a program is for unknown situations. Even an off by one error could appear to be a totally different and chaotic system with no output. This is exactly the reason learning to code in the first place is so difficult, the learning curve is almost vertical.
Some might argue that you just need to provide stronger foundation rules for the system to exploit - but that just leads to attempting to generalize all programming problems, which circles right back to designing a programming language and loses all notion of some learning machine at all. Following this road brings you to a close variant of LISP with mutate-able code and virtually meaningless fitness functions that brute force the 'nice' and 'simple' looking code-space in attempt to follow human coding best practices.
Others might argue that we simply aren't using enough population or momentum to gain footing on the error surface, or make a meaningful step towards a solution. But as your population approaches the number of DNA permutations, you are really just brute forcing (and very inefficiently at that). Brute forcing code permutations is nothing new, and definitely not machine learning - it's actually quite common in regex golf, I think there's even an xkcd about it...
The real problem isn't finding a solution that works for some specific recursive function, but finding a solution space that can encompass the recursive domain in some useful way.
So other than Neural Networks trained using Backpropagation hypothetically finding the closed form of a recursive function (if a closed form even exists, and they don't in most real cases where recursion is useful), or a non-forwards only network acting like a pseudo-programming language with awful fitness prospects in the best case scenario, plus the virtually impossible task of tuning exit constraints to prevent infinite recursion... That's really it so far for machine learning and recursion?
According to Kolmogorov et al's On the representation of continuous functions of many variables by superposition of continuous functions of one variable and addition, a three layer neural network can model arbitrary function with the linear and logistic functions, including f(n) = ((1+sqrt(5))^n - (1-sqrt(5))^n) / (2^n * sqrt(5)), which is the close form solution of Fibonacci sequence.
If you would like to treat the problem as a recursive sequence without a closed-form solution, I would view it as a special sliding window approach (I called it special because your window size seems fixed as 2). There are more general studies on the proper window size for your interest. See these two posts:
Time Series Prediction via Neural Networks
Proper way of using recurrent neural network for time series analysis
Ok, where to start...
Firstly, you talk about 'machine learning' and 'perfectly emulate'. This is not generally the purpose of machine learning algorithms. They make informed guesses given some evidence and some general notions about structures that exist in the world. That typically means an approximate answer is better than an 'exact' one that is wrong. So, no, most existing machine learning approaches aren't the right tools to answer your question.
Second, you talk of 'recursive structures' as some sort of magic bullet. Yet they are merely convenient ways to represent functions, somewhat analogous to higher order differential equations. Because of the feedbacks they tend to introduce, the functions tend to be non-linear. Some machine learning approaches will have trouble with this, but many (neural networks for example) should be able to approximate you function quite well, given sufficient evidence.
As an aside, having or not having closed form solutions is somewhat irrelevant here. What matters is how well the function at hand fits with the assumptions embodied in the machine learning algorithm. That relationship may be complex (eg: try approximating fibbonacci with a support vector machine), but that's the essence.
Now, if you want a machine learning algorithm tailored to the search for exact representations of recursive structures, you could set up some assumptions and have your algorithm produce the most likely 'exact' recursive structure that fits your data. There are probably real world problems in which such a thing would be useful. Indeed the field of optimisation approaches similar problems.
The genetic algorithms mentioned in other answers could be an example of this, especially if you provided a 'genome' that matches the sort of recursive function you think you may be dealing with. Closed form primitives could form part of that space too, if you believe they are more likely to be 'exact' than more complex genetically generated algorithms.
Regarding your assertion that programming cannot be expressed in a hill climbing way, that doesn't prevent a learning algorithm from scoring possible solutions by how many much of your evidence it's able to reproduce and how complex they are. In many cases (most? though counting cases here isn't really possible) such an approach will find a correct answer. Sure, you can come up with pathological cases, but with those, there's little hope anyway.
Summing up, machine learning algorithms are not usually designed to tackle finding 'exact' solutions, so aren't the right tools as they stand. But, by embedding some prior assumptions that exact solutions are best, and perhaps the sort of exact solution you're after, you'll probably do pretty well with genetic algorithms, and likely also with algorithms like support vector machines.
I think you also sum things up nicely with this:
The real problem isn't finding a solution that works for some specific recursive function, but finding a solution space that can encompass the recursive domain in some useful way.
The other answers go a long way to telling you where the state of the art is. If you want more, a bright new research path lies ahead!
See this article:
Turing Machines are Recurrent Neural Networks
http://lipas.uwasa.fi/stes/step96/step96/hyotyniemi1/
The paper describes how a recurrent neural network can simulate a register machine, which is known to be a universal computational model equivalent to a Turing machine. The result is "academic" in the sense that the neurons have to be capable of computing with unbounded numbers. This works mathematically, but would have problems pragmatically.
Because the Fibonacci function is just one of many computable functions (in fact, it is primitive recursive), it could be computed by such a network.
Genetic algorithms should do be able to do the trick. The important this is (as always with GAs) the representation.
If you define the search space to be syntax trees representing arithmetic formulas and provide enough training data (as you would with any machine learning algorithm), it probably will converge to the closed-form solution for the Fibonacci numbers, which is:
Fib(n) = ( (1+srqt(5))^n - (1-sqrt(5))^n ) / ( 2^n * sqrt(5) )
[Source]
If you were asking for a machine learning algorithm to come up with the recursive formula to the Fibonacci numbers, then this should also be possible using the same method, but with individuals being syntax trees of a small program representing a function.
Of course, you also have to define good cross-over and mutation operators as well as a good evaluation function. And I have no idea how well it would converge, but it should at some point.
Edit: I'd also like to point out that in certain cases there is always a closed-form solution to a recursive function:
Like every sequence defined by a linear recurrence with constant coefficients, the Fibonacci numbers have a closed-form solution.
The Fibonacci sequence, where a specific index of the sequence must be returned, is often used as a benchmark problem in Genetic Programming research. In most cases recursive structures are generated, although my own research focused on imperative programs so used an iterative approach.
There's a brief review of other GP research that uses the Fibonacci problem in Section 3.4.2 of my PhD thesis, available here: http://kar.kent.ac.uk/34799/. The rest of the thesis also describes my own approach, which is covered a bit more succinctly in this paper: http://www.cs.kent.ac.uk/pubs/2012/3202/
Other notable research which used the Fibonacci problem is Simon Harding's work with Self-Modifying Cartesian GP (http://www.cartesiangp.co.uk/papers/eurogp2009-harding.pdf).

Examples of mathematics algorithms that apply to game development

I am designing a RPG game like final fantasy.
I have the programming part done but what I lack is the maths. I am ok at maths but I am having trouble incorporating the players stas into mu sums.
How can I make an action timer that is based on the players speed?
How can I use attack and defence so that it is not always exactly the same damage?
How can I add randomness into the equations?
Can anyone point me to some resources that I can read to learn this sort of stuff.
EDIT: Clarification Of what I am looking for
for the damage I have (player attack x move strength) / enemy defence.
This works and scales well but i got a look at the algorithms from final fantasy 4 a while a got and this sum alone was over 15 steps. mine has only 2.
I am looking for real game examples if possible but would settle for papers or books that have sections that explain how they get these complex sums and why they don't use simple ones.
I eventually intent to implement but am looking for more academic knowledge at the moment.
Not knowing Final fantasy at all, here are some thoughts.
Attack/Defence could either be a 'chance to hit/block' or 'damage done/mitigated' (or, possibly, a blend of both). If you decide to go for 'damage done/mitigated', you'll probably want to do one of:
Generate a random number in a suitable range, added/subtracted from the base attack/defence value.
Generate a number in the range 0-1, multiplied by the attack/defence
Generate a number (with a Gaussian or Poisson distribution and a suitable standard deviation) in the range 0-2 (or so, to account for the occasional crit), multiplied by the attack/defence
For attack timers, decide what "double speed" and "triple speed" should do for the number of attacks in a given time. That should give you a decent lead for how to implement it. I can, off-hand, think of three methods.
Use N/speed as a base for the timer (that means double/triple speed gives 2/3 times the number of attacks in a given interval).
Use Basetime - Speed as the timer (requires a cap on speed, may not be an issue, most probably has an unintuitive relation between speed stat and timer, not much difference at low levels, a lot of difference at high levels).
Use Basetime - Sqrt(Speed) as the timer.
I doubt you'll find academic work on this. Determining formulae for damage, say, is heuristic. People just make stuff up based on their experience with various functions and then tweak the result based on gameplay.
It's important to have a good feel for what the function looks like when plotted on a graph. The best advice I can give for this is to study a course on sketching graphs of functions. A Google search on "sketching functions" will get you started.
Take a look at printed role playing games like Dungeons & Dragons and how they handle these issues. They are the inspiration for computer RPGs. I don't know of academic work
Some thoughts: you don't have to have an actual "formula". It can be rules like "roll a 20 sided die, weapon does 2 points of damage if the roll is <12 and 3 points of damage if the roll is >=12".
You might want to simplify continuous variables down to small ranges of integers for testing. That way you can calculate out tables with all the possible permutations and see if the results look reasonable. Once you have something good, you can interpolate the formulas for continuous inputs.
Another key issue is play balance. There aren't necessarily formulas for telling you whether your game mechanics are balanced, you have to test.

What do Planning Poker numbers represent? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 6 years ago.
Improve this question
The numbers used to vote when planning are 0, 0.5, 1, 2, 3, 5, 8, 13, 20, 40, 100.
Is there a meaning when those numbers are chosen? Why don't we just choose 1,2,3,4.. for the sake of simpliness?
The point is that as the estimates get bigger, they become less likely to be accurate anyway. There's no point in debating the merits of 34 vs 35 - at that point you're likely to be miles out anyway. This way just makes it easier: does this feel more like a 20-point task or a 40-point task? Not having the numbers between 21 and 39 forces you to make look at it in this "bigger" way. It should also be a hint that you should break the task down further before you come close to doing it.
All the details are explained here: http://en.wikipedia.org/wiki/Planning_poker
The sequence you give has been introduced by Mike Cohn in his book "Agile Estimating & Planning" (therefore the sequence is copyrighted, you need to obtain the permission to use it or you can also buy decks from his online shop).
The original planning poker sequence is a bit different and described he by his original inventor (James Grenning) : http://renaissancesoftware.net/papers/14-papers/44-planing-poker.html
This sequence allows you to compare backlog items to eachother. So it is imposible to say that some item is exactly two times bigger than other. Using this sequence you will always decide if it is more than two times bigger or less than two times.
For example:
First Item is estimated as 3SP
Now you are estimationg Second Item and someone said that it is two times "bigger" than First Item. Development tasks can't be exactly that same or exactle few times bigger or smaller. So you need to decide if it is bigger less than two times or more (it could be 5SP or 8SP).
If you have many estimated items in your backlog you can use this numbers for some stats. This stats works because Law of large numbers. http://en.wikipedia.org/wiki/Law_of_large_numbers
Using this sequence you are putting some uncertainty into that numbers so probability that this stats will work for you become higher.
Other simple answer for your question is: Mike Cohn chose this nubers after many experiments because they seams to work best in long period of time for various teams
All what I've wrote before is theory which has been created after experiments.
I've never seen that sequence used, the Fibonacci series (1 2 3 5 8 13 21 34) is more common. The idea is to avoid tricking yourself into thinking there is precision when there isn't.
Numbers on planning poker represent complexity of a task. You should not consider that a story with 8 as value is the double size in effort or time of a size 4 story for example. You could use as many different representations for these numbers as you want (like t-shirt sizes). You just need to have an idea that one value is more complex than another and there is another value that is even more bigger than it. The Planning Poker application attempt to illustrate this complexity with drawings related with number in order to help on this idea.

Explain the proof by Vinay Deolalikar that P != NP [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 5 years ago.
Improve this question
Recently there has been a paper floating around by Vinay Deolalikar at HP Labs which claims to have proved that P != NP.
Could someone explain how this proof works for us less mathematically inclined people?
I've only scanned through the paper, but here's a rough summary of how it all hangs together.
From page 86 of the paper.
... polynomial time
algorithms succeed by successively
“breaking up” the problem into
smaller subproblems that are joined to
each other through conditional
independence. Consequently, polynomial
time algorithms cannot solve
problems in regimes where blocks whose
order is the same as the underlying
problem instance require simultaneous
resolution.
Other parts of the paper show that certain NP problems can not be broken up in this manner. Thus NP/= P
Much of the paper is spent defining conditional independence and proving these two points.
Dick Lipton has a nice blog entry about the paper and his first impressions of it. Unfortunately, it also is technical. From what I can understand, Deolalikar's main innovation seems to be to use some concepts from statistical physics and finite model theory and tie them to the problem.
I'm with Rex M with this one, some results, mostly mathematical ones cannot be expressed to people who lack the technical mastery.
I liked this ( http://www.newscientist.com/article/dn19287-p--np-its-bad-news-for-the-power-of-computing.html ):
His argument revolves around a particular task, the Boolean satisfiability problem, which asks whether a collection of logical statements can all be simultaneously true or whether they contradict each other. This is known to be an NP problem.
Deolalikar claims to have shown that
there is no program which can complete
it quickly from scratch, and that it
is therefore not a P problem. His
argument involves the ingenious use of
statistical physics, as he uses a
mathematical structure that follows
many of the same rules as a random
physical system.
The effects of the above can be quite significant:
If the result stands, it would prove
that the two classes P and NP are not
identical, and impose severe limits on
what computers can accomplish –
implying that many tasks may be
fundamentally, irreducibly complex.
For some problems – including
factorisation – the result does not
clearly say whether they can be solved
quickly. But a huge sub-class of
problems called "NP-complete" would be
doomed. A famous example is the
travelling salesman problem – finding
the shortest route between a set of
cities. Such problems can be checked
quickly, but if P ≠ NP then there is
no computer program that can complete
them quickly from scratch.
This is my understanding of the proof technique: he uses first order logic to characterize all polynomial time algorithms, and then shows that for large SAT problems with certain properties that no polynomial time algorithm can determine their satisfiability.
One other way of thinking about it, which may be entirely wrong, but is my first impression as I'm reading it on the first pass, is that we think of assigning/clearing terms in circuit satisfaction as forming and breaking clusters of 'ordered structure', and that he's then using statistical physics to show that there isn't enough speed in the polynomial operations to perform those operations in a particular "phase space" of operations, because these "clusters" end up being too far apart.
Such proof would have to cover all classes of algorithms, like continuous global optimization.
For example, in the 3-SAT problem we have to evaluate variables to fulfill all alternatives of triples of these variables or their negations. Look that x OR y can be changed into optimizing
((x-1)^2+y^2)((x-1)^2+(y-1)^2)(x^2+(y-1)^2)
and analogously seven terms for alternative of three variables.
Finding the global minimum of a sum of such polynomials for all terms would solve our problem. (source)
It's going out of standard combinatorial techniques to the continuous world using_gradient methods, local minims removing methods, evolutionary algorithms. It's completely different kingdom - numerical analysis - I don't believe such proof could really cover (?)
It's worth noting that with proofs, "the devil is in the detail". The high level overview is obviously something like:
Some some sort of relationship
between items, show that this
relationship implies X and that
implies Y and thus my argument is
shown.
I mean, it may be via Induction or any other form of proving things, but what I'm saying is the high level overview is useless. There is no point explaining it. Although the question itself relates to computer science, it is best left to mathematicians (thought it is certainly incredibly interesting).

Where are "Special Numbers" mentioned in Concrete Maths used?

I was glancing through the contents of Concrete Maths online. I had at least heard most of the functions and tricks mentioned but there is a whole section on Special Numbers. These numbers include Stirling Numbers, Eulerian Numbers, Harmonic Numbers so on. Now I have never encountered any of these weird numbers. How do they aid in computational problems? Where are they generally used?
Harmonic Numbers appear almost everywhere! Musical Harmonies, analysis of Quicksort...
Stirling Numbers (first and second kind) arise in a variety of combinatorics and partitioning problems.
Eulerian Numbers also occur several places, most notably in permutations and coefficients of polylogarithm functions.
A lot of the numbers you mentioned are used in the analysis of algorithms. You may not have these numbers in your code, but you'll need them if you want to estimate how long it will take for your code to run. You might see them in your code too. Some of these numbers are related to combinatorics, counting how many ways something can happen.
Sometimes it's not enough to know how many possibilities there are because you need to enumerate over the possibilities. Volume 4 of Knuth's TAOCP, in progress, gives the algorithms you need.
Here's an example of using Fibonacci numbers as part of a numerical integration problem.
Harmonic numbers are a discrete analog of logarithms and so they come up in difference equations just like logs come up in differential equations. Here's an example of physical applications of harmonic means, related to harmonic numbers. See the book Gamma for many examples of harmonic numbers in action, especially the chapter "It's a harmonic world."
These special numbers can help out in computational problems in many ways. For example:
You want to find out when your program to compute the GCD of 2 numbers is going to take the longest amount of time: Try 2 consecutive Fibonacci Numbers.
You want to have a rough estimate of the factorial of a large number, but your factorial program is taking too long: Use Stirling's Approximation.
You're testing for prime numbers, but for some numbers you always get the wrong answer: It could be you're using Fermat's Prime test, in which case the Carmicheal numbers are your culprits.
The most common general case I can think of is in looping. Most of the time you specify a loop using a (start;stop;step) type of syntax, in which case it may be possible to reduce the execution time by using properties of the numbers involved.
For example, summing up all the numbers from 1 to n when n is large in a loop is definitely slower than using the identity sum = n*(n + 1)/2.
There are a large number of examples like these. Many of them are in cryptography, where the security of information systems sometimes depends on tricks like these. They can also help you with performance issues, memory issues, because when you know the formula, you may find a faster/more efficient way to compute other things -- things that you actually care about.
For more information, check out wikipedia, or simply try out Project Euler. You'll start finding patterns pretty fast.
Most of these numbers count certain kinds of discrete structures (for instance, Stirling Numbers count Subsets and Cycles). Such structures, and hence these sequences, implicitly arise in the analysis of algorithms.
There is an extensive list at OEIS that lists almost all sequences that appear in Concrete Math. A short summary from that list:
Golomb's Sequence
Binomial Coefficients
Rencontres Numbers
Stirling Numbers
Eulerian Numbers
Hyperfactorials
Genocchi Numbers
You can browse the OEIS pages for the respective sequences to get detailed information about the "properties" of these sequences (though not exactly applications, if that's what you're most interested in).
Also, if you want to see real-life uses of these sequences in analysis of algorithms, flip through the index of Knuth's Art of Computer Programming, and you'll find many references to "applications" of these sequences. John D. Cook already mentioned applications of Fibonacci & Harmonic numbers; here are some more examples:
Stirling Cycle Numbers arise in the analysis of the standard algorithm that finds the maximum element of an array (TAOCP Sec. 1.2.10): How many times must the current maximum value be updated when finding the maximum value? It turns out that the probability that the maximum will need to be updated k times when finding a maximum in an array of n elements is p[n][k] = StirlingCycle[n, k+1]/n!. From this, we can derive that on the average, approximately Log(n) updates will be necessary.
Genocchi Numbers arise in connection with counting the number of BDDs that are "thin" (TAOCP 7.1.4 Exercise 174).
Not necessarily a magic number from the reference you mentioned, but nonetheless --
0x5f3759df
-- the notorious magic number used to calculate inverse square root of a number by giving a good first estimate to Newton's Approximation of Roots, often attributed to the work of John Carmack - more info here.
Not programming related, huh? :)
Is this directly programming related? Surely related, but I don't know how closely.
Special numbers, such as e, pi, etc., come up all over the place. I don't think that anyone would argue about these two. The Golden_ratio also appears with amazing frequency, in everything from art to other special numbers themselves (look at the ratio between successive Fibonacci numbers.)
Various sequences and families of numbers also appear in many places in mathematics and therefore, in programming too. A beautiful place to look is the Encyclopedia of integer sequences.
I'll suggest this is an experience thing. For example, when I took linear algebra, many, many years ago, I learned about the eigenvalues and eigenvectors of a matrix. I'll admit that I did not at all appreciate the significance of eigenvalues/eigenvectors until I saw them in use in a variety of places. In statistics, in terms of what they tell you about uncertainty of an estimate from a covariance matrix, the size and shape of a confidence ellipse, in terms of principal component analysis, or the long term state of a Markov process. In numerical methods, where they tell you about convergence of a method, be it in optimization or an ODE solver. In mechanical engineering, where you see them as principal stresses and strains.
Discussion in Reddit

Resources