I'm learning Python and came across a question that went something like "How long would it take to count to 1,000,000 out loud?" The only parameter it gave was, "you count, on average, 1 digit per second." I did that problem, which wasn't very difficult. Then I started thinking about counting aloud, annunciating each numeral. That parameter seems off to me, and indeed the answer Google gives to the question alone "how long to count to a million" suggests it's off. Given that each number in the sequence takes progressively longer (an exponential increase??), there must be a better way.
Any ideas or general guidance would be of assistance. Would sampling various people's "counting rates" at various intervals work? Would programming the # of syllables work? I am really curious, and have looked all over SO and Google for solutions that don't revolve around that seemingly inaccurate "average time".
Thanks, and sorry if this isn't on topic or in the appropriate place. I'm a long time lurker, but new to posting, so let me know if you need more info or anything. Thanks!
Let us suppose for the sake of simplicity that you don't say 1502 as "fifteen hundred and two", but as "thousand five hundred and two". Then we can hierarchically break it down.
And let's ignore the fact whether you say "and" or not (though apparently it is more said than not) for now. I will use this reference (and British English, because I like it more and it's more consistent : http://forum.wordreference.com/showthread.php?t=15&langid=6) for how to pronounce numbers.
In fact, to formally describe this, let t be a function of a set of numbers, that tells you how much time it takes to pronounce every number in that set. Then your question is how to compute t([1..1000000]), and we will use M=t([1..999])
Triplet time in function of previous one
To read a large number we start at the left and read the three-digit groups. The group at the left, of course, may have only one or two digits.
Thus for every number x of thousands you will say x thousand y where y will describe all the numbers from 1 to 999.
Thus the time you spend in the x thousand ... is 1000 t({1000x}) + M, as detailed here after :
Note that this formula is generalizable to numbers below 1000, by simply defining t({0}) = 0.
Now the time to say "x thousand" is, per our hypothesis, equal to the time to say "x" plus the time to say "thousand" (when x > 0). Thus your answer is :
Where is the time it takes to say the word thousand. This supposes you say 1000 as "one thousand". You may want to remove 1000 tau("one") if you would only say "thousand".
How ever I stick with the reference :
The numbers 100-199 begin with one hundred... or a hundred...
You can in exactly the same way express the time it takes to count to a billion from and the number above, and so on for all the greater powers of 103, i.e.
Taking into account the "and"
There is a small correction to be done. Let us suppose that M is the time it takes to pronounce numbers from 1 to 999 when they are preceded by at least a non-0 group of numbers, including initial "and"s.
Our reference (well, the wordreference post I linked) says the following :
What do we say to join the groups?
Normally, we don’t use any joining word.
The exception is the last group.
If the last group after the thousands is 1-99 it is joined with and.
Thus our correction applies only to the numbers between 0 and 999 (where there is no non-zero group preceding) :
Getting M
Or rather, let's get t([1..999]) since it's more natural and we know how it is related to M.
Let C = t([1..99]), X = t([1..9]).
Between 1 and 999 we have all the numbers from [1..99] and all the 9 exact hundreds where you don't say "and", that is 108 occurences. There are 900 numbers prefixed with a hundreds number.
Thus
C is probably hard to break down, so I'm not going to try.
Final result
The corrected formula is :
And as a function of C and X :
Note that your measures of tau(word), C, and X need to be very precise if you plan on doing this multiplication and having any kind of correct order of magnitude.
Conclusion : Brits end up saying "and" a whole lot. The nice thing about the last formulation is that you can remove all the "and"s if you decide you actually don't want to pronounce them.
Related
Let's assume we have a sequence a_i of length n and we want to sort it using shell sort. To do so, we would choose sub sequences of out a_i's of length k_i.
I'm now wondering how to choose those k_i's. You usually see that if n=16 we would choose k_1=8, k_2=4, k_3=2, k_4=1. So we would pair-wise compare the number's for each k_i and at the end use insertionSort to finish our sorting.
The idea of first sorting sub sequences of length k_i is to "pre-sort" the sequence for the insertionSort. Right?
Questions:
Now, depending on how we choose our k_i, we get a better performance. Is there a rule I can use here to choose the k_i's?
Could I also choose e.g. n=15, k_1=5, k_2=3, k_3=2?
If we have n=10 and k_1=5, would we now go with {k_2=2, k_3=1} or {k_2=3, k_2=2, k_3=1} or {k_2=3, k_3=1}?
The fascinating thing about shellsort is that for a sequence of n (unique) entries a unique set of gaps will be required to sort it efficiently, essentially f(n) => {gap/gaps}
For example, to most efficiently - on average - sort a sequence containing
2-5 entries - use insertion sort
6 entries - use shellsort with gaps {4,1}
7 or 8 entries - use a {5,1} shellsort
9 entries - use a {6,1} shellsort
10 entries - use a {9,6,1} shellsort
11 entries - use a {10,6,1} shellsort
12 entries - use a {5,1} shellsort
As you can see, 6-9 require 2 gaps, 10 and 11 three and 12 two. This is typical of shellsort's gaps: from one n to the next (i e n+1) you can be fairly sure that the number and makeup of gaps will differ.
A nasty side-effect of shellsort is that when using a set of random combinations of n entries (to save processing/evaluation time) to test gaps you may end up with either the best gaps for n entries or the best gaps for your set of combinations - most likely the latter.
I speculate that it is probably possible to create algorithms where you can plug in an arbitrary n and get the best gap sequence computed for you. Many high-profile computer scientists have explored the relationship between n and gaps without a lot to show for it. In the end they produce gaps (more or less by trial and error) that they claim perform better than those of others who have explored shellsort.
Concerning your foreword given n=16 ...
a {8,4,2,1} shellsort may or may not be an efficient way to sort 16 entries.
or should it be three gaps and, if so, what might they be?
or even two?
Then, to (try to) answer your questions ...
Q1: a rule can probably be formulated
Q2: you could ... but you should test it (for a given n there are n! possible sequences to test)
Q3: you can compare it with the correct answer (above). Or you can test it against all 10! possible sequences when n=10 (comes out to 3628800 of them) - doable
Had a tough time thinking of an appropriate title, but I'm just trying to code something that can auto compute the following simple math problem:
The average value of a,b,c is 25. The average value of b,c is 23. What is the value of 'a'?
For us humans we can easily compute that the value of 'a' is 29, without the need to know b and c. But I'm not sure if this is possible in programming, where we code a function that takes in the average values of 'a,b,c' and 'b,c' and outputs 'a' automatically.
Yes, it is possible to do this. The reason for this is that you can model the sort of problem being described here as a system of linear equations. For example, when you say that the average of a, b, and c is 25, then you're saying that
a / 3 + b / 3 + c / 3 = 25.
Adding in the constraint that the average of b and c is 23 gives the equation
b / 2 + c / 2 = 23.
More generally, any constraint of the form "the average of the variables x1, x2, ..., xn is M" can be written as
x1 / n + x2 / n + ... + xn / n = M.
Once you have all of these constraints written out, solving for the value of a particular variable - or determining that many solutions exists - reduces to solving a system of linear equations. There are a number of techniques to do this, with Gaussian elimination with backpropagation being a particularly common way to do this (though often you'd just hand this to MATLAB or a linear algebra package and have it do the work for you.)
There's no guarantee in general that given a collection of equations the computer can determine whether or not they have a solution or to deduce a value of a variable, but this happens to be one of the nice cases where the shape of the contraints make the problem amenable to exact solutions.
Alright I have figured some things out. To answer the question as per title directly, it's possible to represent average value in programming. 1 possible way is to create a list of map data structures which store the set collection as key (eg. "a,b,c"), while the average value of the set will be the value (eg. 25).
Extract the key and split its string by comma, store into list, then multiply the average value by the size of list to get the total (eg. 25x3 and 23x2). With this, no semantic information will be lost.
As for the context to which I asked this question, the more proper description to the problem is "Given a set of average values of different combinations of variables, is it possible to find the value of each variable?" The answer to this is open. I can't figure it out, but below is an attempt in describing the logic flow if one were to code it out:
Match the lists (from Paragraph 2) against one another in all possible combinations to check if a list contains all elements in another list. If so, substract the lists (eg. abc-bc) as well as the value (eg. 75-46). If upon substracting we only have 1 variable in the collection, then we have found the value for this variable.
If there's still more than 1 variables left such as abcd - bc = ad, then store the values as a map data structure and repeat the process, till the point where the substraction count in the full iteration is 0 for all possible combinations (eg. ac can't substract bc). This is unfortunately not where it ends.
Further solutions may be found by combining the lists (eg. ac + bd = abcd) to get more possible ways to subtract and derive at the answer. When this is the case, you just don't know when to stop trying, and the list of combinations will get exponential. Maybe someone with strong related mathematical theories may be able to prove that upon a certain number of iteration, further additions are useless and hence should stop. Heck, it may even be possible that negative values are also helpful, and hence contradict what I said earlier about 'ac' can't subtract 'bd' (to get a,c,-b,-d). This will give even more combinations to compute.
People with stronger computing science foundations may try what templatetypedef has suggested.
I just read this interesting question about a random number generator that never generates the same value three consecutive times. This clearly makes the random number generator different from a standard uniform random number generator, but I'm not sure how to quantitatively describe how this generator differs from a generator that didn't have this property.
Suppose that you handed me two random number generators, R and S, where R is a true random number generator and S is a true random number generator that has been modified to never produce the same value three consecutive times. If you didn't tell me which one was R or S, the only way I can think of to detect this would be to run the generators until one of them produced the same value three consecutive times.
My question is - is there a better algorithm for telling the two generators apart? Does the restriction of not producing the same number three times somehow affect the observable behavior of the generator in a way other than preventing three of the same value from coming up in a row?
As a consequence of Rice's Theorem, there is no way to tell which is which.
Proof: Let L be the output of the normal RNG. Let L' be L, but with all sequences of length >= 3 removed. Some TMs recognize L', but some do not. Therefore, by Rice's theorem, determining if a TM accepts L' is not decidable.
As others have noted, you may be able to make an assertion like "It has run for N steps without repeating three times", but you can never make the leap to "it will never repeat a digit three times." More appropriately, there exists at least one machine for which you can't determine whether or not it meets this criterion.
Caveat: if you had a truly random generator (e.g. nuclear decay), it is possible that Rice's theorem would not apply. My intuition is that the theorem still holds for these machines, but I've never heard it discussed.
EDIT: a secondary proof. Suppose P(X) determines with high probability whether or not X accepts L'. We can construct an (infinite number of) programs F like:
F(x): if x(F), then don't accept L'
else, accept L'
P cannot determine the behavior of F(P). Moreover, say P correctly predicts the behavior of G. We can construct:
F'(x): if x(F'), then don't accept L'
else, run G(x)
So for every good case, there must exist at least one bad case.
If S is defined by rejecting from R, then a sequence produced by S will be a subsequence of the sequence produced by R. For example, taking a simple random variable X with equal probability of being 1 or 0, you would have:
R = 0 1 1 0 0 0 1 0 1
S = 0 1 1 0 0 1 0 1
The only real way to differentiate these two is to look for streaks. If you are generating binary numbers, then streaks are incredibly common (so much so that one can almost always differentiate between a random 100 digit sequence and one that a student writes down trying to be random). If the numbers are taken from [0,1] uniformly, then streaks are far less common.
It's an easy exercise in probability to calculate the chance of three consecutive numbers being equal once you know the distribution, or even better, the expected number of numbers needed until the probability of three consecutive equal numbers is greater than p for your favourite choice of p.
Since you defined that they only differ with respect to that specific property there is no better algorithm to distinguish those two.
If you do triples of randum values of course the generator S will produce all other triples slightly more often than R in order to compensate the missing triples (X,X,X). But to get a significant result you'd need much more data than it will cost you to find any value three consecutive times the first time.
Probably use ENT ( http://fourmilab.ch/random/ )
Hello good people of stackoverflow, this is a conceptual question and could possibly belong in math.stackexchange.com, however since this relates to the processing speed of a CPU, I put it in here.
Anyways, my question is pretty simple. I have to calculate the sum of the cubes of 3 numbers in a range of numbers. That sounds confusing to me, so let me give an example.
I have a range of numbers, (0, 100), and a list of each numbers cube. I have to calculate each and every combination of 3 numbers in this set. For example, 0 + 0 + 0, 1 + 0 + 0, ... 98^3 + 99^3 + 100^3. That may make sense, I'm not sure if I explained it well enough.
So anyways, after all the sets are computed and checked against a list of numbers to see if the sum matches with any of those, the program moves on to the next set, (100, 200). This set needs to compute everything from 100-200 + 0-200 + 0-200. Than (200, 300) will need to do 200 - 300 + 0 - 300 + 0 - 300 and so on.
So, my question is, depending on the numbers given to a CPU to add, will the time taken increase due to size? And, will the time it takes for each set exponentially increase at a predictable rate or will it be exponential, however not constant.
The time to add two numbers is logarithmic with the magnitude of the numbers, or linear with the size (length) of the numbers.
For a 32-bit computer, numbers up to 2^32 will take 1 unit of time to add, numbers up to 2^64 will take 2 units, etc.
As I understand the question you have roughly 100*100*100 combinations for the first set (let's ignore that addition is commutative). For the next set you have 100*200*200, and for the third you have 100*300*300. So it looks like you have an O(n^2) process going on there. So if you want to calculate twice as many sets, it will take you four times as long. If you want to calculate thrice as many, it's going to take nine times as long. This is not exponential (such as 2^n), but usually referred to as quadratic.
It depends on how long "and so on" lasts. As long as you maximum number, cubed, fits in your longest integer type, no. It always takes just one instruction to add, so it's constant time.
Now, if you assume an arbitrary precision machine, like say writing these numbers on the tape of a turing machine in decimal symbols, then adding will take a longer time. In that case, consider how long it would take? In other words, think about how the length of a string of decimal symbols grows to represent a number n. It will take time at least proportional to that length.
As a programmer, I frequently need to be able to know the
how to calculate the number of permutations of a set, usually
for estimation purposes.
There are a lot of different ways specify the allowable
combinations, depending on the problem at hand. For example,
given the set of letters A,B,C,D
Assuming a 4 digit result, how many ways can those letters
be arranged?
What if you can have 1,2,3 or 4 digits, then how many ways?
What if you are only allowed to use each letter at most once?
twice?
What if you must avoid the same letter appearing twice in
a row, but if they are not in a row, then twice is ok?
Etc. I'm sure there are many more.
Does anyone know of a web reference or book that talks about
this subject in terms that a non-mathematician can understand?
Thanks!
Assuming a 4 digit result, how many
ways can those letters be arranged?
when picking the 1st digital , you have 4 choices ,which is one of A, B , C and D ; it is the same when picking the 2nd, 3rd ,4th since repetition is allowed:
so you have total : 4*4*4*4 = 256 choices.
What if you can have 1,2,3 or 4
digits, then how many ways?
It is easy to deduce from question 1.
What if you are only allowed to use
each letter at most once?
When pick the 1st digital , you have 4 choices ,which is one of A , B , c and D ; when picking the 2nd , you have 3 choice except the one you have picked for the 1st ; and 2 choices for 3rd , 1 choices for the 4th.
So you have total : 4 * 3 * 2 * 1 = 24 choice.
The knowledge involving here include combination , permutation and probability. Here is a good tutorial to understand their difference.
First of all the topics you are speaking of are
Permutations (where the order matters)
Combinations (order doesn't matter)
I would recommend Math Tutor DVD for teaching yourself math topics. The "probability and statistics" disk set will give you the formulas and skill you need to solve the problems. It's great because it's the closest thing you can get to going back to school, because a teacher solves problems on a white board for you.
I've found a clip on the Combinations chapter of the video for you to check out.
If you need to do more than just count the number of combinations and permutations, if you actually need to generate the sequences, then Donald Knuth's books Generating all combinations and partitions and Generating all tuples and permutations. He goes into great detail regarding algorithms subject to various restrictions, looking at the advantages and disadvantages of different solutions for each problem.
It all depends on how simply do you need the explanation to be.
The topic you are looking for is called "Permutations and Combinations".
Here's a fairly simply introduction. There are dozens like this on the first few pages from google.