DVB-S2X satellite data communications has a Network Clock Reference. MPEG streams have a Program Clock Reference. In both cases they're a number of ticks on a 27MHz clock that wraps around at 300*2^33 because the number is defined as a 33-bit "base" plus a 9-bit "extension" with the latter running from 0..299 and the former incrementing by one tick each time the latter wraps around.
My question is this: given 2 NCR counter values A and B, is there a standards-defined method for determining whether A is "behind" or "ahead of" B?
For example, if A is 20 and B is 30 then intuitively you can see that A is behind B by 10 ticks.
Conversely, if A is 10 and B is 300*2^33-10 then intuitively A is ahead of B by 20 ticks with the counter having wrapped around as it incremented from 300*2^33-1 to 0.
Previously when I've worked with things like data packet sequence numbers, we've taken by convention that if max(A,B)-min(A,B) < WRAPVALUE/2 then the larger number is ahead of the smaller, otherwise it’s behind and the difference is given by min(A,B)+WRAPVALUE-max(A,B) (draw some number lines if you don't follow this).
Is this half ahead/behind heuristic simply a convention I've picked up over the years, or is there something more concrete in an ETSI spec somewhere mandating that this is how NCR/PCR values are to be compared and the difference between them calculated? Or any other wraparound number types, for that matter.
Related
I am making a roguelike where the setting is open world on a procedurally generated planet. I want the distribution of each biome to be organic. There are 5 different biomes. Is there a way to organically distribute them without a huge complicated algorithm? I want the amount of space each biome takes up to be nearly equal.
I have worked with cellular automata before when I was making the terrain generators for each biome. There were 2 different states for each tile there. Is there an efficient way to do 5?
I'm using python 2.5, although specific code isn't necessary. Programming theory on it is fine.
If the question is too open ended, are there any resources out there that I could look at for this kind of problem?
You can define a cellular automaton on any cell state space. Just formulate the cell update function as F:Q^n->Q where Q is your state space (here Q={0,1,2,3,4,5}) and n is the size of your neighborhood.
As a start, just write F as a majority rule, that is, 0 being the neutral state, F(c) should return the value in 1-5 with the highest count in the neighborhood, and 0 if none is present. In case of equality, you may pick one of the max at random.
As an initial state, start with a configuration with 5 relatively equidistant cells with the states 1-5 (you may build them deterministically through a fixed position that can be shifted/mirrored, or generate these points randomly).
When all cells have a value different than 0, you have your map.
Feel free to improve on the update function, for example by applying the rule with a given probability.
Hey guys I have the following question:
Suppose we are working with strands of DNA, each strand consisting of
a sequence of 10 nucleotides. Each nucleotide can be any one of four
different types: A, G, T or C. How many bits does it take to encode a
DNA strand?
Here is my approach to it and I want to know if that is correct.
We have 10 spots. Each spot can have 4 different symbols. This means we require 4^10 combinations using our binary digits.
4^10 = 1048576.
We will then find the log base 2 of that. What do you guys think of my approach?
Each nucleotide (aka base-pair) takes two bits (one of four states -> 2 bits of information). 10 base-pairs thus take 20 bits. Reasoning that way is easier than doing the log2(4^10), but gives the same answer.
It would be fewer bits of information if there were any combinations that couldn't appear. e.g. some codons (sequence of three base-pairs) that never appear. But ten independent 2-bit pieces of information sum to 20 bits.
If some sequences appear more frequently than others, and a variable-length representation is viable, then Huffman coding or other compression schemes could save bits most of the time. This might be good in a file-format, but unlikely to be good in-memory when you're working with them.
Densely packing your data into an array of 2bit fields makes it slower to access a single base-pair, but comparing the whole chunk for equality with another chunk is still efficient. (memcmp).
20 bits is unfortunately just slightly too large for a 16bit integer (which computers are good at). Storing in an array of 32bit zero-extended values wastes a lot of space. On hardware with good unaligned support, storing 24bit zero-extended values is ok (do a 32bit load and mask the high 8 bits. Storing is even less convenient though: probably a 16b store and an 8b store, or else load the old value and merge the high 8, then do a 32b store. But that's not atomic.).
This is a similar problem for storing codons (groups of three base-pairs that code for an amino acid): 6 bits of information doesn't fill a byte. Only wasting 2 of every 8 bits isn't that bad, though.
Amino-acid sequences (where you don't care about mutations between different codons that still code for the same AA) have about 20 symbols per position, which means a symbol doesn't quite fit into a 4bit nibble.
I used to work for the phylogenetics research group at Dalhousie, so I've sometimes thought about having a look at DNA-sequence software to see if I could improve on how they internally store sequence data. I never got around to it, though. The real CPU intensive work happens in finding a maximum-likelihood evolutionary tree after you've already calculated a matrix of the evolutionary distance between every pair of input sequences. So actual sequence comparison isn't the bottleneck.
do the maths:
4^10 = 2^2^10 = 2^20
Answer: 20 bits
Specifically around log log counting approach.
I'll try and clarify the use of probabilistic counters although note that I'm no expert on this matter.
The aim is to count to very very large numbers using only a little space to store the counter (e.g. using a 32 bits integer).
Morris came up with the idea to maintain a "log count", so instead of counting n, the counter holds log₂(n). In other words, given a value c of the counter, the real count represented by the counter is 2ᶜ.
As logs are not generally of integer value, the problem becomes when the c counter should be incremented, as we can only do so in steps of 1.
The idea here is to use a "probabilistic counter", so for each call to a method Increment on our counter, we update the actual counter value with a probability p. This is useful as it can be shown that the expected value represented by the counter value c with probabilistic updates is in fact n. In other words, on average the value represented by our counter after n calls to Increment is in fact n (but at any one point in time our counter is probably has an error)! We are trading accuracy for the ability to count up to very large numbers with little storage space (e.g. a single register).
One scheme to achieve this, as described by Morris, is to have a counter value c represent the actual count 2ᶜ (i.e. the counter holds the log₂ of the actual count). We update this counter with probability 1/2ᶜ where c is the current value of the counter.
Note that choosing this "base" of 2 means that our actual counts are always multiples of 2 (hence the term "order of magnitude estimate"). It is also possible to choose other b > 1 (typically such that b < 2) so that the error is smaller at the cost of being able to count smaller maximum numbers.
The log log comes into play because in base-2 a number x needs log₂ bits to be represented.
There are in fact many other schemes to approximate counting, and if you are in need of such a scheme you should probably research which one makes sense for your application.
References:
See Philippe Flajolet for a proof on the average value represented by the counter, or a much simpler treatment in the solutions to a problem 5-1 in the book "Introduction to Algorithms". The paper by Morris is usually behind paywalls, I could not find a free version to post here.
its not exactly for the log counting approach but i think it can help you,
using Morris' algorithm, the counter represents an "order of magnitude estimate" of the actual count.The approximation is mathematically unbiased.
To increment the counter, a pseudo-random event is used, such that the incrementing is a probabilistic event. To save space, only the exponent is kept. For example, in base 2, the counter can estimate the count to be 1, 2, 4, 8, 16, 32, and all of the powers of two. The memory requirement is simply to hold the exponent.
As an example, to increment from 4 to 8, a pseudo-random number would be generated such that a probability of .25 generates a positive change in the counter. Otherwise, the counter remains at 4. from wiki
I just read this interesting question about a random number generator that never generates the same value three consecutive times. This clearly makes the random number generator different from a standard uniform random number generator, but I'm not sure how to quantitatively describe how this generator differs from a generator that didn't have this property.
Suppose that you handed me two random number generators, R and S, where R is a true random number generator and S is a true random number generator that has been modified to never produce the same value three consecutive times. If you didn't tell me which one was R or S, the only way I can think of to detect this would be to run the generators until one of them produced the same value three consecutive times.
My question is - is there a better algorithm for telling the two generators apart? Does the restriction of not producing the same number three times somehow affect the observable behavior of the generator in a way other than preventing three of the same value from coming up in a row?
As a consequence of Rice's Theorem, there is no way to tell which is which.
Proof: Let L be the output of the normal RNG. Let L' be L, but with all sequences of length >= 3 removed. Some TMs recognize L', but some do not. Therefore, by Rice's theorem, determining if a TM accepts L' is not decidable.
As others have noted, you may be able to make an assertion like "It has run for N steps without repeating three times", but you can never make the leap to "it will never repeat a digit three times." More appropriately, there exists at least one machine for which you can't determine whether or not it meets this criterion.
Caveat: if you had a truly random generator (e.g. nuclear decay), it is possible that Rice's theorem would not apply. My intuition is that the theorem still holds for these machines, but I've never heard it discussed.
EDIT: a secondary proof. Suppose P(X) determines with high probability whether or not X accepts L'. We can construct an (infinite number of) programs F like:
F(x): if x(F), then don't accept L'
else, accept L'
P cannot determine the behavior of F(P). Moreover, say P correctly predicts the behavior of G. We can construct:
F'(x): if x(F'), then don't accept L'
else, run G(x)
So for every good case, there must exist at least one bad case.
If S is defined by rejecting from R, then a sequence produced by S will be a subsequence of the sequence produced by R. For example, taking a simple random variable X with equal probability of being 1 or 0, you would have:
R = 0 1 1 0 0 0 1 0 1
S = 0 1 1 0 0 1 0 1
The only real way to differentiate these two is to look for streaks. If you are generating binary numbers, then streaks are incredibly common (so much so that one can almost always differentiate between a random 100 digit sequence and one that a student writes down trying to be random). If the numbers are taken from [0,1] uniformly, then streaks are far less common.
It's an easy exercise in probability to calculate the chance of three consecutive numbers being equal once you know the distribution, or even better, the expected number of numbers needed until the probability of three consecutive equal numbers is greater than p for your favourite choice of p.
Since you defined that they only differ with respect to that specific property there is no better algorithm to distinguish those two.
If you do triples of randum values of course the generator S will produce all other triples slightly more often than R in order to compensate the missing triples (X,X,X). But to get a significant result you'd need much more data than it will cost you to find any value three consecutive times the first time.
Probably use ENT ( http://fourmilab.ch/random/ )
Hello good people of stackoverflow, this is a conceptual question and could possibly belong in math.stackexchange.com, however since this relates to the processing speed of a CPU, I put it in here.
Anyways, my question is pretty simple. I have to calculate the sum of the cubes of 3 numbers in a range of numbers. That sounds confusing to me, so let me give an example.
I have a range of numbers, (0, 100), and a list of each numbers cube. I have to calculate each and every combination of 3 numbers in this set. For example, 0 + 0 + 0, 1 + 0 + 0, ... 98^3 + 99^3 + 100^3. That may make sense, I'm not sure if I explained it well enough.
So anyways, after all the sets are computed and checked against a list of numbers to see if the sum matches with any of those, the program moves on to the next set, (100, 200). This set needs to compute everything from 100-200 + 0-200 + 0-200. Than (200, 300) will need to do 200 - 300 + 0 - 300 + 0 - 300 and so on.
So, my question is, depending on the numbers given to a CPU to add, will the time taken increase due to size? And, will the time it takes for each set exponentially increase at a predictable rate or will it be exponential, however not constant.
The time to add two numbers is logarithmic with the magnitude of the numbers, or linear with the size (length) of the numbers.
For a 32-bit computer, numbers up to 2^32 will take 1 unit of time to add, numbers up to 2^64 will take 2 units, etc.
As I understand the question you have roughly 100*100*100 combinations for the first set (let's ignore that addition is commutative). For the next set you have 100*200*200, and for the third you have 100*300*300. So it looks like you have an O(n^2) process going on there. So if you want to calculate twice as many sets, it will take you four times as long. If you want to calculate thrice as many, it's going to take nine times as long. This is not exponential (such as 2^n), but usually referred to as quadratic.
It depends on how long "and so on" lasts. As long as you maximum number, cubed, fits in your longest integer type, no. It always takes just one instruction to add, so it's constant time.
Now, if you assume an arbitrary precision machine, like say writing these numbers on the tape of a turing machine in decimal symbols, then adding will take a longer time. In that case, consider how long it would take? In other words, think about how the length of a string of decimal symbols grows to represent a number n. It will take time at least proportional to that length.