Hardware Cache Formulas (Parameter) - math

The image below was scanned (poorly) from Computer Systems: A Programmer's Perspective. (I apologize to the publisher). This appears on page 489.
Figure 6.26: Summary of cache parameters http://theopensourceu.com/wp-content/uploads/2009/07/Figure-6.26.jpg
I'm having a terribly difficult time understanding some of these calculations. At the current moment, what is troubling me is the calculation for M, which is supposed to be the number of unique addresses. "Maximum number of unique memory addresses." What does 2m suppose to mean? I think m is calculated as log2(M). This seems circular....
For the sake of this post, assume the following in the event you want to draw up an example: 512 sets, 8 blocks per set, 32 words per block, 8 bits per word
Update: All of the answers posted thus far have been helpful but I still think I'm missing something. cwrea's answer provides the biggest bridge for my understand. I feel like the answer is on the tip of my mental tongue. I know it is there but I can't identify it.
Why does M = 2m but then m = log2(M)?
Perhaps the detail I'm missing is that for a 32-bit machine, we'd assume M = 232. Does this single fact allow me to solve for m? m = log2(232)? But then this gets me back to 32... I have to be missing something...

m & M are related to each other, not defined in terms of each other. They call M a derived quantity however since usually the processor/controller is the limiting factor in terms of the word length it uses.
On a real system they are predefined. If you have a 8-bit processor, it generally can handle 8-bit memory addresses (m = 8). Since you can represent 256 values with 8-bits, you can have a total of 256 memory addresses (M = 2^8 = 256). As you can see we start with the little m due to the processor constraints, but you could always decide you want a memory space of size M, and use that to select a processor that can handle it based on word-size = log2(M).
Now if we take your assumptions for your example,
512 sets, 8 blocks per set, 32 words
per block, 8 bits per word
I have to assume this is an 8-bit processor given the 8-bit words. At that point your described cache is larger than your address space (256 words) & therefore pretty meaningless.
You might want to check out Computer Architecture Animations & Java applets. I don't recall if any of the cache ones go into the cache structure (usually they focus on behavior) but it is a resource I saved on the past to tutor students in architecture.
Feel free to further refine your question if it still doesn't make sense.

The two equations for M are just a relationship. They are two ways of saying the same thing. They do not indicate causality, though. I think the assumption made by the author is that the number of unique address bits is defined by the CPU designer at the start via requirements. Then the M can vary per implementation.

m is the width in bits of a memory address in your system, e.g. 32 for x86, 64 for x86-64. Block size on x86, for example, is 4K, so b=12. Block size more or less refers to the smallest chunk of data you can read from durable storage -- you read it into memory, work on that copy, then write it back at some later time. I believe tag bits are the upper t bits that are used to look up data cached locally very close to the CPU (not even in RAM). I'm not sure about the set lines part, although I can make plausible guesses that wouldn't be especially reliable.

Circular ... yes, but I think it's just stating that the two variables m and M must obey the equation. M would likely be a given or assumed quantity.
Example 1: If you wanted to use the formulas for a main memory size of M = 4GB (4,294,967,296 bytes), then m would be 32, since M = 2^32, i.e. m = log2(M). That is, it would take 32 bits to address the entire main memory.
Example 2: If your main memory size assumed were smaller, e.g. M = 16MB (16,777,216 bytes), then m would be 24, which is log2(16,777,216).

It seems you're confused by the math rather than the architectural stuff.
2^m ("2 to the m'th power") is 2 * 2... with m 2's. 2^1 = 2, 2^2 = 2 * 2 = 4, 2^3 = 2 * 2 * 2 = 8, and so on. Notably, if you have an m bit binary number, you can only represent 2^m different numbers. (is this obvious? If not, it might help to replace the 2's with 10's and think about decimal digits)
log2(x) ("logarithm base 2 of x") is the inverse function of 2^x. That is, log2(2^x) = x for all x. (This is a definition!)
You need log2(M) bits to represent M different numbers.
Note that if you start with M=2^m and take log2 of both sides, you get log2(M)=m. The table is just being very explicit.

Related

Encoding DNA strand in Binary

Hey guys I have the following question:
Suppose we are working with strands of DNA, each strand consisting of
a sequence of 10 nucleotides. Each nucleotide can be any one of four
different types: A, G, T or C. How many bits does it take to encode a
DNA strand?
Here is my approach to it and I want to know if that is correct.
We have 10 spots. Each spot can have 4 different symbols. This means we require 4^10 combinations using our binary digits.
4^10 = 1048576.
We will then find the log base 2 of that. What do you guys think of my approach?
Each nucleotide (aka base-pair) takes two bits (one of four states -> 2 bits of information). 10 base-pairs thus take 20 bits. Reasoning that way is easier than doing the log2(4^10), but gives the same answer.
It would be fewer bits of information if there were any combinations that couldn't appear. e.g. some codons (sequence of three base-pairs) that never appear. But ten independent 2-bit pieces of information sum to 20 bits.
If some sequences appear more frequently than others, and a variable-length representation is viable, then Huffman coding or other compression schemes could save bits most of the time. This might be good in a file-format, but unlikely to be good in-memory when you're working with them.
Densely packing your data into an array of 2bit fields makes it slower to access a single base-pair, but comparing the whole chunk for equality with another chunk is still efficient. (memcmp).
20 bits is unfortunately just slightly too large for a 16bit integer (which computers are good at). Storing in an array of 32bit zero-extended values wastes a lot of space. On hardware with good unaligned support, storing 24bit zero-extended values is ok (do a 32bit load and mask the high 8 bits. Storing is even less convenient though: probably a 16b store and an 8b store, or else load the old value and merge the high 8, then do a 32b store. But that's not atomic.).
This is a similar problem for storing codons (groups of three base-pairs that code for an amino acid): 6 bits of information doesn't fill a byte. Only wasting 2 of every 8 bits isn't that bad, though.
Amino-acid sequences (where you don't care about mutations between different codons that still code for the same AA) have about 20 symbols per position, which means a symbol doesn't quite fit into a 4bit nibble.
I used to work for the phylogenetics research group at Dalhousie, so I've sometimes thought about having a look at DNA-sequence software to see if I could improve on how they internally store sequence data. I never got around to it, though. The real CPU intensive work happens in finding a maximum-likelihood evolutionary tree after you've already calculated a matrix of the evolutionary distance between every pair of input sequences. So actual sequence comparison isn't the bottleneck.
do the maths:
4^10 = 2^2^10 = 2^20
Answer: 20 bits

How to perform mathematical operations on large numbers

I have a question about working on very big numbers. I'm trying to run RSA algorithm and lets's pretend i have 512 bit number d and 1024 bit number n. decrypted_word = crypted_word^d mod n, isn't it? But those d and n are very large numbers! Non of standard variable types can handle my 512 bit numbers. Everywhere is written, that rsa needs 512 bit prime number at last, but how actually can i perform any mathematical operations on such a number?
And one more think. I can't use extra libraries. I generate my prime numbers with java, using BigInteger, but on my system, i have only basic variable types and STRING256 is the biggest.
Suppose your maximal integer size is 64 bit. Strings are not that useful for doing math in most languages, so disregard string types. Now choose an integer of half that size, i.e. 32 bit. An array of these can be interpreted as digits of a number in base 232. With these, you can do long addition and multiplication, just like you are used to with base 10 and pen and paper. In each elementary step, you combine two 32-bit quantities, to produce both a 32-bit result and possibly some carry. If you do the elementary operation in 64-bit arithmetic, you'll have both of these as part of a single 64-bit variable, which you'll then have to split into the 32-bit result digit (via bit mask or simple truncating cast) and the remaining carry (via bit shift).
Division is harder. But if the divisor is known, then you may get away with doing a division by constant using multiplication instead. Consider an example: division by 7. The inverse of 7 is 1/7=0.142857…. So you can multiply by that to obtain the same result. Obviously we don't want to do any floating point math here. But you can also simply multiply by 14286 then omit the last six digits of the result. This will be exactly the right result if your dividend is small enough. How small? Well, you compute x/7 as x*14286/100000, so the error will be x*(14286/100000 - 1/7)=x/350000 so you are on the safe side as long as x<350000. As long as the modulus in your RSA setup is known, i.e. as long as the key pair remains the same, you can use this approach to do integer division, and can also use that to compute the remainder. Remember to use base 232 instead of base 10, though, and check how many digits you need for the inverse constant.
There is an alternative you might want to consider, to do modulo reduction more easily, perhaps even if n is variable. Instead of expressing your remainders as numbers 0 through n-1, you could also use 21024-n through 21024-1. So if your initial number is smaller than 21024-n, you add n to convert to this new encoding. The benefit of this is that you can do the reduction step without performing any division at all. 21024 is equivalent to 21024-n in this setup, so an elementary modulo reduction would start by splitting some number into its lower 1024 bits and its higher rest. The higher rest will be right-shifted by 1024 bits (which is just a change in your array indexing), then multiplied by 21024-n and finally added to the lower part. You'll have to do this until you can be sure that the result has no more than 1024 bits. How often that is depends on n, so for fixed n you can precompute that (and for large n I'd expect it to be two reduction steps after addition but hree steps after multiplication, but please double-check that) whereas for variable n you'll have to check at runtime. At the very end, you can go back to the usual representation: if the result is not smaller than n, subtract n. All of this should work as described if n>2512. If not, i.e. if the top bit of your modulus is zero, then you might have to make further adjustments. Haven't thought this through, since I only used this approach for fixed moduli close to a power of two so far.
Now for that exponentiation. I very much suggest you do the binary approach for that. When computing xd, you start with x, x2=x*x, x4=x2*x2, x8=…, i.e. you compute all power-of-two exponents. You also maintain some intermediate result, which you initialize to one. In every step, if the corresponding bit is set in the exponent d, then you multiply the corresponding power into that intermediate result. So let's say you have d=11. Then you'd compute 1*x1*x2*x8 because d=11=1+2+8=10112. That way, you'll need only about 1024 multiplications max if your exponent has 512 bits. Half of them for the powers-of-two exponentiation, the other to combine the right powers of two. Every single multiplication in all of this should be immediately followed by a modulo reduction, to keep memory requirements low.
Note that the speed of the above exponentiation process will, in this simple form, depend on how many bits in d are actually set. So this might open up a side channel attack which might give an attacker access to information about d. But if you are worried about side channel attacks, then you really should have an expert develop your implementation, because I guess there might be more of those that I didn't think about.
You may write some macros you may execute under Microsoft for functions like +, -, x, /, modulo, x power y which work generally for any integer of less than ten or hundred thousand digits (the practical --not theoretical-- limit being the internal memory of your CPU). Please note the logic is exactly the same as the one you got at elementary school.
E.g.: p= 1819181918953471 divider of (2^8091) - 1, q = ((2^8091) - 1)/p, mod(2^8043 ; q ) = 23322504995859448929764248735216052746508873363163717902048355336760940697615990871589728765508813434665732804031928045448582775940475126837880519641309018668592622533434745187004918392715442874493425444385093718605461240482371261514886704075186619878194235490396202667733422641436251739877125473437191453772352527250063213916768204844936898278633350886662141141963562157184401647467451404036455043333801666890925659608198009284637923691723589801130623143981948238440635691182121543342187092677259674911744400973454032209502359935457437167937310250876002326101738107930637025183950650821770087660200075266862075383130669519130999029920527656234911392421991471757068187747362854148720728923205534341236146499449910896530359729077300366804846439225483086901484209333236595803263313219725469715699546041162923522784170350104589716544529751439438021914727772620391262534105599688603950923321008883179433474898034318285889129115556541479670761040388075352934137326883287245821888999474421001155721566547813970496809555996313854631137490774297564881901877687628176106771918206945434350873509679638109887831932279470631097604018939855788990542627072626049281784152807097659485238838560958316888238137237548590528450890328780080286844038796325101488977988549639523988002825055286469740227842388538751870971691617543141658142313059934326924867846151749777575279310394296562191530602817014549464614253886843832645946866466362950484629554258855714401785472987727841040805816224413657036499959117701249028435191327757276644272944743479296268749828927565559951441945143269656866355210310482235520220580213533425016298993903615753714343456014577479225435915031225863551911605117029393085632947373872635330181718820669836830147312948966028682960518225213960218867207825417830016281036121959384707391718333892849665248512802926601676251199711698978725399048954325887410317060400620412797240129787158839164969382498537742579233544463501470239575760940937130926062252501116458281610468726777710383038372260777522143500312913040987942762244940009811450966646527814576364565964518092955053720983465333258335601691477534154940549197873199633313223848155047098569827560014018412679602636286195283270106917742919383395056306107175539370483171915774381614222806960872813575048014729965930007408532959309197608469115633821869206793759322044599554551057140046156235152048507130125695763956991351137040435703946195318000567664233417843805257728.
The last step took about 0.1 sec.
wpjo (willibrord oomen on academia.edu)

MAD method compression function

I ran across the question below in an old exam. My answers just feels a bit short and inadequate. Any extra ideas I can look into or reasons I have overlooked would be great. Thanx
Consider the MAD method compression function, mapping an object with hash code i to element [(3i + 7)mod9027]mod6000 of the 6000-element bucket array. Explain why this is a poor choice of compression function, and how it could be improved.
I basically just say that the function could be improved by changing the value for p (or 9027) to an prime number and choosing an other constant for a (or 3) could also help.
Rup's comment is essentially the correct answer. 3 and 9027 are both divisible by 3, so 3i + 7 maps onto only 1/3 of the range 0-9026. Then the mapping mod 6000 maps 2/3 of the values to the lower half. So bucket 1 will contain roughly 1/1500 of the values [if I've done the math right] rather than the 1/6000 you would want. Bucket 0 will be empty.
if i is uniformly distributed over a large enough range, then (3i + 7)mod9027 will be evenly distributed over 0-9026, but then taking mod 6000 means two thirds of the hashes will be in the first half of the range (0 to 3026 and 6000 to 9026 inclusive), and one third in the second half (3037 to 5999 inclusive).

Converting bytes to megabytes

I've seen three ways of doing conversion from bytes to megabytes:
megabytes=bytes/1000000
megabytes=bytes/1024/1024
megabytes=bytes/1024/1000
Ok, I think #3 is totally wrong but I have seen it. I think #2 is right, but I am looking for some respected authority (like W3C, ISO, NIST, etc) to clarify which megabyte is a true megabyte. Can anyone cite a source that explicitly explains how this calculation is done?
Bonus question: if #2 is a megabyte what are #1 and #3 called?
BTW: Hard drive manufacturers don't count as authorities on this one!
Traditionally by megabyte we mean your second option -- 1 megabyte = 220 bytes. But it is not correct actually because mega means 1 000 000. There is a new standard name for 220 bytes, it is mebibyte (http://en.wikipedia.org/wiki/Mebibyte) and it gathers popularity.
There's an IEC standard that distinguishes the terms, e.g. Mebibyte = 1024^2 bytes but Megabyte = 1000^2 (in order to be compatible to SI units like kilograms where k/M/... means 1000/1000000). Actually most people in the IT area will prefer Megabyte = 1024^2 and hard disk manufacturers will prefer Megabyte = 1000^2 (because hard disk sizes will sound bigger than they are).
As a matter of fact, most people are confused by the IEC standard (multiplier 1000) and the traditional meaning (multiplier 1024). In general you shouldn't make assumptions on what people mean. For example, 128 kBit/s for MP3s usually means 128000 bits because the multiplier 1000 is mostly used with the unit bits. But often people then call 2048 kBit/s equal to 2 MBit/s - confusing eh?
So as a general rule, don't trust bit/byte units at all ;)
Divide by 2 to the power of 20, (1024*1024) bytes = 1 megabyte
1024*1024 = 1,048,576
2^20 = 1,048,576
1,048,576/1,048,576 = 1
It is the same thing.
BTW: Hard drive manufacturers don't count as authorities on this one!
Oh, yes they do (and the definition they assume from the S.I. is the correct one). On a related issue, see this post on CodingHorror.
for convert byte to megabyte(MB)
use totalbyte/1000/1000
for convert byte to mebibyte (MiB)
use totalbyte/1024/1024
https://en.wikipedia.org/wiki/Byte#Multiple-byte_units
The answer is that #1 is technically correct based on the real meaning of the Mega prefix, however (and in life there is always a however) the math for that doesn't come out so nice in base 2, which is how computers count, so #2 is what people really use.
Megabyte means 2^20 bytes. I know that technically that doesn't mesh with the SI units, and that some folks have come up with a new terminology to mean 2^20. None of that matters. Efforts to change the language to "clarify" things are doomed to failure.
Hard-drive manufacturers use it to mean 1,000,000 bytes, because that's what it means in SI so they figure technically they aren't lying (while actually they are). That falls under lies, damn lies, and marketing.
Use the computation your users will most likely expect. Do your users care to know how many actual bytes are on a disk or in memory or whatever, or do they only care about usable space? The answer to that question will tell you which calculation makes the most sense.
This isn't a precision question as much as it is a usability question. Provide the calculation that is most useful to your users.
In general, it's wrong to use decimal SI prefixes (e.g. kilo, mega) when referring to binary data sizes (except in casual usage). It's ambiguous and causes confusion. To be precise you can use binary prefixes (e.g. 1 mebibyte = 1 MiB = 1024 kibibytes = 2^20 bytes). When someone else uses decimal SI prefixes for binary data you need to get more information before you can know what is meant.
Microsoft Windows Explorer shows file size in the "Properties" window. This is a conversion from the byte count using 2^20

Mysterious combination

I decided to learn concurrency and wanted to find out in how many ways instructions from two different processes could overlap. The code for both processes is just a 10 iteration loop with 3 instructions performed in each iteration. I figured out the problem consisted of leaving X instructions fixed at a point and then fit the other X instructions from the other process between the spaces taking into account that they must be ordered (instruction 4 of process B must always come before instruction 20).
I wrote a program to count this number, looking at the results I found out that the solution is n Combination k, where k is the number of instructions executed throughout the whole loop of one process, so for 10 iterations it would be 30, and n is k*2 (2 processes). In other words, n number of objects with n/2 fixed and having to fit n/2 among the spaces without the latter n/2 losing their order.
Ok problem solved. No, not really. I have no idea why this is, I understand that the definition of a combination is, in how many ways can you take k elements from a group of n such that all the groups are different but the order in which you take the elements doesn't matter. In this case we have n elements and we are actually taking them all, because all the instructions are executed ( n C n).
If one explains it by saying that there are 2k blue (A) and red (B) objects in a bag and you take k objects from the bag, you are still only taking k instructions when 2k instructions are actually executed. Can you please shed some light into this?
Thanks in advance.
FWIW it can be viewed like this: you have a bag with k blue and k red balls. Balls of same color are indistinguishable (in analogy with the restriction that the order of instructions within the same process/thread is fixed - which is not true in modern processors btw, but let's keep it simple for now). How many different ways can you pull all the balls from the bag?
My combinatorial skills are quite rusty, but my first guess is
(2k!)
-----
2*k!
which, according to Wikipedia, indeed equals
(2k)
(k )
(sorry, I have no better idea how to show this).
For n processes, it can be generalized by having balls of n different color in the bag.
Update: Note that in the strict sense, this models only the situation when different processes are executed on a single processor, so all instructions from all processes must be ordered linearly on the processor level. In a multiprocessor environment, several instructions can be executed literally at the same time.
Generally, I agree with Péter's answer, but since it does not seem to have fully clicked for the OP, here's my shot at it (purely from a mathematical/combinatorial standpoint).
You have 2 sets of 30 (k) instructions that you're putting together, for a total of 60 (n) instructions. Since each set of 30 must be kept in order, we don't need to track which instruction within each set, just which set an instruction is from. So, we have 60 "slots" in which to place 30 instructions from one set (say, red) and 30 instructions from the other set (say, blue).
Let's start by placing the 30 red instructions into the 60 slots. There are (60 choose 30) = 60!/(30!30!) ways to do this (we're choosing which 30 slots of the 60 are filled by red instructions). Now, we still have the 30 blue instructions, but we only have 30 open slots left. There is (30 choose 30) = 30!/(30!0!) = 1 way to place the blue instructions in the remaining slots. So, in total, there are (60 choose 30) * (30 choose 30) = (60 choose 30) * 1 = (60 choose 30) ways to do it.
Now, let's suppose that instead of 2 sets of 30, you have 3 sets (red, green, blue) of k instructions. You have a total of 3k slots to fill. First, place the red ones: (3k choose k) = (3k)!/(k!(3k-k)!) = (3k)!/(k!(2k)!). Now, place the green ones into the remaining 2k slots: (2k choose k) = (2k)!/(k!k!). Finally, place the blue ones into the last k slots: (k choose k) = k!/(k!0!) = 1. In total: (3k choose k) * (2k choose k) * (k choose k) = ( (3k)! * (2k)! * k! ) / ( k!(2k)! * k!k! * k!0! ) = (3k)!/(k!k!k!).
As further extensions (though I'm not going to provide a full explanation):
if you have 3 sets of instructions with length a, b, and c, the number of possibilities is (a+b+c)!/(a!b!c!).
if you have n sets of instructions where the ith set has ki instructions, the number of possibilities is (k1+k2+...+kn)!/(k1!k2!...kn!).
Péter's answer is fine enough, but that doesn't explain just why concurrency is difficult. That's because more and more often nowadays you've got multiple execution units available (be they cores, CPUs, nodes, computers, whatever). That in turn means that the possibilities for overlapping between instructions is increased still further; there's no guarantee that what happens can be modeled correctly with any conventional interleaving.
This is why it is important to think in terms of using semaphores/mutexes correctly, and why memory barriers matter. That's because all of these things end up turning the true nasty picture into something that is far easier to understand. But because mutexes reduce the number of possible executions, they are reducing the overall performance and potential efficiency. It's definitely tricky, and that in turn is why it is far better if you can work in terms of message passing between threads of activity that do not otherwise interact; it's easier to understand and having fewer synchronizations is better.

Resources