What is the use of hexadecimal values in programming? - hex

This is something I have been thinking while reading programming books and in computer science class at school where we learned how to convert decimal values into hexadecimal.
Can someone please tell me what are the advantages of using hexadecimal values and why we use them in programmnig?
Thank you.

In many cases (like e.g. bit masks) you need to use binary, but binary is hard to read because of its length. Since hexadecimal values can be much easier translated to/from binary than decimals, you could look at hex values as kind of shorthand notation for binary values.

It certainly depends on what you're doing.
It comes as an extension of base 2, which you probably are familiar with as essential to computing.
Check this out for a good discussion of
several applications...
https://softwareengineering.stackexchange.com/questions/170440/why-use-other-number-bases-when-programming/

The hexadecimal digit corresponds 1:1 to a given pattern of 4 bits. With experience, you can map them from memory. E.g. 0x8 = 1000, 0xF = 1111, correspondingly, 0x8F = 10001111.
This is a convenient shorthand where the bit patterns do matter, e.g. in bit maps or when working with i/o ports. To visualize the bit pattern for 169d is in comparison more difficult.

A byte consists of 8 binary digits and is the smallest piece of data that computers normally work with. All other variables a computer works with are constructed from bytes. For example; a single character can be stored in a single byte, and a 32bit integer consists of 4 bytes.
As bytes are so fundamental we want a way to write down their value as neatly and efficiently as possible. One option would be to use binary, but then we would need a lot of digits. This takes up a lot of space and can be confusing when many numbers are written in sequence:
200 201 202 == 11001000 11001001 11001010
Using hexadecimal notation, we can write every byte using just two digits:
200 == C8
Also, as 16 is a power of 2, it is easy to convert between hexadecimal and binary representations in your head. This is useful as sometimes we are only interested in a single bit within the byte. As a simple example, if the first digit of a hexadecimal representation is 0 we know that the first four binary digits are 0.

Related

Encoding DNA strand in Binary

Hey guys I have the following question:
Suppose we are working with strands of DNA, each strand consisting of
a sequence of 10 nucleotides. Each nucleotide can be any one of four
different types: A, G, T or C. How many bits does it take to encode a
DNA strand?
Here is my approach to it and I want to know if that is correct.
We have 10 spots. Each spot can have 4 different symbols. This means we require 4^10 combinations using our binary digits.
4^10 = 1048576.
We will then find the log base 2 of that. What do you guys think of my approach?
Each nucleotide (aka base-pair) takes two bits (one of four states -> 2 bits of information). 10 base-pairs thus take 20 bits. Reasoning that way is easier than doing the log2(4^10), but gives the same answer.
It would be fewer bits of information if there were any combinations that couldn't appear. e.g. some codons (sequence of three base-pairs) that never appear. But ten independent 2-bit pieces of information sum to 20 bits.
If some sequences appear more frequently than others, and a variable-length representation is viable, then Huffman coding or other compression schemes could save bits most of the time. This might be good in a file-format, but unlikely to be good in-memory when you're working with them.
Densely packing your data into an array of 2bit fields makes it slower to access a single base-pair, but comparing the whole chunk for equality with another chunk is still efficient. (memcmp).
20 bits is unfortunately just slightly too large for a 16bit integer (which computers are good at). Storing in an array of 32bit zero-extended values wastes a lot of space. On hardware with good unaligned support, storing 24bit zero-extended values is ok (do a 32bit load and mask the high 8 bits. Storing is even less convenient though: probably a 16b store and an 8b store, or else load the old value and merge the high 8, then do a 32b store. But that's not atomic.).
This is a similar problem for storing codons (groups of three base-pairs that code for an amino acid): 6 bits of information doesn't fill a byte. Only wasting 2 of every 8 bits isn't that bad, though.
Amino-acid sequences (where you don't care about mutations between different codons that still code for the same AA) have about 20 symbols per position, which means a symbol doesn't quite fit into a 4bit nibble.
I used to work for the phylogenetics research group at Dalhousie, so I've sometimes thought about having a look at DNA-sequence software to see if I could improve on how they internally store sequence data. I never got around to it, though. The real CPU intensive work happens in finding a maximum-likelihood evolutionary tree after you've already calculated a matrix of the evolutionary distance between every pair of input sequences. So actual sequence comparison isn't the bottleneck.
do the maths:
4^10 = 2^2^10 = 2^20
Answer: 20 bits

what options are there for representing numbers with more than 2^81 digits?

I came across an interesting math problem that would require me to do some artithmetic with numbers that have more than 281 digits. I know that its impossible to represent a number this large with a system where there is one memory unit for each digit but wondered if there were any ways around this.
My initial thought was to use a extremely large base instead of base 10 (decimal). After some thought I believe (but can't verify) that the optimal base would be the square root of the number of digits (so for a number with 281 digits you'd use base 240ish) which is a improvement but that doesn't scale well and still isn't really practical.
So what options do I have? I know of many arbitrary precision libraries, but are there any that scale to support this sort of arithmetic?
Thanks o7
EDIT: after thinking some more i realize i may be completely wrong about the "optimal base would be the square root of the number of digits" but a) that's why im asking and b) im too tired to remember my initial reasoning for assumption.
EDIT 2: 1000,000 in base ten = F4240 in base 16 = 364110 in base 8. In base 16 you need 20 bits to store the number in base 8 you need 21 so it would seem that by increasing the base you decrees the total number of bits needed. (again this could be wrong)
This is really a compression problem pretending to be an arithmetic problem. What you can do with such a large number depends entirely on its Kolmogorov complexity. If you're required to do computations on such a large number, it's obviously not going be arrive as 2^81 decimal digits; the Kolmogorov complexity would too high in that case and you can't even finish reading the input before the sun goes out. The best way to deal with such a number is via delayed evaluation and symbolic rational types that a language like Scheme provides. This way a program may be able to answer some questions about the result of computations on the number without actually having to write out all those digits to memory.
I think you should just use scientific notation. You will lose precision, but you can not store numbers that large without losing precision, because storing 2^81 digits will require more than 10^24 bits(about thousand billion terabytes), which is much more that you can have nowadays.
that have more than 2^81 digits
Non-fractional number with 2^81 bits, will take 3*10^11 terabytes of data. Per number.
That's assuming you want every single digit and data isn't compressible.
You could attempt to compress the data storing it in some kind of sparse array that allocates memory only for non-zero elements, but that doesn't guarantee that data will be fit anywhere.
Such precision is useless and impossible to handle on modern hardware. 2^81 bits will take insane amount of time to simply walk through number (9584 trillion years, assuming 1 byte takes 1 millisecond), never mind multiplication/division. I also can't think of any problem that would require precision like that.
Your only option is to reduce precision to first N significant digits and use floating point numbers. Since data won't fit into double, you'll have to use bignum library with floating point support, that provides extremely large floating point numbers. Since you can represent 2^81 (exponent) in bits, you can store beginning of a number using very big floating point.
1000,000 in base ten
Regardless of your base, positive number will take at least floor(log2(number))+1 bits to store it. If base is not 2, then it will take more than floor(log2(number))+1 bits to store it. Numeric base won't reduce number of required bits.

Assembler memory address representation

I'm trying to get into assembler and I often come across numbers in the following form:
org 7c00h
; initialize the stack:
mov ax, 07c0h
mov ss, ax
mov sp, 03feh ; top of the stack.
7c00h, 07c0h, 03feh - What is the name of this number notation? What do they mean? Why are they used over "normal" decimal numbers?
It's hexadecimal, the numeral system with 16 digits 0-9 and A-F. Memory addresses are given in hex, because it's shorter, easier to read, and the numbers that represent memory locations don't mean anything special to humans, so no sense to have long numbers. I would guess that somewhere in the past someone had to type in some addresses by hand as well, might as well have started there.
Worth noting also, 0:7C00 is the boot sector load address.
Further worth noting: 07C0:03FE is the same address as 0:7FFE due to the way segmented addressing works.
This guy's left himself a 510 byte stack (he made the very typical off-by-two error in setting up the boot sector's stack).
These are numbers in hexadecimal notation, i.e. in base 16, where A to F have the digit values 10 to 15.
One advantage is that there is a more direct conversion to binary numbers. With a little bit of practice it is easy to see which bits in the number are 1 and which are 0.
Another is is that many numbers used internally, such as memory addresses, are round numbers in hexadecimal, i.e. contain a lot of zeros.

How do programming languages handle huge number arithmetic

For a computer working with a 64 bit processor, the largest number that it can handle would be 264 = 18,446,744,073,709,551,616. How does programming languages, say Java or be it C, C++ handle arithmetic of numbers higher than this value. Any register cannot hold it as a single piece. How was this issue tackled?
There are lots of specialized techniques for doing calculations on numbers larger than the register size. Some of them are outlined in this wikipedia article on arbitrary precision arithmetic
Low level languages, like C and C++, leave large number calculations to the library of your choice. One notable one is the GNU Multi-Precision library. High level languages like Python, and others, integrate this into the core of the language, so normal numbers and very large numbers are identical to the programmer.
You assume the wrong thing. The biggest number it can handle in a single register is a 64-bits number. However, with some smart programming techniques, you could just combined a few dozens of those 64-bits numbers in a row to generate a huge 6400 bit number and use that to do more calculations. It's just not as fast as having the number fit in one register.
Even the old 8 and 16 bits processors used this trick, where they would just let the number overflow to other registers. It makes the math more complex but it doesn't put an end to the possibilities.
However, such high-precision math is extremely unusual. Even if you want to calculate the whole national debt of the USA and store the outcome in Zimbabwean Dollars, a 64-bits integer would still be big enough, I think. It's definitely big enough to contain the amount of my savings account, though.
Programming languages that handle truly massive numbers use custom number primitives that go beyond normal operations optimized for 32, 64, or 128 bit CPUs. These numbers are especially useful in computer security and mathematical research.
The GNU Multiple Precision Library is probably the most complete example of these approaches.
You can handle larger numbers by using arrays. Try this out in your web browser. Type the following code in the JavaScript console of your web browser:
The point at which JavaScript fails
console.log(9999999999999998 + 1)
// expected 9999999999999999
// actual 10000000000000000 oops!
JavaScript does not handle plain integers above 9999999999999998. But writing your own number primitive is to make this calculation work is simple enough. Here is an example using a custom number adder class in JavaScript.
Passing the test using a custom number class
// Require a custom number primative class
const {Num} = require('./bases')
// Create a massive number that JavaScript will not add to (correctly)
const num = new Num(9999999999999998, 10)
// Add to the massive number
num.add(1)
// The result is correct (where plain JavaScript Math would fail)
console.log(num.val) // 9999999999999999
How it Works
You can look in the code at class Num { ... } to see details of what is happening; but here is a basic outline of the logic in use:
Classes:
The Num class contains an array of single Digit classes.
The Digit class contains the value of a single digit, and the logic to handle the Carry flag
Steps:
The chosen number is turned into a string
Each digit is turned into a Digit class and stored in the Num class as an array of digits
When the Num is incremented, it gets carried to the first Digit in the array (the right-most number)
If the Digit value plus the Carry flag are equal to the Base, then the next Digit to the left is called to be incremented, and the current number is reset to 0
... Repeat all the way to the left-most digit of the array
Logistically it is very similar to what is happening at the machine level, but here it is unbounded. You can read more about about how digits are
carried here; this can be applied to numbers of any base.
Ada actually supports this natively, but only for its typeless constants ("named numbers"). For actual variables, you need to go find an arbitrary-length package. See Arbitrary length integer in Ada
More-or-less the same way that you do. In school, you memorized single-digit addition, multiplication, subtraction, and division. Then, you learned how to do multiple-digit problems as a sequence of single-digit problems.
If you wanted to, you could multiply two twenty-digit numbers together using nothing more than knowledge of a simple algorithm, and the single-digit times tables.
In general, the language itself doesn't handle high-precision, high-accuracy large number arithmetic. It's far more likely that a library is written that uses alternate numerical methods to perform the desired operations.
For example (I'm just making this up right now), such a library might emulate the actual techniques that you might use to perform that large number arithmetic by hand. Such libraries are generally much slower than using the built-in arithmetic, but occasionally the additional precision and accuracy is called for.
As a thought experiment, imagine the numbers stored as a string. With functions to add, multiply, etc these arbitrarily long numbers.
In reality these numbers are probably stored in a more space efficient manner.
Think of one machine-size number as a digit and apply the algorithm for multi-digit multiplication from primary school. Then you don't need to keep the whole numbers in registers, just the digits as they are worked on.
Most languages store them as array of integers. If you add/subtract two to of these big numbers the library adds/subtracts all integer elements in the array separately and handles the carries/borrows.
It's like manual addition/subtraction in school because this is how it works internally.
Some languages use real text strings instead of integer arrays which is less efficient but simpler to transform into text representation.

How to encode a large number (in an URL)?

Quite often one has to encode an big (e.g. 128 or 160 bits) number in an url. For example many web applications use md5(random()) for UUIDs.
If you need to put that value in an URL the common approach is to just encode it as an hexadecimal string.
But obviously hex encoding is not a very tight encoding. What other approaches are there which fit nicely in an URL?
I would use The "URL and Filename safe" Base 64 Alphabet.
Base 64 uses two character sets.
Data: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/
URLs: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_
To use base 64 you need to pad your value to be a multiple of 3 bytes long (24 bits) then split those 24 bits into 4 6bit bytes. Each 6bit value is looked up by position in the string I gave above.
If it all goes well, your final base64 value will always be a multiple of 4 characters long and decode back to a multiple of 3 (8bit) bytes long.
Depending on the language you are using, a lot of them have built in encode and decode functions.
You can do even better with base64-url encoding (a-z, A-Z, 0-9, - and _ [see RFC4648 Section 5]). RFC4648 covers a number of different encoding methods (base16, base32, and base64) an a couple of variants. Also depending on the sparsity of the bits that are set in the number you could conceivably run it through gzip and then use one of the described encoding methods. Of course use of gzip really depends on how large the number you are going to be encoding is.
If you want it tight you can use a base-36 encoding (from 0 to Z).
Using the hint of base36 I currently use something like this (in Python):
>>> str(base64.b32encode(uuid.uuid1().bytes).rstrip('='))
'MTB2ONDSL3YWJN3CA6XIG7O4HM'
Just use hex. Even if you were to get 8 bits per character you're still using a 16-20 character random sequence, which nobody will want to type or say. If you can't put up a short identifier, work on your search capabilities.

Resources