Pierre Terdiman in his article "Radix Sort Revisited" tells us:
For example you’ll need 4 passes to sort standard 32 bits integers,
since in hexadecimal the radix is a byte.
But 0xAB has two radices, namely A and B, 4-bit wide either.
So, what is the radix in hexadecimal? Because i can't understand the article.
From what I understand, the 0xAB was just an example to what radix is. When approaching Radix sort, it's easier to use bytes (no need for shifting - only casting, in C/C++, anyway).
The last part of the sentence is the important thing here:
since in hexadecimal the radix is a byte
After saying that, it doesn't matter what was said earlier...
Just to strengthen the argument, examine his example - the SortedBuffer initialization uses byte as radix (256*sizeof(int)), not nibble:
memset(SortedBuffer, -1, 256*sizeof(int)); // Fill with –1
(again, from I understand in this article...)
Related
I am developing a programming language, September, which uses a tagged variant type as its main value type. 3 bits are used for the type (integer, string, object, exception, etc.), and 61 bits are used for the actual value (the actual integer, pointer to the object, etc.).
Soon, it will be time to add a float type to the language. I almost have the space for a 64-bit double, so I wanted to make use of doubles for calculations internally. Since I'm actually 3 bits short for storage, I would have to round the doubles off after each calculation - essentially resulting in a 61-bit double with a mantissa or exponent shorter by 3 bits.
But! I know floating point is fraught with peril and doing things which sound sensible on paper can produce disastrous results with FP math, so I have an open-ended question to the experts out there:
Is this approach viable at all? Will I run into serious error-accumulation problems in long-running calculations by rounding at each step? Is there some specific way in which I could do the rounding in order to avoid that? Are there any special values that I won't be able to treat that way (subnormals come to mind)?
Ideally, I would like my floats to be as well-behaved as a native 61-bit double would be.
I would recommend borrowing bits from the exponent field of the double-precision format. This is the method described in this article (that you would modify to borrow 3 bits from the exponent instead of 1). With this approach, all computations that do not use very large or very small intermediate results behave exactly as the original double-precision computation would. Even computations that run into the subnormal region of the new format behave exactly as they would if a 1+8+52 61-bit format had been standardized by IEEE.
By contrast, naively borrowing any number of bits at all from the significand introduces many double-rounding problems, all the more frequent that you are rounding from a 52-bit significand to a significand with only a few bits removed. Borrowing one bit from the significand as you suggest in an edit to your question would be the worst, with half the operations statistically producing double-rounded results that are different from what the ideal “native 61-bit double” would have produced. This means that instead of being accurate to 0.5ULP, the basic operations would be accurate to 3/4ULP, a dramatic loss of accuracy that would derail many of the existing, finely-designed numerical algorithms that expect 0.5ULP.
Three is a significant number of bits to borrow from an exponent that only has 11, though, and you could also consider using the single-precision 32-bit format in your language (calling the single-precision operations from the host).
Lastly, I give visibility here to another solution found by Jakub: borrow the three bits from the significand, and simulate round-to-odd for the intermediate double-precision computation before converting to the nearest number in 49-explicit-significand-bit, 11-exponent-bit format. If this way is chosen, it may useful to remark that the rounding itself to 49 bits of significand can be achieved with the following operations:
if ((repr & 7) == 4)
repr += (repr & 8) >> 1); /* midpoint case */
else
repr += 4;
repr &= ~(uint64_t)7; /* round to the nearest */
Despite working on the integer having the same representation as the double being considered, the above snippet works even if the number goes from normal to subnormal, from subnormal to normal, or from normal to infinite. You will of course want to set a tag in the three bits that have been freed as above. To recover a standard double-precision number from its unboxed representation, simply clear the tag with repr &= ~(uint64_t)7;.
This is a summary of my own research and information from the excellent answer by #Pascal Cuoq.
There are two places where we can truncate the 3-bits we need: the exponent, and the mantissa (significand). Both approaches run into problems which have to be explicitly handled in order for the calculations to behave as if we used a hypothetical native 61-bit IEEE format.
Truncating the mantissa
We shorten the mantissa by 3 bits, resulting in a 1s+11e+49m format. When we do that, performing calculations in double-precision and then rounding after each computation exposes us to double rounding problems. Fortunately, double rounding can be avoided by using a special rounding mode (round-to-odd) for the intermediate computations. There is an academic paper describing the approach and proving its correctness for all doubles - as long as we truncate at least 2 bits.
Portable implementation in C99 is straightforward. Since round-to-odd is not one of the available rounding modes, we emulate it by using fesetround(FE_TOWARD_ZERO), and then setting the last bit if the FE_INEXACT exception occurs. After computing the final double this way, we simply round to nearest for storage.
The format of the resulting float loses about 1 significant (decimal) digit compared to a full 64-bit double (from 15-17 digits to 14-16).
Truncating the exponent
We take 3 bits from the exponent, resulting in a 1s+8e+52m format. This approach (applied to a hypothetical introduction of 63-bit floats in OCaml) is described in an article. Since we reduce the range, we have to handle out-of-range exponents on both the positive side (by simply 'rounding' them to infinity) and the negative side. Doing this correctly on the negative side requires biasing the inputs to any operation in order to ensure that we get subnormals in the 64-bit computation whenever the 61-bit result needs to be subnormal. This has to be done a bit differently for each operation, since what matters is not whether the operands are subnormal, but whether we expect the result to be (in 61-bit).
The resulting format has significantly reduced range since we borrow a whopping 3 out of 11 bits of the exponent. The range goes down from 10-308...10308 to about 10-38 to 1038. Seems OK for computation, but we still lose a lot.
Comparison
Both approaches yield a well-behaved 61-bit float. I'm personally leaning towards truncating the mantissa, for three reasons:
the "fix-up" operations for round-to-odd are simpler, do not differ from operation to operation, and can be done after the computation
there is a proof of mathematical correctness of this approach
giving up one significant digit seems less impactful than giving up a big chunk of the double's range
Still, for some uses, truncating the exponent might be more attractive (especially if we care more about precision than range).
This is something I have been thinking while reading programming books and in computer science class at school where we learned how to convert decimal values into hexadecimal.
Can someone please tell me what are the advantages of using hexadecimal values and why we use them in programmnig?
Thank you.
In many cases (like e.g. bit masks) you need to use binary, but binary is hard to read because of its length. Since hexadecimal values can be much easier translated to/from binary than decimals, you could look at hex values as kind of shorthand notation for binary values.
It certainly depends on what you're doing.
It comes as an extension of base 2, which you probably are familiar with as essential to computing.
Check this out for a good discussion of
several applications...
https://softwareengineering.stackexchange.com/questions/170440/why-use-other-number-bases-when-programming/
The hexadecimal digit corresponds 1:1 to a given pattern of 4 bits. With experience, you can map them from memory. E.g. 0x8 = 1000, 0xF = 1111, correspondingly, 0x8F = 10001111.
This is a convenient shorthand where the bit patterns do matter, e.g. in bit maps or when working with i/o ports. To visualize the bit pattern for 169d is in comparison more difficult.
A byte consists of 8 binary digits and is the smallest piece of data that computers normally work with. All other variables a computer works with are constructed from bytes. For example; a single character can be stored in a single byte, and a 32bit integer consists of 4 bytes.
As bytes are so fundamental we want a way to write down their value as neatly and efficiently as possible. One option would be to use binary, but then we would need a lot of digits. This takes up a lot of space and can be confusing when many numbers are written in sequence:
200 201 202 == 11001000 11001001 11001010
Using hexadecimal notation, we can write every byte using just two digits:
200 == C8
Also, as 16 is a power of 2, it is easy to convert between hexadecimal and binary representations in your head. This is useful as sometimes we are only interested in a single bit within the byte. As a simple example, if the first digit of a hexadecimal representation is 0 we know that the first four binary digits are 0.
It is my understanding that numbers are negated using the two's compliment, which to my understanding is: !num + 1.
So my question is does this mean that, for variable 'foo'=1, a negated 'foo' will be the exactly the same as variable 'bar'=255.
f we were to check if -'foo' == 'bar' or if -'foo' == 255, would we get that they are equal?
I know that some languages, such as Java, keep a sign bit -- so the comparisons would yield false. What of languages that do not? And I'm assuming that assembler/native machine does not have a sign bit.
In addition to all of this, I read about a zero flag or a carry-over flag that is set when a 'negative' number is added to another (of any sign) number. This flag being set whenever it is added because of the way two's complement works, 0x01 + 0xff = 0x00 (with the leading 1 truncated). What exactly is this flag used for?
And my last question, for other math operations (such as multiplication), would I have to re-negate the number (so it is now positive), perform the operation, and negate the result? E.g., !((!neg + 1) * pos) + 1.
Edit
Finished the question, so feel free fire away.
Yes, in two’s complement, the number x is represented as ~x+1, where ~x is the bitwise complement of the binary numeral for x in some fixed number bits. E.g., for eight bits, the binary numeral for x is 000000001, so the bitwise complement is 11111110, and adding one produces 11111111.
There is no way to distinguish -1 in eight-bit two’s complement from 255 in eight-bit binary (with no sign). They both have the same representation in bits: 11111111. If you are using both of these numbers, you must either separately remember which one is eight-bit two’s complement and which one is plain eight-bit binary or you must use more than eight bits. In other words, at the raw bit level, 11111111 is just eight bits; it has no value until we decide how to interpret it.
Java and typical other languages do not maintain a sign bit separate from the value of a number; the sign is part of the encoding of the number. Also, typical languages do not allow you to compare different types. If you have a two’s complement x and an unsigned y, then either one must be converted to the type of the other before comparison or they must both be converted to a third type. Thus, if you compare x and y, and one is converted to the other, then the conversion will overflow or wrap, and you cannot expect to get the correct mathematical result. To compare these two numbers, we might convert each of them to a wider integer, such as 32-bits, then compare. Converting the eight-bit two’s complement 11111111 to a 32-bit integer produces -1, and converting the eight-bit plain binary 11111111 to a 32-bit integer produces 255, and then the comparison reports they are unequal.
The zero flag and the carry flag you read about are flags that are set when a comparison instruction is executed in a computer processor. Most high-level languages do not give you direct access to these flags. Many processors have an instruction with a form like this:
cmp a, b
That instruction subtracts b from a and discards the difference but remembers several flags that describe the subtraction: Was the result zero (zero flag)? Did a borrow occur (borrow flag)? Was the result negative (sign flag)? Did an overflow occur (overflow flag)?
The compare instruction requires that the two things being compared be the same type (two’s complement or unsigned), but it does not care which type. The results can be tested later by checking particular combinations of the flags depending on the type. That is, the information recorded in the flags can distinguish whether one two’s complement number was greater than another or whether one unsigned number was greater than another, depending on what tests are made. There are conditional branch instructions that test the desired flag properties.
There is generally no need to “un-negate” a number to perform arithmetic operations. Processors include arithmetic instructions that work on two’s complement numbers. Usually the add and subtract instructions are type-agnostic, the same way the compare instruction is, but the multiply and divide instructions are not (except for some forms of multiply that return partial results). The add and subtract instructions can be type-agnostic because the wrapping that occurs in the arithmetic works for both two’s complement and unsigned. However, that wrapping does not work for multiplication and division.
I'm trying to get into assembler and I often come across numbers in the following form:
org 7c00h
; initialize the stack:
mov ax, 07c0h
mov ss, ax
mov sp, 03feh ; top of the stack.
7c00h, 07c0h, 03feh - What is the name of this number notation? What do they mean? Why are they used over "normal" decimal numbers?
It's hexadecimal, the numeral system with 16 digits 0-9 and A-F. Memory addresses are given in hex, because it's shorter, easier to read, and the numbers that represent memory locations don't mean anything special to humans, so no sense to have long numbers. I would guess that somewhere in the past someone had to type in some addresses by hand as well, might as well have started there.
Worth noting also, 0:7C00 is the boot sector load address.
Further worth noting: 07C0:03FE is the same address as 0:7FFE due to the way segmented addressing works.
This guy's left himself a 510 byte stack (he made the very typical off-by-two error in setting up the boot sector's stack).
These are numbers in hexadecimal notation, i.e. in base 16, where A to F have the digit values 10 to 15.
One advantage is that there is a more direct conversion to binary numbers. With a little bit of practice it is easy to see which bits in the number are 1 and which are 0.
Another is is that many numbers used internally, such as memory addresses, are round numbers in hexadecimal, i.e. contain a lot of zeros.
Greetings everybody. I have seen examples of such operations for so many times that I begin to think that I am getting something wrong with binary arithmetic. Is there any sense to perform the following:
byte value = someAnotherByteValue & 0xFF;
I don't really understand this, because it does not change anything anyway. Thanks for help.
P.S.
I was trying to search for information both elsewhere and here, but unsuccessfully.
EDIT:
Well, off course i assume that someAnotherByteValue is 8 bits long, the problem is that i don't get why so many people ( i mean professionals ) use such things in their code. For example in SharpZlib there is:
buffer_ |= (uint)((window_[windowStart_++] & 0xff |
(window_[windowStart_++] & 0xff) << 8) << bitsInBuffer_);
where window_ is a byte buffer.
The most likely reason is to make the code more self-documenting. In your particular example, it is not the size of someAnotherByteValue that matters, but rather the fact that value is a byte. This makes the & redundant in every language I am aware of. But, to give an example of where it would be needed, if this were Java and someAnotherByteValue was a byte, then the line int value = someAnotherByteValue; could give a completely different result than int value = someAnotherByteValue & 0xff. This is because Java's long, int, short, and byte types are signed, and the rules for conversion and sign extension have to be accounted for.
If you always use the idiom value = someAnotherByteValue & 0xFF then, no matter what the types of the variable are, you know that value is receiving the low 8 bits of someAnotherByteValue.
uint s1 = (uint)(initial & 0xffff);
There is a point to this because uint is 32 bits, while 0xffff is 16 bits. The line selects the 16 least significant bits from initial.
Nope.. There is no use in doing this. Should you be using a value that is having its importance more than 8 bits, then the above statement has some meaning. Otherwise, its the same as the input.
If sizeof(someAnotherByteValue) is more than 8 bits and you want to extract the least signficant 8 bits from someAnotherByteValue then it makes sense. Otherwise, there is no use.
No, there is no point so long as you are dealing with a byte. If value was a long then the lower 8 bits would be the lower 8 bits of someAnotherByteValue and the rest would be zero.
In a language like C++ where operators can be overloaded, it's possible but unlikely that the & operator has been overloaded. That would be pretty unusual and bad practice though.
EDIT: Well, off course i assume that
someAnotherByteValue is 8 bits long,
the problem is that i don't get why so
many people ( i mean professionals )
use such things in their code. For
example in Jon Skeet's MiscUtil there
is:
uint s1 = (uint)(initial & 0xffff);
where initial is int.
In this particular case, the author might be trying to convert an int to a uint. The & with 0xffff would ensure that it would still convert Lowest 2 Bytes, even if the system is not one which has a 2 byte int type.
To be picky, there is no guaranty regarding a machine's byte size. There is no reason to assume in a extremely portable program that the architecture byte is 8 bits wide. To the best of my memory, according to the C standard (for example), a char is one byte, short is wider or the same as char, int is wider or the same as short, long is wider or the same as int and so on. Hence, theoretically there can be a compiler where a long is actually one byte wide, and that byte will be, say, 10 bits wide. Now, to ensure your program behaves the same on that machine, you need to use that (seemingly redundant) coding style.
"Byte" # Wikipedia gives examples for such peculiar architectures.