Julia bitstring function and byte order - julia

I am running Julia 1.0.2 under Windows 8.1.
The following leads me to believe Julia treats my machine in "little-endian" fashion:
julia> VERSION
v"1.0.2"
julia> ENDIAN_BOM
0x04030201
help?> ENDIAN_BOM
search: ENDIAN_BOM
ENDIAN_BOM
The 32-bit byte-order-mark indicates the native byte order of the host machine. Little-endian
machines will contain the value 0x04030201. Big-endian machines will contain the value
0x01020304.
Based on above, the bitstring examples below make sense to me. Both have the least significant byte first, on the left, as I would expect for little-endian byte order:
julia> bitstring(1.0)
"0011111111110000000000000000000000000000000000000000000000000000"
julia> bitstring(Char(1))
"00000001000000000000000000000000"
However, the following example seems to be in big-endian order, the least significant byte is on the right:
julia> bitstring(1)
"0000000000000000000000000000000000000000000000000000000000000001"
Am I confused? Any suggestions or explanations?

bitstring cares about host byte order. In other words,
julia> bitstring(1)
"0000000000000000000000000000000000000000000000000000000000000001"
is machine-independent, but
julia> bitstring(hton(1))
"0000000100000000000000000000000000000000000000000000000000000000"
reflects your arch. Insert hton and ntoh if you parse packets.
This is because bitstring is most used for reality-checking code that uses flags, and << et al operate on host byte order.

In your question there are two separate issues.
Char representation is a custom design decision in Julia; the approach is that UTF-8 representation of a character is filled with 0s on the right side to get 4 bytes (UInt32); you can see how the convesion happens e.g. in Char(u::UInt32) method definition.
For 1.0 and 1 you can see what are their little and big endian representations using htol and hton functions and you get:
julia> bitstring(htol(1))
"0000000000000000000000000000000000000000000000000000000000000001"
julia> bitstring(hton(1))
"0000000100000000000000000000000000000000000000000000000000000000"
julia> bitstring(htol(1.0))
"0011111111110000000000000000000000000000000000000000000000000000"
julia> bitstring(hton(1.0))
"0000000000000000000000000000000000000000000000001111000000111111"
and all is consistent.
EDIT: see the explanation in the other answer what bitstring exactly does as it is relevant.

Related

Is there a way to do a right bit-shift on a BigInt in Rust?

I get this error when attempting to do >> or >>= on a BigInt:
no implementation for `BigInt >> BigInt
using the num_bigint::BigInt library
Edit: More Context:
I am rewriting this program https://www.geeksforgeeks.org/how-to-generate-large-prime-numbers-for-rsa-algorithm/ from python/c++ into rust however I will focus on the python implementation as it is written to handle 1024 bit prime numbers which are extremely big.
In the code we run the Miller Rabin Primality test which includes shifting EC: (prime-candidate - 1) to the right by 1 if we find that EC % 2 == 0. As I mentioned in the python implementation EC can be an incredibly large integer.
It would be convenient to be able to use the same operator in rust, if that is not possible can someone suggest an alternative?
According to the documentation for the num-bigint crate, the BigInt struct does implement the Shr trait for the right-shift operator, just not when the shift amount is itself a BigInt. If you convert the shift amount to a standard integer type (e.g. i64) then it should work.
It is unlikely you would ever want to shift by an amount greater than i64::MAX, but if you do need this, then the correct result is going to be zero (because no computer has 2^60 bytes of memory), so you can write a simple implementation which checks for that case.

Convert binary to decimal in Julia

I'd like to convert binary to decimal in Julia. It looks like parseint() became deprecated.
Is the below method the best way to do this?
julia> parse(Int,"111",2)
7
Are you starting with a string? Then yes, that's the way. If you're just wanting to write a constant in binary, then it's much easier to just use the 0b111 syntax. By default, it constructs an unsigned integer (which is displayed in hexadecimal), but you can easily convert it to a signed integer with Int(0b111).
julia> 0b110111
0x37
julia> Int(0b110111)
55

How does a processor calculate bigger than its register value?

So far I learned that a processor has registers, for 32 bit processor
they are 32 bits, for 64 bit they are 64 bits. So can someone explain
what happens if I give to the processor a larger value than its register
size? How is the calculation performed?
It depends.
Assuming x86 for the sake of discussion, 64-bit integers can still be handled "natively" on a 32-bit architecture. In this case, the program often uses a pair of 32-bit registers to hold the 64-bit value. For example, the value 0xDEADBEEF2B84F00D might be stored in the EDX:EAX register pair:
eax = 0x2B84F00D
edx = 0xDEADBEEF
The CPU actually expects 64-bit numbers in this format in some cases (IDIV, for example).
Math operations are done in multiple instructions. For example, a 64-bit add on a 32-bit x86 CPU is done with an add of the lower DWORDs, and then an adc of the upper DWORDs, which takes into account the carry flag from the first addition.
For even bigger integers, an arbitrary-precision arithmetic (or "big int") library is used. Here, a dynamically-sized array of bytes is used to represent the integer, with additional information (like the number of bits used). GMP is a popular choice.
Mathematical operations on big integers are done iteratively, probably in native word-size values at-a-time. For the gory details, I suggest you have a look through the source code of one of these open-source libraries.
The key to all of this, is that numeric operations are carried out in manageable pieces, and combined to produce the final result.

Bad floating-point magic

I have a strange floating-point problem.
Background:
I am implementing a double-precision (64-bit) IEEE 754 floating-point library for an 8-bit processor with a large integer arithmetic co-processor. To test this library, I am comparing the values returned by my code against the values returned by Intel's floating-point instructions. These don't always agree, because Intel's Floating-Point Unit stores values internally in an 80-bit format, with a 64-bit mantissa.
Example (all in hex):
X = 4C816EFD0D3EC47E:
biased exponent = 4C8 (true exponent = 1C9), mantissa = 116EFD0D3EC47E
Y = 449F20CDC8A5D665:
biased exponent = 449 (true exponent = 14A), mantissa = 1F20CDC8A5D665
Calculate X * Y
The product of the mantissas is 10F5643E3730A17FF62E39D6CDB0, which when rounded to 53 (decimal) bits is 10F5643E3730A1 (because the top bit of 7FF62E39D6CDB0 is zero). So the correct mantissa in the result is 10F5643E3730A1.
But if the computation is carried out with a 64-bit mantissa, 10F5643E3730A17FF62E39D6CDB0 is rounded up to 10F5643E3730A1800, which when rounded again to 53 bits becomes 10F5643E3730A2. The least significant digit has changed from 1 to 2.
To sum up: my library returns the correct mantissa 10F5643E3730A1, but the Intel hardware returns (correctly) 10F5643E3730A2, because of its internal 64-bit mantissa.
The problem:
Now, here's what I don't understand: sometimes the Intel hardware returns 10F5643E3730A1 in the mantissa! I have two programs, a Windows console program and a Windows GUI program, both built by Qt using g++ 4.5.2. The console program returns 10F5643E3730A2, as expected, but the GUI program returns 10F5643E3730A1. They are using the same library function, which has the three instructions:
fldl -0x18(%ebp)
fmull -0x10(%ebp)
fstpl 0x4(%esp)
And these three instructions compute a different result in the two programs. (I have stepped through them both in the debugger.) It seems to me that this might be something that Qt does to configure the FPU in its GUI startup code, but I can't find any documentation about this. Does anybody have any idea what's happening here?
The instructions stream of and inputs to a function do not uniquely determine its execution. You must also consider the environment that is already established in the processor at the time of its execution.
If you inspect the x87 control word, you will find that it is set in two different states, corresponding to your two observed behaviors. In one, the precision control [bits 9:8] has been set to 10b (53 bits). In the other, it is set to 11b (64 bits).
As to exactly what is establishing the non-default state, it could be anything that happens in that thread prior to execution of your code. Any libraries that are pulled in are likely suspects. If you want to do some archaeology, the smoking gun is typically the fldcw instruction (though the control word can also be written to by fldenv, frstor, and finit.
normally it's a compiler setting. Check for example the following page for Visual C++:
http://msdn.microsoft.com/en-us/library/aa289157%28v=vs.71%29.aspx
or this document for intel:
http://cache-www.intel.com/cd/00/00/34/76/347605_347605.pdf
Especially the intel document mentions some flags inside the processor that determine the behavior of the FPU instructions. This explains why the same code behaves differently in 2 programs (one sets the flags different to the other).

Assembler memory address representation

I'm trying to get into assembler and I often come across numbers in the following form:
org 7c00h
; initialize the stack:
mov ax, 07c0h
mov ss, ax
mov sp, 03feh ; top of the stack.
7c00h, 07c0h, 03feh - What is the name of this number notation? What do they mean? Why are they used over "normal" decimal numbers?
It's hexadecimal, the numeral system with 16 digits 0-9 and A-F. Memory addresses are given in hex, because it's shorter, easier to read, and the numbers that represent memory locations don't mean anything special to humans, so no sense to have long numbers. I would guess that somewhere in the past someone had to type in some addresses by hand as well, might as well have started there.
Worth noting also, 0:7C00 is the boot sector load address.
Further worth noting: 07C0:03FE is the same address as 0:7FFE due to the way segmented addressing works.
This guy's left himself a 510 byte stack (he made the very typical off-by-two error in setting up the boot sector's stack).
These are numbers in hexadecimal notation, i.e. in base 16, where A to F have the digit values 10 to 15.
One advantage is that there is a more direct conversion to binary numbers. With a little bit of practice it is easy to see which bits in the number are 1 and which are 0.
Another is is that many numbers used internally, such as memory addresses, are round numbers in hexadecimal, i.e. contain a lot of zeros.

Resources