Interpret double written as hex - hex

I am working with doubles (64-bit) stored in a file by printing the raw bytes from the C# representation. For example the decimal number 0.95238095238095233 is stored as the bytes "9e e7 79 9e e7 79 ee 3f" in hex. Everything works as expected, I can write and read, but I would like to be able to understand this representation of the double myself.
According to the C# documentation https://learn.microsoft.com/en-us/dotnet/api/system.double?view=netframework-4.7.2#explicit-interface-implementations and wikipedia https://en.wikipedia.org/wiki/Double-precision_floating-point_format the first bit is supposedly the sign with 0 for positive and 1 for negative numbers. However, no matter the direction I read my bytes, the first bit is 1. Either 9 = 1001 or f = 1111. I am puzzled since 0.95... is positive.
As a double check, the following python code returns the correct decimal number as well.
unpack('d', binascii.unhexlify("9ee7799ee779ee3f"))
Can anyone explain how a human can read these bytes and get to 0.95238095238095233?

Figured it out, the collection of bytes are read like a 64-bit number (first bit on the left), but each byte is read like a string(first bit on the right). So my bytes should be read "3F" first, 3F reads left to right, so I'm starting with the bits 0011 1111 etc. This gives the IEEE 754 encoding as expected: First bit is the sign, next 11 bits the exponent, and then the fraction.

Related

number base decoded message

I have this piece of decoded message, it's a homework but i can't solve it, the message is
IZWGCZZ2EBAUWRSVOJAU45DSOVCEOZKS
N5CHKQLSM5GGSQ2VNVIECUSEIU======
there is a hint saying The string is encoded using an unusual number base. The numbers 2 - 7 are represented and the letters A - Z are represented.
i have looked into the internet but i couldn't find anything, please if anyone could help understand this problem and solve it i would appreciate it
Let's see: A-Z + 2-7 = 32 possible values.
32 values can be contained in 5 bits, thus each byte of the message represents 5 bits.
To decode, each of those 5 bits have to be put together in one long bit-string which is then read as an 8 bit ASCII string.
Or, in other words: Base32 encoding.
So:
IZWGCZZ2EBAUWRSVOJAU45DSOVCEOZKSN5CHKQLSM5GGSQ2VNVIECUSEIU======
converts to:
Flag: AKFUrANtruDGeRoDuArgLiCUmPARDE
See here to test the decoding.

The structure of Deflate compressed block

I have troubles with understanding Deflate algorithm (RFC 1951).
TL; DR How to parse Deflate compressed block 4be4 0200?
I created a file with a letter and newline a\n in it, and run gzip a.txt. Resultant file a.txt.gz:
1f8b 0808 fe8b eb55 0003 612e 7478 7400
4be4 0200
07a1 eadd 0200 0000
I understand that first line is header with additional information, and last line is CRC32 plus size of input (RFC 1951). These two gives no trouble to me.
But how do I interpret the compressed block itself (the middle line)?
Here's hexadecimal and binary representation of it:
4be4 0200
0100 1011
1110 0100
0000 0010
0000 0000
As far as I understood, somehow these ones:
Each block of compressed data begins with 3 header bits containing the following data:
first bit BFINAL
next 2 bits BTYPE
...actually ended up at the end of first byte: 0100 1011. (I'll skip the question why would anyone call "header" something which is actually at the tail of something else.)
RFC contains something that as far as I understand is supposed to be an explanation to this:
Data elements are packed into bytes in order of
increasing bit number within the byte, i.e., starting
with the least-significant bit of the byte.
Data elements other than Huffman codes are packed
starting with the least-significant bit of the data
element.
Huffman codes are packed starting with the most-
significant bit of the code.
In other words, if one were to print out the compressed data as
a sequence of bytes, starting with the first byte at the
right margin and proceeding to the left, with the most-
significant bit of each byte on the left as usual, one would be
able to parse the result from right to left, with fixed-width
elements in the correct MSB-to-LSB order and Huffman codes in
bit-reversed order (i.e., with the first bit of the code in the
relative LSB position).
But sadly I don't understand that explanation.
Returning to my data. OK, so BFINAL is set, and BTYPE is what? 10 or 01?
How do I interpret the rest of the data in that compressed block?
First lets look at the hexadecimal representation of the compressed data as a series of bytes (instead of a series of 16-bit big-endian values as in your question):
4b e4 02 00
Now lets convert those hexadecimal bytes to binary:
01001011 11100100 00000010 000000000
According to the RFC, the bits are packed "starting with the least-significant bit of the byte". The least-significant bit of a byte is the right-most bit of the byte. So first bit of the first byte is this one:
01001011 11100100 00000010 000000000
^
first bit
The second bit is this one:
01001011 11100100 00000010 000000000
^
second bit
The third bit:
01001011 11100100 00000010 000000000
^
third bit
And so on. Once you gone through all the bits in the first byte, you then start on the least-significant bit of the second byte. So the ninth bit is this one:
01001011 11100100 00000010 000000000
^
ninth bit
And finally the last-bit, the thirty-second bit, is this one:
01001011 11100100 00000010 000000000
^
last bit
The BFINAL value is the first bit in the compressed data, and so is contained in the single bit marked "first bit" above. It's value is 1, which indicates that this is last block in compressed data.
The BTYPE value is stored in the next two bits in data. These are the bits marked "second bit" and "third bit" above. The only question is which bit of the two is the least-significant bit and which is the most-significant bit. According to the RFC, "Data elements other than Huffman codes are packed
starting with the least-significant bit of the data element." That means the first of these two bits, the one marked "second bit", is the least-significant bit. This means the value of BTYPE is 01 in binary. and so indicates that the block is compressed using fixed Huffman codes.
And that's the easy part done. Decoding the rest of the compressed block is more difficult (and with a more realistic example, much more difficult). Properly explaining how do that would be make this answer far too long (and your question too broad) for this site. I'll given you a hint though, the next three elements in the data are the Huffman codes 10010001 ('a'), 00111010 ('\n') and 0000000 (end of stream). The remaining 6 bits are unused, and aren't part of the compressed data.
Note to understand how to decode deflate compressed data you're going to have to understand what Huffman codes are. The RFC you're following assumes that you do. You should also know how LZ77 compression works, though the document more or less explains what you need to know.

How to extract first 20 bits of Hexadecimal address?

I have the following hexadecimal 32 bit virtual address address: 0x274201
How can I extract the first 20 bits, then convert them to decimal?
I wanted to know how to do this by hand.
Update:
#Pete855217 pointed out that the address 0x274201 is not 32 bit.
Also 0x is not part of the address as it is used to signify
a hexadecimal address.
Which suggests that I will add 00 after 0X, so now a true 32 bit address would be: 0x00274201. I have updated my answer!
I believe I have answered my own question and I hope I am correct?
First convert HEX number 0x00274201 to BIN (this is the long way but I learned something from this):
However, I noticed the first 20 bits include 00274 in HEX. Which makes sense because every HEX digit is four BIN digits.
So, since I wanted the first 20 bits, then I am really asking for the
first five HEX digits because 5 * 4 = 20 (bits in BIN)
Thus this will yield 00274 in HEX = 628 in DEC (decimal).

Arduino Arithmetic error negative result

I am trying to times 52 by 1000 and i am getting a negative result
int getNewSum = 52 * 1000;
but the following code is ouputting a negative result: -13536
An explanation of how two's complement representation works is probably better given on Wikipedia and other places than here. What I'll do here is to take you through the workings of your exact example.
The int type on your Arduino is represented using sixteen bits, in a two's complement representation (bear in mind that other Arduinos use 32 bits for it, but yours is using 16.) This means that both positive and negative numbers can be stored in those 16 bits, and if the leftmost bit is set, the number is considered negative.
What's happening is that you're overflowing the number of bits you can use to store a positive number, and (accidentally, as far as you're concerned) setting the sign bit, thus indicating that the number is negative.
In 16 bits on your Arduino, decimal 52 would be represented in binary as:
0000 0000 0011 0100
(2^5 + 2^4 + 2^2 = 52)
However, the result of multiplying 52 by 1,000 -- 52,000 -- will end up overflowing the magnitude bits of an int, putting a '1' in the sign bit on the end:
*----This is the sign bit. It's now 1, so the number is considered negative.
1100 1011 0010 0000
(typically, computer integer arithmetic and associated programming languages don't protect you against doing things like this, for a variety of reasons, mostly related to efficiency, and mostly now historical.)
Because that sign bit on the left-hand end is set, to convert that number back into decimal from its assumed two's complement representation, we assume it's a negative number, and then first take the one's complement (flipping all the bits):
0011 0100 1101 1111
-- which represents 13,535 -- and add one to it, yielding 13,536, and call it negative: -13,536, the value you're seeing.
If you read up on two's complement/integer representations in general, you'll get the hang of it.
In the meantime, this probably means you should be looking for a bigger type to store your number. Arduino has unsigned integers, and a long type, which will use four bytes to store your numbers, giving you a range from -2,147,483,648 to 2,147,483,647. If that's enough for you, you should probably just switch to use long instead of int.
Matt's answer is already a very good in depth explanation of what's happening, but for those looking for a more TL;dr practical answer:
Problem:
This happens quite often for Arduino programmers when they try to assign (= equal sign) the result of an arithmetic (usually multiplication) to a normal integer (int). As mentioned, when the result is bigger than the memory size compiler has assigned to the variables, overflowing happens.
Solution 1:
The easiest solution is to replace the int type with a bigger datatype considering your needs. As this tutorialspoint.com tutorial has explained there are different integer types we can use:
int:
16 bit: from -32,768 to 32,767
32 bit: from -2,147,483,648 to 2,147,483,647
unsigned int: from 0 to 65,535
long: from 2,147,483,648 to 2,147,483,647
unsigned long: from 0 to 4,294,967,295
Solution 2:
This works only if you have some divisions with big enough denominators, in your arithmetic. In Arduino compiler multiplication is calculated prior to the division. so if you have some divisions in your equation try to encapsulate them with parentheses. for example if you have a * b / c replace it with a * (b / c).

TripleDES Decryption truncating last character

I have a .NET class that implements TripleDES encryption and decryption. The code is too much to post here. However, the problem is that while encryption is OK, decryption is inconsistent depending on the length of the original plaintext. I know that encryption is OK since other triple DES tools also give same value.
Specifically, the last character is being cut off from the resulting plain text if the Length of the original Plaintext was 8,16,24,32,40 etc i.e 8n.
The encryption mode is CBC
The key size is 24 chars(192bits)
The IV is 8 chars
The problem is because the (un)padding algorithm is not correct.
(3)DES encrypts/decrypts blocks of 8 bytes. As not all texts are precisely 8 bytes, the last block must contain bytes that are not original from the plain text. Now the trick is to find out which one is the last character of the plain text. Sometimes the length of the plain text is known beforehand - then the padding characters can be anything really.
If the length of the plain text is not known then a deterministic padding algorithm must be used, e.g. PKCS5Padding. PKCS5Padding always performs padding, even if the plaintext is N * blocksize in bytes. The reason for this is simple: otherwise it doesn't know if the last byte is plain text or padding: 41 41 41 41 41 41 41 41 08 08 08 08 08 08 08 08 would be 8 'A' characters, with 8 padding bytes.
It seems that either the unpadding algorithm is not well implemented, or that a non-deterministic padding algorithm is deployed.

Resources