The structure of Deflate compressed block - unix

I have troubles with understanding Deflate algorithm (RFC 1951).
TL; DR How to parse Deflate compressed block 4be4 0200?
I created a file with a letter and newline a\n in it, and run gzip a.txt. Resultant file a.txt.gz:
1f8b 0808 fe8b eb55 0003 612e 7478 7400
4be4 0200
07a1 eadd 0200 0000
I understand that first line is header with additional information, and last line is CRC32 plus size of input (RFC 1951). These two gives no trouble to me.
But how do I interpret the compressed block itself (the middle line)?
Here's hexadecimal and binary representation of it:
4be4 0200
0100 1011
1110 0100
0000 0010
0000 0000
As far as I understood, somehow these ones:
Each block of compressed data begins with 3 header bits containing the following data:
first bit BFINAL
next 2 bits BTYPE
...actually ended up at the end of first byte: 0100 1011. (I'll skip the question why would anyone call "header" something which is actually at the tail of something else.)
RFC contains something that as far as I understand is supposed to be an explanation to this:
Data elements are packed into bytes in order of
increasing bit number within the byte, i.e., starting
with the least-significant bit of the byte.
Data elements other than Huffman codes are packed
starting with the least-significant bit of the data
element.
Huffman codes are packed starting with the most-
significant bit of the code.
In other words, if one were to print out the compressed data as
a sequence of bytes, starting with the first byte at the
right margin and proceeding to the left, with the most-
significant bit of each byte on the left as usual, one would be
able to parse the result from right to left, with fixed-width
elements in the correct MSB-to-LSB order and Huffman codes in
bit-reversed order (i.e., with the first bit of the code in the
relative LSB position).
But sadly I don't understand that explanation.
Returning to my data. OK, so BFINAL is set, and BTYPE is what? 10 or 01?
How do I interpret the rest of the data in that compressed block?

First lets look at the hexadecimal representation of the compressed data as a series of bytes (instead of a series of 16-bit big-endian values as in your question):
4b e4 02 00
Now lets convert those hexadecimal bytes to binary:
01001011 11100100 00000010 000000000
According to the RFC, the bits are packed "starting with the least-significant bit of the byte". The least-significant bit of a byte is the right-most bit of the byte. So first bit of the first byte is this one:
01001011 11100100 00000010 000000000
^
first bit
The second bit is this one:
01001011 11100100 00000010 000000000
^
second bit
The third bit:
01001011 11100100 00000010 000000000
^
third bit
And so on. Once you gone through all the bits in the first byte, you then start on the least-significant bit of the second byte. So the ninth bit is this one:
01001011 11100100 00000010 000000000
^
ninth bit
And finally the last-bit, the thirty-second bit, is this one:
01001011 11100100 00000010 000000000
^
last bit
The BFINAL value is the first bit in the compressed data, and so is contained in the single bit marked "first bit" above. It's value is 1, which indicates that this is last block in compressed data.
The BTYPE value is stored in the next two bits in data. These are the bits marked "second bit" and "third bit" above. The only question is which bit of the two is the least-significant bit and which is the most-significant bit. According to the RFC, "Data elements other than Huffman codes are packed
starting with the least-significant bit of the data element." That means the first of these two bits, the one marked "second bit", is the least-significant bit. This means the value of BTYPE is 01 in binary. and so indicates that the block is compressed using fixed Huffman codes.
And that's the easy part done. Decoding the rest of the compressed block is more difficult (and with a more realistic example, much more difficult). Properly explaining how do that would be make this answer far too long (and your question too broad) for this site. I'll given you a hint though, the next three elements in the data are the Huffman codes 10010001 ('a'), 00111010 ('\n') and 0000000 (end of stream). The remaining 6 bits are unused, and aren't part of the compressed data.
Note to understand how to decode deflate compressed data you're going to have to understand what Huffman codes are. The RFC you're following assumes that you do. You should also know how LZ77 compression works, though the document more or less explains what you need to know.

Related

Assembly Language hex address

I'm just starting to learn assembly language, and we are working with hex addresses. Below is a question of ours. I'm not sure how it adds up though. I know the answer is 0x202C, but how did we get there? Can you help explain the processes step by step, in the most basic way possible to help me understand? Thank you!!
The following data segment starts at memory address 0x2000 (hexadecimal)
.data
printString BYTE "Assembly is fun",0
moreBytes BYTE 24 DUP(0)
dateIssued DWORD ?
dueDate DWORD ?
What is the hexadecimal address of dueDate?
You have three data definitions to add together:
printString is an ASCII text followed by a zero byte. The string part is 15 bytes long, and with the terminal zero byte that makes 16. So the offset of the next data item is 0x2010 (16 decimal is 0x10 hex). printString starts at 0x2000, and the next one starts after the last byte of printString, so you have to add its length to its offset to get to the next offset.
moreBytes is 24 bytes long, because that's how DUP works. BYTE x DUP (y) means "X bytes of value Y". So the offset of the next data item is 0x2028, as 24 decimal is 0x18 hex.
dateIssued is 4 bytes long, because that's the definition of a DWORD. So the next one is at 0x0x2C, since 8+4=12, and that's 0xC in hex notation.
Alternatively, you could add the three lenghts together, getting 44. 44 in hex would be 0x2C.

Interpret double written as hex

I am working with doubles (64-bit) stored in a file by printing the raw bytes from the C# representation. For example the decimal number 0.95238095238095233 is stored as the bytes "9e e7 79 9e e7 79 ee 3f" in hex. Everything works as expected, I can write and read, but I would like to be able to understand this representation of the double myself.
According to the C# documentation https://learn.microsoft.com/en-us/dotnet/api/system.double?view=netframework-4.7.2#explicit-interface-implementations and wikipedia https://en.wikipedia.org/wiki/Double-precision_floating-point_format the first bit is supposedly the sign with 0 for positive and 1 for negative numbers. However, no matter the direction I read my bytes, the first bit is 1. Either 9 = 1001 or f = 1111. I am puzzled since 0.95... is positive.
As a double check, the following python code returns the correct decimal number as well.
unpack('d', binascii.unhexlify("9ee7799ee779ee3f"))
Can anyone explain how a human can read these bytes and get to 0.95238095238095233?
Figured it out, the collection of bytes are read like a 64-bit number (first bit on the left), but each byte is read like a string(first bit on the right). So my bytes should be read "3F" first, 3F reads left to right, so I'm starting with the bits 0011 1111 etc. This gives the IEEE 754 encoding as expected: First bit is the sign, next 11 bits the exponent, and then the fraction.

Simple SDLC CRC calculation not giving the correct value

I am trying to figure out how the calculate the CRC for very simple SDLC frames.
Using an MLT I am capturing the stream and i see some simple frames being sent out like: 0x3073F9E3 and 0x3011EDE3
From my understanding the F9E3 and EDE3 are the 2 byte checksums of the 3073 and 3011 since that is all that was in that frame.
using numerous CRC calculators and calculations I have been able to get the first byte of the checksum, but not the last byte (the F9 and the ED).
Using this calculator (http://www.zorc.breitbandkatze.de/crc.html):
Select CRC-CCITT
Change Final XOR Value to: FFFF
Check Reverse Data Bytes and reverse CRC result before Final XOR
Then type the input: %30%11
Which will give the output B8ED so the last byte is the ED.
Any ideas?
You are getting the correct crc16's (F9 F8, ED B8). I don't know why your last byte is E3 in both cases. This is perhaps a clue that the packets are not being disassembled correctly.

What kinds of checksums result in 2 digit hexadecimal?

Is there any checksum that results in 2 digit hexadecimal?
I can only find NMEA Checksum...
references:
http://nmeachecksum.eqth.net/
http://en.wikipedia.org/wiki/NMEA_0183
I have some data file that I want to perform reverse engineering to find the kind of checksum.
Thank you in advance,
Two hex digits is one byte. You're looking for a checksum which produces one byte.
Obviously, you've got a simple additive checksum (sum the bytes of the input), and an xor of the input bytes.
It could also be a longer checksum of which only 8 bits have been taken.
It could also be some kind of CRC-8; Wikipedia knows about five kinds of standardised CRC-8.

Can someone explain hex offsets to me?

I downloaded Hex Workshop, and I was told to read a .dbc file.
It should contain 28,315 if you read
offset 0x04 and 0x05
I am unsure how to do this? What does 0x04 mean?
0x04 is hex for 4 (the 0x is just a common prefix convention for base 16 representation of numbers - since many people think in decimal), and that would be the fourth byte (since they are saying offset, they probably count the first byte as byte 0, so offset 0x04 would be the 5th byte).
I guess they are saying that the 4th and 5th byte together would be 28315, but did they say if this is little-endian or big-endian?
28315 (decimal) is 0x6E9B in hexadecimal notation, probably in the file in order 0x9B 0x6E if it's little-endian.
Note: Little-endian and big-endian refer to the order bytes are written. Humans typical write decimal notation and hexadecimal in a big-endian way, so:
256 would be written as 0x0100 (digits on the left are the biggest scale)
But that takes two bytes and little-endian systems will write the low byte first: 0x00 0x01. Big-endian systems will write the high-byte first: 0x01 0x00.
Typically Intel systems are little-endian and other systems vary.
Think of a binary file as a linear array of bytes.
0x04 would be the 5th (in a 0 based array) element in the array, and 0x05 would be the 6th.
The two values in 0x04 and 0x05 can be OR'ed together to create the number 28,315.
Since the value you are reading is 16 bit, you need to bitshift one value over and then OR them together, ie if you were manipulating the file in c#, you would use something like this:
int value = (ByteArray[4] >> 8) | ByteArray[5]);
Hopefully this helps explain how hex addresses work.
It's the 4th and the 5th XX code your viewing...
1 2 3 4 5 6
01 AB 11 7B FF 5A
So, the 0x04 and 0x05 is "7B" and "FF".
Assuming what you're saying, in your case 7BFF should be equal to your desired value.
HTH
0x04 in hex is 4 in decimal. 0x10 in hex is 16 in decimal. calc.exe can convert between hex and decimal for you.
Offset 4 means 4 bytes from the start of the file. Offset 0 is the first byte in the file.
Look at bytes 4 and five they should have the values 0x6E 0x9B (or 0x9B 0x6E) depending on your endianess.
Start here. Once you learn how to read hexadecimal values, you'll be in much better shape to actually solve your problem.

Resources