Imagine I have some text, like 'Feef'.
I can gzip it and the result is 24 bytes.
Is there a way to gzip it so the result would be 1024 bytes? It should be still a valid gzip stream, i.e. it would not generate the message "trailing garbage ignored" when decompressed.
How would I use it: Gzip data header to fixed size. Append gzipped data body. Update header, gzip it to same fixed size and overwrite.
You can concatenate gzip streams and it will still be valid gzip, but they have to be proper streams. Maybe there's a way to pad gzip output?
The gzip header permits an extra field of up to 65535 bytes that can contain arbitrary data and that is ignored when decompressing. So you can change the gzip header to insert an extra field to pad out the file to the desired length. See RFC 1952 for the format description. If you don't care about the file name in the gzip header, you can make that any length, to pad to an arbitrarily large size. Or if you want more than 64K and you don't want to muck with the file name, you can append empty gzip streams to make it as long as you like.
Related
I am struggling for an entire week now, failing at meeting the encryption requirements of HERE traffic api, precisely TPEG API.
Steps to do:
<?xml version="1.0" encoding="UTF-8" ?>
<get-messages>
<locations>
<loc lat="52.55121" lon="13.16565"/>
</locations>
</get-messages>
This xml body must be enrypted, HERE API documents it as follows:
Encrypt and compress all traffic information requests:
Compress the XML body using gzip.
Calculate the length in bytes of the gzip file.
Prepend the length of the gzipped data to the compressed body as a little endian 32 bit integer.
Pad the combination of gziped content and length with zeros to make it evenly divisible by 16 bytes.
Using AES 128, encrypt the resultant padded combination of content and length as follows:
a) Create a random integer 16 bytes long.
b) AES encrypt the result of step 4, in mode CBC using the integer generated in step 5.a as the initialization vector and the key from the InitSession response. Do not apply additional padding.
Send the resulting block of AES encrypted data as an HTTP POST request, prepended by the integer generated in step 5.a as content type application/octet-stream to the URL in the initsession response
So many things are just unclear here, what is the desired result of the gziped xml? Base64? binary?
What is the type of the 32 little endian int, binary?
The key has a length of 32 characters. Since AES128 only fits 16 bytes long keys, I assume the key must be interpreted as hex values. Do all values need to be defined as hex values?
What is the type of the IV? Hex? Text? Binary?
What is the type of the encrypted result? Hex? Binary? Text? Base64?
The http header must contain content_type appilcation/octet-stream.
I've read about chunked Transfer-Encoding and basically got the point. However, there's something I don't quite understand and hasn't been reffered to in all the sources I've read.
A chunked encoded data is structured as a series of chunks, each structured as follows:
<chunk size> (In ASCII bytes expressing the hexadecimal value)
\r\n
<data>
\r\n
What I don't understand is: what if the payload itself contains a \r\n ? Doesn't it interfere with the way we track when a chunk starts and ends?
You could argue that even if it does, we still have the chunk size before the chunk so that CRLF shouldn't bother us, but then I would ask - if so, why having these CRLFs in the first place?
Please clarify.
Yes, it can include \r\n.
As to why this format was chosen: I don't know. Maybe to make it more readable when uses with textual data.
May a URL contain raw binary data in a GET-request?
Is it possible to create a URL, www.example.com/**binary-data**, where www.example.com/ are ordinary ASCII characters, and **binary-data** are arbitrary raw byte-values, e.g., 0x10.
I don't won't to encode the binary data, but just create a string, e.g., char* in C, that contains both the ASCII characters and the binary data.
Or is POST-request the only way to send raw binary data as part of the body?
No, but could percent-escape the non-URI characters.
No. A URL transmitted in an HTTP GET request is percent-encoded, UTF-8 encoded Unicode text (resulting in an "ASCII" string).
Again, it's Unicode text.
Unicode encodings do not produce arbitrary binary data. There is no equivalent text for some arbitrary binary data.
Moving away from "raw", the server and request can, of course, agree on the use of a scheme such as Base64 to turn arbitrary binary data into Unicode text. By that point, though, you might as well use an HTTP request with a body, and although as far as HTTP is concerned the bodies are raw binary data, HTTP headers can indicate a standard format. Such requests include POST and PUT.
There are also practical limits to the length of the URL.
Say the body I'm trying to send via chunked encoding includes "\r\n", how do I avoid that being interpreted as the chunk delimeter?
e.g. "All your base are\r\n belong to us"
http://en.wikipedia.org/wiki/Chunked_transfer_encoding
"\r\n" isn't really a chunk delimiter. The chunk size specifies the number of bytes made up by that chunk's data. The client should then read the "\r\n" embedded within your message just fine.
By design, that is not a problem at all. Each chunk specifies the byte size of its data block. The contents of each data block are arbitrary, and must be received as such, so it can include line breaks in it. If the client is reading each chunk correctly (read a line and parse the byte size from it, then read the specified number of bytes, then read a line break), it won't matter if there are line breaks in the data, since the client is reading the data based on byte size, not on line breaks.
When chunked HTTP transfer encoding is used, why does the server need to write out both the chunk size in bytes and have the subsequent chunk data end with CRLF?
Doesn't this make sending binary data "CRLF-unclean" and the method a bit redundant?
What if the data has a 0x0A followed by 0x0D in it somewhere (i.e. these are actually part of the data)? Is the client then expected to adhere to the chunk size explicitly provided at the head of the chunk or choke on the first CRLF it encounters in the data?
My understanding so far of expected client behaviour is to simply take the chunk size provided by the server, proceed to the next line, then read exactly this amount of bytes from within the following data (CRLF or no CRLF therein), then skip the CRLF following the data and repeat the procedure until no more chunks. Is this compliant behaviour? If so, what is the point of the CRLF after each datachunk then? Readability?
I have done some Web searching on this and also did some reading of the HTTP 1.1 specification, but a definitive answer seems to be eluding me.
A chunked consumer does not scan the message body for a CRLF pair. It first reads the specified number of bytes, and then reads two more bytes to confirm that they are CR and LF. If they're not, the message body is ill-formed, and either the size was specified improperly or the data was otherwise corrupted.
The trailing CRLF is a belt-and-suspenders assurance (per RFC 2616 section 3.6.1, Chunked Transfer Coding), but it also serves to maintain the consistent rule that fields start at the beginning of the line.
The CRLF after each chunk is probably just for better readability as it’s not necessary due to the chunk size at the begin of each chunk. But the CRLF after the “chunk header” is necessary as there may be additional information after the chunk size (see Chunk Transfer Encoding):
chunk = chunk-size [ chunk-extension ] CRLF
chunk-data CRLF