Let's say I'm sending a multipart request (or response). I need to choose a multipart boundary which does not appear in any of my payloads. However, my payloads are large binary files and I am streaming them to the destination. I want to avoid streaming them twice - once to scan for the boundary and one to stream out.
So my question is: is it possible to escape the boundary if it appears in the payload? If so, how?
Don't Panic. Your boundary can be up to 70 characters long. If you go with that maximum and randomly generate it out of characters and numbers you'll have 62⁷⁰ possible combinations for each position in a file. Chance of having the same sequence of bytes in your binary files is so infinitesimal that it shouldn't bother your sleep at all 😀. The probability of collision in a 1GB file is roughly 1-((1-(1/(62^70)))^(10^9)) ~= 3.4*10⁻¹¹⁸. Human brain can't really fathom how small that number is. For comparison the number of atoms in our universe is estimated to be ~ 10⁸⁰.
No, it's not possible; you need to either scan, or live with potential failures.
Related
I am currently working on a project that requires data to be sent from A to B. Once B receives the data, it needs to be able to determine if an error occurred during transmission.
I have read about CRC and have decided that CRC16 is right for my needs; I can chop the data into chunks and send a chunk at a time.
However, I am confused about how B will be able to tell if an error occurred. My initial thought was to have A generate a CRC and then send the data to B. Once B receives the data, generate the CRC and send it back to A. If the CRCs match, the transmission was successful. BUT - what if the transmission of the CRC from B to A errors? It seems redundant to have the CRC sent back because it can become corrupted in the same way that the data can be.
Am I missing something or over-complicating the scenario?
Any thoughts would be appreciated.
Thanks,
P
You usually send the checksum with the data. Then you calculate the checksum out of the data on the receiving end, and compare it with the checksum that came along with it. If they don't match, either the data or the checksum was corrupted (unless you're unlucky enough to get a collision) - in which case you should ask for a retransmission.
CRC is error-detection and, notice, your code can only detect a finite number of errors. However, you can calculate the probability of a CRC16 collision (this is relatively small for most practical purposes).
Now how CRC works is using polynomial division. Your CRC value is some polynomial (probably on the order of (x^15) for CRC16). That said, the polynomial is represented in binary as the coefficients. For example, x^3 + [(0)*x^2] + x + 1 = 1011 is some polynomial on order x^3. Now, you divide your data chunk by your CRC polynomial. The remainder is the CRC value. Thus, when you do this division operation again to the received chunk (with the remainder) on B, the polynomial division should come out even to 0. If this does not occur then you have a transmission error.
Now, this assumes (including corruption of your CRC value) that if n bits are corrupted, the CRC check will detect the failure (assuming no collision). If the CRC check does not pass, simply send a retransmission request to A. Otherwise, continue processing as normal. If a collision occurred, there is no way to verify the data is corrupted until you look at your received data manually (or send several, hopefully error-free copies - note that this method incurs a lot of overhead and redundancy only works to finite precision again).
I'm trying to create (or, if I've somehow missed it in my research, find) an algorithm to encode/decode a bmp image into/from a QR code format. I've been using a guide (Thonky) to try to understand the basics of QR codes and I'm still not sure how to go about this problem, specifically:
Should I encode the data as binary or would numeric be more reasonable (assuming each pixel will have a max. value of 255)?
I've searched for information on the structured append capabilities of QR codes but haven't found much detail beyond the fact that it's supported by QR codes -- how could I implement/utilize this functionality?
And, of course, if there are any tips/suggestions to better store an image as binary data, I'm very open to suggestions!
Thanks for your time,
Sean
I'm not sure you'll be able to achieve that, as the amount of information a QR Code can hold is quite limited.
First of all, you'll probably want to store your image as raw bytes, as the other formats (numeric and alphanumeric) are designed to hold text/numbers and would provide less space to store your image. Let's assume you choose the biggest possible QR Code (version 40), with the smallest level of error correction, which can hold up to 2953 bytes of binary information (see here).
First option, as you suggest, you store the image as a bitmap. This format allows no compression at all and requires (in the case of an RGB image without alpha channel) 3 bytes per pixel. If we take into account the file header size (14 to 54 bytes), and ignore the padding (each row of image data must be padded to a length being a multiple of 4), that allows you to store roughly 2900/3 = 966 pixels. If we consider a square image, this represents a 31x31 bitmap, which is small even for a thumbnail image (for example, my avatar at the end of this post is 32x32 pixels).
Second option, you use JPEG to encode your image. This format has the advantage of using a compression algorithm that can reduce the file size. This time there is no exact formula to get the size of an image fitting in 2.9kB, but I tried using a few square images and downsizing them until they fit in this size, keeping a good (93) quality factor: this gives an average of about 60x60 pixel images. (On such small images, it's normal not to see an incredible compression factor between jpeg and bmp, as the file header in a jpeg file is far larger than in a bmp file: about 500 bytes). This is better than bitmap, but remains quite small.
Finally, even if you succeed in encoding your image in this QR Code, you will encounter an other problem: a QR Code this big is very, very hard to scan successfully. As a matter of fact, this QR Code will have a size of 177x177 modules (a "module" being a small white or black square). Assuming you scan it using a smartphone providing so-called "HD" frames (1280x720 pixels), each module will have a maximum size on the frame of about 4 pixels. If you take into account the camera noise, the aliasing and the blur due to the fact that the user is never perfectly idle when scanning, the quality of the input frames will make it very hard for any QR Code decoding algorithm to successfully get the QR Code (don't forget we set its error correction level on low at the beginning of this!).
Even though it's not very good news, I hope this helps you!
There is indeed a way to encode information on several (up to 16) QR Codes, using a special header in your QR Codes called "Structured append". The best source of information you can use is the norm about QR Codes (ISO 18004:2006); it's possible (but not necessarily easy) to find it for free on the web.
The relevant part (section 9) of this norm says:
"Up to 16 QR Code symbols may be appended in a structured format. If a symbol is part of a Structured Append message, it is indicated by a header block in the first three symbol character positions.
The Structured Append Mode Indicator 0011 is placed in the four most significant bit positions in the first symbol character.
This is immediately followed by two Structured Append codewords, spread over the four least significant bits of the first symbol character, the second symbol character and the four most significant bits of the third symbol character. The first codeword is the symbol sequence indicator. The second codeword is the parity data and is identical in all symbols in the message, enabling it to be verified that all symbols read form part of the same Structured Append message. This header is immediately followed by the data codewords for the symbol commencing with the first Mode Indicator."
Nevertheless, i'm not sure most QR Code scanners can handle this, as it's a quite advanced feature.
You can define a fixed image size, reduce jpg header parts and using just vital information about it, so you can save up to 480bytes of a ~500bytes normal header.
I was using this method to store people photos for a small-club ID cards, images about 64x64 pixels is enough.
I want code to render n bits with n + x bits, non-sequentially. I'd Google it but my Google-fu isn't working because I don't know the term for it.
For example, the input value in the first column (2 bits) might be encoded as any of the output values in the comma-delimited second column (4 bits) below:
0 1,2,7,9
1 3,8,12,13
2 0,4,6,11
3 5,10,14,15
My goal is to take a list of integer IDs, and transform them in a way they can still be used for persistent URLs, but that can't be iterated/enumerated sequentially, and where a client cannot determine programmatically if a URL in a search result set has been visited previously without visiting it again.
I would term this process "encoding". You'll see something similar done to permit the use of communications channels that have special symbols that are not permitted in data. Examples: uuencoding and base64 encoding.
That said, you still need to (and appear at first blush to have) ensure that there is only one correct de-code; and accept the increase in size of the output (in the case above, the output will be double the size, bit-for-bit as the input).
I think you'd be better off encrypting the number with a cheap cypher + a constant secret key stored on your server(s), adding a random character or four at the end, and a cheap checksum, and simply reject any responses that don't have a valid checksum.
<encrypt(secret)>
<integer>+<random nonsense>
</encrypt>
+
<checksum()>
<integer>+<random nonsense>
</checksum>
Then decrypt the first part (remember, cheap == fast), validate the ciphertext using the checksum, throw off the random nonsense, and use the integer you stored.
There are probably some cryptographic no-no's here, but let's face it, the cost of this algorithm being broken is a touch on the low side.
i'm making network application which doesn't send good data every time (most of time they are broken) so i tought to make control sum. At the end of data i will add control sum to check if they are valid. So i'm not sure is that a good idea to multiply every data (they are from 1 to 100) by 100, 100^2, 100^3..., and sum them.
Do you have any suggestion what to do, without making really big number(there are many data in the every packet).
Example:
Data: 1,4,2,77,12,32,5,52,23
My solution:1,4,2,77,12,32,5,52,23, 100+40000+2000000+ 77*10^4 ...
When client receive the packet he will check if last data is equal to sum of other datas.
Is there any better solution?
Multiplying the data results in a very large number to transmit, and not a lot of confidence that the numbers are correct. And addition runs into potential overflow issues. That is why it is customary to use an xor.
Or you can read up on http://en.wikipedia.org/wiki/Error-correcting_code to get even fancier solutions that can detect, and sometimes correct, small numbers of errors.
Best explanation here:
http://www.textfiles.com/programming/crc.txt
CRC functions will be available in you language's networking library.
Because 128 is 10000000 in binary, there is only 1 bit for subnetting, and there are 7 bits for hosts. We're going to subneting the Class C network address 192.168.10.0.
192.168.10.0 = Network address
255.255.255.128= Subnet mask
I have some base-64 encoded encrypted data and noticed a fair amount of repetition. In a (approx) 200-character-long string, a certain base-64 character is repeated up to 7 times in several separate repeated runs.
Is this a red flag that there is a problem in the encryption? According to my understanding, encrypted data should never show significant repetition, even if the plaintext is entirely uniform (i.e. even if I encrypt 2 GB of nothing but the letter A, there should be no significant repetition in the encrypted version).
According to the binomial distribution, there is about a 2.5% chance that you'd see one character from a set of 64 appear seven times in a series of 200 random characters. That's a small chance, but not negligible. With more information, you might raise your confidence from 97.5% to something very close to 100% … or find that the cipher text really is uniformly distributed.
You say that the "character is repeated up to 7 times" in several separate repeated runs. That's not enough information to say whether the cipher text has a bias. Instead, tell us the total number of times the character appeared, and the total number of cipher text characters. For example, "it appeared a total of 3125 times in 1000 runs of 200 characters each."
Also, you need to be sure that you are talking about the raw output of a cipher. Cipher text is often encapsulated in an "envelope" like that defined by the Cryptographic Message Syntax. Of course, this enclosing structure will have predictable patterns.
Well I guess it depends. Repetition in general is bad thing if it represents the same data.
Considering you are encoding it have you looked at data to see if you have something that repeats in those counts?
In order to understand better you gotta know what kind of encryption does it use.
It could be just coincidence that they are repeating.
But if repetition comes from same data, then it can be a red flag because then frequency counts can be used to decode it.
What kind of encryption are you using? Home made or some industry standard?
It depends on how are you encrypting your data.
Base64 encoding a string may count as light obfuscation, but it is NOT encryption. The purpose of Base64 encoding is to allow any sort of binary data to be encoded as a safe ASCII string.