Hi i wanted to convert the following (01101010) into cipher text using the following random binary data: 1100
Could anybody help me do this and also show me how to work it out?
Thanks
The original text is eight bits long. Therefore, the key has to be repeated to achieve matching lengths.
Text original 01101010
Key repeated 11001100
Text encrypted 10100110
To get a certain bit of the encoded text, one has to calculate the XOR or the original text bit and the respective key bit. For example: The first bit evaluates to 0 xor 1 = 1
Note that you can retrieve the original text from the encoded text by re-applying the same procedure as during the encryption.
Related
When I run a proc print where segment ID equals 1234 the output shows segment ID 1235. SAS actually changes the last 4 digits of a 19 digit number. Contents shows the field in a num 8 formatted as a char 20. I just pull the data and print with no additional formatting or processing.
If I run a SQL statement in a different software package where segment ID equals 1234 (the exact same record) the results show 1234 (no change to the last 4). The other vars pulled with the query exactly match those of SAS except for the segment ID.
My best guess is it's a formatting issue even though the field should be large enough, 20 > 19.
Second guess is some sort of encryption on the field. Typically if I don't have proper access a field would be blank. But I am unfamiliar with this new data source.
I'll try adding a specific format to my SAS datapull for that field but would love to hear any other suggestions.
Thank you!
PROC PRINT is not the issue. You cannot store 19 decimal digits exactly as a number in SAS. SAS stores numbers as 64-bit floating point numbers. The maximum number of decimal digits that can be represented as consecutive integers is 15. After that the binary representation will not have enough bits to exactly represent every decimal string.
Check this description about precision from the documentation: https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lrcon/p0ji1unv6thm0dn1gp4t01a1u0g6.htm
You should store such things as character strings. I doubt that you need to do any arithmetic with those values.
If you are getting the data from a remote database use the DBSASTYPE= dataset option to control what type of SAS variable is created.
Contents shows the field in a num 8 formatted as a char 20. I just pull the data and print with no additional formatting or processing.
That doesn't make sense. A numeric variable shouldn't have a character format.
I think you'll need to re-read the source field as a character from the source. Not sure where you "just pull the data" from, but you'll need to modify that step to ensure it gets brought over as a character in the first place otherwise you'll have that issue no matter what.
You cannot fix the issue with the data as is, as far as I know.
*issue with numeric;
data have;
input segmentID;
format segmentID 32.;
segmentIDChar=put(SegmentID, 32.);
cards;
1234567890123456789
;;;;
run;
proc print data=have;
run;
*no issue with character fields;
data have;
length segmentID $20.;
input segmentID $;
format segmentID $32.;
cards;
1234567890123456789
;;;;
run;
proc print data=have;
run;
Results:
Numeric Isssue
Obs segmentID segmentIDChar
1 1234567890123456768 1234567890123456768
Character Issue
Obs segmentID
1 1234567890123456789
A bit-stuffing based framing protocol uses an 8-bit delimiter pattern of 01111110. If the output bit-string after stuffing is 01111100101, then the input bit-string is
(A) 0111110100
(B) 0111110101
(C) 0111111101
(D) 0111111111
Correct answer given is B.
My question is why 1 is added after five 1's from left even when delimiter has six continuous 1's.
I think we will add 1 only when we get six continuous 1's, to avoid a 0.
Correct me if I am wrong.
The delimiter given 01111110. Delimiter basically used to determine the start and end of the frame. So we need to make sure if the same pattern(01111110) is also in data frame then receiver will not think of it as start or end of frame rather a valid data portion. That's why after '011111' of data bits, one '0' bit is stuffed to make sure it will not give impression of start or end of frame.
When the receiver receives ,it checks for consecutive five ones and if the next bit is zero then it drops it(If next bit is 1 instead of 0 then check the next bit of this bit ,if that is 0 then it is delimiter else error has occured). This is known as '0' bit stuffing.
I wanted to know the structure of an unknown binary file generated by Fortran routine. For the same I downloaded hex editor. I am fairly new to the whole concept. I can see some character strings in the conversion tool. However, the rest is just dots and junk characters.
I tried with some online converter but it only converts to the decimal systems. Is there any possible way to figure out that certain hex represents integer and real?
I also referred to following thread, but I couldn't get much out of it.
Hex editor for viewing combined string and float data
Any help would be much appreciated. Thanks
The short answer is no. If you really know nothing of the format, then you are stuck. You might see some "obvious" text, in some language, but beyond that, it's pretty much impossible. Your Hex editor reads the file as a group of bytes, and displays, usually, ASCII equivalents beside the hex values. If a byte is not a printable ASCII character, it usually displays a .
So, text aside, if you see a value $31 in the file, you have no way of knowing if this represents a single character ('1'), or is part of a 2 byte word, or a 4 byte long, or indeed an 8 byte floating point number.
In general, that is going to be pretty hard! Some of the following ideas may help.
Can you give the FORTRAN program some inputs that make it change the length/amount of output data? E.g. can you make it produce 1 unit of output, then 8 units of output, then 64 units of output - by unit I mean the number of values it outputs if that is what it does. If so, you can plot output file length against number of units of output and the intercept will tell you how many bytes the header is, if any. So, for example, if you can make it produce one number and you get an output file that is 24 bytes long, and when you make it produce 2 numbers the output file is 28 bytes long, you might deduce that there is a 20 byte header at the start and then 4 bytes per number.
Can you generate some known output? E.g. Can you make it produce zero, or 256 as an output, if so, you can search for zeroes and FF in your output file and see if you can locate them.
Make it generate a zero (or some other number) and save the output file, then make it generate a one (or some other, different number) and save the output file, then difference the files to deduce where that number is located. Or just make it produce any two different numbers and see how much of the output file changes.
Do you have another program that can understand the file? Or some way of knowing any numbers in the file? If so, get those numbers and convert them into hex and look for their positions in the hex dump.
I have some products which have 2d GS1 bar codes on them. Most have the format 01.17.10 which is GTIN.Expiry Date.Lot Number.
This makes sense as 01 and 17 are fixed length, so can be parsed easily, just by splitting the string in the appropriate place.
However, I also have some in the format 01.10.17.21 (GTIN.Lot.Expiry.Serial Number) which doesn't make sense because Lot and Serial number are variable length, meaning I cannot use position to decode the various elements. Also, I cannot search for the AIs as they could legitimately appear in the data.
It seems that I've no way of reliably decoding this format. Am I missing something?
Thanks!
According to the GS 1 website, "More than one AI can be carried in one bar code. When this happens, AIs with a fixed length data content (e.g., SSCC has a fixed length of 18 digits) are placed at the beginning and AI with variable lengths are placed at the end. If more than one variable length AI is placed in one bar code, then a special "function" character is used to tell the scanner system when one ends and the other one starts."
So it looks like they intend for you to order your AIs with the fixed width identifiers first. Then separate the variable-width fields with a function character, which it, appears is FNC1, but implementing that that will depend on the barcode symbology you are using, It may be different between DataMatrix, Code 128 and QR Code for example.
I have been trying to solve Project Euler's problem #59 for a while, and I am having trouble because some of it seems somewhat more ambiguous than previous problems.
As background, the problem says that the given text file is encrypted text with the ASCII codes saved as numbers. The encryption method is to XOR 3 lowercase letters cyclically with the plaintext (so it is reversible). The problem asks for the key that decrypts the file to English text. How should I restrict the character set of my output to get the answer, without trying to sift through all possible plaintexts (26^3)?
I have tried restricting to letters, spaces, and punctuation, and that did not work.
To clarify: I want to determine, out of all printable ASCII characters, which ones I can probably discard and which ones I can expect to be in the plaintext string.
Have you tried two of the most "basic" and common tools in analyzing the algorithm used?
Analyze the frequency of the characters and try to match it against English letter frequency
Bruteforce using keys from a wordlist, most often common words are used as keys by "dumb" users
To analyze the frequency for this particular problem you would have to split the string every third element since the key is of length 3, you should now be able to produce three columns:
79 59 12
2 79 35
8 28 20
2 3 68
...
you have to analyse the frequency for each column, since now they are independent of the key.
Ok, actually took my time and constructed the 3 complete columns and counted the frequency for each of the columns and got the two most frequent item or each column:
Col1 Col2 Col3
71 79 68
2 1 1
Now if you check for instance: http://en.wikipedia.org/wiki/Letter_frequency
You have the most frequent letters, and don't forget you have spaces and other characters which is not present on that page, but I think you can assume that space is the most frequent character.
So now it is just a matter of xor:ing the most frequent characters in the table I provided with the most frequent characters in English language, and see if you get any lowercase characters, I found a three letter word which I think is the answer with only this data.
Good luck and by the way, it was a nice problem!
A possible solution is to simply assume the presence of a given three-character sequence in the encrypted text. You can use a three-letter word, or a three letter sequence which is likely to appear in English text (e.g. " a ": the letter 'a' enclosed between two spaces). Then simply try all possible positions of that sequence in the encrypted text. Each position allows you to simply recompute the key, then decrypt the whole text into a file.
Since the original text has length 1201, you get 1199 files to skim through. At that point it is only a matter of patience, but you can make it much faster by using a simple text search utility on another frequent sequence in English (e.g. "are"), for instance with the Unix tool grep.
I did just that, and got the decrypted text in less than five minutes.
I'll admit upfront I'm not familiar with an XOR cipher.
However, it seems very similar to the concept of the vigenere cipher. Escpecially in the line where they mention for unbreakable encryption the keylength equals the message length. That screams Vernam Cipher.
As mentioned in the other answer, the strategical approach to breaking a vigenere cipher involves a probabilistic approach. I will not go into detail because most of the theory I learned was relatively complicated, but it can be found here keeping in mind that vignere is a series of caesar ciphers.
The problem makes it easy for you though because you already know the keylength. Because of that, as you mentioned, you can simply bruteforce by trying every single 3 letter combination.
Here's what I would do: take a reasonably sized chunk of the ciphertext, say maybe 10-20 characters, and try the brute force approach on that. Keep track of all the keys that seem to create understandable sequences of letters and then use those on the whole ciphertext. That way we can employ the obvious brute forcing method, but without bruteforcing the entire problem, so I don't think you'll have to worry about limiting your output.
That said, I agree that as you're creating the output, if you ever get a non printable character, you could probably break your loop and move on to the next key. I wouldn't try anything more specific than that because who knows what the original message could have, never make assumptions about the data you're dealing with. Short circuiting logic like that is always a good idea, especially when implementing a brute force solution.
Split the ciphertext into 3.
Ciphertext1 comprises the 1st, 4th, 7th, 10th...numbers
Ciphertext2 comprises the 2nd, 5th, 8th, 11th...numbers
Ciphertext3 comprises the 3rd, 6th, 9th, 12th...numbers
Now you know that each cyphertext is encrypted with the same key letter. Now do a standard frequency analysis on it. That should give you enough clues as to what the letter is.
I just solved this problem a few days ago. Without spoiling it for you, I want to describe my approach to this problem. Some of what I say may be redundant to what you already knew, but was part of my approach.
First I assumed that the key is exactly as described, three lowercase ASCII letters. So I began brute forcing at 'aaa' and went to 'zzz'. While decrypting, if any resulting byte was a value lower than 32 (the ASCII value of space, the lowest "printable" ASCII value) or higher than 126 (the ASCII value of the tilde '~' which is the highest printable character in ASCII) than I assumed the key was invalid because any value outside 32 and 126 would be an invalid character for a plain text stretch of English. As soon as a single byte is outside of this range, I stopped decrypting and went to the next possible key.
Once I decrypted the entire message using a particular key (after passing the first test of all bytes being printable characters), I needed a way to verify it as a valid decryption. I was expecting the result to be a simple list of words with no particular order or meaning. Through other cryptography experience, I thought back to letter frequency, and most simply that your average English word in text is 5 characters long. The file contains 1201 input bytes. So that would mean that there would be (on average) 240 words. After decrypting, I counted how many spaces were in the resulting output string. Since Project Euler is anything but average, I compared the number of spaces to 200 accounting for longer, more obscure words. When an output had more than 200 spaces in it, I printed out the key it was decrypted with and the output text. The one and only output that has more than 200 spaces is the answer. Let me tell you that it's more than obvious that you have the answer when you see it.
Something to point out is that the answer to the question is NOT the key. It is the sum of all the ASCII values of the output string. This approach will also solve the equation under the one minute mark, in fact, it times in around 3 or 4 seconds.