Finding spaces in many time pad - encryption

I'm currently completing an online course in Cryptography, and have been give an exercise to complete. The course been running for a while and I know the answer is on the web but would like to complete it myself thought actions and research.
I have a list of 13 cipertext based on one/many time Pad - the same cipher key has been used to encrypt plain text. My task is to decrypt the last ciphertext.
The steps I have taken so far are based on cribing techniques at the following location:
http://adamsblog.aperturelabs.com/2013/05/back-to-skule-one-pad-two-pad-me-pad.html
https://crypto.stackexchange.com/questions/6020/many-time-pad-attack
and I'm using the following tool to XOR the ciphertexts.
In the tutorial I'm following the author suggest that the first step is to identify spaces I have tried to follow the steps but still cannot Identify the spaces once I Xor the cipher's
When I XOR the first cipertext i.e cipher 1 with cipher 2 and 3 I get the following:
15040900180952180C4549114F190E0159490A49120E00521201064C0A4F144F281B13490341004F480203161600000046071B1E4119061D1A411848090F4E0D0000161A0A41140C16160600151C00170B0B090653014E410D4C530F01150116000307520F104D200345020615041C541A49151744490F4C0D0015061C0A1F454F1F4509074D2F01544C080A090028061C1D002E413D004E0B141118
000D064819001E0303490A124C5615001647160C1515451A041D544D0B1D124C3F4F0252021707440D0B4C1100001E075400491E4F1F0A5211070A490B080B0A0700190D044E034F110A00001300490F054F0E08100357001E0853D4315FCEACFA7112C3E55D74AAF3394BB08F7504A8E5019C4E3E838E0F364946F31721A49AD2D24FF6775EFCB4F79FE4217A01B43CB5068BF3B52CA76543187274
000000003E010609164E0C07001F16520D4801490B09160645071950011D0341281B5253040F094C0D4F08010545050150050C1D544D061C5415044548090717074F0611454F164F1F101F411A4F430E0F0219071A0B411505034E461C1B0310454F12480D55040F18451E1B1F0A1C541646410D054C0D4C1B410F1B1B03149AD2D24FF6775EFCB4F79FE4217A01B43CB5068BF3B52CA76543187274
I'm getting confused at to where the space are based on the ASCII table which gives a 20(HEX) to the value a space.
Sorry if this is not enough information I can if more is required. Thanks Mark.

The question you're linking to has the correct answer.
You don't appear to use that method. You should have c1⊕c2 and c1⊕c3 but your question contains three blocks of strings, one with 6 lines, one with 5 lines and one with 4 lines. That's insufficient for us to even guess what your problem is. Go back to the linked answer, read it, and follow the steps listed there.

Related

What are the 8 rules for 8 Mask patterns for QR code?

I was making some QR code manually and I'm stuck in the Data Masking part of making the QR code.
The section where I became confused is underlined as shown in the picture below, saying that there are 8 rules for each masking pattern for QR code.
The website where I read the Data Making pattern is here:
https://www.thonky.com/qr-code-tutorial/data-masking
The rules to create the first and second masking pattern for a QR code has been stated, namely:
For mask pattern #1, every even - numbered row in the QR matrix is masked.
For mask pattern #2, every third column in the QR matrix is masked.
The third to eighth masking patterns weren't stated and I wasn't able to find it when googled. The rules are applicable to particular masking pattern and there were eight rules, so there are eight masking pattern to be created.
My question is:
What are the 8 rules for 8 Mask patterns for QR code?
Thank you for helping me out!
I might be a little late, but I was doing the same thing, and found this image on Wikipedia:

Assigning result of chain matrix multiplication to a variable in Maxima

I'm trying to assign the result of a chain matrix multiplication in Maxima to a new variable. I'm not sure as a new user why line %o6 isn't the same as the previous and fully evaluate the chain. Also why when I enter the new variable name "B" I simply have "B" returned back to me and not ([32, 32], [32, 32]). Basic questions I know but I've searched the documentation for a number of hours, and tutorials, and the syntax that I'm supposed to use here to get what I guess I was expecting as output, is still unclear to me.
I can't tell for sure, but it appears that the problem is that B : A.A.A is entered holding the shift key for at least one of the spaces, and Shift+Space is interpreted as non-breaking space instead of ordinary space. This appears to be a known bug or at least a serious misfeature in wxMaxima; see: https://github.com/wxMaxima-developers/wxmaxima/issues/1031
(I say misfeature because Shift+Space --> non-breaking space is documented in the wxMaxima documentation, but it seems like a classic example of "bad affordance"; it is all too easy to do the wrong thing without knowing it. Anyway this is just my opinion.)
I built wxMaxima from current source code and it appears that Shift+Space is now not interpreted as non-breaking space in code, so B : A.A.A should have the expected effect even if shift key is held while typing space. The current version is 19.07.0-DevelopmentSnapshot. I poked through the commit log a bit, but I can't figure out which commit changed the behavior of Shift+Space, so it's possible that the problem is not fixed and it is just fortuitous that I am not encountering it.
There are two workarounds, if one doesn't want to hazard an upgrade. (1) Omit spaces. (2) Be careful to only type space without shift.
Hope this is helpful in some way.

How to find out the longest definition entry in an English dictionary text file?

I asked over at the English Stack Exchange, "What is the English word with the longest single definition?" The best answer they could give is that I would need a program that could figure out the longest entry in a (text) file listing dictionary definitions, by counting the amount of characters or words in a given entry, and then provide a list of the longest entries. I also asked at Superuser but they couldn't come up with an answer either, so I decided to give it a shot here.
I managed to find a dictionary file which converted to text has the following format:
a /a/ indefinite article (an before a vowel) 1 any, some, one (have a cookie). 2 one single thing (there’s not a store for miles). 3 per, for each (take this twice a day).
aardvark /ard-vark/ n an African mammal with a long snout that feeds on ants.
abacus /a-ba-kus, a-ba-kus/ n a counting frame with beads.
As you can see, each definition comes after the pronunciation (enclosed by slashes), and then either:
1) ends with a period, or
2) ends before an example (enclosed by parenthesis), or
3) follows a number and ends with a period or before an example, when a word has multiple definitions.
What I would need, then, is a function or program that can distinguish each definition (including considering multiple definitions of a single word as separate ones), then count the amount of characters and/or words within (ignoring the examples in parenthesis since that is not the proper definition), and finally provide a list of the longest definitions (I don't think I would need more than say, a top 20 or so to compare). If the file format was an issue, I can convert the file to PDF, EPUB, etc. with no problem. And, I guess ideally I would want to be able to choose between counting length by characters and by words, if it was possible.
How should I go to do this? I have little experience from programming classes I took a long time ago, but I think it's better to assume I know close to nothing about programming at all.
Thanks in advance.
I'm not going to write code for you, but I'll help think the problem through. Pick the programming language you're most familiar with from long ago, and give it a whack. When you run in to problems, come back and ask for help.
I'd chop this task up into a bunch of subproblems:
Read the dictionary file from the filesystem.
Chunk the file up into discrete entries. If it's a text file like you show, most programming languages have a facility to easily iterate linewise through a file (i.e. take a line ending character or character sequence as the separator).
Filter bad entries: in your example, your lines appear separated by an empty line. As you iterate, you'll just drop those.
Use your human observation and judgement to look for strong patterns in the data that you can give communicate as firm rules -- this is one of the central activities of programming. You've already started identifying some patterns in your question, i.e.
All entries have a preamble with the pronounciation and part of speech.
A multiple definition entry will be interspersed with lone numerals.
Otherwise, a single definition just follows the preamble.
Write the rules you've invented into code. It'll go something like this: First find a way to lop off the word itself and the preamble. With the remainder, identify multiple-def entries by presence of lone numerals or whatever; if it's not, treat it as single-def.
For each entry, iterate over each of the one-or-more definitions you've identified.
Write a function that will count a definition either word-wise or character-wise. If word-wise, you'll probably tokenize based on whitespace. Counting the length of a string character-wise is trivial in most programming languages. Why not implement both!
Keep a data structure in memory as you iterate the file to track "longest". For each definition in each entry, after you apply the length calculation, you'll compare against the previous longest entry. If the new one is longer, you'll record this new leading word and its word count in your data structure. Comparing 'greater than' and storing a variable are fundamental in most programming languages, so while this is the real meat of your program, this shouldn't be hard.
Implement some way to display your results once iteration is done. This may be as simple as a print statement.
Finally, write the glue code that lets you execute the program easily. A program like this could easily be a command-line tool that takes one or two arguments (the path to the file to be analyzed, perhaps you pass your desired counting method 'character|word' as an argument too, since you implemented both). Different languages vary in how easy it is to create an executable to run from the command line, but most support it, so it's a good option for tasks like this.

bit stuffing example

I'm going over an example that reads
Bit Stuffing. Suppose the following bit string is received by the data link layer from the network layer: 01110111101111101111110.
What is the resulting string after bit stuffing? Bold each bit that has been added.
Answer:
0111011110111110011111010
^ ^
How is this answer reached? My understanding is that bit stuffing works by inserting a certain sequence of bits (known as a flag value) at the beginning and end of a frame. What I don't get is:
We aren't told the flag value!
We aren't told how big a frame is, so how do we know where to put the flag?
Aditional Information: I think this network is Ethernet.
Aditional Information 2: The bit flag is 01111110
Honestly I think I understand but isn't the answer incomplete because they didn't add the flag 01111110 to the end or beginging? They just took care of when that pattern of bits appeared in the message.
Other example: here they do it too.
For framing in the network in the data link layer, there are some approaches that the bit-oriented is one of them.
It should be a way to know the start and the end of a frame which is transmitting on the link in the receiver side, so there are some format for framing like HDLC. You can see this.
In the many types of frame formats there are begging sequence (shows the start of frame) and ending sequence (shows the end of frame) and the body of frame which is the data.
The problem that might be occur is the appearance of the ending sequence in the body which may make discern of frame end incorrect.
For preventing from this problem, the sender of frame stuff some bit on the body for disarranging the pattern of ending sequence, this technique known as bit stuffing.
Look at this example:
bit sequence: 110101111101011111101011111110 (without bit stuffing)
bit sequence: 110101111100101111101010111110110 (with bit stuffing)
after 5 consecutive 1-bits, a 0-bit is stuffed.
stuffed bits are marked bold.
Consider: 0111011110111110*0111110*10
After it finds a 0 and then five consecutive 1 bits it stuffs with a 0. This assumes 0 bit stuffing which is common.
Bit Stuffing:
Input Stream: 0110111111100111110111111111100000
Stuffed Stream: 01101111101100111110011111011111000000
Unstuffed Stream: 0110111111100111110111111111100000
There isn't enough information in the question to answer it fully for "any" protocol, but Ethernet for example bit-stuffs frame content with a 0-bit after 5 consecutive 1-bits, which seems to be the case here.
As for the rest of your question about the framing, a hint is to look at what is supposed to be passed from the data link layer to the network layer. Is it a pre-framed bit of data or just the frame's content you're looking at?
Whenever you have to perform bit stuffing , you will always be given the starting and ending marker FLAG bit value
The easiest trick is to remove the last two bits of the flag and note down the new bit series , whenever you get the same series in your data , you have to stuff one bit there .
For eg -
Given data is 011111011110
Here my FLAG is 0111110 So my my new FLAG will be 01111
I will stuff a bit whenever I get (01111) in my data series ,
So , the data after stuffing will be -
01111(0)101111(0)0
Brackets imply stuffing here .

Which printable ASCII characters will usually appear in an english text?

I have been trying to solve Project Euler's problem #59 for a while, and I am having trouble because some of it seems somewhat more ambiguous than previous problems.
As background, the problem says that the given text file is encrypted text with the ASCII codes saved as numbers. The encryption method is to XOR 3 lowercase letters cyclically with the plaintext (so it is reversible). The problem asks for the key that decrypts the file to English text. How should I restrict the character set of my output to get the answer, without trying to sift through all possible plaintexts (26^3)?
I have tried restricting to letters, spaces, and punctuation, and that did not work.
To clarify: I want to determine, out of all printable ASCII characters, which ones I can probably discard and which ones I can expect to be in the plaintext string.
Have you tried two of the most "basic" and common tools in analyzing the algorithm used?
Analyze the frequency of the characters and try to match it against English letter frequency
Bruteforce using keys from a wordlist, most often common words are used as keys by "dumb" users
To analyze the frequency for this particular problem you would have to split the string every third element since the key is of length 3, you should now be able to produce three columns:
79 59 12
2 79 35
8 28 20
2 3 68
...
you have to analyse the frequency for each column, since now they are independent of the key.
Ok, actually took my time and constructed the 3 complete columns and counted the frequency for each of the columns and got the two most frequent item or each column:
Col1 Col2 Col3
71 79 68
2 1 1
Now if you check for instance: http://en.wikipedia.org/wiki/Letter_frequency
You have the most frequent letters, and don't forget you have spaces and other characters which is not present on that page, but I think you can assume that space is the most frequent character.
So now it is just a matter of xor:ing the most frequent characters in the table I provided with the most frequent characters in English language, and see if you get any lowercase characters, I found a three letter word which I think is the answer with only this data.
Good luck and by the way, it was a nice problem!
A possible solution is to simply assume the presence of a given three-character sequence in the encrypted text. You can use a three-letter word, or a three letter sequence which is likely to appear in English text (e.g. " a ": the letter 'a' enclosed between two spaces). Then simply try all possible positions of that sequence in the encrypted text. Each position allows you to simply recompute the key, then decrypt the whole text into a file.
Since the original text has length 1201, you get 1199 files to skim through. At that point it is only a matter of patience, but you can make it much faster by using a simple text search utility on another frequent sequence in English (e.g. "are"), for instance with the Unix tool grep.
I did just that, and got the decrypted text in less than five minutes.
I'll admit upfront I'm not familiar with an XOR cipher.
However, it seems very similar to the concept of the vigenere cipher. Escpecially in the line where they mention for unbreakable encryption the keylength equals the message length. That screams Vernam Cipher.
As mentioned in the other answer, the strategical approach to breaking a vigenere cipher involves a probabilistic approach. I will not go into detail because most of the theory I learned was relatively complicated, but it can be found here keeping in mind that vignere is a series of caesar ciphers.
The problem makes it easy for you though because you already know the keylength. Because of that, as you mentioned, you can simply bruteforce by trying every single 3 letter combination.
Here's what I would do: take a reasonably sized chunk of the ciphertext, say maybe 10-20 characters, and try the brute force approach on that. Keep track of all the keys that seem to create understandable sequences of letters and then use those on the whole ciphertext. That way we can employ the obvious brute forcing method, but without bruteforcing the entire problem, so I don't think you'll have to worry about limiting your output.
That said, I agree that as you're creating the output, if you ever get a non printable character, you could probably break your loop and move on to the next key. I wouldn't try anything more specific than that because who knows what the original message could have, never make assumptions about the data you're dealing with. Short circuiting logic like that is always a good idea, especially when implementing a brute force solution.
Split the ciphertext into 3.
Ciphertext1 comprises the 1st, 4th, 7th, 10th...numbers
Ciphertext2 comprises the 2nd, 5th, 8th, 11th...numbers
Ciphertext3 comprises the 3rd, 6th, 9th, 12th...numbers
Now you know that each cyphertext is encrypted with the same key letter. Now do a standard frequency analysis on it. That should give you enough clues as to what the letter is.
I just solved this problem a few days ago. Without spoiling it for you, I want to describe my approach to this problem. Some of what I say may be redundant to what you already knew, but was part of my approach.
First I assumed that the key is exactly as described, three lowercase ASCII letters. So I began brute forcing at 'aaa' and went to 'zzz'. While decrypting, if any resulting byte was a value lower than 32 (the ASCII value of space, the lowest "printable" ASCII value) or higher than 126 (the ASCII value of the tilde '~' which is the highest printable character in ASCII) than I assumed the key was invalid because any value outside 32 and 126 would be an invalid character for a plain text stretch of English. As soon as a single byte is outside of this range, I stopped decrypting and went to the next possible key.
Once I decrypted the entire message using a particular key (after passing the first test of all bytes being printable characters), I needed a way to verify it as a valid decryption. I was expecting the result to be a simple list of words with no particular order or meaning. Through other cryptography experience, I thought back to letter frequency, and most simply that your average English word in text is 5 characters long. The file contains 1201 input bytes. So that would mean that there would be (on average) 240 words. After decrypting, I counted how many spaces were in the resulting output string. Since Project Euler is anything but average, I compared the number of spaces to 200 accounting for longer, more obscure words. When an output had more than 200 spaces in it, I printed out the key it was decrypted with and the output text. The one and only output that has more than 200 spaces is the answer. Let me tell you that it's more than obvious that you have the answer when you see it.
Something to point out is that the answer to the question is NOT the key. It is the sum of all the ASCII values of the output string. This approach will also solve the equation under the one minute mark, in fact, it times in around 3 or 4 seconds.

Resources