Bit stuffing help. GATE-CS Set 3 2014 - networking

A bit-stuffing based framing protocol uses an 8-bit delimiter pattern of 01111110. If the output bit-string after stuffing is 01111100101, then the input bit-string is
(A) 0111110100
(B) 0111110101
(C) 0111111101
(D) 0111111111
Correct answer given is B.
My question is why 1 is added after five 1's from left even when delimiter has six continuous 1's.
I think we will add 1 only when we get six continuous 1's, to avoid a 0.
Correct me if I am wrong.

The delimiter given 01111110. Delimiter basically used to determine the start and end of the frame. So we need to make sure if the same pattern(01111110) is also in data frame then receiver will not think of it as start or end of frame rather a valid data portion. That's why after '011111' of data bits, one '0' bit is stuffed to make sure it will not give impression of start or end of frame.
When the receiver receives ,it checks for consecutive five ones and if the next bit is zero then it drops it(If next bit is 1 instead of 0 then check the next bit of this bit ,if that is 0 then it is delimiter else error has occured). This is known as '0' bit stuffing.

Related

how can you sort this based on prep time without sorting the first line, in unix with vi or vim?

name;ingredients;diet;prep time;cook time;flavor profile;course;state (region)
Lassi;yogurt,milk,nuts,sugar;vegetarian;5;5;sweet;dessert;Punjab (North)
Papad;urad dal,sev,lemon juice,chopped tomatoes;vegetarian;5;5;spicy;snack;Bihar (South)
...
You might get away with:
:2,$sort n
The range 2,$ makes sure you skip the first line.
The n option ensures a numeric sort.
And also from :help sort:
With [n] sorting is done on the first decimal number in the line (after or inside a {pattern} match). One leading '-' is included in the number.
And since prep time is the first field containing a decimal number, you get the right result.
(This will fail if you have any name, ingredient or diet containing a number.)

How to split data block in Fortran?

I need to count number of lines in each block and count number of blocks in order to read it properly afterwards. Can anybody suggest a sample piece of code in Fortran?
My input file goes like this:
# Section 1 at 50% (Name of the block and its number and value)
1 2 3 (Three numbers in line with random number of lines)
...
1 2 3
# Section 2 at 100% (And then again Name of the block)
1 2 3...
and so on.
The code goes below. It works fine with 1 set of data, but when it meets " # " again it just stops providing data only about one section. Can not jump to another section:
integer IS, NOSEC, count
double precision SPAN
character(LEN=100):: NAME, NAME2, AT
real whatever
101 read (10,*,iostat=ios) NAME, NAME2, IS, AT, SPAN
if (ios/=0) go to 200
write(6,*) IS, SPAN
count = 0
102 read(10,*,iostat=ios) whatever
if (ios/=0) go to 101
count = count + 1
write(6,*) whatever
go to 102
200 write(6,*) 'Section span =', SPAN
So the first loop (101) suppose to read parameters of the Block and second (102) counts the number of lines in block with 'ncount' as the only parameter which is needed. However, when after 102 it suppose to jump back to 101 to start a new block, it just goes to 200 instead (printing results of the operation), which means it couldn't read the data about second block.
Let's say your file contains two valid types of lines:
Block headers which begin with '#, and
Data lines which begin with a digit 0 through 9
Let's add further conditions:
Leading whitespace is ignored,
Lines which don't match the first two patterns are considered comments and are ignored
Comment lines do not terminate a block; blocks are only terminated when a new block is found or the end of the file is reached,
Data lines must follow a block header (the first non-comment line in a file must be a block header),
Blocks may be empty, and
Files may contain no blocks
You want to know the number of blocks and how many data lines are in each block but you don't know how many blocks there might be. A simple dynamic data structure will help with record-keeping. The number of blocks may be counted with just an integer, but a singly-linked list with nodes containing a block ID, a data line count, and a pointer to the next node will gracefully handle an arbitrarily large blob of data. Create a head node with ID = 0, a data line count of 0, and the pointer nullify()'d.
The Fortran Wiki has a pile of references on singly-linked lists: http://fortranwiki.org/fortran/show/Linked+list
Since the parsing is simple (e.g. no backtracking), you can process each line as it is read. Iterate over the lines in the file, use adjustl() to dispose of leading whitespace, then check the first two characters: if they are '#, increment your block counter by one and add a new node to the list and set its ID to the value of the block counter (1), and process the next line.
Aside: I have a simple character function called munch() which is just trim(adjustl()). Great for stripping whitespace off both ends of a string. It doesn't quite act like Perl's chop() or chomp() and Fortran's trim() is more of an rtrim() so munch() was the next best name.
If the line doesn't match a block header, check if the first character is a digit; index('0123456789', line(1:1)) is greater than zero if the the first character of line is a digit, otherwise it returns 0. Increment the data line count in the head node of the linked list and go on to process the next line.
Note that if the block count is zero, this is an error condition; write out a friendly "Data line seen before block header" error message with the last line read and (ideally) the line number in the file. It takes a little more effort but it's worth it from the user's standpoint, especially if you're the main user.
Otherwise if the line isn't a block header or a data line, process the next line.
Eventually you'll hit the end of the file and you'll be left with the block counter and a linked list that has at least one node. Depending on how you want to use this data later, you can dynamically allocate an array of integers the length of the block counter, then transfer the data line count from the linked list to the array. Then you can deallocate the linked list and get direct access to the data line count for any block because the block index matches the array index.
I use a similar technique for reading arbitrarily long lists of data. The singly-linked list is extremely simple to code and it avoids the irritation of having to reallocate and expand a dynamic array. But once the amount of data is known, I carve out a dynamic array the exact size I need and copy the data from the linked list so I can have fast access to the data instead of needing to walk the list all the time.
Since Fortran doesn't have a standard library worth mentioning, I also use a variant of this technique with an insertion sort to simultaneously read and sort data.
So sorry, no code but enough to get you started. Defining your file format is key; once you do that, the parser almost writes itself. It also makes you think about exceptional conditions: data before block header, how you want to treat whitespace and unrecognized input, etc. Having this clearly written down is incredibly helpful if you're planning on sharing data; the Fortran world is littered with poorly-documented custom data file formats. Please don't add to the wreckage...
Finally, if you're really ambitious/insane, you could write this as a recursive routine and make your functional programming friends' heads explode. :)

How to GS1 parse 01.10.17.21?

I have some products which have 2d GS1 bar codes on them. Most have the format 01.17.10 which is GTIN.Expiry Date.Lot Number.
This makes sense as 01 and 17 are fixed length, so can be parsed easily, just by splitting the string in the appropriate place.
However, I also have some in the format 01.10.17.21 (GTIN.Lot.Expiry.Serial Number) which doesn't make sense because Lot and Serial number are variable length, meaning I cannot use position to decode the various elements. Also, I cannot search for the AIs as they could legitimately appear in the data.
It seems that I've no way of reliably decoding this format. Am I missing something?
Thanks!
According to the GS 1 website, "More than one AI can be carried in one bar code. When this happens, AIs with a fixed length data content (e.g., SSCC has a fixed length of 18 digits) are placed at the beginning and AI with variable lengths are placed at the end. If more than one variable length AI is placed in one bar code, then a special "function" character is used to tell the scanner system when one ends and the other one starts."
So it looks like they intend for you to order your AIs with the fixed width identifiers first. Then separate the variable-width fields with a function character, which it, appears is FNC1, but implementing that that will depend on the barcode symbology you are using, It may be different between DataMatrix, Code 128 and QR Code for example.

bit stuffing example

I'm going over an example that reads
Bit Stuffing. Suppose the following bit string is received by the data link layer from the network layer: 01110111101111101111110.
What is the resulting string after bit stuffing? Bold each bit that has been added.
Answer:
0111011110111110011111010
^ ^
How is this answer reached? My understanding is that bit stuffing works by inserting a certain sequence of bits (known as a flag value) at the beginning and end of a frame. What I don't get is:
We aren't told the flag value!
We aren't told how big a frame is, so how do we know where to put the flag?
Aditional Information: I think this network is Ethernet.
Aditional Information 2: The bit flag is 01111110
Honestly I think I understand but isn't the answer incomplete because they didn't add the flag 01111110 to the end or beginging? They just took care of when that pattern of bits appeared in the message.
Other example: here they do it too.
For framing in the network in the data link layer, there are some approaches that the bit-oriented is one of them.
It should be a way to know the start and the end of a frame which is transmitting on the link in the receiver side, so there are some format for framing like HDLC. You can see this.
In the many types of frame formats there are begging sequence (shows the start of frame) and ending sequence (shows the end of frame) and the body of frame which is the data.
The problem that might be occur is the appearance of the ending sequence in the body which may make discern of frame end incorrect.
For preventing from this problem, the sender of frame stuff some bit on the body for disarranging the pattern of ending sequence, this technique known as bit stuffing.
Look at this example:
bit sequence: 110101111101011111101011111110 (without bit stuffing)
bit sequence: 110101111100101111101010111110110 (with bit stuffing)
after 5 consecutive 1-bits, a 0-bit is stuffed.
stuffed bits are marked bold.
Consider: 0111011110111110*0111110*10
After it finds a 0 and then five consecutive 1 bits it stuffs with a 0. This assumes 0 bit stuffing which is common.
Bit Stuffing:
Input Stream: 0110111111100111110111111111100000
Stuffed Stream: 01101111101100111110011111011111000000
Unstuffed Stream: 0110111111100111110111111111100000
There isn't enough information in the question to answer it fully for "any" protocol, but Ethernet for example bit-stuffs frame content with a 0-bit after 5 consecutive 1-bits, which seems to be the case here.
As for the rest of your question about the framing, a hint is to look at what is supposed to be passed from the data link layer to the network layer. Is it a pre-framed bit of data or just the frame's content you're looking at?
Whenever you have to perform bit stuffing , you will always be given the starting and ending marker FLAG bit value
The easiest trick is to remove the last two bits of the flag and note down the new bit series , whenever you get the same series in your data , you have to stuff one bit there .
For eg -
Given data is 011111011110
Here my FLAG is 0111110 So my my new FLAG will be 01111
I will stuff a bit whenever I get (01111) in my data series ,
So , the data after stuffing will be -
01111(0)101111(0)0
Brackets imply stuffing here .

Which printable ASCII characters will usually appear in an english text?

I have been trying to solve Project Euler's problem #59 for a while, and I am having trouble because some of it seems somewhat more ambiguous than previous problems.
As background, the problem says that the given text file is encrypted text with the ASCII codes saved as numbers. The encryption method is to XOR 3 lowercase letters cyclically with the plaintext (so it is reversible). The problem asks for the key that decrypts the file to English text. How should I restrict the character set of my output to get the answer, without trying to sift through all possible plaintexts (26^3)?
I have tried restricting to letters, spaces, and punctuation, and that did not work.
To clarify: I want to determine, out of all printable ASCII characters, which ones I can probably discard and which ones I can expect to be in the plaintext string.
Have you tried two of the most "basic" and common tools in analyzing the algorithm used?
Analyze the frequency of the characters and try to match it against English letter frequency
Bruteforce using keys from a wordlist, most often common words are used as keys by "dumb" users
To analyze the frequency for this particular problem you would have to split the string every third element since the key is of length 3, you should now be able to produce three columns:
79 59 12
2 79 35
8 28 20
2 3 68
...
you have to analyse the frequency for each column, since now they are independent of the key.
Ok, actually took my time and constructed the 3 complete columns and counted the frequency for each of the columns and got the two most frequent item or each column:
Col1 Col2 Col3
71 79 68
2 1 1
Now if you check for instance: http://en.wikipedia.org/wiki/Letter_frequency
You have the most frequent letters, and don't forget you have spaces and other characters which is not present on that page, but I think you can assume that space is the most frequent character.
So now it is just a matter of xor:ing the most frequent characters in the table I provided with the most frequent characters in English language, and see if you get any lowercase characters, I found a three letter word which I think is the answer with only this data.
Good luck and by the way, it was a nice problem!
A possible solution is to simply assume the presence of a given three-character sequence in the encrypted text. You can use a three-letter word, or a three letter sequence which is likely to appear in English text (e.g. " a ": the letter 'a' enclosed between two spaces). Then simply try all possible positions of that sequence in the encrypted text. Each position allows you to simply recompute the key, then decrypt the whole text into a file.
Since the original text has length 1201, you get 1199 files to skim through. At that point it is only a matter of patience, but you can make it much faster by using a simple text search utility on another frequent sequence in English (e.g. "are"), for instance with the Unix tool grep.
I did just that, and got the decrypted text in less than five minutes.
I'll admit upfront I'm not familiar with an XOR cipher.
However, it seems very similar to the concept of the vigenere cipher. Escpecially in the line where they mention for unbreakable encryption the keylength equals the message length. That screams Vernam Cipher.
As mentioned in the other answer, the strategical approach to breaking a vigenere cipher involves a probabilistic approach. I will not go into detail because most of the theory I learned was relatively complicated, but it can be found here keeping in mind that vignere is a series of caesar ciphers.
The problem makes it easy for you though because you already know the keylength. Because of that, as you mentioned, you can simply bruteforce by trying every single 3 letter combination.
Here's what I would do: take a reasonably sized chunk of the ciphertext, say maybe 10-20 characters, and try the brute force approach on that. Keep track of all the keys that seem to create understandable sequences of letters and then use those on the whole ciphertext. That way we can employ the obvious brute forcing method, but without bruteforcing the entire problem, so I don't think you'll have to worry about limiting your output.
That said, I agree that as you're creating the output, if you ever get a non printable character, you could probably break your loop and move on to the next key. I wouldn't try anything more specific than that because who knows what the original message could have, never make assumptions about the data you're dealing with. Short circuiting logic like that is always a good idea, especially when implementing a brute force solution.
Split the ciphertext into 3.
Ciphertext1 comprises the 1st, 4th, 7th, 10th...numbers
Ciphertext2 comprises the 2nd, 5th, 8th, 11th...numbers
Ciphertext3 comprises the 3rd, 6th, 9th, 12th...numbers
Now you know that each cyphertext is encrypted with the same key letter. Now do a standard frequency analysis on it. That should give you enough clues as to what the letter is.
I just solved this problem a few days ago. Without spoiling it for you, I want to describe my approach to this problem. Some of what I say may be redundant to what you already knew, but was part of my approach.
First I assumed that the key is exactly as described, three lowercase ASCII letters. So I began brute forcing at 'aaa' and went to 'zzz'. While decrypting, if any resulting byte was a value lower than 32 (the ASCII value of space, the lowest "printable" ASCII value) or higher than 126 (the ASCII value of the tilde '~' which is the highest printable character in ASCII) than I assumed the key was invalid because any value outside 32 and 126 would be an invalid character for a plain text stretch of English. As soon as a single byte is outside of this range, I stopped decrypting and went to the next possible key.
Once I decrypted the entire message using a particular key (after passing the first test of all bytes being printable characters), I needed a way to verify it as a valid decryption. I was expecting the result to be a simple list of words with no particular order or meaning. Through other cryptography experience, I thought back to letter frequency, and most simply that your average English word in text is 5 characters long. The file contains 1201 input bytes. So that would mean that there would be (on average) 240 words. After decrypting, I counted how many spaces were in the resulting output string. Since Project Euler is anything but average, I compared the number of spaces to 200 accounting for longer, more obscure words. When an output had more than 200 spaces in it, I printed out the key it was decrypted with and the output text. The one and only output that has more than 200 spaces is the answer. Let me tell you that it's more than obvious that you have the answer when you see it.
Something to point out is that the answer to the question is NOT the key. It is the sum of all the ASCII values of the output string. This approach will also solve the equation under the one minute mark, in fact, it times in around 3 or 4 seconds.

Resources