bit stuffing example - networking

I'm going over an example that reads
Bit Stuffing. Suppose the following bit string is received by the data link layer from the network layer: 01110111101111101111110.
What is the resulting string after bit stuffing? Bold each bit that has been added.
Answer:
0111011110111110011111010
^ ^
How is this answer reached? My understanding is that bit stuffing works by inserting a certain sequence of bits (known as a flag value) at the beginning and end of a frame. What I don't get is:
We aren't told the flag value!
We aren't told how big a frame is, so how do we know where to put the flag?
Aditional Information: I think this network is Ethernet.
Aditional Information 2: The bit flag is 01111110
Honestly I think I understand but isn't the answer incomplete because they didn't add the flag 01111110 to the end or beginging? They just took care of when that pattern of bits appeared in the message.
Other example: here they do it too.

For framing in the network in the data link layer, there are some approaches that the bit-oriented is one of them.
It should be a way to know the start and the end of a frame which is transmitting on the link in the receiver side, so there are some format for framing like HDLC. You can see this.
In the many types of frame formats there are begging sequence (shows the start of frame) and ending sequence (shows the end of frame) and the body of frame which is the data.
The problem that might be occur is the appearance of the ending sequence in the body which may make discern of frame end incorrect.
For preventing from this problem, the sender of frame stuff some bit on the body for disarranging the pattern of ending sequence, this technique known as bit stuffing.
Look at this example:
bit sequence: 110101111101011111101011111110 (without bit stuffing)
bit sequence: 110101111100101111101010111110110 (with bit stuffing)
after 5 consecutive 1-bits, a 0-bit is stuffed.
stuffed bits are marked bold.

Consider: 0111011110111110*0111110*10
After it finds a 0 and then five consecutive 1 bits it stuffs with a 0. This assumes 0 bit stuffing which is common.

Bit Stuffing:
Input Stream: 0110111111100111110111111111100000
Stuffed Stream: 01101111101100111110011111011111000000
Unstuffed Stream: 0110111111100111110111111111100000

There isn't enough information in the question to answer it fully for "any" protocol, but Ethernet for example bit-stuffs frame content with a 0-bit after 5 consecutive 1-bits, which seems to be the case here.
As for the rest of your question about the framing, a hint is to look at what is supposed to be passed from the data link layer to the network layer. Is it a pre-framed bit of data or just the frame's content you're looking at?

Whenever you have to perform bit stuffing , you will always be given the starting and ending marker FLAG bit value
The easiest trick is to remove the last two bits of the flag and note down the new bit series , whenever you get the same series in your data , you have to stuff one bit there .
For eg -
Given data is 011111011110
Here my FLAG is 0111110 So my my new FLAG will be 01111
I will stuff a bit whenever I get (01111) in my data series ,
So , the data after stuffing will be -
01111(0)101111(0)0
Brackets imply stuffing here .

Related

How to find out the longest definition entry in an English dictionary text file?

I asked over at the English Stack Exchange, "What is the English word with the longest single definition?" The best answer they could give is that I would need a program that could figure out the longest entry in a (text) file listing dictionary definitions, by counting the amount of characters or words in a given entry, and then provide a list of the longest entries. I also asked at Superuser but they couldn't come up with an answer either, so I decided to give it a shot here.
I managed to find a dictionary file which converted to text has the following format:
a /a/ indefinite article (an before a vowel) 1 any, some, one (have a cookie). 2 one single thing (there’s not a store for miles). 3 per, for each (take this twice a day).
aardvark /ard-vark/ n an African mammal with a long snout that feeds on ants.
abacus /a-ba-kus, a-ba-kus/ n a counting frame with beads.
As you can see, each definition comes after the pronunciation (enclosed by slashes), and then either:
1) ends with a period, or
2) ends before an example (enclosed by parenthesis), or
3) follows a number and ends with a period or before an example, when a word has multiple definitions.
What I would need, then, is a function or program that can distinguish each definition (including considering multiple definitions of a single word as separate ones), then count the amount of characters and/or words within (ignoring the examples in parenthesis since that is not the proper definition), and finally provide a list of the longest definitions (I don't think I would need more than say, a top 20 or so to compare). If the file format was an issue, I can convert the file to PDF, EPUB, etc. with no problem. And, I guess ideally I would want to be able to choose between counting length by characters and by words, if it was possible.
How should I go to do this? I have little experience from programming classes I took a long time ago, but I think it's better to assume I know close to nothing about programming at all.
Thanks in advance.
I'm not going to write code for you, but I'll help think the problem through. Pick the programming language you're most familiar with from long ago, and give it a whack. When you run in to problems, come back and ask for help.
I'd chop this task up into a bunch of subproblems:
Read the dictionary file from the filesystem.
Chunk the file up into discrete entries. If it's a text file like you show, most programming languages have a facility to easily iterate linewise through a file (i.e. take a line ending character or character sequence as the separator).
Filter bad entries: in your example, your lines appear separated by an empty line. As you iterate, you'll just drop those.
Use your human observation and judgement to look for strong patterns in the data that you can give communicate as firm rules -- this is one of the central activities of programming. You've already started identifying some patterns in your question, i.e.
All entries have a preamble with the pronounciation and part of speech.
A multiple definition entry will be interspersed with lone numerals.
Otherwise, a single definition just follows the preamble.
Write the rules you've invented into code. It'll go something like this: First find a way to lop off the word itself and the preamble. With the remainder, identify multiple-def entries by presence of lone numerals or whatever; if it's not, treat it as single-def.
For each entry, iterate over each of the one-or-more definitions you've identified.
Write a function that will count a definition either word-wise or character-wise. If word-wise, you'll probably tokenize based on whitespace. Counting the length of a string character-wise is trivial in most programming languages. Why not implement both!
Keep a data structure in memory as you iterate the file to track "longest". For each definition in each entry, after you apply the length calculation, you'll compare against the previous longest entry. If the new one is longer, you'll record this new leading word and its word count in your data structure. Comparing 'greater than' and storing a variable are fundamental in most programming languages, so while this is the real meat of your program, this shouldn't be hard.
Implement some way to display your results once iteration is done. This may be as simple as a print statement.
Finally, write the glue code that lets you execute the program easily. A program like this could easily be a command-line tool that takes one or two arguments (the path to the file to be analyzed, perhaps you pass your desired counting method 'character|word' as an argument too, since you implemented both). Different languages vary in how easy it is to create an executable to run from the command line, but most support it, so it's a good option for tasks like this.

Bit stuffing help. GATE-CS Set 3 2014

A bit-stuffing based framing protocol uses an 8-bit delimiter pattern of 01111110. If the output bit-string after stuffing is 01111100101, then the input bit-string is
(A) 0111110100
(B) 0111110101
(C) 0111111101
(D) 0111111111
Correct answer given is B.
My question is why 1 is added after five 1's from left even when delimiter has six continuous 1's.
I think we will add 1 only when we get six continuous 1's, to avoid a 0.
Correct me if I am wrong.
The delimiter given 01111110. Delimiter basically used to determine the start and end of the frame. So we need to make sure if the same pattern(01111110) is also in data frame then receiver will not think of it as start or end of frame rather a valid data portion. That's why after '011111' of data bits, one '0' bit is stuffed to make sure it will not give impression of start or end of frame.
When the receiver receives ,it checks for consecutive five ones and if the next bit is zero then it drops it(If next bit is 1 instead of 0 then check the next bit of this bit ,if that is 0 then it is delimiter else error has occured). This is known as '0' bit stuffing.

Either unformatted I/O is giving absurd values, or I'm reading them incorrectly in R

I have a problem with unformatted data and I don't know where, so I will post my entire workflow.
I'm integrating my own code into an existing climate model, written in fortran, to generate a custom variable from the model output. I have been successful in getting sensible and readable formatted output (values up to the thousands), but when I try to write unformatted output then the values I get are absurd (on the scale of 1E10).
Would anyone be able to take a look at my process and see where I might be going wrong?
I'm unable to make a functional replication of the entire code used to output the data, however the relevant snippet is;
c write customvar to file [UNFORMATTED]
open (unit=10,file="~/output_test_u",form="unformatted")
write (10)customvar
close(10)
c write customvar to file [FORMATTED]
c open (unit=10,file="~/output_test_f")
c write (10,*)customvar
c close(10)
The model was run twice, once with the FORMATTED code commented out and once with the UNFORMATTED code commented out, although I now realise I could have run it once if I'd used different unit numbers. Either way, different runs should not produce different values.
The files produced are available here;
unformatted(9kb)
formatted (31kb)
In order to interpret these files, I am using R. The following code is what I used to read each file, and shape them into comparable matrices.
##Read in FORMATTED data
formatted <- scan(file="output_test_f",what="numeric")
formatted <- (matrix(formatted,ncol=64,byrow=T))
formatted <- apply(formatted,1:2,as.numeric)
##Read in UNFORMATTED data
to.read <- file("output_test_u","rb")
unformatted <- readBin(to.read,integer(),n=10000)
close(to.read)
unformatted <- unformatted[c(-1,-2050)] #to remove padding
unformatted <- matrix(unformatted,ncol=64,byrow=T)
unformatted <- apply(unformatted,1:2,as.numeric)
In order to check the the general structure of the data between the two files is the same, I checked that zero and non-zero values were in the same position in each matrix (each value represents a grid square, zeros represent where there was sea) using;
as.logical(unformatted)-as.logical(formatted)
and an array of zeros was returned, indicating that it is the just the values which are different between the two, and not the way I've shaped them.
To see how the values relate to each other, I tried plotting formatted vs unformatted values (note all zero values are removed)
As you can see they have some sort of relationship, so the inflation of the values is not random.
I am completely stumped as to why the unformatted data values are so inflated. Is there an error in the way I'm reading and interpreting the file? Is there some underlying way that fortran writes unformatted data that alters the values?
The usual method that Fortran uses to write unformatted file is:
A leading record marker, usually four bytes, with the length of the following record
The actual data
A trailing record marker, the same number of bytes as the leading record marker, with the same information (used for BACKSPACE)
The usual number of bytes in the record marker is four bytes, but eight bytes have also been sighted (e.g. very old versions of gfortran for 64-bit systems).
If you don't want to deal with these complications, just use stream access. On the Fortran side, open the file with
OPEN(unit=10,file="foo.dat",form="unformatted",access="stream")
This will give you a stream-oriented I/O model like C's binary streams.
Otherwise, you would have to look at your compiler's documentation to see how exactly unformatted I/O is implemented, and take care of the record markers from the R side. A word of caution here: Different compilers have different methods of dealing with very long records of more than 2^31 bytes, even if they have four-byte record markers.
Following on from the comments of #Stibu and #IanH, I experimented with the R code and found that the source of error was the incorrect handling of the byte size in R. Explicitly specifying a bite size of 4, i.e
unformatted <- readBin(to.read,integer(),size="4",n=10000)
allows the data to be perfectly read in.

Missing Time Series data in Hadoop

I have a big text file (in TBs), every line has a timestamp and some other data, like this:
timestamp1,data
timestamp2,data
timestamp5,data
timestamp7,data
...
timestampN,data
This file is ordered by timestamp but there might be gaps between consecutive timestamps. I need to fill those gaps and write the new file.
Can this be done in Hadoop Map Reduce? The reason for asking this question,to interpolate the missing lines I need the previous and next lines too. For Eg. To interpolate timestamp6, I need the values in timestamp5 and timestamp7. So what if, starting from timestamp7 sits in another data block in which case I will not be able to calculate timestamp6 at all..
Any other algorithm/solution? Maybe this can not be done with mapreduce? Can we do this in RHADOOP?
(Pig/Hive solutions are also valid)
Though my suggestion is a bit tedious and may impact a little bit performance also. You can implement your own RecordReader and at the end of all lines in the current split, get the first line of next split using its block location. I am suggesting this because, hadoop itself do this if last line of any mapper is incomplete. Hope this helps!!

Finding spaces in many time pad

I'm currently completing an online course in Cryptography, and have been give an exercise to complete. The course been running for a while and I know the answer is on the web but would like to complete it myself thought actions and research.
I have a list of 13 cipertext based on one/many time Pad - the same cipher key has been used to encrypt plain text. My task is to decrypt the last ciphertext.
The steps I have taken so far are based on cribing techniques at the following location:
http://adamsblog.aperturelabs.com/2013/05/back-to-skule-one-pad-two-pad-me-pad.html
https://crypto.stackexchange.com/questions/6020/many-time-pad-attack
and I'm using the following tool to XOR the ciphertexts.
In the tutorial I'm following the author suggest that the first step is to identify spaces I have tried to follow the steps but still cannot Identify the spaces once I Xor the cipher's
When I XOR the first cipertext i.e cipher 1 with cipher 2 and 3 I get the following:
15040900180952180C4549114F190E0159490A49120E00521201064C0A4F144F281B13490341004F480203161600000046071B1E4119061D1A411848090F4E0D0000161A0A41140C16160600151C00170B0B090653014E410D4C530F01150116000307520F104D200345020615041C541A49151744490F4C0D0015061C0A1F454F1F4509074D2F01544C080A090028061C1D002E413D004E0B141118
000D064819001E0303490A124C5615001647160C1515451A041D544D0B1D124C3F4F0252021707440D0B4C1100001E075400491E4F1F0A5211070A490B080B0A0700190D044E034F110A00001300490F054F0E08100357001E0853D4315FCEACFA7112C3E55D74AAF3394BB08F7504A8E5019C4E3E838E0F364946F31721A49AD2D24FF6775EFCB4F79FE4217A01B43CB5068BF3B52CA76543187274
000000003E010609164E0C07001F16520D4801490B09160645071950011D0341281B5253040F094C0D4F08010545050150050C1D544D061C5415044548090717074F0611454F164F1F101F411A4F430E0F0219071A0B411505034E461C1B0310454F12480D55040F18451E1B1F0A1C541646410D054C0D4C1B410F1B1B03149AD2D24FF6775EFCB4F79FE4217A01B43CB5068BF3B52CA76543187274
I'm getting confused at to where the space are based on the ASCII table which gives a 20(HEX) to the value a space.
Sorry if this is not enough information I can if more is required. Thanks Mark.
The question you're linking to has the correct answer.
You don't appear to use that method. You should have c1⊕c2 and c1⊕c3 but your question contains three blocks of strings, one with 6 lines, one with 5 lines and one with 4 lines. That's insufficient for us to even guess what your problem is. Go back to the linked answer, read it, and follow the steps listed there.

Resources