What happens when a long TCP segment is sent? - tcp

I uploaded a txt file to a server and captured the upload with Wireshark.
The issue is that there is a segment that is extreamly long and right after that i get ack's from the server for lower sequences than i should.
on line 865 my PC sends a segment with the length of 12240.
I should get an ack that is bigger than 12240 and yet this is not the case.
Wireshark capture image

Look at frame 862. The host 128.119.245.12 is advertising an MSS of 1360 bytes. So, the maximum size of TCP segment sent by 10.0.0.12 will only contain at most 1360 bytes, despite what is being shown by Wireshark. The reason for the seemingly larger TCP segments - 12240 and 2720 bytes - is because the capture engine is receiving the packets before they are segmented by the NIC. If you were capturing the traffic on an external device, such as from a SPAN port or via a TAP, you wouldn't see that 12240 byte segment, but rather you'd see 9 1360 byte segments sent instead. So, this is the reason why the receiving host's ACK number doesn't match the 12240; it ACKs each 1360 byte segment it receives instead. It isn't until frame 930 that all 9 1360 byte segments comprising the apparent 12240 segment are all acknowledged, and you can easily calculate all of this with some SEQ/ACK analysis.
Here are the SEQ #'s for host 10.0.0.12 along with the ACK #'s from host 128.119.245.12, and I've included the breakdown of the 9 1360 byte segments in brackets, [], that would have actually been seen on the wire had Wireshark been run on an external machine instead of on the 10.0.0.12 host:
Frame # 10.0.0.12 128.119.245.12 Comments
SEQ Len ACK
------- ----------- -------------- -----------------------------
822 0 0
862 1 Next expected SEQ # is now 1
863 1 0
864 1 716
865 717 12240 SEQ: 1 + 716 = 717
[865-1 2077 1360 SEQ: 717 + 1360 = 2077]
[865-2 3437 1360 SEQ: 2077 + 1360 = 3437]
[865-3 4797 1360 SEQ: 3437 + 1360 = 4797]
[865-4 6157 1360 SEQ: 4797 + 1360 = 6157]
[865-5 7517 1360 SEQ: 6157 + 1360 = 7517]
[865-6 8877 1360 SEQ: 7517 + 1360 = 8877]
[865-7 10237 1360 SEQ: 8877 + 1360 = 10237]
[865-8 11597 1360 SEQ: 10237 + 1360 = 11597]
[865-9 12957 1360 SEQ: 11597 + 1360 = 12957]
905 717 ACK: The ACK to frame 864
906 12957 1360 SEQ: 717 + 12240 = 12957
907 2077 ACK: The ACK to "frame" 865-1
908 14317 2720 SEQ: 12957 + 1360 = 14317
912 3437 ACK: The ACK to "frame" 865-2
913 17037 2720 SEQ: 14317 + 2720 = 17037
915 4797 ACK: The ACK to "frame" 865-3
916 19757 2720 SEQ: 17037 + 2720 = 19757
917 6157 ACK: The ACK to "frame" 865-4
918 22477 2720 SEQ: 19757 + 2720 = 22477
919 7517 ACK: The ACK to "frame" 865-5
920 25197 2720 SEQ: 22477 + 2720 = 25197
923 8877 ACK: The ACK to "frame" 865-6
924 27917 2720 SEQ: 25197 + 2720 = 27917
925 10237 ACK: The ACK to "frame" 865-7
926 30637 2720 SEQ: 27917 + 2720 = 30637
927 11597 ACK: The ACK to "frame" 865-8
928 33357 2720 SEQ: 30637 + 2720 = 33357
930 12957 ACK: The ACK to "frame" 865-9
------- ----------- -------------- -----------------------------
For further reading regarding this topic, I'll refer you to an excellent article written by Jasper Bongertz titled, The drawbacks of local packet captures.

Related

select best indices from the result of ensemble using mRMR

I am using the R package mRMRe for feature selection and trying to get the indices of most common feature from the results of ensemble:
ensemble <- mRMR.ensemble(data = dd, target_indices = target_idx,solution_count = 5, feature_count = 30)
features_indices = as.data.frame(solutions(ensemble))
This give me the below data:
MR_1 MR_2 MR_3 MR_4 MR_5
2793 2794 2796 2795 2918
1406 1406 1406 1406 1406
2798 2800 2798 2798 2907
2907 2907 2907 2907 2800
2709 2709 2709 2709 2709
1350 2781 1582 1350 1582
2781 1350 2781 2781 636
2712 2712 2712 2712 2781
636 636 636 636 2779
2067 2067 2067 2067 2712
2328 2328 2357 2357 2067
2357 783 2328 2328 2328
772 2357 772 772 772
I want to use some sort of voting logic to select the most frequent index for each row across all columns.
For example in the above image :
1. For the first row there is no match - so select the first one.
2. There are some rows where min occurrence is 2 - so select that one.
3. In case of tie - check if any occurs thrice, if yes select that one, or else from the tied indices select the first occurring one.
May be I am making it too complex, but basically I want to select best indices from all the indices for each row from the dataframe.
Can someone please help me on this?
Here's a simple solution using apply:
apply(df, 1, function(x) { names(which.max(table(x))) })
which gives:
[1] "2793" "1406" "2798" "2907" "2709" "1350" "2781" "2712" "636" "2067" "2328" "2328" "772"
For each row, the function table counts occurrences of each unique element, then we return the name of the element with the maximum number of occurrences (if there is a tie, the first one is selected).

Read a tsv file and change the first column [duplicate]

This question already has answers here:
Only read selected columns
(5 answers)
Closed 5 years ago.
I have the following file and I am interested only on the 1st and last columns (14):
sp0000001-mRNA-1 f0651baa110098a342ff92218202e4d0 1016 Pfam PF00226 DnaJ domain 76 137 7.5E-18 T 02-05-2017 IPR001623 DnaJ domain
sp0000001-mRNA-1 f0651baa110098a342ff92218202e4d0 1016 Pfam PF05266 Protein of unknown function (DUF724) 832 1015 3.8E-41 T 02-05-2017 IPR007930 Protein of unknown function DUF724
sp0000001-mRNA-1 f0651baa110098a342ff92218202e4d0 1016 Pfam PF11926 Domain of unknown function (DUF3444) 419 607 2.6E-56 T 02-05-2017 IPR024593 Domain of unknown function DUF3444
sp0000005-mRNA-1 8db7c080b2bc76bf090fec8662fcae20 243 Pfam PF01472 PUA domain 155 232 1.3E-19 T 02-05-2017 IPR002478 PUA domain GO:0003723
sp0000006-mRNA-1 edf5c2bb6341fe44b3da447099a5b2df 282 Pfam PF03083 Sugar efflux transporter for intercellular exchange 198 261 1.4E-15 T 02-05-2017 IPR004316 SWEET sugar transporter GO:0016021
sp0000006-mRNA-1 edf5c2bb6341fe44b3da447099a5b2df 282 Pfam PF03083 Sugar efflux transporter for intercellular exchange 7 91 1.1E-25 T 02-05-2017 IPR004316 SWEET sugar transporter GO:0016021
sp0000006-mRNA-2 edf5c2bb6341fe44b3da447099a5b2df 282 Pfam PF03083 Sugar efflux transporter for intercellular exchange 198 261 1.4E-15 T 02-05-2017 IPR004316 SWEET sugar transporter GO:0016021
sp0000006-mRNA-2 edf5c2bb6341fe44b3da447099a5b2df 282 Pfam PF03083 Sugar efflux transporter for intercellular exchange 7 91 1.1E-25 T 02-05-2017 IPR004316 SWEET sugar transporter GO:0016021
sp0000006-mRNA-3 51ff56e496d48682f7af1b2478190834 235 Pfam PF03083 Sugar efflux transporter for intercellular exchange 130 214 9.6E-24 T 02-05-2017 IPR004316 SWEET sugar transporter GO:0016021
sp0000006-mRNA-3 51ff56e496d48682f7af1b2478190834 235 Pfam PF03083 Sugar efflux transporter for intercellular exchange 7 91 7.5E-26 T 02-05-2017 IPR004316 SWEET sugar transporter GO:0016021
sp0000007-mRNA-1 ed1eda6e176feb124dbef8934b633df0 553 Pfam PF03106 WRKY DNA -binding domain 281 338 2.6E-26 T 02-05-2017 IPR003657 WRKY domain GO:0003700|GO:0006355|GO:0043565
as result I try to get the following file:
sp0000001,n/a
sp0000005,GO:0003723
sp0000006,GO:0016021
sp0000007,GO:0003700
sp0000007,GO:0006355
sp0000007,GO:0043565
I tried to read the input file in the following way
> interproscan <- read.csv(file="ed.tsv", sep = "\t")[1,14]
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
duplicate 'row.names' are not allowed
What would be the best way to solve the problem?
Seems like duplicate row names. I tried to save your tsv file but it's not saved as tab-separated file to me.
Anyway try this. NULL the row names:
> interproscan <- read.csv(file="ed.tsv", sep = "\t", row.names=NULL)[c(1,14)]

Plot histogram in R

Given the following data frame:
time frequency
0000 - 0059 8
0100 - 0159 4
0200 - 0259 17
0300 - 0359 5
0400 - 0459 71
0500 - 0559 477
0600 - 0629 325
0630 - 0659 661
0700 - 0714 558
0715 - 0729 403
0730 - 0744 671
0745 - 0759 444
0800 - 0814 641
0815 - 0829 356
0830 - 0844 427
need to plot a a histogram of 15 bins, where x is labelled with the "time" for each bin and y is titled with "frequency". Is there a good way to make it?

Decoding Unknown Data Type

I have received some encoded data from Arduino via PySerial. I have access to an application which decodes the data, but I need to know what it is doing and I do not have access to the source code.
Data file contents:
%N|nkNsnrNlnzNqnEOknJOlM
%VA#_##hpZzbdIvzegvxefvkeavdeXvXeXvPeMvReDvlM
%PaA#gH#lnMO#QaLN#mbzM#cbmM#^beM#Pb_M#Fb]M#xaUM#balM
%Ma##HI#FzJP#auPO#~uPO#{uPO#}uMO#vuN#wuyN#uuqN#xulM
%knOOinSOXnMOAnFOcmxNYmlNBm_NslSNqlHNclnM^N
%PezuReouLeluDeju~diuFe`uBeXuAeUu~dJuxdAu^N
%MM#NaJM#`MM#t`VM#h`aM#f`fM#Y`jM#O`mM#G`uM#{_BN#u_^N
%rN#tuhN#nu[N#kuRN#huEN#au{M#[uqM#Nu^M#CuFM#ttuL#at^N
%XlPMMlvLMlWLPlBLVllKMlWKDlCKKlrJNl[J`lHJPO
%pd|trdrttdjtudbtmd_tkd[tkdWtmdOtldGtvdHtPO
Output from application:
86 31 -48 97 -51 33 -1109 -3121
-984 -358 551 -1108 584 -378 -1111 -3117
-1758 -631 973 -1967 1034 -671 -1128 -3123
-1670 -601 908 -1875 976 -642 -1151 -3130
-1672 -602 890 -1885 976 -645 -1181 -3144
-1685 -607 877 -1890 976 -643 -1191 -3156
-1692 -616 869 -1904 973 -650 -1214 -3169
-1704 -616 863 -1914 959 -649 -1229 -3181
-1712 -627 861 -1928 953 -651 -1231 -3192
-1710 -636 853 -1950 945 -648 -1245 -3218
-1712 -646 845 -1970 946 -652 -1256 -3248
-1710 -657 842 -1985 936 -658 -1267 -3274
-1716 -660 845 -1996 923 -661 -1267 -3305
-1724 -662 854 -2008 914 -664 -1264 -3326
-1730 -663 865 -2010 901 -671 -1258 -3348
-1722 -672 870 -2023 891 -677 -1267 -3369
-1726 -680 874 -2033 881 -690 -1276 -3389
-1727 -683 877 -2041 862 -701 -1269 -3406
-1730 -694 885 -2053 838 -716 -1266 -3429
-1736 -703 898 -2059 821 -735 -1248 -3448
I have tried several encodings like ASCII, UTF-8, and UUEncoding but none have given me any tangible results.
Does anyone have an idea as to what this could be?
Thanks in advance,
Cheers

Converting unknown binary data into series of numbers? (with a known example)

I'm trying to find a way to convert files in a little-used archaic file format into something human readable...
As an example, od -x myfile gives:
0000000 2800 4620 1000 461e c800 461d a000 461e
0000020 8000 461e 2800 461e 5000 461f b800 461e
0000040 b800 461d 4000 461c a000 461e 3800 4620
0000060 f800 4621 7800 462a e000 4622 2800 463c
0000100 2000 464a 1000 4654 8c00 4693 5000 4661
0000120 7000 46ac 6c00 46d1 a400 4695 3c00 470a
0000140 b000 46ca 7400 46e9 c200 471b 9400 469e
0000160 9c00 4709 cc00 4719 4000 46b0 6400 46cc
...
which I know corresponds to these integers:
10250 10116 10098 10152 10144 10122 10196 10158
10094 10000 10152 10254 10366 10910 10424 12042
12936 13572 18886 14420 22072 ...
but I have no idea how to convert one to the other!!
Many many thanks to anyone who can help.
If possible, general tips for what to try/where to begin in this situation would also be appreciated.
Update: I put the full binary file online here http://pastebin.com/YL2ApExG and the numbers it corresponds to here http://pastebin.com/gXNntsaJ
In the hex dump, it seems to alternate between four digits, presumably they correspond to the numbers I want? separated either by 4600 or 4700. Unfortunately, I don't know where to go from here!
Someone else asked below: the binary file is a .dat file generated by an old spectroscopy program... it's 1336 bytes and corresponds to 334 integers, so it's four bytes per integer.
Well this is what you can do -
Step I: Do the od -x of the file and redirect it to a temp file (eg. hexdump.txt)
od -x myfile > hexdump.txt
Step II: You will now have a text file that contains hexadecimal values which you can view using the cat command. Something like this -
[jaypal~/Temp]$ cat hexdump.txt
0000000 2800 4620 1000 461e c800 461d a000 461e
0000020 8000 461e 2800 461e 5000 461f b800 461e
0000040 b800 461d 4000 461c a000 461e 3800 4620
0000060 f800 4621 7800 462a e000 4622 2800 463c
0000100 2000 464a 1000 4654 8c00 4693 5000 4661
0000120 7000 46ac 6c00 46d1 a400 4695 3c00 470a
0000140 b000 46ca 7400 46e9 c200 471b 9400 469e
0000160 9c00 4709 cc00 4719 4000 46b0 6400 46cc
Step III: The first column isn't really important to you. Columns 2 thru 9 are important. We will now strip the file using AWK so that you can convert it to decimal. We will add space so that we can consider each value as an individual field. We will also add "0x" to it so that we can pass it as a hexadecimal value.
[jaypal~/Temp]$ awk '{for (i=2;i<=NF;i++) printf "0x"$i" "}' hexdump.txt > hexdump1.txt
[jaypal~/Temp]$ cat hexdump1.txt
0x2800 0x4620 0x1000 0x461e 0xc800 0x461d 0xa000 0x461e 0x8000 0x461e 0x2800 0x461e 0x5000 0x461f 0xb800 0x461e 0xb800 0x461d 0x4000 0x461c 0xa000 0x461e 0x3800 0x4620 0xf800 0x4621 0x7800 0x462a 0xe000 0x4622 0x2800 0x463c 0x2000 0x464a 0x1000 0x4654 0x8c00 0x4693 0x5000 0x4661 0x7000 0x46ac 0x6c00 0x46d1 0xa400 0x4695 0x3c00 0x470a 0xb000 0x46ca 0x7400 0x46e9 0xc200 0x471b 0x9400 0x469e 0x9c00 0x4709 0xcc00 0x4719 0x4000 0x46b0 0x6400 0x46cc
Step IV: Now we will convert each hexadecimal value into decimal using printf function with AWK.
[jaypal~/Temp]$ gawk --non-decimal-data '{ for (i=1;i<=NF;i++) printf ("%05d ", $i)}' hexdump1.txt > hexdump2.txt
[jaypal~/Temp]$ cat hexdump2.txt
10240 17952 04096 17950 51200 17949 40960 17950 32768 17950 10240 17950 20480 17951 47104 17950 47104 17949 16384 17948 40960 17950 14336 17952 63488 17953 30720 17962 57344 17954 10240 17980 08192 17994 04096 18004 35840 18067 20480 18017 28672 18092 27648 18129 41984 18069 15360 18186 45056 18122 29696 18153 49664 18203 37888 18078 39936 18185 52224 18201 16384 18096 25600 18124
Step V: Formatting to make it easily readable
[jaypal~/Temp]$ sed 's/.\{48\}/&\n/g' < hexdump2.txt > hexdump3.txt
[jaypal~/Temp]$ cat hexdump3.txt
10240 17952 04096 17950 51200 17949 40960 17950
32768 17950 10240 17950 20480 17951 47104 17950
47104 17949 16384 17948 40960 17950 14336 17952
63488 17953 30720 17962 57344 17954 10240 17980
08192 17994 04096 18004 35840 18067 20480 18017
28672 18092 27648 18129 41984 18069 15360 18186
45056 18122 29696 18153 49664 18203 37888 18078
39936 18185 52224 18201 16384 18096 25600 18124

Resources