Need an option to Wireshark Statistics - networking

I need to obtain statistics about the network traffic of an mpls link between two sites. The main purpose of this is detect the 'top flooders' at the end of the day and at precise moments when the network is 'overloaded'.
At this time i have a sniffer with Ubuntu and i'm using wireshark to capture packets. The built-in statistics are awesome, but i can only use them with not bigger files than 150mb (it hungs for memory leaks with bigger files). So i use them for precise moments to detect in 'live mode' any instant flooder. But its impossible for me to leave wireshark capturing traffic all day long because of the hungs.
What tools are better suited to use them for these purposes? (detect any 'instant' flooder and take statistics of top talkers and top conversations between computers for the entire day)
Thank you.

Preliminary important note:
wireshark does not "hang for memory leaks with bigger files". The (very annoying) problem is that when opening a file, wireshark dissect it entirely from first to last packet before doing anything else and 1/ that can take a very very very long time and 2/ this imply that wireshark will have the entire file in memory e.g. the wireshark process will weight 1GB of memory for a 1GB trace (plus its own internal memory data of course), which may becomes a problem not only for wireshark but for the whole OS. Hence yes, it can become so unresponsive for so long that it looks like it's "hanged". Not a bug - rather a missing very complicated feature to dissect in "lazy" mode. The same goes with live capture, it dissect and put to relation everything (so that it knows and follow TCP dialog for instance) on the fly, and will hold the entire capture in memory. Which can quickly becomes quite heavy, both on memory and CPU.
And this will not the fixed implemented tomorrow, so now to your problem:
An option would be not to save to a file and latter process it, but doing it "live". You can do so using tshark (a terminal base version of wireshark) that will do the capture just like wireshark, and pipe its textual output to a dissecting/statistic analysis of your own.
https://www.wireshark.org/docs/man-pages/tshark.html
It has a -Y <displaY filter> option, so you should be able to use the MPLS filters from wireshark:
https://www.wireshark.org/docs/dfref/m/mpls.html
The -z <statistics> option will not be usable since it display the result after finishing reading the capture file, and you'll be piping live.
And tshark by default work in "one-pass analysis" mode, which of course limit a lot the analysis it can do, but alleviate the wireshark issue of "I want to dissect everything".[*]
So that would look like:
$ sudo tshark -i <your interface> -Y <your display filters> etc etc | your_parsing_and_statistical_tool
Of course, you'll have to write your own code for "your_parsing_and_statistical_tool". I'm not familiar with MPLS, nor know the statistics your interested in, but that may just be a couple of hours (or days) or Python coding? So if that's worth it for your job...
[*]:
tshark also have an option -2 to perform a two-pass analysis, but that would not work here since the first-pass must be completed first, which will never occur since your not reading a file but capture and analyse live.

Related

Copying 100 GB with continues change of file between datacenters with R-sync is good idea?

I have a datacenter A which has 100GB of the file changing every millisecond. I need to copy and place the file in Datacenter B. In case of failure on Datacenter A, I need to utilize the file in B. As the file is changing every millisecond does r-sync can handle it at 250 miles far datacenter? Is there any possibility of getting the corropted file? As it is continuously updating when we call this as a finished file in datacenter B ?
rsync is a relatively straightforward file copying tool with some very advanced features. This would work great for files and directory structures where change is less frequent.
If a single file with 100GB of data is changing every millisecond, that would be a potential data change rate of 100TB per second. In reality I would expect the change rate to be much smaller.
Although it is possible to resume data transfer and potentially partially reuse existing data, rsync is not made for continuous replication at that interval. rsync works on a file level and is not as commonly used as a block-level replication tool. However there is an --inplace option. This may be able to provide you the kind of file synchronization you are looking for. https://superuser.com/questions/576035/does-rsync-inplace-write-to-the-entire-file-or-just-to-the-parts-that-need-to
When it comes to distance, the 250 miles may result in at least 2ms of additional latency, if accounting for the speed of light, which is not all that much. In reality this would be more due to cabling, routers and switches.
rsync by itself is probably not the right solution. This question seems to be more about physics, link speed and business requirements than anything else. It would be good to know the exact change rate, and to know if you're allowed to have gaps in your restore points. This level of reliability may require a more sophisticated solution like log shipping, storage snapshots, storage replication or some form of distributed storage on the back end.
No, rsync is probably not the right way to keep the data in sync based on your description.
100Gb of data is of no use to anybody without without the means to maintain it and extract information. That implies structured elements such as records and indexes. Rsync knows nothing about this structure therefore cannot ensure that writes to the file will transition from one valid state to another. It certainly cannot guarantee any sort of consistency if the file will be concurrently updated at either end and copied via rsync
Rsync might be the right solution, but it is impossible to tell from what you have said here.
If you are talking about provisioning real time replication of a database for failover purposes, then the best method is to use transaction replication at the DBMS tier. Failing that, consider something like drbd for block replication but bear in mind you will have to apply database crash recovery on the replicated copy before it will be usable at the remote end.

what exactly is 'flow' in nfdump? can i get tcp sessions with nfdump?

i need to create some statistics from packets in my network interface, but i'm concerned only for my tcp sessions. i thought i could do that with nfdump and nfsen. because i'm new to this stuff, i dont really get what nfdump defines as 'flow'.
furthermore, can i get statistics with these tools only for the tcp protocol sessions? i mean, for example, that i need to have some average duration of all the connections(srcip-srcport, dstip-dstport pairs) in a server of mine. And for this reason i need the time between the 3WH and the closing of each connection (either with [fin/ack,ack] or with [rst]). Is that possible with nfdump-nfsen?
Short answer here is no: you don't have anything in your list of software that generates netflow information. It's not going to work. Netflow collectors do not work as hard as you might like to maintain your idea of a connection - a flow is just a collection of related packets that happen during part of the collection cycle. For a long-lived session, you can expect to see a few flows.
For your application, you will do better to capture syn-ack and fin packets with tcpdump and analyse the timing of these with your favourite text processing tool.
Also, on the left side of your keyboard, you may find a key with an arrow that allows you to type capital letters.

Techniques for infinitely long pipes

There are two really simple ways to let one program send a stream of data to another:
Unix pipe, or TCP socket, or something like that. This requires constant attention by consumer program, or producer program will block. Even increasing buffers their typically tiny defaults, it's still a huge problem.
Plain files - producer program appends with O_APPEND, consumer just reads whatever new data became available at its convenience. This doesn't require any synchronization (as long as diskspace is available), but Unix files only support truncating at the end, not at beginning, so it will fill up disk until both programs quit.
Is there a simple way to have it both ways, with data stored on disk until it gets read, and then freed? Obviously programs could communicate via database server or something like that, and not have this problem, but I'm looking for something that integrates well with normal Unix piping.
A relatively simple hand-rolled solution.
You could have the producer create files and keep writing until it gets to a certain size/number of record, whatever suits your application. The producer then closes the file and starts a new one with an agreed naming algorithm.
The consumer reads new records from a file then when it gets to the agreed maximum size closes and unlinks it and then opens the next one.
If your data can be split into blocks or transactions of some sort, you can use the file method for this with a serial number. The data producer would store the first megabyte of data in outfile.1, the next in outfile.2 etc. The consumer can read the files in order and delete them when read. Thus you get something like your second method, with cleanup along the way.
You should probably wrap all this in a library, so that from the applications point of view this is a pipe of some sort.
You should read some documentation on socat. You can use it to bridge the gap between tcp sockets, fifo files, pipes, stdio and others.
If you're feeling lazy, there's some nice examples of useful commands.
I'm not aware of anything, but it shouldn't be too hard to write a small utility that takes a directory as an argument (or uses $TMPDIR); and, uses select/poll to multiplex between reading from stdin, paging to a series of temporary files, and writing to stdout.

Why I cannot get equal upload and download speed on symmetrical channel?

I'm assigned to a project where my code is supposed to perform uploads and downloads of some files on the same FTP or HTTP server simultaneously. The speed is measured and some conclusions are being made out of this.
Now, the problem is that on high-speed connections we're getting pretty much expected results in terms of throughput, but on slow connections (think ideal CDMA 1xRTT link) either download or upload wins at the expense of the opposite direction. I have a "higher body" who's convinced that CDMA 1xRTT connection is symmetric and thus we should be able to perform data transfer with equivalent speeds (~100 kbps in each direction) on this link.
My measurements show that without heavy tweaking the code in terms of buffer sizes and data link throttling it's not possible to have same speeds in forementioned conditions. I tried both my multithreaded code and also created a simple batch file that automates Windows' ftp.exe to perform data transfer -- same result.
So, the question is: is it really possible to perform data transfer on a slow symmetrical link with equivalent speeds? Is a "higher body" right in their expectations? If yes, do you have any suggestions on what should I do with my code in order to achieve such throughput?
PS.
I completely re-wrote the question, so it would be obvious it belongs to this site.
CDMA 1x consists of up to 15 channels of 9.6kbps traffic. This results in a total throughput of 144kbps.
Two channels are used for command and control signals (talking to base stations, associating/disassociating, SMS traffic, ring signals, etc).
That leaves you with up to 124.8kbps.
--> Each channel is one way. <--
They are dynamically switched and allocated depending on the need.
Generally you'll get more download than upload because that's the typical cell phone modem usage. But you'll never get more than 120kbps total aggregate bandwidth.
In practise, due to overhead of 1xRTT encoding, error correction, resends, etc, you'll typically experience between 60kbps and 90kbps even if you have all the channels possible.
This means that you can probably only get 30kbps-60kbps of upload and download simultaneously.
Further, due to switching the channels dynamically (and the fact that the base station controls this more than your modem - they need to manage base station channels carefully to keep channels free for voice calls) you'll lose time when it switches channels - it's not an instantaneous process.
So - 1xRTT can, in theory, give you 124kbps one way, but due to overhead, switching times, base station capacity, or the phone company simply limiting such connections for other reasons, you can't depend on a symmetrical link.
NOTE:
This will vary to some degree based on the provider and the modem. For instance, some modems have 16 channels, and some providers support 16 channels. In some cases those modems and providers work well together and can provide a full 144kbps aggregate raw bandwidth to the application, with only one dedicated channel (which has to work pretty hard) to deal with control, switching, and other issues. Even then, though, with the overhead of the modem communications, then the overhead of PPP, then the overhead of IP, then the overhead of TCP, you're still looking at maybe 100-120kbps total bandwidth, both up and down.
Lastly, no provider yet supports transparent transfer of IP traffic. In other words if you're modem is moving, the modem will switch to a new base station, but you'll completely drop the PPP session and have to restart it, as well as all the TCP sessions and such. You typically won't get the same IP address, and so your TCP sessions will not recover gracefully.
The "fun" aspect to this twist is that this can happen even if you aren't moving. If one base station gets loaded down, you may be transferred to another base station if you are close enough - there are other things that may make your modem transfer even without you moving. So make sure you take this into account, since you seem to be keen on maintaining a full duplex, symmetric channel open. It's tough to write stuff that will recover gracefully, nevermind anticipate it and do it quickly. You would do well to work very closely with a modem manufacturer (such as Kyocera) on this - otherwise you won't get the documentation on how to control the modem chipset at the low level that you need.
-Adam
I think the whole drama with high equal speeds on both directions is because my higher body thinks that they have 144 kbps on uplink AND 144 kbps on DOWNLINK (== TWO pipes). Whereas in reality we have 144 kbps of ONE pipe which is switching directions when I transfer files.
Comment me if I right or wrong, please.

implementing a download manager that supports resuming

I intend on writing a small download manager in C++ that supports resuming (and multiple connections per download).
From the info I gathered so far, when sending the http request I need to add a header field with a key of "Range" and the value "bytes=startoff-endoff". Then the server returns a http response with the data between those offsets.
So roughly what I have in mind is to split the file to the number of allowed connections per file and send a http request per splitted part with the appropriate "Range". So if I have a 4mb file and 4 allowed connections, I'd split the file to 4 and have 4 http requests going, each with the appropriate "Range" field. Implementing the resume feature would involve remembering which offsets are already downloaded and simply not request those.
Is this the right way to do this?
What if the web server doesn't support resuming? (my guess is it will ignore the "Range" and just send the entire file)
When sending the http requests, should I specify in the range the entire splitted size? Or maybe ask smaller pieces, say 1024k per request?
When reading the data, should I write it immediately to the file or do some kind of buffering? I guess it could be wasteful to write small chunks.
Should I use a memory mapped file? If I remember correctly, it's recommended for frequent reads rather than writes (I could be wrong). Is it memory wise? What if I have several downloads simultaneously?
If I'm not using a memory mapped file, should I open the file per allowed connection? Or when needing to write to the file simply seek? (if I did use a memory mapped file this would be really easy, since I could simply have several pointers).
Note: I'll probably be using Qt, but this is a general question so I left code out of it.
Regarding the request/response:
for a Range-d request, you could get three different responses:
206 Partial Content - resuming supported and possible; check Content-Range header for size/range of response
200 OK - byte ranges ("resuming") not supported, whole resource ("file") follows
416 Requested Range Not Satisfiable - incorrect range (past EOF etc.)
Content-Range usu. looks like this: Content-Range: bytes 21010-47000/47022, that is bytes start-end/total.
Check the HTTP spec for details, esp. sections 14.5, 14.16 and 14.35
I am not an expert on C++, however, I had once done a .net application which needed similar functionality (download scheduling, resume support, prioritizing downloads)
i used microsoft bits (Background Intelligent Transfer Service) component - which has been developed in c. windows update uses BITS too. I went for this solution because I don't think I am a good enough a programmer to write something of this level myself ;-)
Although I am not sure if you can get the code of BITS - I do think you should just have a look at its documentation which might help you understand how they implemented it, the architecture, interfaces, etc.
Here it is - http://msdn.microsoft.com/en-us/library/aa362708(VS.85).aspx
I can't answer all your questions, but here is my take on two of them.
Chunk size
There are two things you should consider about chunk size:
The smaller they are the more overhead you get form sending the HTTP request.
With larger chunks you run the risk of re-downloading the same data twice, if one download fails.
I'd recommend you go with smaller chunks of data. You'll have to do some test to see what size is best for your purpose though.
In memory vs. files
You should write the data chunks to in memory buffer, and then when it is full write it to the disk. If you are going to download large files, it can be troublesome for your users, if they run out of RAM. If I remember correctly the IIS stores requests smaller than 256kb in memory, anything larger will be written to the disk, you may want to consider a simmilar approach.
Besides keeping track of what were the offsets marking the beginning of your segments and each segment length (unless you want to compute that upon resume, which would involve sort the offset list and calculate the distance between two of them) you will want to check the Accept-Ranges header of the HTTP response sent by the server to make sure it supports the usage of the Range header. The best way to specify the range is "Range: bytes=START_BYTE-END_BYTE" and the range you request includes both START_BYTE and byte END_BYTE, thus consisting of (END_BYTE-START_BYTE)+1 bytes.
Requesting micro chunks is something I'd advise against as you might be blacklisted by a firewall rule to block HTTP flood. In general, I'd suggest you don't make chunks smaller than 1MB and don't make more than 10 chunks.
Depending on what control you plan to have on your download, if you've got socket-level control you can consider writing only once every 32K at least, or writing data asynchronously.
I couldn't comment on the MMF idea, but if the downloaded file is large that's not going to be a good idea as you'll eat up a lot of RAM and eventually even cause the system to swap, which is not efficient.
About handling the chunks, you could just create several files - one per segment, optionally preallocate the disk space filling up the file with as many \x00 as the size of the chunk (preallocating might save you sometime while you write during the download, but will make starting the download slower), and then finally just write all of the chunks sequentially into the final file.
One thing you should beware of is that several servers have a max. concurrent connections limit, and you don't get to know it in advance, so you should be prepared to handle http errors/timeouts and to change the size of the chunks or to create a queue of the chunks in case you created more chunks than max. connections.
Not really an answer to the original questions, but another thing worth mentioning is that a resumable downloader should also check the last modified date on a resource before trying to grab the next chunk of something that may have changed.
It seems to me you would want to limit the size per download chunk. Large chunks could force you to repeat download of data if the connection aborted close to the end of the data part. Specially an issue with slower connections.
for the pause resume support look at this simple example
Simple download manager in Qt with puase/ resume support

Resources