How many safe parallel connections can be made to a server - networking

I am trying to make a simple general purpose multi-threaded async downloader in python.How many parallel connections can be generally be made to a server with minimum risk of being banned or rate limited.
I am aware that network will be a limiting in some cases but lets assume in this case that network isn't an issue in this case for the sake of discussion.I/O is also done asynchronously.
According to Browserscope , browsers make a maximum of 17 connections at a time.
However according to my research , most download managers download files in multi-part and make 8+ connections per file.
1.How many files can be downloaded at a time ?
2.How many chunks for a single can be downloaded at one time ?
3.What should be the minimum size of those chunks to make it worth creating the overhead of creating parallel connections ?

It depends.
While some servers tolerate a high number of connections, others don't. General web servers might be more on the high side (low two digit), file hosters might be more sensitive.
There's little to say unless you can check the server's configuration or just try and remember for the next time when your ban has timed out.
You should however watch your bandwidth. Once you max out your access line there's no gain in further increasing the connections.

Related

How to handle 20k concurrent listeners on an Icecast server

I want to know how to handle more than 20k listeners concurrently on an Icecast server. I am using liquidsoap as the audio stream generator (Only one audio stream is distributed through the Icecast server ). The server is configured on AWS. Further, I want to know whether I need to use LB and CDN to handle this much traffic.
Your main concern is bandwidth. Nothing else, bandwidth. You always run out of bandwidth first. Really.
You'll likely want to spread the load across multiple servers and e.g. do simple DNS round-robin. Also because multiple servers means more bandwidth available.
Feeding in a Primary+Relays (master/slave) topology is typical and documented. For more details I'd recommend the Icecast documentation and searching the Icecast mailing list archives.
There are some minor things like making sure that your ulimit for file descriptors is high enough for the Icecast process.
PS: In theory you can squeeze around 20k concurrent connections out of Icecast, but most of the time you won't have enough actual bandwidth to feed those anyway.

Load Testing Thousands of SLOW Connections

I would like to test an upload service with hundreds, if not thousands,
of slow HTTPS connections simultaneously.
I would like to have lots of, say, 3G-quality connections,
each throttled with low bandwidth and high latency,
each sending a few megabytes of data up to the server,
resulting in lots of concurrent, long-lived requests being handled by the server.
There are many load generation tools that can generate thousands of simultaneous requests.
(I'm currently using Locust, mostly so that I can take
advantage of my existing client library written in Python.)
Such tools typically run each concurrent request as fast as possible
over the shared network link.
There are various ways to adjust the apparent bandwidth and latency of TCP connections,
such as Linux's TC
and handy wrappers like Comcast.
As far as I can tell, TC and the like control the shared link
but they cannot throttle the individual requests.
If you want to throttle a single request, TC works well.
In theory, with many clients sharing the same throttled network link,
each request could be run serially,
subject to the constrained bandwidth,
rather than having lots of requests executing concurrently,
a few packets at a time.
The former would result in much fewer active requests executing
concurrently on the server.
I suspect that the tool I want has to actively manage each individual client's sending
and receiving to throttle them fairly.
Is there such a tool?
You can take a look at Apache JMeter, it can "throttle" connections to the throughput configurable via the following properties:
httpclient.socket.http.cps=0
httpclient.socket.https.cps=0
The properties can be defined either in user.properties file or passed to JMeter via -J command-line argument
cps stands for character per second so you can "slow down" JMeter threads (virtual users) to the given throughput rate, the formula for cps calculation is:
cps = (target bandwidth in kbps * 1024) / 8
Check out How to Simulate Different Network Speeds in Your JMeter Load Test for more information.
Yes, these are network simulators. A very primitive one is in the form of WanEM. It is not going to cover your testing needs. You will need something akin to Shunra Storm, a hardware device which can manage individual connections and impairment with models derived from Ookla (think speedtest.com) related to 3,4,5g connections from the wild. Well, perhaps I should say, "could manage," as this product has been absent since the HP acquisition of Shunra.
There are some other market competitors on the network front from companies such as Ixia, Agilent, PacketStorm, Spirent and the like. None of them are inexpensive, but I see your need. Slow, and particularly dirty connections likes cell phones, have a disproportionate impact on the stack and can result in the server running out of resources with fewer mobile connections than desktop ones.
On a side note, be sure you are including a representative model for think time in your test code. If you collapse the client-server model with no or extremely limited think time & impair the network only bad things can happen. This will play particular havoc with both predictability and repeatability on your tests. You may also wind up chasing dozens of engineering ghosts related to load in your code that will not occur in production because of the natural delays and the release of resources which should occur during those windows of activity between client requests.

Does more NICs on a server mean potential for more sustained concurrent I/O?

If you're trying to build an application that needs to have the highest possible sustained network bandwidth, for multiple and repetitive file transfers (not for streaming media), will having 2 or more NICs be beneficial?
I think your answer will depend on your server and network architecture, and unfortunately may change as they change.
What you are essentially doing is trying to remove the 'current' bottleneck in your overall application or design which you have presumably identified as your current NIC (if you haven't actually confirmed this then I would stop and check this in case something else restricts throughput before you reach your NIC limit).
Some general points on this type of performance optimization:
It is worth checking if you have the option to upgrade the current NIC to a higher bandwidth interface - this may be a simpler solution for you if it avoids having to add load balancing hardware/software/configuration to your application.
As pointed out above you need to make sure all the other elements in your network can handle this increased traffic - i.e. that you are not simply going to have congestion in your internet connection or in one of your routers
Similarly, it is worth checking what the next bottle neck will be once you have made this change, if the traffic continues to increase. If adding a new NIC only gives you 5% more throughput before you need a new server anyway, then it may be cheaper to look for a new server right away with better IO from new.
the profile of your traffic and how it is predicted to evolve may influence your decision. If you have a regular daily peak which only exceeds your load slightly then a simple fix may serve you for a long time. If you have steadily growing traffic then a more fundamental look at your system architecture will probably be necessary.
In line with the last point above, it may be worth looking at the various Cloud offerings to see if any meet your requirements at a reasonable cost, possibly even as temporary resource every day just to get you through your peak traffic times.
And finally you should be aware that as soon as you settle on a solution and get it up and running someone else in your organization will change or upgrade the application to introduce a new and unexpected bottle-neck...
It can be beneficial, but it won't necessarily be that way "out of the box".
You need to make sure that both NICs actually get used - by separating your clients on different network segments, by using round robin DNS, by using channel bonding, by using a load balancer, etc. And on top of that you need to make sure your network infrastructure actually has sufficient bandwidth to allow more throughput.
But the general principle is sound - you have less network bandwidth available on your server than disk I/O, so the more network bandwidth you add the better, up until it reaches or exceeds your disk I/O, then it doesn't help you anymore.
Potentially yes. In practice, it also depends on the network fabric, and whether or not network I/O is a bottleneck for your application(s).

How are web servers best loaded

If a web server is going to serve out say 100GB per day would it be better for it to do so in 10,000 10MB sessions or 200,000 500kB sessions.
The reason for this question is I'm wondering if there would be any advantage, disadvantage or neither to sites that mirror content to allow clients to exploit HTTP's start-in-the-middle feature to download files in segments from many servers. (IIRC this is a little like bit torrent works)
I suppose that not only the size of the session matters, but the duration of it is very important. I would guess that 200,000 small sessions will live shorter than 10,000 big ones. As soon as a session is completed, resources are freed to be used again. I would optimize it so that a session lasts as short as possible.
Have also in mind that a single server can't have more than a couple of hundred of sessions at the same time (100 - 200 is a safe number).
There's usually some overhead associated with each session - the cost of creating a TCP connection for example, or HTTP headers (if you haven't already included them in the 100GB) - so ostensibly it'd be better to use larger sessions. But also think about it from the clients' point of view: by using multiple smaller sessions they can run downloads in parallel and possibly get their content faster. So the actual "best" setup will probably be some intermediate session size, which of course depends on what kind of content you're serving (large files or small ones, streaming or static) and how important you think speed is. There's no one-size-fits-all answer.
100 GB/day is not a number you need to worry about. It's an average of slightly over 1 MBYte/sec. You could get that over old Ethernet, to put that in perspective. Now, your actual peak throughput will be a lot higher, perhaps 10 MByte/s. Still, this is no problem at all for a modern server. So, I don't think you need those multiple servers, so that's no reason for splitting the 10MB downloads. And if you're downloading from a single server, why would you insert 19 disconnect-and-reconnect sequences in the middle of a 10MB download?

How Many Network Connections Can a Computer Support?

When writing a custom server, what are the best practices or techniques to determine maximum number of users that can connect to the server at any given time?
I would assume that the capabilities of the computer hardware, network capacity, and server protocol would all be important factors.
Also, do you think it is a good practice to limit the number of network connections to a certain maximum number of users? Or should the server not limit the number of network connections and let performance degrade until the response time is extremely high?
Dan Kegel put together a summary of techniques for handling large amounts of network connections from a single server, here: http://www.kegel.com/c10k.html
In general modern servers can handle very large numbers of concurrent connections. I've worked on systems having over 8,000 concurrently open TCP/IP sockets.
You will need a high quality servicing interface to handle that kind of load, check out libevent or libev.
That is a good question and it definitely is situational. What is your computer? Do you have a 4 socket machine filled with Quad Core Xeons, 128 GB of RAM, and Fiber Channel Connectivity (like the pair of Dell R900s we just bought)? Or are you running on a p3 550 with 256 MB of RAM, and 56K modem? How much load does each connection place on your server? What kind of response is acceptible?
These are the questions you need to answer. I guess the best way to find the answer is through load testing. Create a unit test of the expected (and maybe some unexpected) paths that your code will perform against your server. Find a load testing framework that will allow you to simulate 10, 100, 1000, 10000 users performing those tasks at the same time.
That will tell you how many connections your computer can support.
The great thing about the load/unit test scenario is that you can put in response time expectations in your unit tests and increase the load until you fall outside of your response time. If you have a requirement of supporting X number of Users with Y second response, you will be able to demonstrate it with your load tests.
One of the biggest setbacks in high concurrency connections is actually the routers involved. Home user oriented routers usually have a small NAT table, preventing the router from actually servicing the server the connections.
Be sure to research your router/ network infrastructure setup just as well.
I think you shouldn't limit the number of connections your server will allow - just catch and handle properly any exceptions that might occur when accepting and closing connections and you should be fine. You should leave that kind of lower level programming to the underlying OS layers - that way you can port your server easier etc.
This really depends on your operating system.
Different Unix flavors will support "unlimited" number of file handles / sockets others have high values like 32768.
A typical user limit is 8192 but it can usually be set higher.
I think windows is more limiting but the server version may have higher limits.

Resources