I am writing a crawler to crawl some forum contents and all my HTTP connection is using Apache Http Client.
As suggested by the official documentation, I'm using a single Http client for a single forum server and this client, equipped with a PoolingHttpClientConnectionManager instance, can execute multiple requests by multiple execution threads at the same time.
One important attribute of this Pooling connection manager is maximum number of connections per route (which is 2 by default). I am confused which is the optimal (general) limit for this that ensure the speed of crawling but not overload the server?
(By general, I meant an average number that work for a general forum server in different cases cause I will set it static when I initialize the connection manager).
Besides that, I would really appreciate if someone know how to dynamically manage the limit per route based on server feedback in HttpClient 4.5 or other similar library.
Thanks really much for helping!
Related
I'm looking for a mechanism to limit the number of concurrent connections to a service exposed using ASP.NET WebAPI.
Why? Because this service is performing operations that are expensive on the hardware resources and I would like to prevent degradation under stress.
More info:
I don't know how many requests will be issued per period of time.
This service runs in its own IIS application pool and limiting the maximum connections on the parent site in IIS is not an option.
I found this suite, but the supported algorithms do not include the one that I'm interested in.
I'm looking for something out of the box (something as straightforward as an IIS config setting) but I could not find exactly what I need.
Any clues?
Thanks!
Scaling your service would probably be a better idea than limiting the number of requests. You could send the heavy processing to some background jobs and keep your API servicing requests.
But assuming the above cannot be done, you will need to use one of the throttling package available or write your own if none meets your requirements.
I suggest starting with the ThrottlingHandler from WebApiContrib
You might be able to meet your needs by properly implementing the GetUserIdentifier method.
If not, you will need to implement your own MessageHandler and the handler mentioned would be a good starting point.
Is there a way to stimulate 10000 concurrent HTTP request?
I try siege tool
but only have 2000 request limit for my laptop
How can I make 10000 request?
The most simple approach to generate a huge amount of concurrent requests, it probably Apache's ab tool.
For example, ab -n 100 -c 10 http://www.example.com/ would request the given websites a 100 times, with a concurrency of 10 requests.
It is true that the number of simultaneous requests is limited by nature. Keep in mind that TCP only has 65536 available ports, some of which are already occupied and the first 1024 are usually reserved, this leaves you with a theoretical maximum of around 64500 ports per machine for outgoing request.
Then there are the operating system limits. For example, in Linux there are the kernel parameters in the net.ipv4.* group.
Finally, you should of course configure your HTTP server to handle that amount of simultaneous requests. In Apache, those are StartServers and its friends, in nginx it's worker_processes and worker_connections. Also, if you have some stand-alone dynamic processor attached to your webserver (such as php-fpm), you must raise the number of idle processes in the connection pool, too.
After all, the purpose of massive parallel requests should be to find your bottle necks, and the above steps will give you a fair idea.
Btw. if you use ab, read its final report thoroughly. It may seem brief, but it carries a lot of useful information (e.g. "non-2xx responses" may indicate server-side errors due to overload.)
Jmeter allows distributed testing, which means that you can setup up a set of computers (one acting as a master and the rest as slaves) to run as many threads as you need. Jmeter has a very good doc explaining this here . . .
http://jmeter.apache.org/usermanual/jmeter_distributed_testing_step_by_step.pdf
and some more info here . . .
http://digitalab.org/2013/06/distributed-testing-in-jmeter/
You can set this all up on the cloud as well if you do not have access to sufficient slave machines, there are a couple of services out there for this.
Have you tried using Apache JMeter? You can create a web test plan and there are several options which you can play with. You can wrap the requests in a ThreadGroup as outlined here. You can generate extensive reports and graphs as well. If the simple thread group is not enough you could potentially try using the UltimateThreadGroup plugin for JMeter.
When creating so many threads with JMeter on a single machine you run out of memory to allocate a new stack for a thread. For that you can potentially consider reducing the stack space for the thread. How to do that is explained in the SO answer here. The post has some other alternative approaches as well.
If there isn't an OS limit of the number of simultaneous TCP connections allowed, there is a registry setting that removes or increases that limit. After you made sure that isn't the case, you could write some JavaScript that includes AJAX requests and put it in a loop.
You would probably need node.js to execute the JavaScript.
How to test the performance of an http server that serves and accepts only JSON requests (post and get)? I'm new to web testing, so tell me if I'm trying to do it in incorrect way.
I want to test if:
server is capable of handling hundreds of simultaneous connections.
server is capable to serve thousands requests per second.
server does not crash or get stuck when the number of requests exceeds server capabilities, and continues to run normally when the number of requests drops below average.
One way is to write some logic that repeats certain actions per run, and run multiple of them.
PS: Ideally, the tool/method should support compression like gzip as an option.
You can try JMeter and it's HTTPSampler.
About gzip. I've never used it in JMeter, but it seems it can:
How to get JMeter to request gzipped content?
Apache Bench (ab) is a command line tool that's great for these kinds of things. http://en.wikipedia.org/wiki/ApacheBench
ab -n 100 -c 10 http://www.yahoo.com/
If you are new to web testing then there are a lot of factors that you need to take into account. At the most basic level you want to do the things you have outlined.
Beyond this you need to think about how poorly performing clients might impact your service eg. keeping connections alive, sending malformed requests etc. These may translate into exceptions on the server which might in turn have additional impact (due to logging or slower execution). This means that you have to think of ways to break the service and monitor events that have an impact at higher scales.
Microsoft have a fairly good introduction to performance testing for web applications.
I'm currently reading a lot about node.js. There is a frequent comparison between servers using a traditional thread per request model (Apache), and servers that use an event loop (Nginx, node, Tornado).
I would like to learn in detail about how a request is processed in ASP.NET - from the point it is received in http.sys all the way up to it being processed in ASP.NET itself. I've found the MSDN documentation on http.sys and IIS a little lacking, but perhaps my google-fu is weak today. So far, the best resource I have found is a post on Thomas Marquardt's Blog.
Could anyone shed more light on the topic, or point me to any other resources?
(For the purposes of this question I'm only interested in IIS7 with a typical integrated pipeline)
From my research so far, its my understanding that when a request comes in it gets put into a kernel-mode request queue. According to this, this avoids many of the problems with context switching when there are massive amounts of requests (or processes or threads...), providing similar benefits to evented IO.
Quoted from the article:
"Each request queue corresponds to one
application pool. An application pool
corresponds to one request queue
within HTTP.sys and one or more worker
processes."
So according to that, every request queue may have more than one "Worker Process." (Google cache) More on worker processes
From my understanding:
IIS Opens creates a request queue
(see the http.sys api below)
A "Web Site" configured in IIS corresponds to one Worker Process
A Web Site/Worker Process shares the Thread Pool.
A thread is handed a request from the request queue.
Here is a lot of great information about IIS7's architecture
Here is some more information about http.sys.
HTTP Server I/O Completion Stuff
Typical Server Tasks
Open questions i still have:
How the heck does IIS change the Server header if it Uses HTTP.SYS? (See this question)
Note: I am not sure if/how a "Kernel-mode request queue" corresponds to an IO completion port, I would assume that each request would have its own but I don't know, so I truly hope someone will answer this more thoroughly. I just stumbled on this question and it seems that http.sys does in fact use IO Completion ports, which should provide nearly all of the same benifits that evented IO (node.js, nginx, lighttpd, C10K, etc...) have.
This might be a bit of a silly question but;
If I have two people logging on to my site at exactly the same time, will the server side code be executed one after the other or will they be executed simultaneously in separate threads?
I'm curious in regards to a denial of service attack on a website login. Does the server slow down because it has a massive queue of logins or is it slow because it has a billion simultaneous logins!
This is not related to ASP.NET per se (I have very little knowledge in that area), but generally web servers. Most web servers use threads (or processes) to handle requests, so basically, whatever snippet of code you have will be executed for both connections in parallel. Of course, if you access a database or some other backend system where a lock is placed, allowing just one session to perform queries, you might have implicitly serialized all requests.
Web servers typically have a minimum and maximum number of workers, which are tuned to the current hardware (CPUs, memory, etc). If these are exhausted, new requests will be queued waiting for a worker to become available, or until a maximum queue length of pending requests has been reached at which point it disregards new connections, effectively denying service (if this is on purpose, it's called a denial of service or DoS attack).
So, in your terms it's a combination, it's a huge number of simultaneous requests filling up the queue.
It should use a thread pool. Note that they are still in the same application, so application level items like static variables are still shared between them.
from this article
"Remember ISAPI is multi-threaded so requests will come in on multiple threads through the reference that was returned by ApplicationDomainFactory.Create(). Listing 1 shows the disassembled code from the IsapiRuntime.ProcessRequest method that receives an ISAPI ecb object and server type as parameters. The method is thread safe, so multiple ISAPI threads can safely call this single returned object instance simultaneously."
So yes, in the case of a DOS attack, it would be slow because of the large number of connections
As others said, most webservers use multiple processes or threads (better) to serve multiple requests at a time. In particular, you can set each ASP.NET application pool with a max number of queued requests and max worker processes. Each process has multiple threads up to a maximum (not configurable AFAIK, I may be wrong), and incoming requests are processed on a first-in-first-out basis.
Moreover, ASP.NET processes one single request for each session - but a malicious user can open as many sessions as she wants.
Multiple logins will probably hit the database and bring it to its knees probably before the webserver itself.
As far as I know, there is not a built-in way to throttle ASP.NET requests other than setting the max number of queued requests (waiting to be processed). This number should be ideally very small. You can monitor the number of queued ASP.NET requests using performance counters. Say you find that, on peak traffic, this number is 100. You can then update application so that it refuses login attempts when this number is above 100 so that the database is not hit (never did that, just a thought).