node.js vs. asp.net async pages - asp.net

still trying to understnad node.js...
If I apply the asp.net async pattern for every i/o operation, and configure maxWorkerThreads=1, is it (conceptually) similar to node.js?
Does an i/o operation (in either framework) takes place in its own thread or is there some OS functionality to get notifications / light thread?
this SO thread says that node.js still uses threads internally so it is not such a big difference from asp.net. Some answers say that yes, but it is a better programming model etc. Which threads does the question refers to, lightweight i/o like the ones I asked on in #2?

See this similar question
As for the i/o operations that's implementation specific. the linux backend uses libev and the windows backend uses IOCP. See this video on async i/o details for windows/linux
node.js only uses threads internally because linux doesn't have an async IO system (like windows does with IOCP). So to make async IO possible you need an internal thread pool. See the video.

Related

How Datastax implements its async API driver for Cassandra?

I'm trying to convince a coworker of the benefits of using the Session#executeAsync.
However, since we are using the driver from Scala, it would be rather easy to wrap the sync call Session#execute in a Future and that would be all to transform it in an async call. This will be already an improvement because it will give us the opportunity of avoid blocking the current thread (in our case that would represent blocking the threads that handles http requests in play with a huge impact on the number of requests that can be handled concurrently)
I argue that if the work needed to implement an async driver will be wrap it in a Future it won't exist implementations like ReactiveMongo an the Async Api for Cassandra from Datastax.
So,
What are the benefits of using the async api?
How is the async api implemented in Datastax driver and it what libraries and OS features relies on?
What kind of problems were to be solved beyond the asynchronous networks calls? (I mean, implement the async driver must be more than just using java nio)
How is the async api implemented in Datastax driver and it what libraries and OS features relies on?
Datastax java driver based on Netty networking framework. Netty itself based on Event Driven model. Also for some operating systems Netty provides native transports to improve performance e.g. epoll for Linux.
What are the benefits of using the async api?
I'm not a Scala expert but as I know Scala Future based on Threads model (Execution contexts). It means you need to submit a request to another thread to execute the request asynchronously. For IO tasks you just need request another system and wait response from this system. If you have a big number of requests, all threads in your pool will be busy but will not do anything useful. Thread is a fairly expensive resource and it can be a problem to have thousands threads in the same physical resource. Threads are good for parallel calculation tasks but not for IO tasks.
From other hand Datastax java driver based on Event Driven model (Netty). It means the each request will be submitted in event loop queue. For each iteration of event loop, Netty will define the state of request and will execute handlers associated with this request.
This approach avoids of memory usage overhead for threads and allows you to perform thousands of IO requests in the same time. But in this case you should define slow or blocking request callbacks in another thread to avoid blocking of event-loop.

Multithreaded comet server library

I'm looking for multithreaded comet server library - what I need is async io (using epoll) working on a threadpool (4-8 threads). Tornado would be ideal if it was multithreaded.
Why multithreaded? I need to process and serve data which could come from every connected user - it could be synchronised between tornado instances using database but even nosql would be too big slowdown - almost every request would end up with database write/update - which even by using async drivers isn't a good idea. I can store everything in local volataile memory so it can be very fast - but must be run on single process to avoid inter-process communication. I don't need to scale - single box is enough - but it MUST be fast. Some data will be stored in MongoDB - but number of mongo queries will be like 5% of normal requests.
And important thing - semaphores (and other higher level approaches) are not rocket science for me so I'm not afraid of synchronisation.
Requirements:
async io
non-blocking
thousands of concurrent connections
FAST
basic HTTP features (GET, POST, cookies)
ability to process request asynchronously (do something, async call with callback (ex. database query), process callback, return data)
thread pool
C++/Java/Python
simple and lightweight
It would be nice to have async mongo driver too
I've looked into Boost ASIO and it seems to be capable of doing what I need - but I want to focus on application - not writing http request processing.
I've read about Tornado (seems ideal but is single threaded), Simple (not sure if it can process request asynchronously and return data after async call), BOOST ASIO (very nice, but too low-level)
Well, after more digging I decided to change technology... I decided to create my own protocol on top of TCP and Netty

What so different about Node.js's event-driven? Can't we do that in ASP.Net's HttpAsyncHandler?

I'm not very experienced in web programming,
and I haven't actually coded anything in Node.js yet, just curious about the event-driven approach. It does seems good.
The article explains some bad things that could happen when we use a thread-based approach to handle requests, and should opt for a event-driven approach instead.
In thread-based, the cashier/thread is stuck with us until our food/resource is ready. While in event-driven, the cashier send us somewhere out of the request queue so we don't block other requests while waiting for our food.
To scale the blocking thread-based, you need to increase the number of threads.
To me this seems like a bad excuse for not using threads/threadpools properly.
Couldn't that be properly handled using IHttpAsyncHandler?
ASP.Net receives a request, uses the ThreadPool and runs the handler (BeginProcessRequest), and then inside it we load the file/database with a callback. That Thread should then be free to handle other requests. Once the file-reading is done, the ThreadPool is called into action again and executes the remaining response.
Not so different for me, so why is that not as scalable?
One of the disadvantages of the thread-based that I do know is, using threads needs more memory. But only with these, you can enjoy the benefits of multiple cores. I doubt Node.js is not using any threads/cores at all.
So, based on just the event-driven vs thread-based (don't bring the "because it's Javascript and every browser..." argument), can someone point me out what is the actual benefit of using Node.js instead of the existing technology?
That was a long question. Thanks :)
First of all, Node.js is not multi-threaded. This is important. You have to be a very talented programmer to design programs that work perfectly in a threaded environment. Threads are just hard.
You have to be a god to maintain a threaded project where it wasn't designed properly. There are just so many problems that can be hard to avoid in very large projects.
Secondly, the whole platform was designed to be run asynchronously. Have you see any ASP.NET project where every single IO interaction was asynchronous? simply put, ASP.NET was not designed to be event-driven.
Then, there's the memory footprint due to the fact that we have one thread per open-connection and the whole scaling issue. Correct me if I'm wrong but I don't know how you would avoid creating a new thread for each connection in ASP.NET.
Another issue is that a Node.js request is idle when it's not being used or when it's waiting for IO. On the other hand, a C# thread sleeps. Now, there is a limit to the number of these threads that can sleep. In Node.js, you can easily handle 10k clients at the same time in parallel on one development machine. You try handling 10k threads in parallel on one development machine.
JavaScript itself as a language makes asynchronous coding easier. If you're still in C# 2.0, then the asynchronous syntax is a real pain. A lot of developers will simply get confused if you're defining Action<> and Function<> all over the place and using callbacks. An ASP.NET project written in an evented way is just not maintainable by an average ASP.NET developer.
As for threads and cores. Node.js is single-threaded and scales by creating multiple-node processes. If you have a 16 core then you run 16 instances of your node.js server and have a single Node.js load balancer in front of it. (Maybe a nginx load balancer if you want).
This was all written into the platform at a very low-level right from the beginning. This was not some functionality bolted on later down the line.
Other advantages
Node.js has a lot more to it then above. Above is only why Node.js' way of handling the event loop is better than doing it with asynchronous capabilities in ASP.NET.
Performance. It's fast. Real fast.
One big advantage of Node.js is its low-level API. You have a lot of control.
You have the entire HTTP server integrated directly into your code then outsourced to IIS.
You have the entire nginx vs Apache comparison.
The entire C10K challenge is handled well by node but not by IIS
AJAX and JSON communication feels natural and easy.
Real-time communication is one of the great things about Node.js. It was made for it.
Plays nicely with document-based nosql databases.
Can run a TCP server as well. Can do file-writing access, can run any unix console command on the server.
You query your database in javascript using, for example, CouchDB and map/reduce. You write your client in JavaScript. There are no context switches whilst developing on your web stack.
Rich set of community-driven open-source modules. Everything in node.js is open source.
Small footprint and almost no dependencies. You can build the node.js source yourself.
Disadvantages of Node.js
It's hard. It's young. As a skilled JavaScript developer, I face difficulty writing a website with Node.js just because of its low-level nature and the level of control I have. It feels just like C. A lot of flexibility and power either to be used for me or to hang me.
The API is not frozen. It's changing rapidly. I can imagine having to rewrite a large website completely in 5 years because of the amount Node.js will be changed by then. It is do-able, you just have to be aware that maintenance on node.js websites is not cheap.
further reading
http://blog.mixu.net/2011/02/01/understanding-the-node-js-event-loop/
http://blip.tv/file/2899135
http://nodeguide.com/
There are a lot of misconceptions regarding node.js vs. ASP.Net and asynchronous programming. You can do non blocking IO in ASP.NET. Most people don't know that the .Net framework uses Windows iocompletion ports underneath when you do web service calls or other I/O bound operations using the begin/end pattern in .Net 2.0 and above. IO completion ports is the way the Windows operating system supports non-blocking IO so that the app thread is freed why the IO operation completes. Interestingly, node.js uses a less optimal non blocking IO implementation in Windows through Cygwin. A new Windows version is on the road map, which with Microsoft's guidance will be using IO completions ports. At that point there is underneath no difference.
It is also possible to do non-blocking database calls in ADO.NET but be aware of ORM tools such as NHibernate and Entity Framework. They are still very much synchronous.
Synchronous IO (blocking) makes the control flow much clearer and it has for this reason become popular. The reason why computer environments are multithreaded has only superficially to do with this. It is more generally related to time sharing and utilization of multiple CPUs.
Having only a single thread can cause starvation during lengthy operations, which can be related to both IO and complex computations. So, even though the rule of thumb is one thread pr. core when utilizing non-blocking IO, one should still consider a sufficient thread pool size so that simple requests don't get starved by more complex operations if such exist. Multiple threads also allows complex operations to be split easily among multiple CPUs. A single threaded environment like node.js can only utilize multicore processors through more processes and message passing to coordinate action.
I have personally not yet seen any compelling argument to introduce an additional technology such a node.js. However, there may be good reasons but they have in my opinion little to do with servicing a large number of connections through non-blocking IO since this can also be done with ASP.NET.
BTW tamejs can help make your nodejs code more readable similar to the new upcoming .Net Async CTP.
It is easy to understate the cultural difference between the Node.js and ASP.NET communities. Sure, IHttpAsyncHandler exists and it's been around since .NET 1.0 so it might even be good, but all of the code and discussion around Node.js is about async I/O which is decidedly not the case when it comes to .NET. Want to use LINQ To SQL? You kind of can, kind of. Want to log stuff? Maybe "CSharp DotNet Logger" will work, maybe.
So yes, IHttpAsyncHandler is there and if you're really careful you might be able to write an event driven web-service without tripping over some blocking I/O somewhere, but I don't really get the impression a lot of people are using it (and it certainly isn't the prominent way for writing ASP.NET apps). In contrast, Node.js is all about evented I/O, all the code examples, all the libraries and it's the only way people are using it. So if you were going to bet on which one's evented I/O model actually worked all the way through, Node.js would probably be the one to pick.
As per current age technology improvements and reading below links, I can say, it is matter of expertise and choosing perfect mix as per the particular scenario that matters. NodeJS is getting mature and ASP.NET side we have ASP.NET MVC, WebAPI, and SignalR etc. to make things better.
Node.js vs .Net performance
http://www.salmanq.com/blog/net-and-node-js-performance-comparison/2013/03/
and
http://www.hanselman.com/blog/InstallingAndRunningNodejsApplicationsWithinIISOnWindowsAreYouMad.aspx
Thanks.

Asynchronous pages in the ASP.NET framework - where are the other threads and how is it reattached?

Sorry for this dumb question on Asynchronous operations. This is how I understand it.
IIS has a limited set of worker threads waiting for requests. If one request is a long running operation, it will block that thread. This leads to fewer threads to serve requests.
Way to fix this - use asynchronous pages. When a request comes in, the main worker thread is freed and this other thread is created in some other place. The main thread is thus able to serve other requests. When the request completes on this other thread, another thread is picked from the main thread pool and the response is sent back to the client.
1) Where are these other threads located? Is there another thread pool?
2) IF ASP.NET likes creating new threads in this other thread pool(?), why not increase the number of threads in the main worker pool - they are all running on the same machine anyway? I don't see the advantage of moving that request to this other thread pool. Memory/CPU should be the same right?
3) If the main thread hands off a request to this other thread, why does the request not get disconnected? It magically hands off the request to another worker thread somewhere else and when the long running process completes, it picks a thread from the main worker pool and sends response to the client. I am amazed...but how does that work?
You didn't say which version of IIS or ASP.NET you're using. A lot of folks talk about IIS and ASP.NET as if they are one and the same, but they really are two components working together. Note that IIS 6 and 7 listen to an I/O completion port where they pick up completions from HTTP.sys. The IIS thread pool is used for this, and it has a maximum thread count of 256. This thread pool is designed in such a way that it does not handle long running tasks well. The recommendation from the IIS team is to switch to another thread if you're going to do substantial work, such as done by the ASP.NET ISAPI and/or ASP.NET "integrated mode" handler on IIS 7. Otherwise you will tie up IIS threads and prevent IIS from picking up completions from HTTP.sys Chances are you don't care about any of this, because you're not writing native code, that is, you're not writing an ISAPI or native handler for the IIS 7 pipeline. You're probably just using ASP.NET, in which case you're more interested in its thread pool and how it works.
There is a blog post at http://blogs.msdn.com/tmarq/archive/2007/07/21/asp-net-thread-usage-on-iis-7-0-and-6-0.aspx that explains how ASP.NET uses threads. Note that for ASP.NET v2.0 and v3.5 on IIS 7 you should increase MaxConcurrentRequestsPerCPU to 5000--it is a bug that it was set to 12 by default on those platforms. The new default for MaxConcurrentRequestsPerCPU in ASP.NET v4.0 on IIS 7 is 5000.
To answer your three questions:
1) First, a little primer. Only 1 thread per CPU can execute at a time. When you have more than this, you pay a penalty--a context switch is necessary every time the CPU switches to another thread, and these are expensive. However, if a thread is blocked waiting on work...then it makes sense to switch to another thread, one that can execute now.
So if I have a thread that is doing a lot of computational work and using the CPU heavily, and this takes a long time, should I switch to another thread? No! The current thread is efficiently using the CPU, so switching will only incur the cost of a context switch.
So if I have a thread that makes an HTTP or SOAP request to another server and takes a long time, should I switch threads? Yes! You can perform the HTTP or SOAP request asynchronously, so that once the "send" has occurred, you can unwind the current thread and not use any threads until there is an I/O completion for the "receive". Between the "send" and the "receive", the remote server is busy, so locally you don't need to be blocking on a thread, but instead make use of the async APIs provided in .NET Framework so that you can unwind and be notified upon completion.
Ok, so you're #1 questions was "Where are these other threads located? Is there another thread pool?" This depends. Most code that runs in .NET Framework uses the CLR ThreadPool, which consists of two types of threads, worker threads and i/o completion threads. What about code that doesn't use CLR ThreadPool? Well, it can create its own threads, use its own thread pool, or whatever it wants because it has access to the Win32 APIs provided by the operating system. Based on what we discussed a bit ago, it really doesn't matter where the thread comes from, and a thread is a thread as far as the operating system and hardware is concerned.
2) In your second question, you state, "I don't see the advantage of moving that request to this other thread pool." You're correct in thinking that there is NO advantage to switching unless you're going to make up for that costly context switch you just performed in order to switch. That's why I gave an example of a slow HTTP or SOAP request to a remote server as an example of a good reason to switch. And by the way, ASP.NET does not create any threads. It uses the CLR ThreadPool, and the threads in that pool are entirely managed by the CLR. They do a pretty good job of determining when you need more threads. For example, that's why ASP.NET can easily scale from executing 1 request concurrently to executing 300 requests concurrently, without doing anything. The incoming requests are posted to the CLR ThreadPool via a call to QueueUserWorkItem, and the CLR decides when to call the WaitCallback (see MSDN).
3) The third question is, "If the main thread hands off a request to this other thread, why does the request not get disconnected?" Well, IIS picks up the I/O completion from HTTP.sys when the request initially arrives at the server. IIS then invokes ASP.NET's handler (or ISAPI). ASP.NET immediately queues the request to the CLR Threadpool, and returns a pending status to IIS. This pending status tells IIS that we're not done yet, but as soon as we are done we'll let you know. Now ASP.NET manages the life of that request. When a CLR ThreadPool thread invokes the ASP.NET WaitCallback (see MSDN), it can execute the entire request on that thread, which is the normal case. Or it can switch to one or more other threads if the request is what we call asynchronous--i.e. it has an asynchronous module or handler. Either way, there are well defined ways in which the request completes, and when it finally does, ASP.NET will tell IIS we're done, and IIS will send the final bytes to the client and close the connection if Keep-Alive is not being used.
Regards,
Thomas
Async pages in ASP.NET use asynchronous callbacks, and asynchronous callbacks use the Thread Pool, and it is the same thread pool used to serve ASP.NET requests.
However, it's not quite that simple. The .NET ThreadPool has two types of threads - worker threads and I/O threads. I/O threads use what's called an I/O Completion Port, which is (greatly oversimplifying here) a thread-free or thread-agnostic means of waiting for a read/write operation on a file handle to complete, subsequently running a callback method.
(Note that a file handle does not necessarily refer to a file on disk; as far as Windows is concerned, it could just as well be a socket, pipe, etc.)
A typical .NET web developer doesn't really need to know about any of this. Of course, if you were writing an actual web server, or any kind of network server, then you would definitely need to learn about these, because they are the only way to handle hundreds of incoming connections without actually spawning hundreds of threads to serve them. There's a Managed I/O Completion Port tutorial (CodeProject) if you're interested.
Anyway, getting back on topic; when you interact with the thread pool at a high level, i.e. by writing:
ThreadPool.QueueUserWorkItem(s => DoSomeWork(s));
This does not use an I/O completion port. Ever. It posts the work to one of the normal worker threads managed by thread pool. It's the same if you use async callbacks:
Func<int> asyncFunc;
IAsyncResult BeginOperation(object sender, EventArgs e, AsyncCallback cb,
object state)
{
asyncFunc = () => { Thread.Sleep(500); return 42; };
return asyncFunc.BeginInvoke(cb, state);
}
void EndOperation(IAsyncResult ar)
{
int result = asyncFunc.EndInvoke(ar);
Console.WriteLine(result);
}
Again - same deal. Inside the EndOperation you're running on a ThreadPool worker thread. You can verify this by inserting the following debugging code:
void EndSimpleWait(IAsyncResult ar)
{
int maxWorkers, maxIO, availableWorkers, availableIO;
ThreadPool.GetMaxThreads(out maxWorkers, out maxIO);
ThreadPool.GetAvailableThreads(out availableWorkers, out availableIO);
int result = asyncFunc.EndInvoke(ar);
}
Slap a breakpoint in there and you'll see that availableWorkers is one less than maxWorkers, while maxIO and availableIO are the same.
But some async operations are "special" in .NET. This actually has nothing to do with ASP.NET directly - they'll use I/O completion ports in a Winforms or WPF app too. Examples are:
System.Net.Sockets.Socket (BeginReceive) and a whole bunch of other BeginXYZ methods)
System.IO.FileStream (BeginRead and BeginWrite)
System.ServiceModel.ClientBase<T> (BeginInvoke)
System.Net.WebRequest (BeginGetResponse)
And so on, this is nowhere near a full list. Basically almost every class in the .NET Framework that exposes its own BeginXYZ and EndXYZ methods and could conceivably perform any I/O, probably uses I/O completion ports. That's to make it easier for you, the application developer, because I/O threads are kind of hard to implement yourself in .NET.
My guess is that the .NET Framework designers deliberately chose to make it difficult to post I/O operations (compared to worker threads, where you can just write ThreadPool.QueueUserWorkItem) because it's comparatively "dangerous" if you don't know how to use them properly; by contrast, it's actually pretty straightforward to spawn these in the Windows API.
As before, you can verify what's happening with some debugging code:
WebRequest request;
IAsyncResult BeginDownload(object sender, EventArgs e,
AsyncCallback cb, object state)
{
request = WebRequest.Create("http://www.example.com");
return request.BeginGetResponse(cb, state);
}
void EndDownload(IAsyncResult ar)
{
int maxWorkers, maxIO, availableWorkers, availableIO;
ThreadPool.GetMaxThreads(out maxWorkers, out maxIO);
ThreadPool.GetAvailableThreads(out availableWorkers, out availableIO);
string html;
using (WebResponse response = request.EndGetResponse(ar))
{
using (StreamReader reader = new
StreamReader(response.GetResponseStream()))
{
html = reader.ReadToEnd();
}
}
}
If you step through this one, you'll see that the thread stats are different. The availableWorkers will match maxWorkers, but availableIO is one less than maxIO. That's because you're running on an I/O thread. That's also why you're not supposed to do any expensive computations in async callbacks - posting CPU-intensive work on an I/O completion port is inefficient and, well, bad.
All of this explains why it's strongly recommended that you use Async pages in ASP.NET when you need to perform any I/O operations. The pattern is only useful for I/O operations; non-I/O async operations will end up being posted to worker threads in the ThreadPool and you'll still end up blocking subsequent ASP.NET requests. But you can spawn a virtually unlimited number of async I/O operations and not give it a second thought; these won't use any threads at all until the I/O is complete and the callback is ready to begin.
So, to summarize - there is only one ThreadPool, but there are different kinds of threads in it, and if you're performing slow I/O operations then it's much more efficient to use the I/O threads. It's got nothing to do with CPU or memory, it's all about I/O and file handles.
As for #3, it's not really a question of "why doesn't the request get disconnected", more like a question of "why would it?" A socket doesn't get closed simply because there's no thread currently sending to or receiving data from it, same way your front door doesn't automatically close if there's nobody there to greet guests. Client operations may time out if the server doesn't answer them, and may subsequently choose to disconnect from their end, but that's another issue altogether.
1) The threads are in w3svc or whatever process is running the ASP.NET engine in your particular version of IIS.
2) Not sure what you mean here. You actually have control over how many threads are in the worker thread pool. This article is pretty good: http://msdn.microsoft.com/en-us/library/ms998549.aspx
3) I think you are confusing Requests and connections... To be honest, I haven't a clue how the internals of IIS works, but generally in applications that handle multiple requests simultaneously there is ONE master listening thread that will then hand off the actual work to a child thread (and do nothing else). The original request is not "disconnected" because these things are happening at completely different levels of the network protocol stack. Windows Server has no problem accepting multiple connections on TCP port 80. Think about how TCP/IP works and the fact that it is sending multiple discrete packets of information. You are thinking of "connection" like a single hose going from spigot A to spigot B, but of course that's not how it really works. It is more akin to a bucket that is just collecting whatever gets spilled into it.
Hope this helps.
The answer also depends on which version of IIS you're talking about. In earlier versions, ASP.NET did not use "IIS threads". They were .NET ThreadPool threads. In IIS 7, the IIS and ASP.NET pipelines have been merged. I don't know which threads ASP.NET uses now.
The bottom line is, don't spawn your own threads.

Using ThreadPool.QueueUserWorkItem in ASP.NET in a high traffic scenario

I've always been under the impression that using the ThreadPool for (let's say non-critical) short-lived background tasks was considered best practice, even in ASP.NET, but then I came across this article that seems to suggest otherwise - the argument being that you should leave the ThreadPool to deal with ASP.NET related requests.
So here's how I've been doing small asynchronous tasks so far:
ThreadPool.QueueUserWorkItem(s => PostLog(logEvent))
And the article is suggesting instead to create a thread explicitly, similar to:
new Thread(() => PostLog(logEvent)){ IsBackground = true }.Start()
The first method has the advantage of being managed and bounded, but there's the potential (if the article is correct) that the background tasks are then vying for threads with ASP.NET request-handlers. The second method frees up the ThreadPool, but at the cost of being unbounded and thus potentially using up too many resources.
So my question is, is the advice in the article correct?
If your site was getting so much traffic that your ThreadPool was getting full, then is it better to go out-of-band, or would a full ThreadPool imply that you're getting to the limit of your resources anyway, in which case you shouldn't be trying to start your own threads?
Clarification: I'm just asking in the scope of small non-critical asynchronous tasks (eg, remote logging), not expensive work items that would require a separate process (in these cases I agree you'll need a more robust solution).
Other answers here seem to be leaving out the most important point:
Unless you are trying to parallelize a CPU-intensive operation in order to get it done faster on a low-load site, there is no point in using a worker thread at all.
That goes for both free threads, created by new Thread(...), and worker threads in the ThreadPool that respond to QueueUserWorkItem requests.
Yes, it's true, you can starve the ThreadPool in an ASP.NET process by queuing too many work items. It will prevent ASP.NET from processing further requests. The information in the article is accurate in that respect; the same thread pool used for QueueUserWorkItem is also used to serve requests.
But if you are actually queuing enough work items to cause this starvation, then you should be starving the thread pool! If you are running literally hundreds of CPU-intensive operations at the same time, what good would it do to have another worker thread to serve an ASP.NET request, when the machine is already overloaded? If you're running into this situation, you need to redesign completely!
Most of the time I see or hear about multi-threaded code being inappropriately used in ASP.NET, it's not for queuing CPU-intensive work. It's for queuing I/O-bound work. And if you want to do I/O work, then you should be using an I/O thread (I/O Completion Port).
Specifically, you should be using the async callbacks supported by whatever library class you're using. These methods are always very clearly labeled; they start with the words Begin and End. As in Stream.BeginRead, Socket.BeginConnect, WebRequest.BeginGetResponse, and so on.
These methods do use the ThreadPool, but they use IOCPs, which do not interfere with ASP.NET requests. They are a special kind of lightweight thread that can be "woken up" by an interrupt signal from the I/O system. And in an ASP.NET application, you normally have one I/O thread for each worker thread, so every single request can have one async operation queued up. That's literally hundreds of async operations without any significant performance degradation (assuming the I/O subsystem can keep up). It's way more than you'll ever need.
Just keep in mind that async delegates do not work this way - they'll end up using a worker thread, just like ThreadPool.QueueUserWorkItem. It's only the built-in async methods of the .NET Framework library classes that are capable of doing this. You can do it yourself, but it's complicated and a little bit dangerous and probably beyond the scope of this discussion.
The best answer to this question, in my opinion, is don't use the ThreadPool or a background Thread instance in ASP.NET. It's not at all like spinning up a thread in a Windows Forms application, where you do it to keep the UI responsive and don't care about how efficient it is. In ASP.NET, your concern is throughput, and all that context switching on all those worker threads is absolutely going to kill your throughput whether you use the ThreadPool or not.
Please, if you find yourself writing threading code in ASP.NET - consider whether or not it could be rewritten to use pre-existing asynchronous methods, and if it can't, then please consider whether or not you really, truly need the code to run in a background thread at all. In the majority of cases, you will probably be adding complexity for no net benefit.
Per Thomas Marquadt of the ASP.NET team at Microsoft, it is safe to use the ASP.NET ThreadPool (QueueUserWorkItem).
From the article:
Q) If my ASP.NET Application uses CLR ThreadPool threads, won’t I starve ASP.NET, which also uses the CLR ThreadPool to execute requests?
..
A) To summarize, don’t worry about
starving ASP.NET of threads, and if
you think there’s a problem here let
me know and we’ll take care of it.
Q) Should I create my own threads
(new Thread)? Won’t this be better
for ASP.NET, since it uses the CLR
ThreadPool.
A) Please don’t. Or to put it a
different way, no!!! If you’re really
smart—much smarter than me—then you
can create your own threads;
otherwise, don’t even think about it.
Here are some reasons why you should
not frequently create new threads:
It is very expensive, compared to
QueueUserWorkItem...By the way, if you can write a better ThreadPool than the CLR’s, I encourage you to apply for a job at Microsoft, because we’re definitely looking for people like you!.
Websites shouldn't go around spawning threads.
You typically move this functionality out into a Windows Service that you then communicate with (I use MSMQ to talk to them).
-- Edit
I described an implementation here: Queue-Based Background Processing in ASP.NET MVC Web Application
-- Edit
To expand why this is even better than just threads:
Using MSMQ, you can communicate to another server. You can write to a queue across machines, so if you determine, for some reason, that your background task is using up the resources of the main server too much, you can just shift it quite trivially.
It also allows you to batch-process whatever task you were trying to do (send emails/whatever).
I definitely think that general practice for quick, low-priority asynchronous work in ASP.NET would be to use the .NET thread pool, particularly for high-traffic scenarios as you want your resources to be bounded.
Also, the implementation of threading is hidden - if you start spawning your own threads, you have to manage them properly as well. Not saying you couldn't do it, but why reinvent that wheel?
If performance becomes an issue, and you can establish that the thread pool is the limiting factor (and not database connections, outgoing network connections, memory, page timeouts etc) then you tweak the thread pool configuration to allow more worker threads, higher queued requests, etc.
If you don't have a performance problem then choosing to spawn new threads to reduce contention with the ASP.NET request queue is classic premature optimization.
Ideally you wouldn't need to use a separate thread to do a logging operation though - just enable the original thread to complete the operation as quickly as possible, which is where MSMQ and a separate consumer thread / process come in to the picture. I agree that this is heavier and more work to implement, but you really need the durability here - the volatility of a shared, in-memory queue will quickly wear out its welcome.
You should use QueueUserWorkItem, and avoid creating new threads like you would avoid the plague. For a visual that explains why you won't starve ASP.NET, since it uses the same ThreadPool, imagine a very skilled juggler using two hands to keep a half dozen bowling pins, swords, or whatever in flight. For a visual of why creating your own threads is bad, imagine what happens in Seattle at rush hour when heavily used entrance ramps to the highway allow vehicles to enter traffic immediately instead of using a light and limiting the number of entrances to one every few seconds. Finally, for a detailed explanation, please see this link:
http://blogs.msdn.com/tmarq/archive/2010/04/14/performing-asynchronous-work-or-tasks-in-asp-net-applications.aspx
Thanks,
Thomas
That article is not correct. ASP.NET has it's own pool of threads, managed worker threads, for serving ASP.NET requests. This pool is usually a few hundred threads and is separate from the ThreadPool pool, which is some smaller multiple of processors.
Using ThreadPool in ASP.NET will not interfere with ASP.NET worker threads. Using ThreadPool is fine.
It would also be acceptable to setup a single thread which is just for logging messages and using producer/consumer pattern to pass logs messages to that thread. In that case, since the thread is long-lived, you should create a single new thread to run the logging.
Using a new thread for every message is definitely overkill.
Another alternative, if you're only talking about logging, is to use a library like log4net. It handles logging in a separate thread and takes care of all the context issues that could come up in that scenario.
I'd say the article is wrong. If you're running a large .NET shop you can safely use the pool across multiple apps and multiple websites (using seperate app pools), simply based on one statement in the ThreadPool documentation:
There is one thread pool per process.
The thread pool has a default size of
250 worker threads per available
processor, and 1000 I/O completion
threads. The number of threads in the
thread pool can be changed by using
the SetMaxThreads method. Each thread
uses the default stack size and runs
at the default priority.
I was asked a similar question at work last week and I'll give you the same answer. Why are you multi threading web applications per request? A web server is a fantastic system optimized heavily to provide many requests in a timely fashion (i.e. multi threading). Think of what happens when you request almost any page on the web.
A request is made for some page
Html is served back
The Html tells the client to make further requets (js, css, images, etc..)
Further information is served back
You give the example of remote logging, but that should be a concern of your logger. An asynchronous process should be in place to receive messages in a timely fashion. Sam even points out that your logger (log4net) should already support this.
Sam is also correct in that using the Thread Pool on the CLR will not cause issues with the thread pool in IIS. The thing to be concerned with here though, is that you are not spawning threads from a process, you are spawning new threads off of IIS threadpool threads. There is a difference and the distinction is important.
Threads vs Process
Both threads and processes are methods
of parallelizing an application.
However, processes are independent
execution units that contain their own
state information, use their own
address spaces, and only interact with
each other via interprocess
communication mechanisms (generally
managed by the operating system).
Applications are typically divided
into processes during the design
phase, and a master process explicitly
spawns sub-processes when it makes
sense to logically separate
significant application functionality.
Processes, in other words, are an
architectural construct.
By contrast, a thread is a coding
construct that doesn't affect the
architecture of an application. A
single process might contains multiple
threads; all threads within a process
share the same state and same memory
space, and can communicate with each
other directly, because they share the
same variables.
Source
You can use Parallel.For or Parallel.ForEach and define the limit of possible threads you want to allocate to run smoothly and prevent pool starvation.
However, being run in background you will need to use pure TPL style below in ASP.Net web application.
var ts = new CancellationTokenSource();
CancellationToken ct = ts.Token;
ParallelOptions po = new ParallelOptions();
po.CancellationToken = ts.Token;
po.MaxDegreeOfParallelism = 6; //limit here
Task.Factory.StartNew(()=>
{
Parallel.ForEach(collectionList, po, (collectionItem) =>
{
//Code Here PostLog(logEvent);
}
});
I do not agree with the referenced article(C#feeds.com). It is easy to create a new thread but dangerous. The optimal number of active threads to run on a single core is actually surprisingly low - less than 10. It is way too easy to cause the machine to waste time switching threads if threads are created for minor tasks. Threads are a resource that REQUIRE management. The WorkItem abstraction is there to handle this.
There is a trade off here between reducing the number of threads available for requests and creating too many threads to allow any of them to process efficiently. This is a very dynamic situation but I think one that should be actively managed (in this case by the thread pool) rather than leaving it to the processer to stay ahead of the creation of threads.
Finally the article makes some pretty sweeping statements about the dangers of using the ThreadPool but it really needs something concrete to back them up.
Whether or not IIS uses the same ThreadPool to handle incoming requests seems hard to get a definitive answer to, and also seems to have changed over versions. So it would seem like a good idea not to use ThreadPool threads excessively, so that IIS has a lot of them available. On the other hand, spawning your own thread for every little task seems like a bad idea. Presumably, you have some sort of locking in your logging, so only one thread could progress at a time, and the rest would just take turns getting scheduled and unscheduled (not to mention the overhead of spawning a new thread). Essentially, you run into the exact problems the ThreadPool was designed to avoid.
It seems that a reasonable compromise would be for your app to allocate a single logging thread that you could pass messages to. You would want to be careful that sending messages is as fast as possible so that you don't slow down your app.

Resources