I'm not very experienced in web programming,
and I haven't actually coded anything in Node.js yet, just curious about the event-driven approach. It does seems good.
The article explains some bad things that could happen when we use a thread-based approach to handle requests, and should opt for a event-driven approach instead.
In thread-based, the cashier/thread is stuck with us until our food/resource is ready. While in event-driven, the cashier send us somewhere out of the request queue so we don't block other requests while waiting for our food.
To scale the blocking thread-based, you need to increase the number of threads.
To me this seems like a bad excuse for not using threads/threadpools properly.
Couldn't that be properly handled using IHttpAsyncHandler?
ASP.Net receives a request, uses the ThreadPool and runs the handler (BeginProcessRequest), and then inside it we load the file/database with a callback. That Thread should then be free to handle other requests. Once the file-reading is done, the ThreadPool is called into action again and executes the remaining response.
Not so different for me, so why is that not as scalable?
One of the disadvantages of the thread-based that I do know is, using threads needs more memory. But only with these, you can enjoy the benefits of multiple cores. I doubt Node.js is not using any threads/cores at all.
So, based on just the event-driven vs thread-based (don't bring the "because it's Javascript and every browser..." argument), can someone point me out what is the actual benefit of using Node.js instead of the existing technology?
That was a long question. Thanks :)
First of all, Node.js is not multi-threaded. This is important. You have to be a very talented programmer to design programs that work perfectly in a threaded environment. Threads are just hard.
You have to be a god to maintain a threaded project where it wasn't designed properly. There are just so many problems that can be hard to avoid in very large projects.
Secondly, the whole platform was designed to be run asynchronously. Have you see any ASP.NET project where every single IO interaction was asynchronous? simply put, ASP.NET was not designed to be event-driven.
Then, there's the memory footprint due to the fact that we have one thread per open-connection and the whole scaling issue. Correct me if I'm wrong but I don't know how you would avoid creating a new thread for each connection in ASP.NET.
Another issue is that a Node.js request is idle when it's not being used or when it's waiting for IO. On the other hand, a C# thread sleeps. Now, there is a limit to the number of these threads that can sleep. In Node.js, you can easily handle 10k clients at the same time in parallel on one development machine. You try handling 10k threads in parallel on one development machine.
JavaScript itself as a language makes asynchronous coding easier. If you're still in C# 2.0, then the asynchronous syntax is a real pain. A lot of developers will simply get confused if you're defining Action<> and Function<> all over the place and using callbacks. An ASP.NET project written in an evented way is just not maintainable by an average ASP.NET developer.
As for threads and cores. Node.js is single-threaded and scales by creating multiple-node processes. If you have a 16 core then you run 16 instances of your node.js server and have a single Node.js load balancer in front of it. (Maybe a nginx load balancer if you want).
This was all written into the platform at a very low-level right from the beginning. This was not some functionality bolted on later down the line.
Other advantages
Node.js has a lot more to it then above. Above is only why Node.js' way of handling the event loop is better than doing it with asynchronous capabilities in ASP.NET.
Performance. It's fast. Real fast.
One big advantage of Node.js is its low-level API. You have a lot of control.
You have the entire HTTP server integrated directly into your code then outsourced to IIS.
You have the entire nginx vs Apache comparison.
The entire C10K challenge is handled well by node but not by IIS
AJAX and JSON communication feels natural and easy.
Real-time communication is one of the great things about Node.js. It was made for it.
Plays nicely with document-based nosql databases.
Can run a TCP server as well. Can do file-writing access, can run any unix console command on the server.
You query your database in javascript using, for example, CouchDB and map/reduce. You write your client in JavaScript. There are no context switches whilst developing on your web stack.
Rich set of community-driven open-source modules. Everything in node.js is open source.
Small footprint and almost no dependencies. You can build the node.js source yourself.
Disadvantages of Node.js
It's hard. It's young. As a skilled JavaScript developer, I face difficulty writing a website with Node.js just because of its low-level nature and the level of control I have. It feels just like C. A lot of flexibility and power either to be used for me or to hang me.
The API is not frozen. It's changing rapidly. I can imagine having to rewrite a large website completely in 5 years because of the amount Node.js will be changed by then. It is do-able, you just have to be aware that maintenance on node.js websites is not cheap.
further reading
http://blog.mixu.net/2011/02/01/understanding-the-node-js-event-loop/
http://blip.tv/file/2899135
http://nodeguide.com/
There are a lot of misconceptions regarding node.js vs. ASP.Net and asynchronous programming. You can do non blocking IO in ASP.NET. Most people don't know that the .Net framework uses Windows iocompletion ports underneath when you do web service calls or other I/O bound operations using the begin/end pattern in .Net 2.0 and above. IO completion ports is the way the Windows operating system supports non-blocking IO so that the app thread is freed why the IO operation completes. Interestingly, node.js uses a less optimal non blocking IO implementation in Windows through Cygwin. A new Windows version is on the road map, which with Microsoft's guidance will be using IO completions ports. At that point there is underneath no difference.
It is also possible to do non-blocking database calls in ADO.NET but be aware of ORM tools such as NHibernate and Entity Framework. They are still very much synchronous.
Synchronous IO (blocking) makes the control flow much clearer and it has for this reason become popular. The reason why computer environments are multithreaded has only superficially to do with this. It is more generally related to time sharing and utilization of multiple CPUs.
Having only a single thread can cause starvation during lengthy operations, which can be related to both IO and complex computations. So, even though the rule of thumb is one thread pr. core when utilizing non-blocking IO, one should still consider a sufficient thread pool size so that simple requests don't get starved by more complex operations if such exist. Multiple threads also allows complex operations to be split easily among multiple CPUs. A single threaded environment like node.js can only utilize multicore processors through more processes and message passing to coordinate action.
I have personally not yet seen any compelling argument to introduce an additional technology such a node.js. However, there may be good reasons but they have in my opinion little to do with servicing a large number of connections through non-blocking IO since this can also be done with ASP.NET.
BTW tamejs can help make your nodejs code more readable similar to the new upcoming .Net Async CTP.
It is easy to understate the cultural difference between the Node.js and ASP.NET communities. Sure, IHttpAsyncHandler exists and it's been around since .NET 1.0 so it might even be good, but all of the code and discussion around Node.js is about async I/O which is decidedly not the case when it comes to .NET. Want to use LINQ To SQL? You kind of can, kind of. Want to log stuff? Maybe "CSharp DotNet Logger" will work, maybe.
So yes, IHttpAsyncHandler is there and if you're really careful you might be able to write an event driven web-service without tripping over some blocking I/O somewhere, but I don't really get the impression a lot of people are using it (and it certainly isn't the prominent way for writing ASP.NET apps). In contrast, Node.js is all about evented I/O, all the code examples, all the libraries and it's the only way people are using it. So if you were going to bet on which one's evented I/O model actually worked all the way through, Node.js would probably be the one to pick.
As per current age technology improvements and reading below links, I can say, it is matter of expertise and choosing perfect mix as per the particular scenario that matters. NodeJS is getting mature and ASP.NET side we have ASP.NET MVC, WebAPI, and SignalR etc. to make things better.
Node.js vs .Net performance
http://www.salmanq.com/blog/net-and-node-js-performance-comparison/2013/03/
and
http://www.hanselman.com/blog/InstallingAndRunningNodejsApplicationsWithinIISOnWindowsAreYouMad.aspx
Thanks.
In my ASP.NET MVC application I have a number of threads that wait for a certain length of time and wake up to do some clean tasks over and over. I have not deployed this application to a production server yet but on my dev machine they seem to work as expected. For these threads to work the same on IIS7 do I need to look out for anything? Will IIS7 keep my threads alive indefinitely? are there implications to worry about?
Also I want to queue, lets say 50 objects that were created through various requests and process them all in one go. I'd like to maintain them inside a list and then process the list which means that the list object has to be kept alive indefinitely. I'd like to avoid serializing my objects into the DB in order to maintain this queue. What is the correct way of achieving this?
Will IIS7 keep my threads alive
indefinitely?
No, if the application pool recycles (if there's a long inactivity or some memory threshold is hit) those threads will be stopped as the application will be unloaded from memory. If those objects are so much precise I wouldn't recommend you keeping them in memory but rather serialize them to some persistent storage so that they could be processed later in case of failure.
The design you describe is fine when you don't mind losing cached commands in the queue. Otherwise it would be better to go with a different design. ASP.NET isn't suited for this type of processing, because IIS can recycle the process. When that happens you lose your in-memory queue. IIS could also decide to unload the AppDomain because no new requests are coming in. In that case your threads will also stop running which means that pending operations will still not been cached, even when you use a persisted queue.
You'd probably be better of with some sort of transactional queue, such as MSMQ or a custom table in the database (or look at the open source NServiceBus). Adding operations to the queue can be done by your web application and processing items can be done within a Windows service application that will not be recycled and can process the queue in a transactional way.
Since you're talking about multiple threads: when using a Windows service you can build it in such way that it can run multiple threads or make it single threaded and run several instances of the same thread. This is a very flexible design that I used successfully in the past to distribute CPU and disk intensive operations over multiple machines.
How can MVC website benefit from the new parallelism features in .net 4? don't websites support parallelism by default since multiple users access them at the same time? Can someone clarify this?
Executing tasks in parallel is especially useful for long running tasks. What constitutes a long running task might differ, but it should be longer than the overhead of spinning up and synchronizing the threads.
So, there is no particular benefit for MVC, but there is a general benefit for each request which require more things to be run in parallel.
There's an article from 2007 in MSDN Magazine which outlines some performance aspects of the parallel library.
Example 1: a user hits a page which displays two different graphs. Each graph is calculated from a dataset. Executing the calculations in parallel will benefit the overall time to render the page. (Executing parallel individual tasks)
Example 2: You need to execute some function on a list of data, And use Parallel.For to enumerate over the data and execute some code on it in parallel.
You should analyze your application and figure out which parts can be run in parallel, and then test with the new language features if it helps your application or not.
Parallelism is by computer processor. Parallelism doesn't mean multiple users access the application. Parallelism is that for one request, the app can split tasks across multiple processors to do the work. So multiple users accessing the app is a factor in how the server dishes out the work, but not the core idea of parallelism.
Every request is handled in its own thread, so you get some degree of parallelism by default with ASP.Net.
When considering implementing any kind of parallelism it is important to test your code with and without parallelism. You have to make sure that the overhead of wrapping, distributing and executing each task is not greater than running the tasks in serial. It often is for most of your day-to-day loops. Parallelism is best used in computationally intensive tasks.
I've always been under the impression that using the ThreadPool for (let's say non-critical) short-lived background tasks was considered best practice, even in ASP.NET, but then I came across this article that seems to suggest otherwise - the argument being that you should leave the ThreadPool to deal with ASP.NET related requests.
So here's how I've been doing small asynchronous tasks so far:
ThreadPool.QueueUserWorkItem(s => PostLog(logEvent))
And the article is suggesting instead to create a thread explicitly, similar to:
new Thread(() => PostLog(logEvent)){ IsBackground = true }.Start()
The first method has the advantage of being managed and bounded, but there's the potential (if the article is correct) that the background tasks are then vying for threads with ASP.NET request-handlers. The second method frees up the ThreadPool, but at the cost of being unbounded and thus potentially using up too many resources.
So my question is, is the advice in the article correct?
If your site was getting so much traffic that your ThreadPool was getting full, then is it better to go out-of-band, or would a full ThreadPool imply that you're getting to the limit of your resources anyway, in which case you shouldn't be trying to start your own threads?
Clarification: I'm just asking in the scope of small non-critical asynchronous tasks (eg, remote logging), not expensive work items that would require a separate process (in these cases I agree you'll need a more robust solution).
Other answers here seem to be leaving out the most important point:
Unless you are trying to parallelize a CPU-intensive operation in order to get it done faster on a low-load site, there is no point in using a worker thread at all.
That goes for both free threads, created by new Thread(...), and worker threads in the ThreadPool that respond to QueueUserWorkItem requests.
Yes, it's true, you can starve the ThreadPool in an ASP.NET process by queuing too many work items. It will prevent ASP.NET from processing further requests. The information in the article is accurate in that respect; the same thread pool used for QueueUserWorkItem is also used to serve requests.
But if you are actually queuing enough work items to cause this starvation, then you should be starving the thread pool! If you are running literally hundreds of CPU-intensive operations at the same time, what good would it do to have another worker thread to serve an ASP.NET request, when the machine is already overloaded? If you're running into this situation, you need to redesign completely!
Most of the time I see or hear about multi-threaded code being inappropriately used in ASP.NET, it's not for queuing CPU-intensive work. It's for queuing I/O-bound work. And if you want to do I/O work, then you should be using an I/O thread (I/O Completion Port).
Specifically, you should be using the async callbacks supported by whatever library class you're using. These methods are always very clearly labeled; they start with the words Begin and End. As in Stream.BeginRead, Socket.BeginConnect, WebRequest.BeginGetResponse, and so on.
These methods do use the ThreadPool, but they use IOCPs, which do not interfere with ASP.NET requests. They are a special kind of lightweight thread that can be "woken up" by an interrupt signal from the I/O system. And in an ASP.NET application, you normally have one I/O thread for each worker thread, so every single request can have one async operation queued up. That's literally hundreds of async operations without any significant performance degradation (assuming the I/O subsystem can keep up). It's way more than you'll ever need.
Just keep in mind that async delegates do not work this way - they'll end up using a worker thread, just like ThreadPool.QueueUserWorkItem. It's only the built-in async methods of the .NET Framework library classes that are capable of doing this. You can do it yourself, but it's complicated and a little bit dangerous and probably beyond the scope of this discussion.
The best answer to this question, in my opinion, is don't use the ThreadPool or a background Thread instance in ASP.NET. It's not at all like spinning up a thread in a Windows Forms application, where you do it to keep the UI responsive and don't care about how efficient it is. In ASP.NET, your concern is throughput, and all that context switching on all those worker threads is absolutely going to kill your throughput whether you use the ThreadPool or not.
Please, if you find yourself writing threading code in ASP.NET - consider whether or not it could be rewritten to use pre-existing asynchronous methods, and if it can't, then please consider whether or not you really, truly need the code to run in a background thread at all. In the majority of cases, you will probably be adding complexity for no net benefit.
Per Thomas Marquadt of the ASP.NET team at Microsoft, it is safe to use the ASP.NET ThreadPool (QueueUserWorkItem).
From the article:
Q) If my ASP.NET Application uses CLR ThreadPool threads, won’t I starve ASP.NET, which also uses the CLR ThreadPool to execute requests?
..
A) To summarize, don’t worry about
starving ASP.NET of threads, and if
you think there’s a problem here let
me know and we’ll take care of it.
Q) Should I create my own threads
(new Thread)? Won’t this be better
for ASP.NET, since it uses the CLR
ThreadPool.
A) Please don’t. Or to put it a
different way, no!!! If you’re really
smart—much smarter than me—then you
can create your own threads;
otherwise, don’t even think about it.
Here are some reasons why you should
not frequently create new threads:
It is very expensive, compared to
QueueUserWorkItem...By the way, if you can write a better ThreadPool than the CLR’s, I encourage you to apply for a job at Microsoft, because we’re definitely looking for people like you!.
Websites shouldn't go around spawning threads.
You typically move this functionality out into a Windows Service that you then communicate with (I use MSMQ to talk to them).
-- Edit
I described an implementation here: Queue-Based Background Processing in ASP.NET MVC Web Application
-- Edit
To expand why this is even better than just threads:
Using MSMQ, you can communicate to another server. You can write to a queue across machines, so if you determine, for some reason, that your background task is using up the resources of the main server too much, you can just shift it quite trivially.
It also allows you to batch-process whatever task you were trying to do (send emails/whatever).
I definitely think that general practice for quick, low-priority asynchronous work in ASP.NET would be to use the .NET thread pool, particularly for high-traffic scenarios as you want your resources to be bounded.
Also, the implementation of threading is hidden - if you start spawning your own threads, you have to manage them properly as well. Not saying you couldn't do it, but why reinvent that wheel?
If performance becomes an issue, and you can establish that the thread pool is the limiting factor (and not database connections, outgoing network connections, memory, page timeouts etc) then you tweak the thread pool configuration to allow more worker threads, higher queued requests, etc.
If you don't have a performance problem then choosing to spawn new threads to reduce contention with the ASP.NET request queue is classic premature optimization.
Ideally you wouldn't need to use a separate thread to do a logging operation though - just enable the original thread to complete the operation as quickly as possible, which is where MSMQ and a separate consumer thread / process come in to the picture. I agree that this is heavier and more work to implement, but you really need the durability here - the volatility of a shared, in-memory queue will quickly wear out its welcome.
You should use QueueUserWorkItem, and avoid creating new threads like you would avoid the plague. For a visual that explains why you won't starve ASP.NET, since it uses the same ThreadPool, imagine a very skilled juggler using two hands to keep a half dozen bowling pins, swords, or whatever in flight. For a visual of why creating your own threads is bad, imagine what happens in Seattle at rush hour when heavily used entrance ramps to the highway allow vehicles to enter traffic immediately instead of using a light and limiting the number of entrances to one every few seconds. Finally, for a detailed explanation, please see this link:
http://blogs.msdn.com/tmarq/archive/2010/04/14/performing-asynchronous-work-or-tasks-in-asp-net-applications.aspx
Thanks,
Thomas
That article is not correct. ASP.NET has it's own pool of threads, managed worker threads, for serving ASP.NET requests. This pool is usually a few hundred threads and is separate from the ThreadPool pool, which is some smaller multiple of processors.
Using ThreadPool in ASP.NET will not interfere with ASP.NET worker threads. Using ThreadPool is fine.
It would also be acceptable to setup a single thread which is just for logging messages and using producer/consumer pattern to pass logs messages to that thread. In that case, since the thread is long-lived, you should create a single new thread to run the logging.
Using a new thread for every message is definitely overkill.
Another alternative, if you're only talking about logging, is to use a library like log4net. It handles logging in a separate thread and takes care of all the context issues that could come up in that scenario.
I'd say the article is wrong. If you're running a large .NET shop you can safely use the pool across multiple apps and multiple websites (using seperate app pools), simply based on one statement in the ThreadPool documentation:
There is one thread pool per process.
The thread pool has a default size of
250 worker threads per available
processor, and 1000 I/O completion
threads. The number of threads in the
thread pool can be changed by using
the SetMaxThreads method. Each thread
uses the default stack size and runs
at the default priority.
I was asked a similar question at work last week and I'll give you the same answer. Why are you multi threading web applications per request? A web server is a fantastic system optimized heavily to provide many requests in a timely fashion (i.e. multi threading). Think of what happens when you request almost any page on the web.
A request is made for some page
Html is served back
The Html tells the client to make further requets (js, css, images, etc..)
Further information is served back
You give the example of remote logging, but that should be a concern of your logger. An asynchronous process should be in place to receive messages in a timely fashion. Sam even points out that your logger (log4net) should already support this.
Sam is also correct in that using the Thread Pool on the CLR will not cause issues with the thread pool in IIS. The thing to be concerned with here though, is that you are not spawning threads from a process, you are spawning new threads off of IIS threadpool threads. There is a difference and the distinction is important.
Threads vs Process
Both threads and processes are methods
of parallelizing an application.
However, processes are independent
execution units that contain their own
state information, use their own
address spaces, and only interact with
each other via interprocess
communication mechanisms (generally
managed by the operating system).
Applications are typically divided
into processes during the design
phase, and a master process explicitly
spawns sub-processes when it makes
sense to logically separate
significant application functionality.
Processes, in other words, are an
architectural construct.
By contrast, a thread is a coding
construct that doesn't affect the
architecture of an application. A
single process might contains multiple
threads; all threads within a process
share the same state and same memory
space, and can communicate with each
other directly, because they share the
same variables.
Source
You can use Parallel.For or Parallel.ForEach and define the limit of possible threads you want to allocate to run smoothly and prevent pool starvation.
However, being run in background you will need to use pure TPL style below in ASP.Net web application.
var ts = new CancellationTokenSource();
CancellationToken ct = ts.Token;
ParallelOptions po = new ParallelOptions();
po.CancellationToken = ts.Token;
po.MaxDegreeOfParallelism = 6; //limit here
Task.Factory.StartNew(()=>
{
Parallel.ForEach(collectionList, po, (collectionItem) =>
{
//Code Here PostLog(logEvent);
}
});
I do not agree with the referenced article(C#feeds.com). It is easy to create a new thread but dangerous. The optimal number of active threads to run on a single core is actually surprisingly low - less than 10. It is way too easy to cause the machine to waste time switching threads if threads are created for minor tasks. Threads are a resource that REQUIRE management. The WorkItem abstraction is there to handle this.
There is a trade off here between reducing the number of threads available for requests and creating too many threads to allow any of them to process efficiently. This is a very dynamic situation but I think one that should be actively managed (in this case by the thread pool) rather than leaving it to the processer to stay ahead of the creation of threads.
Finally the article makes some pretty sweeping statements about the dangers of using the ThreadPool but it really needs something concrete to back them up.
Whether or not IIS uses the same ThreadPool to handle incoming requests seems hard to get a definitive answer to, and also seems to have changed over versions. So it would seem like a good idea not to use ThreadPool threads excessively, so that IIS has a lot of them available. On the other hand, spawning your own thread for every little task seems like a bad idea. Presumably, you have some sort of locking in your logging, so only one thread could progress at a time, and the rest would just take turns getting scheduled and unscheduled (not to mention the overhead of spawning a new thread). Essentially, you run into the exact problems the ThreadPool was designed to avoid.
It seems that a reasonable compromise would be for your app to allocate a single logging thread that you could pass messages to. You would want to be careful that sending messages is as fast as possible so that you don't slow down your app.