I have an application that is working well in production, but I wonder if I could have implemented the concurrency better....
ASP.NET .NET 4, C#
Basically, it generates n number of sql statements on the fly (approx 50 at the moment) and then runs them concurrently and writes the data to .csv files.
EDIT: First I create a thread to do all the work on so the page request can return. Then on that thread...
For each of the SQL statements I create a new Task using the TPL and execute it using a datareader and write the data to disk. When the last file is created I write some summary data to a summary file and zip it all up and give it to the user.
Should I have used Threads or Asynchronous Delegates instead?
I haven't posted code as I am really just wondering if my overall approach (i.e. TPL) is the best option in this situation.
Please don't lecture me about creating dynamic sql, it is totally necessary due to the technicalities of the database I am reading from and not relevant to the question. (Its the back end of a proprietary system. Got 7 thousand+ tables).
Should I have used Threads or Asynchronous Delegates instead?
Apparently, your background thread operation spans across the boundaries of a single HTTP request. In this case, it doesn't really matter what API you use to run such operation: Task.Run, Delegate.BeginInvoke, ThreadPool.QueueUserWorkItem, new Thread or anything else.
You shouldn't be running a lengthy background thread operation, which lifetime spans multiple HTTP requests, inside ASP.NET address space. While it's relatively easy to implement, this approach may have issues with IIS maintainability, scalability and security. Create a WCF service for that and call it from your ASP.NET page:
How to: Host a WCF Service in a Managed Windows Service.
If we start a new thread in ASP.Net from the thread which is serving the http request, and new thread has an unhandled exception, the worker process will crash immediately. Even if we use WCF service and call that from ASP.Net the ASP.Net thread is going to wait for the result. So better use any queuing mechanism so that the requests is in queue and queue can process in a different time based on the processing capacity. Of course when we say queuing we need to think about queue failure, requeue etc...But its worth if the application is big and needs to scale.
Related
This comment by Stephen Cleary says this:
AspNetSynchronizationContext is the strangest implementation. It treats Post as synchronous rather than asynchronous and uses a lock to execute its delegates one at a time.
Similarly, the article that he wrote on synchronization contexts and linked to in that comment suggests:
Conceptually, the context of AspNetSynchronizationContext is complex. During the lifetime of an asynchronous page, the context starts with just one thread from the ASP.NET thread pool. After the asynchronous requests have started, the context doesn’t include any threads. As the asynchronous requests complete, the thread pool threads executing their completion routines enter the context. These may be the same threads that initiated the requests but more likely would be whatever threads happen to be free at the time the operations complete.
If multiple operations complete at once for the same application, AspNetSynchronizationContext will ensure that they execute one at a time. They may execute on any thread, but that thread will have the identity and culture of the original page.
Digging in reflector seems to validate this as it takes a lock on the HttpApplication while invoking any callback.
Locking the app object seems like scary stuff. So my first question: Does that mean that today, all asynchronous completions for the entire app execute one at a time, even ones that originated from separate requests on separate threads with separate HttpContexts? Wouldn't this be a huge bottleneck for any apps that make 100% use of async pages (or async controllers in MVC)? If not, why not? What am I missing?
Also, in .NET 4.5, it looks like there's a new AspNetSynchronizationContext, and the old one is renamed LegacyAspNetSynchronizationContext and only used if the new app setting UseTaskFriendlySynchronizationContext is not set. So question #2: Does the new implementation change this behavior? Otherwise, I imagine with the new async/await support marshaling completions through the synchronization context, this kind of bottleneck would be noticed much more frequently going forward.
The answer to this forum post (linked from SO answer here) suggests that something fundamentally changed here, but I want to be clear on what that is and what behaviors have improved, since we have a .NET 4 MVC 3 app which is pretty much 100% async action methods making web service calls.
Let me answer your first question. In your assumption you didn't consider the fact that separate ASP.NET requests are processed by different HttpApplication objects. HttpApplication objects are stored in pool. Once you request a page, an application object is retrieved from pool and belongs to the request till its completion. So, my answer to your question:
all asynchronous completions for the entire app execute one at a time, even ones that originated from separate requests on separate threads with separate HttpContexts
is: No, they don't
Separate requests are processed by separate HttpApplication objects, locked HttpApplication will affect only single request.
Synchronization context is a powerful thing that helps developers to synchronize access to shared (in scope of request) resources. That is why all callbacks are executed under lock. Synchronization context is a heart of event-based synchronization pattern.
I'm just wondering, if I have an ASP.net Web Application, either WebForms or MVC, is there any situation where doing stuff asynchronously would make sense?
The Web Server already handles threading for me in that it spins up multiple threads to handle requests, and most request processing is rather simple and straight forward.
I see some use for when stuff truly is a) expensive and b) can be parallelized. but these are the minority cases (at least from what I've encountered).
Is there any gain from async in the simple "Read some input, do some CRUD, display some output" scenario?
If a page is making a request to an external web service on every request, then using the asynchronous BCL APIs to fetch data from the web service will help free up resources on the server. This is because Windows can make a distinction between a managed ASP thread that is waiting for stuff (asynchronous web service call) and a managed ASP thread that is doing stuff (synchronous web service call). These asynchronous calls may end up as I/O Completion Ports behind the scenes, freeing ASP from essentially all of the burdens. This can result in the web site being able to handle more simultaneous requests. Specifically, when an ASP thread is waiting for a callback from an asynchronous operation, ASP may decide to reuse that thread to serve other requests in the meantime.
The synchronous BCL way to invoke external resources is the Get(), Read(), EndRead() etc family of methods. Their asynchronous counterpart is the BeginGet(), EndGet(), BeginRead(), EndRead() etc family of methods.
If you have a page that does two things simultaneously, and both are essentially wait for external data type operations, then the asynchronous BCL APIs will enable parallellism automatically. If they are calculate pi type operations, then you may want to use the parallell BCL APIs instead.
Here's one example when you will clearly gain from an async call: imagine that in order to render a page you need to aggregate information coming from different web services (or database calls) where each being independent and resulting in a network call. In this case the async pattern is very good because you have IO bound operations for which IO Completion Ports will be used and while waiting for the response you won't monopolize worker threads. While the operations are running no thread will be consumed on the server side.
If you have many CPU bound operations than async pattern will not bring much benefit because while you free the request worker thread from performing the operation, another thread will be consumed to do calculations.
I find this article a very useful reference.
Problem: Some 300 candidates make a test using Flex. A test consist of some 100 exercises. After each exercise a .NET service is called to store the result. If a candidate finishes a test, all the data of his/her test is denormalized by Asp.NET. This denormalization can take some cpu and can take 5 to 10 seconds. Now, most of the times, some of the candidates have finished their test earlier than the rest, but still some 200 of them wait until their time is up. At that moment, 200 candidates finish their test and 200 sessions are denormalized at the same time. At this point, server load (cpu) is too high and cause calls to the webserver to go wrong. Now, instead of all these sessions being normalized concurrently, I would like to add them to a queue using MSMQ.
Question:
How do you process the Queue?
Do you start a separate thread in the Application_Start of global.asax that listens to the queue? If there are messages, they are dealt one at the time.
Is it necessary to do this in a separate thread? What if in the global.asax you just call a singleton for instance that starts listening to the queue? In what thread will this singleton run? (what's the thread that calls global.asax)
What are best practices to implement this? Links? Resources? Tutorials? Examples?
I don't like the idea, but could you put an exe on the root of your website, an exe that starts a process listening to the queue...
If you get a message out of the queue, do you remove it when you pull it out or do you remove it if denormalization for this session was successful? If you remove it when you pull it out and something goes wrong...
I could also create my own queue in memory, but restarting the webserver would empty the queue and a lot of sessions would end up not being normalized, so I guess this is really a bad idea.
Is MSMQ a good choice or are there better alternatives?
You could consider using a WCF-Service with MSMQ transport. I used this approach in an application that calculates commissions:
User completes asp.net wizard configuring calculation parameters
Calculation Job is sent to WCF-Service using MSMQ transport
Service transaction is completed as soon as Job entered MSMQ
New transaction scope is created for processing Job instances
One drawback is that the transaction will require MSDTC which will add some overhead when targeting MS SQL Server and even more when dealing with Oracle.
IDesign provides a lot of useful samples and best practices on WCF queueing.
Personally, I use a servicebus for scenario's like that. I know this sounds like an overkill, but I think the .net servicebusses are so good that they require the least amount of code written by you, because it's not easy to create a good scheduler for background processes without disturbing the threads of the application pool the webapp is running in. NServicebus and MassTransit are both good an well enough documented servicebuses for your scenario. With a servicebus, you have a framework that writes to msmq and listens to msmq in several apps connected by the messagequeue. The bus makes it easy for you to create a separate app that runs as a background service and is connected with your web-app by the message queue. When you use topself (included in nservicebus and masstransit), an installer/uninstaller for the seperate apps is automatically generated by the service bus.
Question: Why don't you like the idea of having a separate exe?
How do you process the Queue?
Do you start a separate thread in the Application_Start of global.asax
that listens to the queue? If there are messages, they are dealt one at
the time.
Is it necessary to do this in a separate thread? What if in the
global.asax you just call a singleton for instance that starts listening to
the queue? In what thread will this singleton run? (what's the thread that
calls global.asax)
[skip]
I don't like the idea, but could you put an exe on the root of your website, an exe that > starts a process listening to the queue...
Normally another program processes the queue - not ASP.NET. Either a windows service or an executable that you run under a scheduler (and there's no reason to put it in the root of your website).
If you get a message out of the queue, do you remove it when you pull
it out or do you remove it if denormalization for this session was
successful? If you remove it when you pull it out and something goes
wrong...
For critical work, you perform a transactional read. Items aren't removed from the queue until you commit your read operation, but while the transaction is open, no other process can get the item.
What are best practices to implement this? Links? Resources? Tutorials? Examples?
This tutorial is a good introduction and John Breakwell's blog is excellent and offers a lot of good links (including the ones in his easy-to-find sidebar "MSMQ Documentation").
Recently, the book on threading for Winforms application (Concurrent programming on Windows by Joe Duffy) was released. This book, focused on winforms, is 1000 pages.
What gotchas are there in ASP.NET threading? I'm sure there are plenty of gotchas to be aware of when implementing threading in ASP.NET. What should I be aware of?
Thanks
Since each http request received by IIS is processed separately, on it's own thread anyway, the only issues you should have is if you kick off some long running process from within the scope of a single http request. In that case, I would put such code into a separate referenced dependant assembly, coded like a middle-tier component, with no dependance or coupling to the ASP.Net model at all, and handle whatever concurrency issues arose within that assembly separately, without worrying about the ASP.Net model at all...
Jeff Richter over at Wintellect has a library called PowerThreading. It is very useful if you are developing applications on .NET. => Power Threading Library
Check for his presentations online at various events.
Usually you are encouraged to use the thread pool in .Net because it of the many benefits of having things managed on your behalf.....but NOT in ASP.net.
Since ASP.net is already multi-threaded, it uses the thread pool to serve requests that are mapped to the ASP.net ISAPI filter, and since the thread pool is fixed in size, by using it you are basically taking threads away that are set aside to do the job of handling request.
In small, low-traffic websites, this is not an issue, but in larger, high-traffic websites you end up competing for and consuming threads that the ASP.net process relies on.
If you want to use threading, it is fine to do something like....
Thread thread = new Thread(threadStarter);
thread.IsBackground = true;
thread.Start();
but with a warning: be sure that the IsBackground is set to true because if it isn't the thread exists in the foreground and will likely prevent the IIS worker process from recycling or restarting.
First, are you talking about asynchronous ASP.NET? Or using the ThreadPool/spinning up your own threads?
If you aren't talking about asynchronous ASP.NET, the main question to answer is: what work would you be doing in the other threads and would the work be specific to a request/response cycle, or is it more about processing global tasks in the background?
EDIT
If you need to handle concurrent operations (a better term than multi-threaded IMO) for a given request/response cycle, then use the asynchronous features of ASP.NET. These provide an abstraction over IIS's support for concurrency, allowing the server to process other requests while the current request is waiting for work to complete.
For background processing of global tasks, I would not use ASP.NET at all. You should assume that IIS will recycle your AppPool at a random point in time. You also should not assume that IIS will run your AppPool on any sort of schedule. Any important background processing should be done outside of IIS, either as a scheduled task or a Windows Service. The approach I usually take is to have a Windows Service and a shared work-queue where the web-site can post work items. The queue can be a database table, a reliable message-based queue (MSMQ, etc), files on the file system, etc.
The immediate thing that comes to mind is, why would you "implement threading" in ASP.NET.
You do need to be conscious all the time that ASP.NET is multi-threaded since many requests can be processed simulatenously each in its own thread. So for example use of static fields needs to take threading into account.
However its rare that you would want to spin up a new thread in code yourself.
As far as the usual winforms issues with threading in the UI is concerned these issues are not present in ASP.NET. There is no window based message pump to worry about.
It is possible to create asynchronous pages in ASP.NET. These will perform all steps up to a certain point. These steps will include asynchronously fetching data, for instance. When all the asynchronous tasks have completed, the remainder of the page lifecycle will execute. In the meantime, a worker thread was not tied up waiting for database I/O to complete.
In this model, all extra threads are executing while the request, and the page instance, and all the controls, still exist. You have to be careful when starting your own threads, that, by the time the thread executes, it's possible that the request, page instance, and controls will have been Disposed.
Also, as usual, be certain that multiple threads will actually improve performance. Often, additional threads will make things worse.
The gotchas are pretty much the same as in any multithreaded application.
The classes involved in processing a request (Page, Controls, HttpContext.Current, ...) are specific to that request so don't need any special handling.
Similarly for any classes you instantiate as local variables or fields within these classes, and for access to Session.
But, as usual, you need to synchronize access to shared resources such as:
Static (C#) / Shared(VB.NET) references.
Singletons
External resources such as the file system
... etc...
I've seen threading bugs too often in ASP.NET apps, e.g. a singleton being used by multiple concurrent requests without synchronization, resulting in user A seeing user B's data.
Is it possible to use BackGroundWorker thread in ASP.NET 2.0 for the following scenario, so that the user at the browser's end does not have to wait for long time?
Scenario
The browser requests a page, say SendEmails.aspx
SendEmails.aspx page creates a BackgroundWorker thread, and supplies the thread with enough context to create and send emails.
The browser receives the response from the ComposeAndSendEmails.aspx, saying that emails are being sent.
Meanwhile, the background thread is engaged in a process of creating and sending emails which could take some considerable time to complete.
My main concern is about keeping the BackgroundWorker thread running, trying to send, say 50 emails while the ASP.NET workerprocess threadpool thread is long gone.
If you don't want to use the AJAX libraries, or the e-mail processing is REALLY long and would timeout a standard AJAX request, you can use an AsynchronousPostBack method that was the "old hack" in the .net 1.1 days.
Essentially what you do is have your submit button begin the e-mail processing in an asynchronous state, while the user is taken to an intermediate page. The benefit to this is that you can have your intermediate page refresh as much as needed, without worrying about hitting the standard timeouts.
When your background process is complete, it will put a little "done" flag in the database/application variable/whatever. When your intermediate page does a refresh of itself, it detects this flag and automatically redirects the user to the "done" page.
Again, AJAX makes all of this moot, but if for some reason you have a very intensive or timely process that has to be done over the web, this solution will work for you. I found a nice tutorial on it here and there are plenty more out there.
I had to use a process like this when we were working on a "web check-in" type application that was interfacing with a third party application and their import API was hideously slow.
EDIT: GAH! Curse you Guzlar and your god-like typing abilities 8^D.
You shouldn't do any threading from ASP.NET pages. Any thread that is long running is in danger of being killed when the worker process recycles. You can't predict when this will happen. Any long-running processes need to be handled by a windows service. You can kick off these processes by dropping a message in MSMQ, for example.
ThreadPool.QueueUserWorkItem(delegateThatSendsEmails)
or on System.Net.Mail.SmtpServer use the SendAsync method.
You want to put the email sending code on another thread, because then it will return the the user immediately, and will just process, no matter how long it takes.
It is possible. Once you start a new thread asynchronously from page, page request will proceed and send the page back to the user. The async thread will continue to run on the server but will no longer have access to the session.
If you have to show task progress, consider some Ajax techniques.
What you need to use for this scenario is Asynchronous Pages, a feature that was added in ASP.NET 2.0
Asynchronous pages offer a neat
solution to the problems caused by
I/O-bound requests. Page processing
begins on a thread-pool thread, but
that thread is returned to the thread
pool once an asynchronous I/O
operation begins in response to a
signal from ASP.NET. When the
operation completes, ASP.NET grabs
another thread from the thread pool
and finishes processing the request.
Scalability increases because
thread-pool threads are used more
efficiently. Threads that would
otherwise be stuck waiting for I/O to
complete can now be used to service
other requests. The direct
beneficiaries are requests that don't
perform lengthy I/O operations and can
therefore get in and out of the
pipeline quickly. Long waits to get
into the pipeline have a
disproportionately negative impact on
the performance of such requests.
http://msdn.microsoft.com/en-us/magazine/cc163725.aspx
If you want using multitheading in your ASP page, you might using simple threading model like this:
{
System.Threading.Thread _thread = new Thread(new ThreadStart(Activity_DoWork));
_thred.Start();
}
Activity_DoWork()
{
/*Do some things...
}
This method is correct working with ASP pages. The ASP page with BackgroundWorker will not start while BackgroundWorker will finish.
5 years later, but problems the same… If you want to perform fire-and-forget operations from your application and forget about all difficulties related to background job processing in ASP.NET applications, you can use http://hangfire.io.
It does not loose your jobs on recycling process, because it uses persistent storage to keep information about background jobs.
It automatically retries your background jobs that were aborted or failed due to transient exception (SMTP Server connectivity errors).
It allows you to easily debug background jobs through the integrated web interface.
It is very easy to install/configure/use HangFire.
There is also tutorial Sending Mail in Background with ASP.NET MVC for using HangFire with Postal.