SqlClient connection pool maxed out when using async - asynchronous

I have a busy ASP.NET 5 Core app (thousands of requests per second) that uses SQL Server. Recently we decided to try to switch some hot code paths to async database access and... the app didn't even start. I get this error:
The timeout period elapsed prior to obtaining a connection from the
pool. This may have occurred because all pooled connections were in
use and max pool size was reached.
And I see the number of threads in the thread pool growing to 40... 50... 100...
The code pattern we use is fairly simple:
using (var cn = new SqlConnection(connenctionStrng))
{
cn.Open();
var data = await cn.QueryAsync("SELECT x FROM Stuff WHERE id=#id"); //QueryAsync is from Dapper
}
I made a process dump and all threads are stuck on the cn.Open() line, just sitting there and waiting.
This mostly happens during application "recycles" on IIS, when the app process is restarted and HTTP requests are queued from one process to another. Resulting in tens of thousands requests in the queue, that need to be processed.
Well, yeah,, I get it. I think I know what's happening. async makes the app scale more. And while the database is busy responding to my query, the control is returned to other threads. Which try to open more, and more, and more connections in parallel. The connection pool maxes out. But why the closed connections are not returned to the pool immediately after the work is finished?
Switching from async to "traditional" code fixes the problem immediately.
What are my options?
Increasing max pool size from the default 100? Tried 200, didn't help. Should I try, like, 10000?
Using OpenAsync instead of Open? Didn't help.
I thought I'm running into this problem https://github.com/dotnet/SqlClient/issues/18 but nope, I'm on a newer version of SqlClient and it's said to be fixed. Supposedly.
Not use async with database access at all? Huh...
Do we really have to come up with our own throttling mechanisms when using async like this answer suggests? I'm surprised there's no built-in workaround...
P.S. Taking a closer look at the process dump - I checked the Tasks report and discovered literally tens of thousands blocked tasks in the waiting state. And there's exactly 200 db-querying tasks (which is the size of connection pool) waiting for queries to finish.

Well, after a bit of digging, investigating source codes and tons of reading, it appears that async is not always a good idea for DB calls.
As Stephen Cleary (the god of async who wrote many books about it) has nailed it - and it really clicked with me:
If your backend is a single SQL server database, and every single request
hits that database, then there isn't a benefit from making
your web service asynchronous.
So, yes, async helps you free up some threads, but the first thing these threads do is rush back to the database.
Also this:
The old-style common scenario was client <-> API <-> DB, and in that
architecture there's no need for asynchronous DB access
However if your database is a cluster or a cloud or some other "autoscaling" thing - than yes, async database access makes a lot of sense
Here's also an old archive.org article by RIck Anderson that I found useful: https://web.archive.org/web/20140212064150/http://blogs.msdn.com/b/rickandy/archive/2009/11/14/should-my-database-calls-be-asynchronous.aspx

Related

ADO: Async all the way down the tubes?

Okay, so "async all the way down" is the mandate. But when is it problematic?
For example, if you have limited access to a resource, as in a DbConnection or a file, when do you stop using async methods in favor of synchronous?
Let's review the complexity of an asynchronous database call:
(Not putting .ConfigureAwait(false) for readability.)
// Step 1: Ok, no big deal, our connection is closed, let's open it and wait.
await connection.OpenAsync();
// Connection is open! Let's do some work.
// Step 2: Acquire a reader.
using(var reader = await command.ExecuteReaderAsync())
{
// Step 3: Start reading results.
while(await reader.ReadAsync())
{
// get the data.
}
}
Steps:
Should be reasonably innocuous and nothing to worry about.
But now we've acquired an open connection in a potentially limited connection pool. What if when waiting for step 2, other long running tasks are at the head of the line in the task scheduler?
Even worse now, we await with an open connection (and most likely added latency).
Aren't we holding open a connection longer than necessary? Isn't this an undesirable result? Wouldn't it be better to use synchronous methods to lessen the overall connection time, ultimately resulting in our data driven application performing better?
Of course I understand that async doesn't mean faster but async methods provide the opportunity for more total throughput. But as I've observed, there can definitely be weirdness when there are tasks scheduled in-between awaits that ultimately delay the operation, and essentially behave like blocking because of the limitations of the underlying resource.
[Note: this question is focused on ADO, but this also applies to file reads and writes.]
Hoping for some deeper insight. Thank you.
There are a few things to consider here:
Database connection pool limits, specifically the "Max Pool Size" which defaults to 100. The database connection pool has upper limit of the maximum number of connections. Besure to set "Max Pool Size=X" where X is the maximum number of database connections you want to have. This applies to either sync or async.
The thread pool settings. The thread pool will not add threads quickly if you load spikes. It will only add a new thread every 500ms or so. See MSDN Threading Guidelines from 2004 and The CLR Thread Pool 'Thread Injection' Algorithm. Here is a capture of the number of busy threads on one of my projects. The load spiked and requests were delayed due to lack of available threads to service the requests. The line increases as new threads were being added. Remember every thread required 1MB of memory for its stack. 1000 threads ~= 1GB of RAM just for threads.
The load characteristics of your project, relates to the thread pool.
The type of system you are providing, I will assume you are talking about a ASP.NET type app/api
The throughput (requests/sec) vs latency (sec/request) requirements. Async will add to latency but increase throughput.
The database/query performance, relates to the 50ms recommendation below
The article The overhead of async/await in NET 4.5 Edit 2018-04-16 the recommendation below applied to WinRT UI based applications.
Avoid using async/await for very short methods or having await
statements in tight loops (run the whole loop asynchronously instead).
Microsoft recommends that any method that might take longer than 50ms
to return should run asynchronously, so you may wish to use this
figure to determine whether it’s worth using the async/await pattern.
Also take a watch Diagnosing issues in ASP.NET Core Applications - David Fowler & Damian Edwards that talks about issues with thread pool and using async, sync, etc.
Hopefully this helps
if you have limited access to a resource, as in a DbConnection or a file, when do you stop using async methods in favor of synchronous?
You shouldn't need to switch to synchronous at all. Generally speaking, async only works if it's used all the way. Async-over-sync is an antipattern.
Consider the asynchronous code:
using (connection)
{
await connection.OpenAsync();
using(var reader = await command.ExecuteReaderAsync())
{
while(await reader.ReadAsync())
{
}
}
}
In this code, the connection is held open while the command is executed and the data is read. Anytime that the code is waiting on the database to respond, the calling thread is freed up to do other work.
Now consider the synchronous equivalent:
using (connection)
{
connection.Open();
using(var reader = command.ExecuteReader())
{
while(reader.Read())
{
}
}
}
In this code, the connection is held open while the command is executed and the data is read. Anytime that the code is waiting on the database to respond, the calling thread is blocked.
With both of these code blocks, the connection is held open while the command is executed and the data is read. The only difference is that with the async code, the calling thread is freed up to do other work.
What if when waiting for step 2, other long running tasks are at the head of the line in the task scheduler?
The time to deal with thread pool exhaustion is when you run into it. In the vast majority of scenarios, it isn't a problem and the default heuristics work fine.
This is particularly true if you use async everywhere and don't mix in blocking code.
For example, this code would be more problematic:
using (connection)
{
await connection.OpenAsync();
using(var reader = command.ExecuteReader())
{
while(reader.Read())
{
}
}
}
Now you have asynchronous code that, when it resumes, blocks a thread pool thread on I/O. Do that a lot, and you can end up in a thread pool exhaustion scenario.
Even worse now, we await with an open connection (and most likely added latency).
The added latency is miniscule. Like sub-millisecond (assuming no thread pool exhaustion). It's immeasurably small compared to random network fluctuations.
Aren't we holding open a connection longer than necessary? Isn't this an undesirable result? Wouldn't it be better to use synchronous methods to lessen the overall connection time, ultimately resulting in our data driven application performing better?
As noted above, synchronous code would hold the connection open just as long. (Well, OK, a sub-millisecond amount less, but that Doesn't Matter).
But as I've observed, there can definitely be weirdness when there are tasks scheduled in-between awaits that ultimately delay the operation, and essentially behave like blocking because of the limitations of the underlying resource.
It would be worrying if you observed this on the thread pool. That would mean you're already at thread pool exhaustion, and you should carefully review your code and remove blocking calls.
It's less worrying if you observed this on a single-thread scheduler (e.g., UI thread or ASP.NET Classic request context). In that case, you're not at thread pool exhaustion (though you still need to carefully review your code and remove blocking calls).
As a concluding note, it sounds as though you're trying to add async the hard way. It's harder to start at a higher level and work your way to a lower level. It's much easier to start at the lower level and work your way up. E.g., start with any I/O-bound APIs like DbConnection.Open / ExecuteReader / Read, and make those asynchronous first, and then let async grow up through your codebase.
Due to the way database connection pooling works at lower levels of protocol, the high level open / close commands don't have a lot of effect on performance. Generally though the internal thread scheduling IO is usually not a bottleneck unless you have some really long running tasks - we're talking something CPU intensive or worse - blocking inside. This will quickly exhaust your thread pool and things will start queuing up.
I would also suggest you investigate http://steeltoe.io, particularly the circuit breaker hystrix implementation. The way it works is it allows you to group your code into commands, and have command execution managed by command groups, which are essentially dedicated and segregated thread pools. The advantage is if you have a noisy, long running command, it can only exhaust it's own's command group thread pool without affecting the rest of the app. There are many other advantages of this portion of the library, primary being circuit breaker implementation, and one of my personal favorite collapsers. Imagine multiple incoming calls for a query GetObjectById being grouped into a single select * where id in(1,2,3) query and then results mapped back on the separate inbound requests. Db call is just an example, can be anything really.
Significant amounts of iteration introduce significant added latency and extra CPU usage
See http://telegra.ph/SqlDataReader-ReadAsync-vs-Read-04-18 for details.
As suspected:
Using async does not come without cost and requires consideration.
Certain types of operations lend themselves well to async, and others are problematic (for what should be obvious reasons).
High volume synchronous/blocking code has it's downsides, but for the most part is well managed by modern threading:
Testing / Profiling
4 x 100 paralleled queries, 1000 records each query.
Performance Profile for Synchronous Query
Average Query: 00:00:00.6731697, Total Time: 00:00:25.1435656
Performance Profile for Async Setup with Synchronous Read
Average Query: 00:00:01.4122918, Total Time: 00:00:30.2188467
Performance Profile for Fully Async Query
Average Query: 00:00:02.6879162, Total Time: 00:00:32.6702872
Assessment
The above results were run on SQL Server 2008 R2 using a .NET Core 2 console application. I invite anyone who has access to a modern instance of SQL Server to replicate these tests to see if there is a reversal in trend. If you find my testing method flawed, please comment so I correct and retest.
As you can easily see in the results. The more asynchronous operations we introduce, the longer the the queries take, and the longer the total time to complete. Even worse, fully asynchronous uses more CPU overhead which is counter productive to the idea that using async tasks would provide more available thread time. This overhead could be due to how I'm running these tests, but it's important to treat each test in a similar way to compare. Again, if anyone has a way to prove that async is better, please do.
I'm proposing here that "async all the way" has it's limitations and should be seriously scrutinized at certain iterative levels (like file, or data access).

ASP.NET and multithreading best practices

I am working on ASP.NET project and yesterday I saw a piece of code that uses System.Threading.Thread to offload some tasks to a new thread. The thread runs a few SQL statements and logs the result.
Isn't it better to use another approach? For example to have a Windows Service that performs the SQL batch. Then the web page will just enqueue the batch (via WCF).
In general, what are the best practices for multithreading in ASP.NET? Are there justified usages of threads/TPL tasks/etc. in a web page?
My thought when using multi-threading in ASP.NET:
ASP.NET recycles AppDomain for some reasons like you change web.config or in the period of time to avoid memory leak. The thing is you don't know which exact time of recycle. Long running thread is not suitable because when ASP.NET recycles it will take your thread down accordingly. The right approach of this case is long running task should be running on background process via Queue, like you mention.
For short running and fire and forget task, TPL or async/await are the most appropriate because it does not block thread in thread pool to utilize for HTTP requests.
In my opinion this should be solved by raising some kind of flag in the database and a Windows service that periodically checks the flag and starts the job. If the job is too frequent a dedicated queue solution should be used (MSMQ, RabbitMQ, etc.) to avoid overloading the database or the table growing too fast. I don't think communicating directly with the Windows service via WCF or anything else is a good idea because this may result in dropped messages.
That being said sometimes a project needs to run in a shared hosting and cannot setup a dedicated Windows service. In this case a thread is acceptable as a work around that should be removed as soon as the project grows enough to have its own server.
I believe all other threading in ASP.NET is a sign of a problem except for using Tasks to represent async operations or in the extremely rare case when you want to perform a computation in parallel in a web project but your project has very few concurrent users (less concurrent users than the number of cores)
Why Tasks are useful in ASP.NET?
First reason to use Tasks for async operations is that as of .NET 4.5 async APIs return Tasks :)
Async operations (not to be confused with parallel computations) may be web service calls, database calls, etc. They may be useful for two things:
Fire several of them at once and your job will take a time equal to the longest operation. If you fire them in sequential (non-async) fashion they will take time equal to the sum of the times of each operation which is obviously more.
They can improve scalability by releasing the thread executing the page - Node.js style. ASP.NET supports this since forever but in version 4.5 it is really easy to use. I'll go as far as claiming that it is easier than Node.js because of async/await. Releasing the thread is important because you may deplete your threads in the pool by having them wait. The result is that your website becomes slow when there are a certain number of users despite the fact that the CPU usage is like 30% simply because new requests are waiting in queue. If you increase the number of threads in the thread pool you pay the price of constant context switching than by the OS. At certain point you will get 100% CPU usage but 40% of it will be spent context switching. You will increase the throughput but with diminishing returns. A lot of threads also increase the memory footprint.

Considerations for ASP.NET application with long running synchronous requests

Under windows server 2008 64bit, IIS 7.0 and .NET 4.0 if an ASP.NET application (using ASP.NET thread pool, synchronous request processing) is long running (> 30 minutes). Web application has no page and main purpose is reading huge files ( > 1 GB) in chunks (~5 MB) and transfer them to the clients. Code:
while (reading)
{
Response.OutputStream.Write(buffer, 0, buffer.Length);
Response.Flush();
}
Single producer - single consumer pattern implemented so for each request there are two threads. I don't use task library here but please let me know if it has advantage over traditional thread creation in this scenario. HTTP Handler (.ashx) is used instead of a (.aspx) page. Under stress test CPU utilization is not a problem but with a single worker process, after 210 concurrent clients, new connections encounter time-out. This is solved by web gardening since I don't use session state. I'm not sure if there's any big issue I've missed but please let me know what other considerations should be taken in your opinion ?
for example maybe IIS closes long running TCP connections due to a "connection timeout" since normal ASP.NET pages are processed in less than 5 minutes, so I should increase the value.
I appreciate your Ideas.
Personally, I would be looking at a different mechanism for this type of processing. HTTP Requests/Web Applications are NOT designed for this type of thing, and stability is going to be VERY hard, you have a number of risks that could cause you major issues as you are working with this type of model.
I would move that processing off to a backend process, so that you are OUTSIDE of the asp.net runtime, that way you have more control over start/shutdown, etc.
First, Never. NEVER. NEVER! do any processing that takes more than a few seconds in a thread pool thread. There are a limited number of them, and they're used by the system for many things. This is asking for trouble.
Second, while the handler is a good idea, you're a little vague on what you mean by "generate on the fly" Do you mean you are encrypting a file on the fly and this encryption can take 30 minutes? Or do you mean you're pulling data from a database and assembling a file? Or that the download takes 30 minutes to download?
Edit:
As I said, don't use a thread pool for anything long running. Create your own thread, or if you're using .NET 4 use a Task and specify it as long running.
Long running processes should not be implemented this way. Pass this off to a service that you set up.
IF you do want to have a page hang for a client, consider interfacing from AJAX to something that does not block on IO threads - like node.js.
Push notifications to many clients is not something ASP.NET can handle due to thread usage, hence my node.js. If your load is low, you have other options.
Use Web-Gardening for more stability of your application.
Turn-off caching since you don't have aspx pages
It's hard to advise more without performance analysis. You the VS built-in and find the bottlenecks.
The Web 1.0 way of dealing with long running processes is to spawn them off on the server and return immediately. Have the spawned off service update a database with progress and pages on the site can query for progress.
The most common usage of this technique is getting a package delivery. You can't hold the HTTP connection open until my package shows up, so it just gives you a way to query for progress. The background process deals with orchestrating all of the steps it takes for getting the item, wrapping it up, getting it onto a UPS truck, etc. All along the way, each step is recorded in the database. Conceptually, it's the same.
Edit based on Question Edit: Just return a result page immediately, and generate the binary on the server in a spawned thread or process. Use Ajax to check to see if the file is ready and when it is, provide a link to it.

Long-running ASP.NET tasks

I know there's a bunch of APIs out there that do this, but I also know that the hosting environment (being ASP.NET) puts restrictions on what you can reliably do in a separate thread.
I could be completely wrong, so please correct me if I am, this is however what I think I know.
A request typically timeouts after 120 seconds (this is configurable) but eventually the ASP.NET runtime will kill a request that's taking too long to complete.
The hosting environment, typically IIS, employs process recycling and can at any point decide to recycle your app. When this happens all threads are aborted and the app restarts. I'm however not sure how aggressive it is, it would be kind of stupid to assume that it would abort a normal ongoing HTTP request but I would expect it to abort a thread because it doesn't know anything about the unit of work of a thread.
If you had to create a programming model that easily and reliably and theoretically put a long running task, that would have to run for days, how would you accomplish this from within an ASP.NET application?
The following are my thoughts on the issue:
I've been thinking a long the line of hosting a WCF service in a win32 service. And talk to the service through WCF. This is however not very practical, because the only reason I would choose to do so, is to send tasks (units of work) from several different web apps. I'd then eventually ask the service for status updates and act accordingly. My biggest concern with this is that it would NOT be a particular great experience if I had to deploy every task to the service for it to be able to execute some instructions. There's also this issue of input, how would I feed this service with data if I had a large data set and needed to chew through it?
What I typically do right now is this
SELECT TOP 10 *
FROM WorkItem WITH (ROWLOCK, UPDLOCK, READPAST)
WHERE WorkCompleted IS NULL
It allows me to use a SQL Server database as a work queue and periodically poll the database with this query for work. If the work item completed with success, I mark it as done and proceed until there's nothing more to do. What I don't like is that I could theoretically be interrupted at any point and if I'm in-between success and marking it as done, I could end up processing the same work item twice. I might be a bit paranoid and this might be all fine but as I understand it there's no guarantee that that won't happen...
I know there's been similar questions on SO before but non really answers with a definitive answer. This is a really common thing, yet the ASP.NET hosting environment is ill equipped to handle long-running work.
Please share your thoughts.
Have a look at NServiceBus
NServiceBus is an open source
communications framework for .NET with
build in support for publish/subscribe
and long-running processes.
It is a technology build upon MSMQ, which means that your messages don't get lost since they are persisted to disk. Nevertheless the Framework has an impressive performance and an intuitive API.
John,
I agree that ASP.NET is not suitable for Async tasks as you have described them, nor should it be. It is designed as a web hosting platform, not a back of house processor.
We have had similar situations in the past and we have used a solution similar to what you have described. In summary, keep your WCF service under ASP.NET, use a "Queue" table with a Windows Service as the "QueueProcessor". The client should poll to see if work is done (or use messaging to notify the client).
We used a table that contained the process and it's information (eg InvoicingRun). On that table was a status (Pending, Running, Completed, Failed). The client would submit a new InvoicingRun with a status of Pending. A Windows service (the processor) would poll the database to get any runs that in the pending stage (you could also use SQL Notification so you don't need to poll. If a pending run was found, it would move it to running, do the processing and then move it to completed/failed.
In the case where the process failed fatally (eg DB down, process killed), the run would be left in a running state, and human intervention was required. If the process failed in an non-fatal state (exception, error), the process would be moved to failed, and you can choose to retry or have human intervantion.
If there were multiple processors, the first one to move it to a running state got that job. You can use this method to prevent the job being run twice. Alternate is to do the select then update to running under a transaction. Make sure either of these outside a transaction larger transaction. Sample (rough) SQL:
UPDATE InvoicingRun
SET Status = 2 -- Running
WHERE ID = 1
AND Status = 1 -- Pending
IF ##RowCount = 0
SELECT Cast(0 as bit)
ELSE
SELECT Cast(1 as bit)
Rob
Use a simple background tasks / jobs framework like Hangfire and apply these best practice principals to the design of the rest of your solution:
Keep all actions as small as possible; to achieve this, you should-
Divide long running jobs into batches and queue them (in a Hangfire queue or on a bus of another sort)
Make sure your small jobs (batched parts of long jobs) are idempotent (have all the context they need to run in any order). This way you don't have to use a quete which maintains a sequence; because then you can
Parallelise the execution of jobs in your queue depending on how many nodes you have in your web server farm. You can even control how much load this subjects your farm to (as a trade off to servicing web requests). This ensures that you complete the whole job (all batches) as fast and as efficiently as possible, while not compromising your cluster from servicing web clients.
Have thought about the use the Workflow Foundation instead of your custom implementation? It also allows you to persist states. Tasks could be defined as workflows in this case.
Just some thoughts...
Michael

Can I use threads to carry out long-running jobs on IIS?

In an ASP.Net application, the user clicks a button on the webpage and this then instantiates an object on the server through the event handler and calls a method on the object.
The method goes off to an external system to do stuff and this could take a while. So, what I would like to do is run that method call in another thread so I can return control to the user with "Your request has been submitted".
I am reasonably happy to do this as fire-and-forget, though it would be even nicer if the user could keep polling the object for status.
What I don't know is if IIS allows my thread to keep running, even if the user session expires.
Imagine, the user fires the event and we instantiate the object on the server and fire the method in a new thread. The user is happy with the "Your request has been submitted" message and closes his browser. Eventually, this users session will time out on IIS, but the thread may still be running, doing work. Will IIS allow the thread to keep running or will it kill it and dispose of the object once the user session expires?
EDIT: From the answers and comments, I understand that the best way to do this is to move the long-running processing outside of IIS. Apart from everything else, this deals with the appdomain recycling problem. In practice, I need to get version 1 off the ground in limited time and has to work inside an existing framework, so would like to avoid the service layer, hence the desire to just fire off the thread inside IIS. In practice, "long running" here will only be a few minutes and the concurrency on the website will be low so it should be okay. But, next version definitely will need splitting into a separate service layer.
You can accomplish what you want, but it is typically a bad idea. Several ASP.NET blog and CMS engines take this approach, because they want to be installable on a shared hosting system and not take a dependency on a windows service that needs to be installed. Typically they kick off a long running thread in Global.asax when the app starts, and have that thread process queued up tasks.
In addition to reducing resources available to IIS/ASP.NET to process requests, you also have issues with the thread being killed when the AppDomain is recycled, and then you have to deal with persistence of the task while it is in-flight, as well as starting the work back up when the AppDomain comes back up.
Keep in mind that in many cases the AppDomain is recycled automatically at a default interval, as well as if you update the web.config, etc.
If you can handle the persistence and transactional aspects of your thread being killed at any time, then you can get around the AppDomain recycling by having some external process that makes a request on your site at some interval - so that if the site is recycled you are guaranteed to have it start back up again automatically within X minutes.
Again, this is typically a bad idea.
EDIT: Here are some examples of this technique in action:
Community Server: Using Windows Services vs. Background Thread to Run Code at Scheduled Intervals
Creating a Background Thread When Website First Starts
EDIT (from the far distant future) - These days I would use Hangfire.
I disagree with the accepted answer.
Using a background thread (or a task, started with Task.Factory.StartNew) is fine in ASP.NET. As with all hosting environments, you may want to understand and cooperate with the facilities governing shutdown.
In ASP.NET, you can register work needing to stop gracefully on shutdown using the HostingEnvironment.RegisterObject method. See this article and the comments for a discussion.
(As Gerard points out in his comment, there's now also HostingEnvironment.QueueBackgroundWorkItem that calls down to RegisterObject to register a scheduler for the background item to work on. Overall the new method is nicer since it's task-based.)
As for the general theme that you often hear of it being a bad idea, consider the alternative of deploying a windows service (or another kind of extra-process application):
No more trivial deployment with web deploy
Not deployable purely on Azure Websites
Depending on the nature of the background task, the processes will likely have to communicate. That means either some form of IPC or the service will have to access a common database.
Note also that some advanced scenarios might even need the background thread to be running in the same address space as the requests. I see the fact that ASP.NET can do this as a great advantage that has become possible through .NET.
You wouldn't want to use a thread from the IIS thread pool for this task because it would leave that thread unable to process future requests. You could look into Asynchronous Pages in ASP.NET 2.0, but that really wouldn't be the right answer, either. Instead, what it sounds like you would benefit from is looking into Microsoft Message Queuing. Essentially, you would add the task details to the queue and another background process (possibly a Windows Service) would be in charge of carrying out that task. But the bottom line is that the background process is completely isolated from IIS.
I would suggest to use HangFire for such requirements. Its a nice fire and forget engine runs in background, supports different architecture, reliable because it is backed by persistence storage.
There is a good thread and sample code here: http://forums.asp.net/t/1534903.aspx?PageIndex=2
I've even toyed with the idea of calling a keep alive page on my website from the thread to help keep the app pool alive. Keep in mind if you are using this method that you need really good recovery handling, because the application could recycle at any time. As many have mentioned this is not the right approach if you have access to other service options, but for shared hosting this may be one of your only options.
To help keep the app pool alive, you could make a request to your own site while the thread is processing. This may help keep the app pool alive if your process runs a long time.
string tempStr = GetUrlPageSource("http://www.mysite.com/keepalive.aspx");
public static string GetUrlPageSource(string url)
{
string returnString = "";
try
{
Uri uri = new Uri(url);
if (uri.Scheme == Uri.UriSchemeHttp)
{
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(uri);
CookieContainer cookieJar = new CookieContainer();
req.CookieContainer = cookieJar;
//set the request timeout to 60 seconds
req.Timeout = 60000;
req.UserAgent = "MyAgent";
//we do not want to request a persistent connection
req.KeepAlive = false;
HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
Stream stream = resp.GetResponseStream();
StreamReader sr = new StreamReader(stream);
returnString = sr.ReadToEnd();
sr.Close();
stream.Close();
resp.Close();
}
}
catch
{
returnString = "";
}
return returnString;
}
We started down this path, and it actually worked ok when our app was on one server. When we wanted to scale out to multiple machines (or use multiple w3wp in a web garen) we had to re-evaluate and look at how to manage a work queue, error handling, retries and the tricky problem of correctly locking to ensure only one server picks up the next item.
... we realized we are not in the business of writing background processing engines so we looked for existing solutions and we landed up using the awesome OSS project hangfire
Sergey Odinokov has created a real gem which is really easy to get started with, and allows you to swap out the backend of how work is persisted and queued. Hangfire uses background threads, but persists the jobs, handles retries and gives you visibility into the work queue. So hangfire jobs are robust and survive all the vagaries of appdomains being recycled etc.
Its basic setup uses sql server as the storage but you can swap out for Redis or MSMQ when its time to scale up. It also has an excellent UI for visualizing all the jobs and their status plus allowing you to re-queue jobs.
My point is that while its entirely possible to do what you want in an background thread, there is a lot of work to make it scalable and robust. Its fine for simple workloads but when things get more complex I much prefer to use a purpose built library rather than go through this effort.
For some more perspective on options available check out Scott Hanselman's blog which covers off a few options for handling background jobs in asp.net. (He gave hangfire a glowing review)
Also as referenced by John its worth reading up Phil Haack's blog on why the approach is problematic, and how to gracefully stop work on the thread when appdomain is unloaded.
Can you create a windows service to do that task? Then use .NET remoting from the Web Server to call the Windows Service to do the action? If that is the case that is what I would do.
This would eliminate the need to relay on IIS, and limit some of its processing power.
If not then I would force the user to sit there while the process is done. That way you ensure it is completed and not killed by IIS.
There does seem to be one supported way of hosting long-running work in IIS. Workflow Services seem designed for this, especially in conjunction with Windows Server AppFabric. The design allows for application pool recycling by supporting automatic persistence and resumption of the long-running work.
You may run tasks in the background and they will complete even after the request ends. Don't let an uncaught exception be thrown. Normally you want to always throw your exceptions. If an exception is thrown in a new thread then it will crash the IIS worker process - w3wp.exe, because you are no longer in the request's context. That's also then going to kill any other background tasks you have running in addition to in-process memory backed sessions if you are using them. This would be hard to diagnose, which is why the practice is discouraged.
Just create a surrogate process to run the async tasks; it doesn't have to be a windows service (although that is the more optimal approach in most cases. MSMQ is way over kill.

Resources