We have a long running data transfer process that is just an asp.net page that is called and run. It can take up to a couple hours to complete. It seems to work all right but I was just wondering what are some of the more popular ways to handle a long process like this. Do you create an application and run it through windows scheduler, or a web service or custom handler?
In a project for long running tasks in web-application, i made a windows service.
whenever the user has to do the time-consuming task, the IIS would give the task to the service which will return a token(a temporary name for the task) and in the background the service would do the task. At anytime, the user would see the status of his/her task which would be either pending in queue, processing, or completed. The service would do a fixed number of jobs in parallel, and would keep a queue for the next-incoming tasks.
A windows service is the typical solution. You do not want to use a web service or a custom handler as both of those will lie prey to the app pool recycling, which will kill your process.
Windows Workflow Foundation
What I find the most appealing about WF, is that workflows can be designed without much complexity to be persisted in SQL Server, so that if the server reboots in the middle of a process, the workflow can resume.
I use two types of processes depending on the needs of my BAs. For transfer processes that are run on demand and can be scheduled regularly, I typically write a WinForms (this is a personal preference) application that accepts command line parameters so I can schedule the job with params or run it on demand through an interactive window. I've written enough of them over the last few years that I have my own basic generic shell that I use to create new applications of this nature. For processes that must detect events (files appearing in folders, receiving CyberMation calls, or detecting SNMP traps), I prefer to use Windows Services so that they are always available. It's a little trickier simply because you have to be much more cautious of memory usage, leaks, recycling, security, etc. For me, the windows application tends to run faster on long duration jobs than they do when through an IIS process. I don't know if this is because it's attached to an IIS thread or if its memory/security is more limited. I've never investigated it.
I do know that .Net applications provide a lot of flexibility and management over resources, and with some standards and practice, they can be banged out fairly quickly and produce very positive results.
Related
I am working on ASP.NET project and yesterday I saw a piece of code that uses System.Threading.Thread to offload some tasks to a new thread. The thread runs a few SQL statements and logs the result.
Isn't it better to use another approach? For example to have a Windows Service that performs the SQL batch. Then the web page will just enqueue the batch (via WCF).
In general, what are the best practices for multithreading in ASP.NET? Are there justified usages of threads/TPL tasks/etc. in a web page?
My thought when using multi-threading in ASP.NET:
ASP.NET recycles AppDomain for some reasons like you change web.config or in the period of time to avoid memory leak. The thing is you don't know which exact time of recycle. Long running thread is not suitable because when ASP.NET recycles it will take your thread down accordingly. The right approach of this case is long running task should be running on background process via Queue, like you mention.
For short running and fire and forget task, TPL or async/await are the most appropriate because it does not block thread in thread pool to utilize for HTTP requests.
In my opinion this should be solved by raising some kind of flag in the database and a Windows service that periodically checks the flag and starts the job. If the job is too frequent a dedicated queue solution should be used (MSMQ, RabbitMQ, etc.) to avoid overloading the database or the table growing too fast. I don't think communicating directly with the Windows service via WCF or anything else is a good idea because this may result in dropped messages.
That being said sometimes a project needs to run in a shared hosting and cannot setup a dedicated Windows service. In this case a thread is acceptable as a work around that should be removed as soon as the project grows enough to have its own server.
I believe all other threading in ASP.NET is a sign of a problem except for using Tasks to represent async operations or in the extremely rare case when you want to perform a computation in parallel in a web project but your project has very few concurrent users (less concurrent users than the number of cores)
Why Tasks are useful in ASP.NET?
First reason to use Tasks for async operations is that as of .NET 4.5 async APIs return Tasks :)
Async operations (not to be confused with parallel computations) may be web service calls, database calls, etc. They may be useful for two things:
Fire several of them at once and your job will take a time equal to the longest operation. If you fire them in sequential (non-async) fashion they will take time equal to the sum of the times of each operation which is obviously more.
They can improve scalability by releasing the thread executing the page - Node.js style. ASP.NET supports this since forever but in version 4.5 it is really easy to use. I'll go as far as claiming that it is easier than Node.js because of async/await. Releasing the thread is important because you may deplete your threads in the pool by having them wait. The result is that your website becomes slow when there are a certain number of users despite the fact that the CPU usage is like 30% simply because new requests are waiting in queue. If you increase the number of threads in the thread pool you pay the price of constant context switching than by the OS. At certain point you will get 100% CPU usage but 40% of it will be spent context switching. You will increase the throughput but with diminishing returns. A lot of threads also increase the memory footprint.
client wants an asp.net page that has a button to fire off a database update from an external source with hundreds of records. This process takes a long time. He also wants status update as the process runs, like "processing 10 out of 1000 records". In reading various articles, I'm thinking of putting the database update code in a windows service. I've never worked with windows services before and I can't find many tutorials on how to fire off a windows service and poll it from an asp.net page. My questions are is this the best way to handle this process? And, does anyone have any examples on how they've accomplished this?
There are a few ways to approach this.
You're right in that executing a long-running task within the Web's worker process doesn't usually end well: it ties up resources, the app pool can get recycled, etc. In most of my projects of any complexity, I usually end up with 4 pieces: the database, a DLL with my model, a "Worker" that is a Windows service, and an ASP.NET Web site.
The "Worker" is a Windows service that is always running and uses Quartz.net to execute scheduled tasks using the same model that the Web site uses. These can be all sorts of periodic tasks that seem to crop up when maintaining a Web site of any complexity: VacuumExpiredPickTicketsJob, BackupAndFtpDatabaseJob, SendBackorderReminderEmailsJob, etc.
Writing a Windows service is not difficult in C# (there is a built-in template in Visual Studio, but you pretty much inherit from ServiceBase and you're off to the races), and libraries like TopShelf make it even easier to deploy them.
What is left is triggering the update from the Web site and communicating the results back to the user. This can be as simple or as complicated as you want it to be. If this is something that has to scale up to lots of users, you might use something like MSMQ to queue up update commands to the Windows service, and the Windows service would respond to that queue. I get the impression that that is probably overkill here.
For a handful of users, you could override your service's OnCustomCommand(int command) method to be the trigger. Your Web site would then use ExecuteCommand() of the ServiceController class to get the process started. Your Web site and service would agree on the parameter value that means "do that update thing," let's say 142 (since it has to be a number between 128 and 255 for reasons of history).
As for communicating progress back to the client, it's probably easiest to just have the Web page use a timer and an AJAX call to poll for updated progress data. You can get fancy with new stuff like WebSockets (bleeding edge stuff as I write this) and long polling, but regular polling will simply work for something that doesn't need to scale.
Hope this helps!
In addition to Nicholas' thorough answer, another option is to deploy your back end processes as command line scripts, and schedule them to run through Window's built in task scheduler, which has improved quite a bit in Windows Server 2008+. Or you can use any other host of task scheduler applications.
I find the command line approach to be easier for MIS staff to understand and configure, and to migrate to new servers, versus standard Windows services.
I have a long running process in my MVC application(C#).
Its building data for many reports.
Some of the clients could take several minutes or longer to calculate. Is running the process in a separate thread the best option?
Is there another way to allow the process to run, while allowing the user to still use the rest of the site?
If threading is the best solution, any good sites or stackoverflow threads to look at on how to do this?
When I've had cases like those, I usually would build a service to asynchronously process requests, and return a handle that I could use to check on its status in a database. IMHO, splitting it off as a thread in the web application seems like you'd be trying to shove a square peg into a round hole.
I've used two methods to solve this. If the work is guaranteed to not be TOO long running, I've kicked off a thread to do the work and return immediately to the user. When we couldn't make that guarantee, we used a queue (we happened to use MSMQ) for executing long running tasks. This processing was done on a different server apart from IIS. A benefit of this is that we built in a wait and retry on failure mechanism. So besides it handling long running tasks, we also used it for anything that might fail in a way that was inconvenient to handle in our MVC app. The main example of this is sending an email. Rather than do that in the MVC app we would just toss an email task on the queue. We used the Command Pattern for the task objects placed on the queue. Once we had that mechanism in place, we stopped using the technique of spawning a thread from our MVC code.
I know there's a bunch of APIs out there that do this, but I also know that the hosting environment (being ASP.NET) puts restrictions on what you can reliably do in a separate thread.
I could be completely wrong, so please correct me if I am, this is however what I think I know.
A request typically timeouts after 120 seconds (this is configurable) but eventually the ASP.NET runtime will kill a request that's taking too long to complete.
The hosting environment, typically IIS, employs process recycling and can at any point decide to recycle your app. When this happens all threads are aborted and the app restarts. I'm however not sure how aggressive it is, it would be kind of stupid to assume that it would abort a normal ongoing HTTP request but I would expect it to abort a thread because it doesn't know anything about the unit of work of a thread.
If you had to create a programming model that easily and reliably and theoretically put a long running task, that would have to run for days, how would you accomplish this from within an ASP.NET application?
The following are my thoughts on the issue:
I've been thinking a long the line of hosting a WCF service in a win32 service. And talk to the service through WCF. This is however not very practical, because the only reason I would choose to do so, is to send tasks (units of work) from several different web apps. I'd then eventually ask the service for status updates and act accordingly. My biggest concern with this is that it would NOT be a particular great experience if I had to deploy every task to the service for it to be able to execute some instructions. There's also this issue of input, how would I feed this service with data if I had a large data set and needed to chew through it?
What I typically do right now is this
SELECT TOP 10 *
FROM WorkItem WITH (ROWLOCK, UPDLOCK, READPAST)
WHERE WorkCompleted IS NULL
It allows me to use a SQL Server database as a work queue and periodically poll the database with this query for work. If the work item completed with success, I mark it as done and proceed until there's nothing more to do. What I don't like is that I could theoretically be interrupted at any point and if I'm in-between success and marking it as done, I could end up processing the same work item twice. I might be a bit paranoid and this might be all fine but as I understand it there's no guarantee that that won't happen...
I know there's been similar questions on SO before but non really answers with a definitive answer. This is a really common thing, yet the ASP.NET hosting environment is ill equipped to handle long-running work.
Please share your thoughts.
Have a look at NServiceBus
NServiceBus is an open source
communications framework for .NET with
build in support for publish/subscribe
and long-running processes.
It is a technology build upon MSMQ, which means that your messages don't get lost since they are persisted to disk. Nevertheless the Framework has an impressive performance and an intuitive API.
John,
I agree that ASP.NET is not suitable for Async tasks as you have described them, nor should it be. It is designed as a web hosting platform, not a back of house processor.
We have had similar situations in the past and we have used a solution similar to what you have described. In summary, keep your WCF service under ASP.NET, use a "Queue" table with a Windows Service as the "QueueProcessor". The client should poll to see if work is done (or use messaging to notify the client).
We used a table that contained the process and it's information (eg InvoicingRun). On that table was a status (Pending, Running, Completed, Failed). The client would submit a new InvoicingRun with a status of Pending. A Windows service (the processor) would poll the database to get any runs that in the pending stage (you could also use SQL Notification so you don't need to poll. If a pending run was found, it would move it to running, do the processing and then move it to completed/failed.
In the case where the process failed fatally (eg DB down, process killed), the run would be left in a running state, and human intervention was required. If the process failed in an non-fatal state (exception, error), the process would be moved to failed, and you can choose to retry or have human intervantion.
If there were multiple processors, the first one to move it to a running state got that job. You can use this method to prevent the job being run twice. Alternate is to do the select then update to running under a transaction. Make sure either of these outside a transaction larger transaction. Sample (rough) SQL:
UPDATE InvoicingRun
SET Status = 2 -- Running
WHERE ID = 1
AND Status = 1 -- Pending
IF ##RowCount = 0
SELECT Cast(0 as bit)
ELSE
SELECT Cast(1 as bit)
Rob
Use a simple background tasks / jobs framework like Hangfire and apply these best practice principals to the design of the rest of your solution:
Keep all actions as small as possible; to achieve this, you should-
Divide long running jobs into batches and queue them (in a Hangfire queue or on a bus of another sort)
Make sure your small jobs (batched parts of long jobs) are idempotent (have all the context they need to run in any order). This way you don't have to use a quete which maintains a sequence; because then you can
Parallelise the execution of jobs in your queue depending on how many nodes you have in your web server farm. You can even control how much load this subjects your farm to (as a trade off to servicing web requests). This ensures that you complete the whole job (all batches) as fast and as efficiently as possible, while not compromising your cluster from servicing web clients.
Have thought about the use the Workflow Foundation instead of your custom implementation? It also allows you to persist states. Tasks could be defined as workflows in this case.
Just some thoughts...
Michael
These two may look like they have no correlation but bear with me!
In a previous version of the software I develop/maintain there was a web app sitting on top of a web service. There was a scheduled task that run every hour called one of the web methods to carry out some tasks.
In the new architecture we now have a web application project with two class libraries for the Business Layer and Resource Access Layer.
However I still need the same functionality in this version, and I am currently trying to design a suitable solution.
I was thinking it may be an idea to have the hourly task running on a separate thread of the web application that sleeps for an hour, wakes up and carries out the task, or would it be easier to expose some web methods in a similar way to the old application.
If anyone also has any good examples of ASP.NET threading I would appreciate having a look at them
The problem you could have is that the thread will be running in application pool process which may for various reasons be closed down. For example overnight with no activity the process could legitimately shutdown. This fine, any new request will simply spin a new worker process. However if you require something to run every hour this won't be happening if the pool is shutdown.
Additional the application pool may be a web garden where there are multiple processes. You then need to consider how you ensure you don't have multiple versions of this task.
Hence it would be better to continue with the scheduled task approach with posts a request to the web server to intiate the activity.