Requirement: I need to create a background worker/task that will get data from an external source ( message queue) at certain intervals ( i.e. 10s) and update a database. Need to run non stop 24hrs. An ASP.NET application is placing the data to the message queue.
Possible solutions:
Windows service with timer
Pros: Takes load away from web server
Cons: Separate deployment overhead, Not load balanced
Use one of the methods described here : background task
Pros: No separation deployment required, Can be load balanced - if one server goes down another can pick it up
Cons: Overhead on web server (however, in my case with max 100 concurrent users and seeing the web server resources are under-utilized, I do not think it will be an issue)
Question: What would be a recommended solution and why?
I am looking for a .net based solution.
You shouldn't go with the second option unless there's a really good reason for it. Decoupling your background jobs from your web application brings a number of advantages:
Scalability - It's up to you where to deploy the service. It can share the same server with the web application or you can easily move it to a different server if you see the load going up.
Robustness - If there's a critical bug in either the web application or the service this won't bring the other component down.
Maintanance - Yes, there's a slight overhead as you will have to adjust your deployment process but it's as simple as copying all binaries from the output folder and you will have to do it once only. On the other hand, you won't have to redeploy the application thus brining it down for some time if you just need to fix a small bug in the service.
etc.
Though I recommend you to go with the first option I don't like the idea with timer. There's a much simpler and robust solution. I would implement a WCF service with MSMQ binding as it provides you with a lot of nice features out of the box:
You won't have to implement polling logic. On start up the service will connect to the queue and will sit waiting for new messages.
You can easily use transaction to process queue messages. For example, if there's something wrong with the database and you can't write to it the message which is being processed at the moment won't get lost. This will get back to the queue to be processed later.
You can deploy as many services listening to the same queue as you wish to ensure scalability and availability. WCF will make sure that the same queue message is not processed by more than one service that is if a message is being processed by service A, service B will skip it and get the next available message.
Many other features you can learn about here.
I suggest reading this article for a WCF + MSMQ service sample and see how simple it is to implement one and use the features I mentioned above. As soon as you are done with the WCF service you can easily host it in a windows service.
Hope it helps!
Related
Our website has a long running calculation process which keeps the client waiting for a few minutes until it's finished. We've decided we need a design change, and to farm out the processing to a windows or a WCF service, while the client is presented with another page, while we're doing all the calculations.
What's the best way of implementing the service though?
We've looked at background worker processes, but it looks like these are problematic because if IIS can periodically shut down threads
It seems the best thing to use is either a Windows service or a WCF service. Does anyone have a view on which is better for this purpose?
If we host the service on another machine, would it have to be a WCF service?
It looks like it's difficult to have the service (whatever type it is) to communicate back to the website - maybe instead the service can update its results to a database, and the website polls that for the required results later on.
It's an open ended question I know, but does anyone have any ideas?
thanks
I don't think that the true gain in terms of performance will come from the design change.
If I were to chose between windows service and WCF I would go with the Windows service because I would be able to fix an affinity and prioritize as I want. However I will have to implement the logic for serving multiple clients in the same time (which in a WCF service approach will be handled by IIS).
So in terms of performance if you use .NET framework for both the WCF service and Windows service the performance difference will not be major. Windows service would be more "controllable", WCF would be more straight-forward and with no big performance penalties.
For these types of tasks I would focus on highly optimizing the single thread calculation. If you have a complex calculation, can it be written in native code (C or C++)? You could make a .DLL file that is highly optimized and is used by either the Windows service or the WCF service. Using this approach will allow you to select best compiler option and make best use of your machine resources. Also nothing stops you from creating multiple threads in the .DLL function.
The link between the website and the service can be ensured in both cases: through sockets for Windows service (extra code for creating the protocol) or directly through SOAP for the WCF. If you push the results in a database the difficulty would be letting the website (and knowing to wich particular user session) know that the data is there.
So that's what I would do.
Hope it helps.
Cheers!
One way to do this is:
The Client submits the calculation request using a Call to a WCF Service (can be hosted in IIS)
The calculation request is stored in a database With a unique ID
The ID is returned to the Client
A Windows Service (or serveral on several different machines) poll the database for New requests
The Windows service performs the calculation and stores the result to a result table With the ID
The Client polls the result table (using a WCF service) With the ID
When the calculation is finished the result is returned to the client
I’m working on a business problem which has to import files which has 1000s of records. Each record has to be registered in a Workflow as individual record which has to go through its own workflow.
WF4 Corporate Purchase Process example has a good solution, as in the first step it create bookmarks for all the required record ids. So the workflow can be resumed with rest of the actions for each individual record/id.
I would like to know how to implement same thing using Workflow services as I could get the benefits of AppFabric for my workflows.
Is there any other solutions to handle batch of records/ids? Otherwise workflow service has to be called 1000s of times just to register every record in a workflow instance which is a not a good solution.
I would like to know how to implement same thing using Workflow services as I could get the benefits of AppFabric for my workflows.
This is pretty straight forward. You're going to have one workflow that reads the file and loops through the results using the looping activities that exist. Then, inside the loop you'll be starting up the workflow that each record needs (the "Service") by calling the endpoint with a Send activity.
Now, as for the workflow that is the Service, you're going to have a Receive activity at the top of the workflow that also has CanCreateInstance set the true. The everything after the Receive is no different than any other workflow. You may consider having a Send activity right after the Receive just to let the caller know that the Service has been started. But that's not a requirement -- the Receive will be required because it forces WF to build the workflow to use the WorkflowServiceHost.
Is there any other solutions to handle batch of records/ids? Otherwise workflow service has to be called 1000s of times just to register every record in a workflow instance which is a not a good solution.
Are you indicating that a for a web server to receive 1000's of requests is not a good solution? Consider the fact that an IIS server can handle roughly 25-50 requests, per instant in time, per core. Now consider the fact that you're loop that's loading the workflows isn't going to average more than maybe 5 in that instant of time but probably more like 1 or 2.
I don't think the web server is going to be your issue. I've started up literally 10,000's of workflows on a server via a loop just like the one you're going to build and it didn't break a sweat.
One way would be to use WCF's MSMQ binding to launch your workflows. Requests can come in normally through HTTP, and WCF would route them to MSMQ and process the load. You can throttle how many workflow instances are used through the MSMQ binding + IIS settings.
Download this word document that describes setting up a workflow application with WCF and MSMQ: http://www.microsoft.com/en-us/download/details.aspx?id=21245
In the spirit of the doing the simplest thing that could work, you can bring the subworkflow in as an activity to the main workflow and use a parallel for each to execute the branch for each input from your file. No extra invoking is required and the tooling supports this out of the box because all workflows are activities. Hosting the main process in a service so you can avoid contention with the rest of your IIS users, real people that they may are, might be a good idea.
I do agree that calling IIS or a WCF service 1000's of times is not a problem though, unless you want to do it in a few seconds!
It is important to remember that one of the good things about workflow is that it has fairly low overhead (compared to other workflow products) so you should be more concerned about what your workflow does than just the idea of launching lots of instances. The idea of batches like your example is very common.
What are your most successful ways of running a long process, like 2 hours, in asp.net and return information to the client on the progress.
I've heard creating a windows service, httphandler and remoting can be successful.
Just a suggestion...
If you have logic that you are tyring to utilize already in asp.net... You could make an external app (windows service, console app, etc.) that calls a web service on your asp.net page.
For example, I had a similiar problem where the code I needed was asp.net and I needed to update about 3000 clients using this code. It started timing out, so I exposed the code through a web service. Then, instead of trying to run the whole 3000 clients at through asp.net all at once, I used a console app that is run by a nightly sql server job that ran the web service once for each client. This way all the time consuming processing was handled by the console app that doesn't have the time out issue, but the code we had already wrote in asp.net did not have to be recreated. In the end slighty modifying the design of my existing architecture allowed me easily get around this problem.
It really depends on the environment and constraints you have to deal with...Hope this helps.
There are two ways that I have handled this. First, you can simply run the process and let the client time out. This has two drawbacks: the UI isn't in synch and you are tying up an IIS thread for non-html purposes (I did this for a process that used to return quickly enough but that grew beyond time-out limits).
The better way to handle this is to write a "Service" application that handles the request as passed through a database table (put the details of the request there). Then you can create a window that gives the user a "window" into ongoing progress on the task (e.g. how many records have been processed or emails sent). This status window can either have a link to permit the user to refresh or you can automate the refresh using Ajax callbacks on a timer.
This isn't directly applicable but I wrote code that will let you run processes similar to "scheduled tasks" inside of ASP.NET without needing to use windows services or any type of cron jobs.
Scheduled Tasks in ASP.NET!
I very much prefer WCF service to scheduled tasks. You might (off the top of my head) pass an addr to the WCF service as a sort of 'callback' that the service can call with progress reports as it works.
I'd shy away from scheduled tasks... too course grained.
I know there's a bunch of APIs out there that do this, but I also know that the hosting environment (being ASP.NET) puts restrictions on what you can reliably do in a separate thread.
I could be completely wrong, so please correct me if I am, this is however what I think I know.
A request typically timeouts after 120 seconds (this is configurable) but eventually the ASP.NET runtime will kill a request that's taking too long to complete.
The hosting environment, typically IIS, employs process recycling and can at any point decide to recycle your app. When this happens all threads are aborted and the app restarts. I'm however not sure how aggressive it is, it would be kind of stupid to assume that it would abort a normal ongoing HTTP request but I would expect it to abort a thread because it doesn't know anything about the unit of work of a thread.
If you had to create a programming model that easily and reliably and theoretically put a long running task, that would have to run for days, how would you accomplish this from within an ASP.NET application?
The following are my thoughts on the issue:
I've been thinking a long the line of hosting a WCF service in a win32 service. And talk to the service through WCF. This is however not very practical, because the only reason I would choose to do so, is to send tasks (units of work) from several different web apps. I'd then eventually ask the service for status updates and act accordingly. My biggest concern with this is that it would NOT be a particular great experience if I had to deploy every task to the service for it to be able to execute some instructions. There's also this issue of input, how would I feed this service with data if I had a large data set and needed to chew through it?
What I typically do right now is this
SELECT TOP 10 *
FROM WorkItem WITH (ROWLOCK, UPDLOCK, READPAST)
WHERE WorkCompleted IS NULL
It allows me to use a SQL Server database as a work queue and periodically poll the database with this query for work. If the work item completed with success, I mark it as done and proceed until there's nothing more to do. What I don't like is that I could theoretically be interrupted at any point and if I'm in-between success and marking it as done, I could end up processing the same work item twice. I might be a bit paranoid and this might be all fine but as I understand it there's no guarantee that that won't happen...
I know there's been similar questions on SO before but non really answers with a definitive answer. This is a really common thing, yet the ASP.NET hosting environment is ill equipped to handle long-running work.
Please share your thoughts.
Have a look at NServiceBus
NServiceBus is an open source
communications framework for .NET with
build in support for publish/subscribe
and long-running processes.
It is a technology build upon MSMQ, which means that your messages don't get lost since they are persisted to disk. Nevertheless the Framework has an impressive performance and an intuitive API.
John,
I agree that ASP.NET is not suitable for Async tasks as you have described them, nor should it be. It is designed as a web hosting platform, not a back of house processor.
We have had similar situations in the past and we have used a solution similar to what you have described. In summary, keep your WCF service under ASP.NET, use a "Queue" table with a Windows Service as the "QueueProcessor". The client should poll to see if work is done (or use messaging to notify the client).
We used a table that contained the process and it's information (eg InvoicingRun). On that table was a status (Pending, Running, Completed, Failed). The client would submit a new InvoicingRun with a status of Pending. A Windows service (the processor) would poll the database to get any runs that in the pending stage (you could also use SQL Notification so you don't need to poll. If a pending run was found, it would move it to running, do the processing and then move it to completed/failed.
In the case where the process failed fatally (eg DB down, process killed), the run would be left in a running state, and human intervention was required. If the process failed in an non-fatal state (exception, error), the process would be moved to failed, and you can choose to retry or have human intervantion.
If there were multiple processors, the first one to move it to a running state got that job. You can use this method to prevent the job being run twice. Alternate is to do the select then update to running under a transaction. Make sure either of these outside a transaction larger transaction. Sample (rough) SQL:
UPDATE InvoicingRun
SET Status = 2 -- Running
WHERE ID = 1
AND Status = 1 -- Pending
IF ##RowCount = 0
SELECT Cast(0 as bit)
ELSE
SELECT Cast(1 as bit)
Rob
Use a simple background tasks / jobs framework like Hangfire and apply these best practice principals to the design of the rest of your solution:
Keep all actions as small as possible; to achieve this, you should-
Divide long running jobs into batches and queue them (in a Hangfire queue or on a bus of another sort)
Make sure your small jobs (batched parts of long jobs) are idempotent (have all the context they need to run in any order). This way you don't have to use a quete which maintains a sequence; because then you can
Parallelise the execution of jobs in your queue depending on how many nodes you have in your web server farm. You can even control how much load this subjects your farm to (as a trade off to servicing web requests). This ensures that you complete the whole job (all batches) as fast and as efficiently as possible, while not compromising your cluster from servicing web clients.
Have thought about the use the Workflow Foundation instead of your custom implementation? It also allows you to persist states. Tasks could be defined as workflows in this case.
Just some thoughts...
Michael
I am building an ASP.NET web application that will be deployed to a 4-node web farm.
My web application's farm is located in California.
Instead of a database for back-end data, I plan to use a set of web services served from a data center in New York.
I have a page /show-web-service-result.aspx that works like this:
1) User requests page /show-web-service-result.aspx?s=foo
2) Page's codebehind queries a web service that is hosted by the third party in New York.
3) When web service returns, the returned data is formatted and displayed to user in page response.
Does this architecture have potential scalability problems? Suppose I am getting hundreds of unique hits per second, e.g.
/show-web-service-result.aspx?s=foo1
/show-web-service-result.aspx?s=foo2
/show-web-service-result.aspx?s=foo3
etc...
Is it typical for web servers in a farm to be using web services for data instead of database? Any personal experience?
What change should I make to the architecture to improve scalability?
You have most definitely a scalability problem: the third-party web service. Unless you have a service-level agreement with that service (agreeing on the number of requests that you can submit per second), chances are real that you overload that service with your anticipated load. That you have four nodes yourself doesn't help you then.
So you should a) come up with an agreement with the third party, and b) test what the actual load is that they can take.
In addition, you need to make sure that your framework can use parallel connections for accessing the remote service. Suppose you have a round-trip time of 20ms from California to New York (which would be fairly good), you can not make more than 50 requests over a single TCP connection. Likewise, starting new TCP connections for every request will also kill performance, so you want pooling on these parallel connections.
I don't see a problem with this approach, we use it quite a bit where I work. However, here are some things to consider:
Is your page rendering going to be blocked while waiting for the web service to respond?
What if the response never comes, i.e. the service is down?
For the first problem I would look into using AJAX to update the page after you get a response back from the web service. You'll also want to consider how to handle the no response or timeout condition.
Finally, you should really think about how you could cache the web service data locally. For example if you are calling a stock quoting service then unless you have a real-time feed, there is no reason to call the web service with every request you get. Store the data locally for a period of time and return that until it becomes stale.
You may have scalability problems but most of these can be carefully engineered around.
I recommend you use ASP.NET's asynchronous tasks so that the web service is queued up, the thread is released while the request waits for the web service to respond, and then another thread picks up when the web service is done to finish off the request.
MSDN Magazine - Wicked Code - Asynchronous Pages in ASP.NET 2.0
Local caching is an absolute must. The fewer times you have to go from California to New York, the better. You might want to look into Microsoft's Velocity (although that's still in CTP) or NCache, or another distributed cache, so that each of your 4 web servers don't all have to make and cache the same data from the web service - once one server gets it, it should be available to all.
Microsoft Project Code Named "Velocity"
NCache
Other things that can go wrong that you should engineer around:
The web service is down (obviously) and data falls out of cache, and you can't get it back. Try to make it so that the data is not actually dropped from cache until you're sure you have an update available. Then the only risk is if the service is down and your application pool is reset, so don't reset it as a first-line troubleshooting maneuver!
There are two different timeouts on web requests, a connect and an overall timeout. Make sure both are set extremely low and you handle both of them timing out. If the service's DNS goes down, this can look like quite a different failure.
Watch perfmon for ASP.NET Queued Requests. This number will rise rapidly if the service goes down and you're not covering it properly.
Research and adjust ASP.NET performance registry settings so you have a highly optimized ASP.NET thread pool. I don't remember the specifics, but I seem to remember that there's a limit on IO Completion Ports and something else of that nature that are absurdly low for the powerful hardware I'm assuming you have on hand.
the trendy answer is REST. Any GET request can be HTTP Response cached (with lots of options on how that is configured) and it will be cached by the internet itself (your ISP, essentially).
Your project has an architecture that reflects they direction that Microsoft and many others in the SOA world want to take us. That said, many people try to avoid this type of real-time risk introduced by the web service.
Your system will have a huge dependency on the web service working in an efficient manner. If it doesn't work, or is slow, people will just see that your page isn't working properly.
At the very least, I would get a web stress tool and performance test your web service to at least the traffic levels you expect to get at peaks, and likely beyond this. When does it break (if ever?), when does it start to slow down? These are good metrics to know.
Other options to look at: perhaps you can get daily batches of data from the web service to a local database and hit the database for your web site. Then, if for some reason the web service is down or slow, you could use the most recently obtained data (if this is feasible for your data).
Overall, it should be doable, but you want to understand and measure the risks, and explore any potential options to minimize those risks.
It's fine. There are some scalability issues. Primarily, with the number of calls you are allowed to make to the external web service per second. Some web services (Yahoo shopping for example) limit how often you can call their service and will lock out your account if you call too often. If you have a large farm and lots of traffic, you might have to throttle your requests.
Also, it's typical in these situations to use an interstitial page that forks off a worker thread to go and do the web service call and redirects to the results page when the call returns. (Think a travel site when you do search, you get an interstitial page while they call out to an external source for the flight data and then you get redirected to a results page when the call completes). This may be unnecessary if your web service call returns quickly.
I recommend you be certain to use WCF, and not the legacy ASMX web services technology as the client. Use "Add Service Reference" instead of "Add Web Reference".
One other issue you need to consider, depending on the type of application and/or data you're pulling down: security.
Specifically, I'm referring to authentication and authorization, both of your end users, and the web application itself. Where are these things handled? All in the web app? by the WS? Or maybe the front-end app is authenticating the users, and flowing the user's identity to the back end WS, allowing that to verify that the user is allowed? How do you verify this? Since many other responders here mention a local data cache on the front end app (an EXCELLENT idea, BTW), this gets even MORE complicated: do you cache data that is allowed to userA, but not for userB? if so, how do you verify that userB cannot access data from the cache? What if the authorization is checked by the WS, how do you cache the permissions then?
On the other hand, how are you verifying that only your web app is allowed to access the WS (and an attacker doesn't directly access your WS data over the Internet, for instance)? For that matter, how do you ensure that your web app contacts the CORRECT WS server, and not a bogus one? And of course I assume that all the connection to the WS is only over TLS/SSL... (but of course also programmatically verify the cert applies to the accessed server...)
In short, its complicated, and many elements to consider here.... but it is NOT insurmountable.
(as far as input validation goes, that's actually NOT an issue, since this should be done by BOTH the front end app AND the back end WS...)
Another aspect here, as mentioned by #Martin, is the need for an SLA on whatever provider/hosting service you have for the NY WS, not just for performance, but also to cover availability. I.e. what happens if the server is inaccessible how quickly they commit to getting it back up, what happens if its down for extended periods of time, etc. That's the only way to legitimately transfer the risk of your availability being controlled by an externality.