I'm working on an API (Pragmatic Rest API or very similar). I would like to know if it is possible to do an API request that will return a quick response (in JSON) and continue to process heavy code in background.
I suppose this is possible by using queue system but I have no idea where to start with this.
You can have your API delegate long running things to another process.
You mentioned queues, that's one way of doing things, all you need really is an application which can execute whatever long running tasks you have.
Let's imagine a simple system that can do this.
Your API receives a request to do something.
Instead of doing this something, the API writes one record into a database with the details of what needs to be done. Another app watches that table, sees a new record, runs the thing, updates the record with the status / result / whatever it needs.
On any requests from now on, the API can check the record and return whatever is there.
This is the simplest thing I can think of. You can easily do other things as well, talk to a queue system, send it data, let something else execute it.
Looking at your comments, what you are suggesting is not really a good way of building APIs. Why do I say this?
Well, let's say that you receive a request, the API starts a work thread and sends back a 200 to the client. Great the client knows work has started and how does it know when that process had ended and how does it receive whatever data it expects back?
Let's go a bit deeper next.
What happens when 1000 clients call that one endpoint and your API is attempting to start 1000 work threads? You've killed your API, no work gets done and no client gets anything.
This is why I suggest to delegate the work to something else, not the API. Let the API do what it does best, run quick things and return results and delegate other things to something else.
Related
So suppose I have an API to create a cloud instance asynchronously. So after I made an API call it will just return the success response, but the cloud instance will not been initialized yet. It will take 1-2 minutes to create cloud instance and after that it will save the cloud instance information (ex. ip, hostname, os) to db which mean I have to wait 1-2 minutes so I can fetch the data again to show cloud information. At first I try making a loading component, but the problem is that I don't know when the cloud instance is initialized (each instance has different time duration for creating). I'm considering using websocket or using cron or should I redesign my API? Has anyone design asynchronous system before how do you handle such a case.
If the API that you call gives you no information on when it's done with its asynchronous processing, it seems to me that you'll have to check at intervals until you find that the resource is ready; i.e. to poll it.
This seems to me to roughly fit the description and intent of the Polling Consumer pattern. In general, for asynchronous systems design, I can't recommend Enterprise Integration Patterns enough.
As other noted you can either have a notification channel using WebSockets or poll the backend. Personally I'd probably go with the latter for this case and would actually create several APIs, one for initiating the work and get back a URL with "job id" in it where the status of the job can be polled.
RESTfully that would look something like POST /instances to initiate a job GET /instances see all the instances that are running/created/stopped and GET /instances/<id> to see the status of a current instance (initiating , failed , running or whatever)
WebSockets would work, but might be an overkill for this use case. I would probably display a status of 'creating' or something similar after receiving the success response from the API call, and then start polling the API to see if the creation process has finished.
I have to create a Java EE application which converts large documents into different formats. Each conversion takes between 10 seconds and 2 minutes.
The SOAP requests will be made from a client application which I also have to create.
What's the best way to handle these long running requests? Clearly the process takes to much time to run without any feedback to the user.
I can think of the following ways to provide some kind of feedback, but I'm not sure if there isn't a better way, perhaps something standardized.
The client performs the request from a thread and the server sends the document in the response, which can take a few minutes. Until then the client shows a "Please wait" message, progress spinner, etc. (This seems to be simple to implement.)
The client sends a "Start conversion" command. The server returns some kind of job ID which the client can use to frequently poll for a status update or the final document. (This seems to be user friendly, because I can display a progress, but also requires the server to be stateful.)
The client sends a "Start conversion" command. The server somehow notifies the client when it is done. (Here I don't even know how to do this)
Are there other approaches? Which one is the best in terms of performance, stability, fault tolerance, user-friendliness, etc.?
Thank you for your answers.
Since this almost all done server-side, there isn't much a client can do besides poll the server somehow for updates on the status.
#1 is OK, but users get impatient really fast. "A few minutes" is a bit too long for most people. You'd need HTTP Streaming to implement #3, but I think that's overkill.
I would just go with #2.
For 3 the server should return a unique ID back to the client and using that ID the client has to ask the server the result at a later time
option 4 for those desiring to use web sockets
you request will be response with a jobId,
you get progress state over the web soket
I know that similar questions have been asked all over the place, but I'm having trouble finding one that relates directly to what I'm after.
I have a website where a user uploads a data file, then that file is transformed and imported into SQL. The file could be up to 50mb in size, and some times this process can take 30 minutes or sometimes even longer.
I realise I need to palm off the actual work to another process, and poll that process on the web page. I'm wondering what the best approach would be though? Being a web developer by trade, I'm finding all this new Windows Service stuff a bit confusing, and I just wanted somewhere to start.
So:
Can I do / should I being doing this with a windows service? if so, how?
Should I use WCF? If this runs under IIS, will I have problems with aspnet_wp.exe recycling and timing out my process?
clarifications
The data is imported into sql, there's no file distribution taking place.
If there is a failure, it absolutely MUST be reported to the user. The web page will poll every, lets say, 5 seconds, from the time the async task begins, to get the 'status' of the import. Once it's finished another response will tell the page to stop polling for status updates.
queries on final decision
ok, so as I thought, it seems that a windows service is the best idea. So as to HOW to get it to work, it seems the 'put the file there and wait for the service to pick it up' idea is the generally accepted way, is there a way I can start a process run by the service, without it having to constantly be checking a database table / folder? As I said earlier, I don't have any experience with Windows Services - I wondered if I put a public method in the service, can I call it somehow?
well ...
var thread = new Thread(() => {
// your action
});
thread.Start();
but you will have problems with that:
what if the import to sql fails? should there be any response to the client
if it fails, how do you ensure the file on a later request
what if the applications shuts down ... this newly created and started thread will be killed either
...
it's not always a good idea to store everything in sql (especially files...). if you want to make the file available to several servers why not distribute them via ftp ...?
i believe that your whole concept is a bit messed up (sry assuming this), and it might be helpful if you elaborate and give us more information about your intentions!
edit:
Can I do / should I being doing this
with a windows service? if so, how?
you can :) i advise you to create a simple console-program and convert this with srvany and sc. you can get a rough overview howto here (note: insert blanks after =... that's a silly pitfall)
the term should is relative, because you did not answer the most important question
what if a record is persisted to the database, telling a consumer that file test.img should be persisted, but your service hasn't captured it or did not transform it yet?
so ... next on
Should I use WCF? If this runs under IIS, will I have problems with aspnet_wp.exe recycling and timing out my process?
you probably could create a WCF-service which recieves some binary-data and then stores this to a database. this request could be async. yes. but what for?
once again:
please give us more insight to your workflow: what are you exactly trying to achieve? which "environmental-conditions" to you have (eg. app A polls db and expects file-records which are referenced in table x to be persisted) ...
edit:
so you want to import a .csv-file. well that changes everything :)
but i won't advise you to use a wcf-service (there could be a usage: eg. a wcf-service which has a method to insert a single row, then your iteration through the file would be implemented in another app... not that good, though).
i would suggest following:
at first do everything in your webapp (as you've already done), but rather use some sort of bulk-insert and do your transformation/logic on the database.
if you have some sort of bottle-neck then, i would suggest you something like a minor job-service, eg:
webapp will upload the file and insert a row to a job-table. the job-service is continiously polling the table/or gets informed via wcf by the webapp (hey, hey, finally some sort of usage for WCF in your scenario... :) ) and then does the import-job, writing a finish-note to a table/or set the state of the job to finished ...
but this is a bit overkill :)
Please see if my below comments helps you to resolve your issue:
•Can I do / should I being doing this with a windows service? if so, how?
Yes you can do this with a windows service. And I think that is the way you should be doing it. You can implement your own service to process your request or you can use the open source code Job Proccessor
Basically the idea is..
You submit a request for processing
the csv file in database table with
some status as not started.
Then your windows service picks up
the request from database table which
are not started and update them as in
progress status.
Once the processing is complete
succesfully /unsuccesfuly your
service updated the database table
with status as Completed / Failed.
And your asp.net page can poll to
database table for the current status
every 5 sec or so.
•Should I use WCF? If this runs under IIS, will I have problems with aspnet_wp.exe recycling and timing out my process?
you should not be using WCF for this purpose.
I looking into the concept of queueing for web apps (i.e. putting some types of job in a queue for completion by a seperate worker, rather than being completed in the web request cycle).
I would like to know if there are any good solutions existing for this which can be utilised in an ASP.NET MVC environemnt.
Has anyone had any (good or bad) experiences?
Thank you!
UPDATE:
Just to clarify, I'm not talking about queueing incoming requests. I'll try to illustrate what I mean...
1) Standard situation:
Request from browser
Server processing starts
Long job starts
Long job finished
Server processing finished
Response returned to browser
2) What I'm looking into:
Requsest from browser
Server processing starts
Long job placed in queue
Server processing finished
Response returned to browser
And in another process (possibly after the response was sent):
Long job taken from queue
Long job starts
Long job finished
In the first instance the user has waited a long time for server resoponse, in the second it was quick.
Of course there are certain types of jobs that would be appropriate for this, some that would not be.
UPDATE2:
The client doesn't have to be updated immediately with the results of the long job. The changes would just show themselves in the application whenever the user happened to refresh a page (after the job had completed of course).
Think of some of the things that happen in stack overflow - they are not immediately updated in each part of the application, but this happends quite quickly - I suspect some of these jobs are being queued.
Post the job data in an MSMQ queue and have a Windows Service process the items in the queue. Or, let the web request spawn a process that process the items in the queue.
The Rhino Service Bus is another solution that may work for you:
http://ayende.com/Blog/archive/2008/12/17/rhino-service-bus.aspx
You might check into using an ESB. I've played around with MassTransit: http://code.google.com/p/masstransit/ - the documentation is (or at least was) a little sparse, but it's easy to implement.
In addition, I develop apps for running on Amazon EC2 and absolutely love their AmazonSQS Service.
Thanks,
Hal
Since you mentioned in another comment that you were looking for an equivalent to amazon's sqs service ... you might want to look into Windows Azure. They have an equivalent queue api:
http://msdn.microsoft.com/en-us/library/dd179363.aspx
I have implemented this pattern by having the web server call a WCF service asynchronously. The VS wizards will generate async proxies for you when you consume a WCF service. If you must have guaranteed delivery on the request to the service, you could use MSMQ as the transport layer for the WCF service.
I think Chrisitan's comment might be your answer, but considering I don't know much about IIS and queueing with it, my solution would be:
Make an asynchronous request and load the job details in the database. Then have a job to loop through the database and process the job details. I do this for one of my sites. Might not be the best solution out there, but it gets the job done.
EDIT
My answer might still work, but you will need to have some polling mechanism on the client to continuously check the database to see if that user's job is done, then grab the data you need.
I am thinking on the following approach but not sure if its the best way out:
step1 (server side): A TaskMangaer class creates a new thread and start a task.
step2 (server side): Store taskManager object reference into the cache for future reference.
step3 (client side): Use periodic Ajax call to check the status of the task.
Basically the intention is to have a framework to run a background task (5mins approx) and provide regular feedback on the web UI for the percentage of task completed.
Is there a neat way around this or any existing asp.net API that will be helpful ?
Edit 1#: I want to run the task in-proc with the app.
Edit 2#: Looks like badge implementation on stack overflow is also using the cache to track background task. https://blog.stackoverflow.com/2008/07/easy-background-tasks-in-aspnet/
I think the problem with storing the result in the cache is that ASP.NET might scavenge that cache entry for other purposes (ie if its short on memory, if its grumpy, etc). Something that is served from the cache should be something you can recreate on demand if its not found in the cache, the ASP.NET runtime is free to dump cache entries whenever it feels like it.
The usage of the cache in the badge discussion seems fundamentally different, in that case the task was shortlived. The cache was just being used as a hacky timer to fire off the task periodically.
Can you confirm this is a task that is going to take 5 minutes, and require its own thread that whole time? This is a performance concern in itself, you will only be able to support a limited number of such requests if each requires its own thread for so long. Only if thats acceptable would I let the task camp a thread for so long.
If its ok for these tasks to camp a thread, then I'd just go ahead and store the result in a dictionary global to the process. The key of the dictionary would correlate to the client request / AJAX callback series. The key should incorporate the user ID as well if security is at all important.
If you need to scale up to many users, then I think you need to break the task down into asynchronous steps, and in that case I'd probably use a DB table to store the results (again keyed per request / user).
Microsoft Message Queuing was built for scenarios like the one you try to solve:
http://www.microsoft.com/windowsserver2003/technologies/msmq/default.mspx
Windows Communicatio Foundation also has message queuing support.
Hope this helps.
Thomas
One approach for doing this is to use application state. When you spawn a worker thread, pass it a request ID that you generate, and return this to the client. The client will then pass that request ID back to the server in its AJAX calls. The server will then fetch the status using the request ID from application state. (The worker thread would be updating the application state based on its status).
I saw an approach to a similar problem somewhere. The solution was something like:
Start the background task on server.Return immediately with a url to the result.
Until the result is posted, this url will return 404.
The client checks periodically for this url.
The client reads the results when
they are finally posted.
The url will be something like http://mysite/myresults/cffc6c30-d1c2-11dd-ad8b-0800200c9a66.
The best document format is probably JSON.
If feedback on progress is important, modify the document to also contain status (inprogress/finish) and progress (42 %).