I have a application in which I can generate raw export in xls.
The problem is that the xls generation can be very long, more than the timeout duration.
I've checked and my query isn't the culprit (takes <2s for a regular query), but the xls generation is very long (for several thousand lines, I put different colors in cells, conditionally display data...).
I was thinking about the command, which runs in CLI, without timeout problem.
I can't use it directly, because the data to be generated has to be called by users (without cli access).
So I thought about calling the command in my controller
The user would choose the parameters in a form, send the form, and then in the controller, the parameters would be passed to the command that would do the heavy lifting.
My question is: In this case, is the command called in the CLI context (with CLI timeout = 0) or is it called in the application (Web) context (with timeout <50s) ? In the latter case, this would be useless, and I would be grateful for any advice on any alternate method to resolve my problem.
This is a textbook case for a message queue.
RabbitMq is recommended, and easy to use with Symfony.
You will have a producer, which will generate a message and put it in a queue. This will be done in your controller.
The db query and the sheet generation should be placed in the consumer (the command running in the background, picking messages from the queue and processing them).
When the sheet is ready, save it as a file, and perhaps log it in the database with a unique ID.
This migth sound difficult, but it is very simple, and you should learn it anyway :)
A problem is showing the result to the user. The simplest way is to refresh the browser every X seconds. Other choices include polling with ajax, and websocket based notifications from the server.
Related
Regarding Symfony Documentation, when running a command, i can acess in real-time to each console output, no problem.
While collecting these informations in a controller, i want to render them in a view, one after an other.
What concept should I look for to achieve this ?
I dont think you fully grasp how the controller Request / Response paradigm works. A single request comes in, then a single response is given. There are ways to accomplish what you want but its involved.
High overview would be something like:
Have the output of the command logged to a file
Setup a route and controller action that takes a starting line number as an argument
The method reads the file and returns all new lines since then.
On the front end of the site setup some sort of polling AJAX request that requests the new route and passes the last line number its received.
I'm using Spring MVC 3 + Tiles for a webapp. I have a slow operation, and I'd like a please wait page.
There are two main approaches to please wait pages, that I know of:
Long-lived requests: render and flush the "please wait" bit of the page, but don't complete the request until the action has finished, at which point you can stream out the rest of the response with some javascript to redirect away or update the page.
Return immediately, and start processing on a background thread. The client polls the server (in javascript, or via page refreshes), and redirects away when the background thread finishes.
(1) is nice as it keeps the action all single-threaded, but doesn't seem possible with Tiles, as each JSP must complete rendering in full before the page is assembled and returned to the client.
So I've started implementing (2). In my implementation, the first request starts the operation on a background thread, using Spring's #Async annotation, which returns a Future<Result>. It then returns a "please wait" page to the user, which refreshes every few seconds.
When the please wait page is refreshed, the controller needs to check on the progress of the background thread. What is the best way of doing this?
If I put the Future object in the Session directly, then the poll request threads can pull it out and check on the thread's progress. However, doesn't this mean my Sessions are not serializable, so my app can't be deployed with more than one web server (without requiring sticky sessions)?
I could put some kind of status flag in the Session, and have the background thread update the Session when it is finished. I'm very concerned that passing an HttpSession object to a non-request thread will result in hard to debug errors. Is this allowed? Can anyone cite any documentation either way? It works fine when the sessions are in-memory, of course, but what if the sessions are stored in a database? What if I have more than one web server?
I could put some kind of status flag in my database, keyed on the session id, or some other aspect of the slow operation. It seems weird to have session data in my domain database, and not in the session, but at least I know the database is thread-safe.
Is there another option I have missed?
The Spring MVC part of your question is rather easy, since the problem has nothing to do with Spring MVC. See a possible solution in this answer: https://stackoverflow.com/a/4427922/734687
As you can see in the code, the author is using a tokenService to store the future. The implementation is not included and here the problems begin, as you are already aware of, when you want failover.
It is not possible to serialize the future and let it jump to a second server instance. The thread is executed within a certain instance and therefore has to stay there. So session storage is no option.
As in the example link you could use a token service. This is normally just a HashMap where you can store your object and access it later again via the token (the String identifier). But again, this works only within the same web application, when the tokenService is a singleton.
The solution is not to save the future, but instead the state of the work (in work, finished, failed with result). Even when the querying session and the executing threads are on different machines, the state should be accessible and serialize able. But how would you do that? This could be implemented by storing it in a database or on the file system (the example above you could check if the zip file is available) or in a key/value store or in a cache or in a common object store (Terracota), ...
In fact, every batch framework (Spring Batch for example) works this way. It stores the current state of the jobs in the database. You are concerned that you mix domain data with operation data. But most applications do. On large applications there is the possibility to use two database instances, operational data and domain data.
So I recommend that you save the state and the result of the work in a database.
Hope that helps.
I would like to know the best way to deal with long running processes started on demand from an ASP.NET webpage.
The process may consist of various steps (like upload files to the server, run SSIS packages on them, execute some stored procedures etc.) and sometimes the process could take up to couple of hours to finish.
If I go for asynchronous execution using a WCF service, then what happens if the user closes the browser while the process is running, how the process success or failure result should be displayed to the user? To solve this, I choose one-way WCF service calls, but the problem with this is I need to create a process table and store the result (and error messages if it fails in any of the steps and which steps have completed successfully) in that table which is an additional overhead because there are many such processes with various steps that the user can invoke from the web page and user needs to be made aware of the progress (in simplest case, the status can be "process xyz running") and once it is done, the output needs to be displayed to the user (for example by running a stored procedure).
What is the best way to design the solution for this?
As I see it, you have three options
Have a long running page where the user waits for the response. If this is several hours, you're going to have many usability problems, so I wouldn't even consider it.
Create a process table to store the results of operations. Run service functions asynchronously and delegate logging the results to the service. There can be a page that the user refreshes which gets the latest results of this table.
If you really don't want to create a table, then store all the current process details in the users' session state, and have a current processes page as above. You have the possible issue that the session might timeout, or the web app might restart and you'll lose all this.
I can't see that number 2 is such a great hardship. You could make the table fairly generic to encompass all types of processes: process details could just be encoded as binary or xml and interpreted by the web application. You then have the most robust solution.
I cant say what the best way would be but using Windows Workflow Foundation for such long running processes is definitely one way to go about it.
You can do tracking of the process to see what stage it is at, even persist it if you have steps where it is awaiting user input etc.
WF provides a lot of features out of the box (especially if your storage medium is SQL Server) and may be a good option to consider.
http://www.codeproject.com/KB/WF/WF4Extensions.aspx might help give you some more insight into the same.
I think you are in the right track. You should run the process asynchronously, store the execution somewhere (a table), and keep status of the running process in there.
Your user should see a pending display label while the process is executing, and a finished label with the result when the process finished. If the user closed the browser, she will see the result of her running process next time she logs in.
I know that similar questions have been asked all over the place, but I'm having trouble finding one that relates directly to what I'm after.
I have a website where a user uploads a data file, then that file is transformed and imported into SQL. The file could be up to 50mb in size, and some times this process can take 30 minutes or sometimes even longer.
I realise I need to palm off the actual work to another process, and poll that process on the web page. I'm wondering what the best approach would be though? Being a web developer by trade, I'm finding all this new Windows Service stuff a bit confusing, and I just wanted somewhere to start.
So:
Can I do / should I being doing this with a windows service? if so, how?
Should I use WCF? If this runs under IIS, will I have problems with aspnet_wp.exe recycling and timing out my process?
clarifications
The data is imported into sql, there's no file distribution taking place.
If there is a failure, it absolutely MUST be reported to the user. The web page will poll every, lets say, 5 seconds, from the time the async task begins, to get the 'status' of the import. Once it's finished another response will tell the page to stop polling for status updates.
queries on final decision
ok, so as I thought, it seems that a windows service is the best idea. So as to HOW to get it to work, it seems the 'put the file there and wait for the service to pick it up' idea is the generally accepted way, is there a way I can start a process run by the service, without it having to constantly be checking a database table / folder? As I said earlier, I don't have any experience with Windows Services - I wondered if I put a public method in the service, can I call it somehow?
well ...
var thread = new Thread(() => {
// your action
});
thread.Start();
but you will have problems with that:
what if the import to sql fails? should there be any response to the client
if it fails, how do you ensure the file on a later request
what if the applications shuts down ... this newly created and started thread will be killed either
...
it's not always a good idea to store everything in sql (especially files...). if you want to make the file available to several servers why not distribute them via ftp ...?
i believe that your whole concept is a bit messed up (sry assuming this), and it might be helpful if you elaborate and give us more information about your intentions!
edit:
Can I do / should I being doing this
with a windows service? if so, how?
you can :) i advise you to create a simple console-program and convert this with srvany and sc. you can get a rough overview howto here (note: insert blanks after =... that's a silly pitfall)
the term should is relative, because you did not answer the most important question
what if a record is persisted to the database, telling a consumer that file test.img should be persisted, but your service hasn't captured it or did not transform it yet?
so ... next on
Should I use WCF? If this runs under IIS, will I have problems with aspnet_wp.exe recycling and timing out my process?
you probably could create a WCF-service which recieves some binary-data and then stores this to a database. this request could be async. yes. but what for?
once again:
please give us more insight to your workflow: what are you exactly trying to achieve? which "environmental-conditions" to you have (eg. app A polls db and expects file-records which are referenced in table x to be persisted) ...
edit:
so you want to import a .csv-file. well that changes everything :)
but i won't advise you to use a wcf-service (there could be a usage: eg. a wcf-service which has a method to insert a single row, then your iteration through the file would be implemented in another app... not that good, though).
i would suggest following:
at first do everything in your webapp (as you've already done), but rather use some sort of bulk-insert and do your transformation/logic on the database.
if you have some sort of bottle-neck then, i would suggest you something like a minor job-service, eg:
webapp will upload the file and insert a row to a job-table. the job-service is continiously polling the table/or gets informed via wcf by the webapp (hey, hey, finally some sort of usage for WCF in your scenario... :) ) and then does the import-job, writing a finish-note to a table/or set the state of the job to finished ...
but this is a bit overkill :)
Please see if my below comments helps you to resolve your issue:
•Can I do / should I being doing this with a windows service? if so, how?
Yes you can do this with a windows service. And I think that is the way you should be doing it. You can implement your own service to process your request or you can use the open source code Job Proccessor
Basically the idea is..
You submit a request for processing
the csv file in database table with
some status as not started.
Then your windows service picks up
the request from database table which
are not started and update them as in
progress status.
Once the processing is complete
succesfully /unsuccesfuly your
service updated the database table
with status as Completed / Failed.
And your asp.net page can poll to
database table for the current status
every 5 sec or so.
•Should I use WCF? If this runs under IIS, will I have problems with aspnet_wp.exe recycling and timing out my process?
you should not be using WCF for this purpose.
I am thinking on the following approach but not sure if its the best way out:
step1 (server side): A TaskMangaer class creates a new thread and start a task.
step2 (server side): Store taskManager object reference into the cache for future reference.
step3 (client side): Use periodic Ajax call to check the status of the task.
Basically the intention is to have a framework to run a background task (5mins approx) and provide regular feedback on the web UI for the percentage of task completed.
Is there a neat way around this or any existing asp.net API that will be helpful ?
Edit 1#: I want to run the task in-proc with the app.
Edit 2#: Looks like badge implementation on stack overflow is also using the cache to track background task. https://blog.stackoverflow.com/2008/07/easy-background-tasks-in-aspnet/
I think the problem with storing the result in the cache is that ASP.NET might scavenge that cache entry for other purposes (ie if its short on memory, if its grumpy, etc). Something that is served from the cache should be something you can recreate on demand if its not found in the cache, the ASP.NET runtime is free to dump cache entries whenever it feels like it.
The usage of the cache in the badge discussion seems fundamentally different, in that case the task was shortlived. The cache was just being used as a hacky timer to fire off the task periodically.
Can you confirm this is a task that is going to take 5 minutes, and require its own thread that whole time? This is a performance concern in itself, you will only be able to support a limited number of such requests if each requires its own thread for so long. Only if thats acceptable would I let the task camp a thread for so long.
If its ok for these tasks to camp a thread, then I'd just go ahead and store the result in a dictionary global to the process. The key of the dictionary would correlate to the client request / AJAX callback series. The key should incorporate the user ID as well if security is at all important.
If you need to scale up to many users, then I think you need to break the task down into asynchronous steps, and in that case I'd probably use a DB table to store the results (again keyed per request / user).
Microsoft Message Queuing was built for scenarios like the one you try to solve:
http://www.microsoft.com/windowsserver2003/technologies/msmq/default.mspx
Windows Communicatio Foundation also has message queuing support.
Hope this helps.
Thomas
One approach for doing this is to use application state. When you spawn a worker thread, pass it a request ID that you generate, and return this to the client. The client will then pass that request ID back to the server in its AJAX calls. The server will then fetch the status using the request ID from application state. (The worker thread would be updating the application state based on its status).
I saw an approach to a similar problem somewhere. The solution was something like:
Start the background task on server.Return immediately with a url to the result.
Until the result is posted, this url will return 404.
The client checks periodically for this url.
The client reads the results when
they are finally posted.
The url will be something like http://mysite/myresults/cffc6c30-d1c2-11dd-ad8b-0800200c9a66.
The best document format is probably JSON.
If feedback on progress is important, modify the document to also contain status (inprogress/finish) and progress (42 %).