Ideas on patterns for a data file based WCF REST web service - asp.net

We have an existing, proprietary data processing application that runs on one of our servers and we wish to expose it as a web service for our clients to submit jobs remotely. In essence, the system takes a set of configuration parameters and one or more data files (the number of files depends on the particular configuration template, but the normal config is 2 files). The application then takes the input files, processes them, and outputs a single result data file (all files are delimited text / CSV or tab).
We want to now expose this process as a service. Based on our current setup and existing platforms, we are fairly confident that we want to go with WCF 4.0 as the framework and likely REST for the service format, though a SOAP implementation may be required at some point.
Although I am doing a lot of reading on SOA, WCF and REST, I am interested in other thoughts on how to model this service. In particular, the one-to-many relationship of job to required files for input. It seems pretty trivial to model a "job" in REST with the standard CRUD commands. However, the predefined "job type" parameter defines the number of files that must be included. A job type of "A" might call for two input files, while "B" requires 3 before the job can run.
Given that, what is best way to model the job? Do I include the multiple files in the initial creation of the job? Do I create a job and then have an "addFile" method where by I can then upload the necessary number of files?
The jobs will then have to run asynchronously because they can take time. Once complete, is it best to then just have a status field in the job object and require the client to regularly query the system for job status, or perhaps have the client provide a URL to "ping" when the job is complete?
We are only in the planning stages for the service, so any insights would be appreciated.

To model it for REST, think of resources. Are the files part of the job resource or are they seperate resources.
If they are seperate resources then I would have a method to upload them seperately. How they link is up to you - so you could have a way to associate a file to a job when you upload the file or do you have a way to create links (now treating links as individual resources too) between existing files and jobs.
If you files are not seen as seperate resources then I would have them inline with job, as a single create.

Related

Automated testing an Orchestration

I have an orchestration which polls data from a database (which is actually used by an ERP, so i am not able to manipulate data in this database), Once the polling port finds matching data it executes the orchestration and sends data to a third party web service.
The logic used in this orchestration is complicated and often prone to change, and so it's important to cover it with proper set of tests. I am thinking about this for a while and even thought of using 3 different components so that,
First part (can be only 2 ports) reads the data from the database and put into a folder
Second one (current orchestration) uses a file port to read data and dumped by the first component and it dumps the resultant file to another folder
Third component reads the file dumped by the second component and send it to the web service
However I have few concerns,
Is this a frowned upon practice, when it comes to the BizTalk? Or is it a normal way to do things?
The performance - would it be significant slower compared to the current solution?
We are currently using the one of the server to run the tests / do the build using BTDF and Jenkins. Is there a way to disable the components 1 and 3, run the tests and re-enable them once build is completed so that it can function normally?
You can avoid the overhead of writing to and reading from files by using the built-in functionality of the MessageBox. The first place to start is here: https://msdn.microsoft.com/en-us/library/aa949234.aspx
There is an excellent Biztalk sample which shows how you can use this approach to modularise your functionality into a set of orchestrations which independently read from and write to the MessageBox. It's referenced at the bottom of the previous page and is called "Direct Binding to the MessageBox Database in Orchestrations".
I'd recommend against this approach. You'd be better off making the three orchestrations direct bound to the MessageBox and subscribe to the messages published by the previous orchestration. You could also create send ports that subscribe to these messages, or just use the management console to debug the messages.
You can also write unit tests for your various tasks. If you're doing some work in a .NET helper library, you can have a plain old unit tests project. You might also want to look into the BizUnit framework (https://bizunit.codeplex.com/) - it takes a little doing to get used to but it's a great resource for writing BizTalk unit tests.

Writing large volume of web post requests to flat files (File based Queuing )

I am developing a Spring Based Web Application which will handle large volume of requests per minute and this web app needs to respond very quickly.
For this purpose, We decided to implement a flat-file based queuing mechanism, which would just write the requests (set of database columns values) to flat files and another process would pick this data from flat files periodically and write it to the database. I pick up only those files that am done writing with.
As am using a flat file, For each request I receive, I need to open and close the flat file inside my controller method.
My Question is : Is there a better way to implement this solution ? JMS is out of scope as we don't have the infrastructure right now.
If this file based approach seems good, then is there a better way to reduce the file I/O ? With the current design, I open/write/close the flat file for each web request received, which I know is bad. :(
Env : SpringSource ToolSuite, Apache/Tomcat with back-end as Oracle.
File access has to be synchronized, otherwise you'll corrupt it. Synchronized access clashes with the large volume of requests you plan.
Take a look at things like Kestrel or just go with a database like SQLite (at least you can delegate the synchronization burden)

Handle ActionResults as cachable, "static content" in ASP.NET MVC (4)

I have a couple of ActionMethods that returns content from the database that is not changing very often (eg.: a polygon list of available ZIP-Areas, returned as json; changes twice per year).
I know, there is the [OutputCache(...)] Attribute, but this has some disadvantages (a long time client-side caching is not good; if the server/iis/process gets restartet the server-side cache also stopps)
What i want is, that MVC stores the result in the file system, calculates the hash, and if the hash hasn't changed - it returns a HTTP Status Code 304 --> like it is done with images by default.
Does anybody know a solution for that?
I think it's a bad idea to try to cache data on the file system because:
It is not going to be much faster to read your data from file system than getting it from database, even if you have it already in the json format.
You are going to add a lot of logic to calculate and compare the hash. Also to read data from a file. It means new bugs, more complexity.
If I were you I would keep it as simple as possible. Store you data in the Application container. Yes, you will have to reload it every time the application starts but it should not be a problem at all as application is not supposed to be restarted often. Also consider using some distributed cache like App Fabric if you have a web farm in order not to come up with different data in the Application containers on different servers.
And one more important note. Caching means really fast access and you can't achieve it with file system or database storage this is a memory storage you should consider.

Create workflow service instances for large number of records at once

I’m working on a business problem which has to import files which has 1000s of records. Each record has to be registered in a Workflow as individual record which has to go through its own workflow.
WF4 Corporate Purchase Process example has a good solution, as in the first step it create bookmarks for all the required record ids. So the workflow can be resumed with rest of the actions for each individual record/id.
I would like to know how to implement same thing using Workflow services as I could get the benefits of AppFabric for my workflows.
Is there any other solutions to handle batch of records/ids? Otherwise workflow service has to be called 1000s of times just to register every record in a workflow instance which is a not a good solution.
I would like to know how to implement same thing using Workflow services as I could get the benefits of AppFabric for my workflows.
This is pretty straight forward. You're going to have one workflow that reads the file and loops through the results using the looping activities that exist. Then, inside the loop you'll be starting up the workflow that each record needs (the "Service") by calling the endpoint with a Send activity.
Now, as for the workflow that is the Service, you're going to have a Receive activity at the top of the workflow that also has CanCreateInstance set the true. The everything after the Receive is no different than any other workflow. You may consider having a Send activity right after the Receive just to let the caller know that the Service has been started. But that's not a requirement -- the Receive will be required because it forces WF to build the workflow to use the WorkflowServiceHost.
Is there any other solutions to handle batch of records/ids? Otherwise workflow service has to be called 1000s of times just to register every record in a workflow instance which is a not a good solution.
Are you indicating that a for a web server to receive 1000's of requests is not a good solution? Consider the fact that an IIS server can handle roughly 25-50 requests, per instant in time, per core. Now consider the fact that you're loop that's loading the workflows isn't going to average more than maybe 5 in that instant of time but probably more like 1 or 2.
I don't think the web server is going to be your issue. I've started up literally 10,000's of workflows on a server via a loop just like the one you're going to build and it didn't break a sweat.
One way would be to use WCF's MSMQ binding to launch your workflows. Requests can come in normally through HTTP, and WCF would route them to MSMQ and process the load. You can throttle how many workflow instances are used through the MSMQ binding + IIS settings.
Download this word document that describes setting up a workflow application with WCF and MSMQ: http://www.microsoft.com/en-us/download/details.aspx?id=21245
In the spirit of the doing the simplest thing that could work, you can bring the subworkflow in as an activity to the main workflow and use a parallel for each to execute the branch for each input from your file. No extra invoking is required and the tooling supports this out of the box because all workflows are activities. Hosting the main process in a service so you can avoid contention with the rest of your IIS users, real people that they may are, might be a good idea.
I do agree that calling IIS or a WCF service 1000's of times is not a problem though, unless you want to do it in a few seconds!
It is important to remember that one of the good things about workflow is that it has fairly low overhead (compared to other workflow products) so you should be more concerned about what your workflow does than just the idea of launching lots of instances. The idea of batches like your example is very common.

.Net Scenario Based Opinion

I am facing a situation where I am stuck in a very heavy traffic load and keeping the performance high at the same time. Here is my scenario, please read it and advise me with your valuable opinion.
I am going to have a three way communication between my server, client and visitor. When visitor visits my client's website, he will be detected and sent to a intermediate Rule Engine to perform some tasks and output a filtered list of different visitors on my server. On the other side, I have a client who will access those lists. Now what my initial idea was to have a Web Service at my server who will act as a Rule Engine and output resultant lists on an ASPX page. But this seems to be inefficient because there will be huge traffic coming in and the clients will continuously requesting data from those lists so it will be a performance overhead. Kindly suggest me what approach should I do to achieve this scenario so that no deadlock will happen and things work smoothly. I also considered the option for writing and fetching from XML file but its also not very good approach in my case.
NOTE: Please remember that no DB will involve initially, all work will remain outside DB.
Wow, storing data efficiently without a database will be tricky. What you can possibly consider is the following:
Store the visitor data in an object list of some sort and keep it in the application cache on the server.
Periodically flush this list (say after 100 items in the list) to a file on the server - possibly storing it in XML for ease of access (you can associate a schema with it as well to make sure you always get the same structure you need). You can perform this file-writing asynchronously as to avoid keeping the thread locked while writing the file.
The Web Service sounds like a good idea - make it feed off the XML file. Possibly consider breaking the XML file up into several files as well. You can even cache the contents of this file separately so the service feeds of the cached data for added performance benefits...

Resources