I have an orchestration which polls data from a database (which is actually used by an ERP, so i am not able to manipulate data in this database), Once the polling port finds matching data it executes the orchestration and sends data to a third party web service.
The logic used in this orchestration is complicated and often prone to change, and so it's important to cover it with proper set of tests. I am thinking about this for a while and even thought of using 3 different components so that,
First part (can be only 2 ports) reads the data from the database and put into a folder
Second one (current orchestration) uses a file port to read data and dumped by the first component and it dumps the resultant file to another folder
Third component reads the file dumped by the second component and send it to the web service
However I have few concerns,
Is this a frowned upon practice, when it comes to the BizTalk? Or is it a normal way to do things?
The performance - would it be significant slower compared to the current solution?
We are currently using the one of the server to run the tests / do the build using BTDF and Jenkins. Is there a way to disable the components 1 and 3, run the tests and re-enable them once build is completed so that it can function normally?
You can avoid the overhead of writing to and reading from files by using the built-in functionality of the MessageBox. The first place to start is here: https://msdn.microsoft.com/en-us/library/aa949234.aspx
There is an excellent Biztalk sample which shows how you can use this approach to modularise your functionality into a set of orchestrations which independently read from and write to the MessageBox. It's referenced at the bottom of the previous page and is called "Direct Binding to the MessageBox Database in Orchestrations".
I'd recommend against this approach. You'd be better off making the three orchestrations direct bound to the MessageBox and subscribe to the messages published by the previous orchestration. You could also create send ports that subscribe to these messages, or just use the management console to debug the messages.
You can also write unit tests for your various tasks. If you're doing some work in a .NET helper library, you can have a plain old unit tests project. You might also want to look into the BizUnit framework (https://bizunit.codeplex.com/) - it takes a little doing to get used to but it's a great resource for writing BizTalk unit tests.
Related
Requirement: I need to create a background worker/task that will get data from an external source ( message queue) at certain intervals ( i.e. 10s) and update a database. Need to run non stop 24hrs. An ASP.NET application is placing the data to the message queue.
Possible solutions:
Windows service with timer
Pros: Takes load away from web server
Cons: Separate deployment overhead, Not load balanced
Use one of the methods described here : background task
Pros: No separation deployment required, Can be load balanced - if one server goes down another can pick it up
Cons: Overhead on web server (however, in my case with max 100 concurrent users and seeing the web server resources are under-utilized, I do not think it will be an issue)
Question: What would be a recommended solution and why?
I am looking for a .net based solution.
You shouldn't go with the second option unless there's a really good reason for it. Decoupling your background jobs from your web application brings a number of advantages:
Scalability - It's up to you where to deploy the service. It can share the same server with the web application or you can easily move it to a different server if you see the load going up.
Robustness - If there's a critical bug in either the web application or the service this won't bring the other component down.
Maintanance - Yes, there's a slight overhead as you will have to adjust your deployment process but it's as simple as copying all binaries from the output folder and you will have to do it once only. On the other hand, you won't have to redeploy the application thus brining it down for some time if you just need to fix a small bug in the service.
etc.
Though I recommend you to go with the first option I don't like the idea with timer. There's a much simpler and robust solution. I would implement a WCF service with MSMQ binding as it provides you with a lot of nice features out of the box:
You won't have to implement polling logic. On start up the service will connect to the queue and will sit waiting for new messages.
You can easily use transaction to process queue messages. For example, if there's something wrong with the database and you can't write to it the message which is being processed at the moment won't get lost. This will get back to the queue to be processed later.
You can deploy as many services listening to the same queue as you wish to ensure scalability and availability. WCF will make sure that the same queue message is not processed by more than one service that is if a message is being processed by service A, service B will skip it and get the next available message.
Many other features you can learn about here.
I suggest reading this article for a WCF + MSMQ service sample and see how simple it is to implement one and use the features I mentioned above. As soon as you are done with the WCF service you can easily host it in a windows service.
Hope it helps!
I’m working on a business problem which has to import files which has 1000s of records. Each record has to be registered in a Workflow as individual record which has to go through its own workflow.
WF4 Corporate Purchase Process example has a good solution, as in the first step it create bookmarks for all the required record ids. So the workflow can be resumed with rest of the actions for each individual record/id.
I would like to know how to implement same thing using Workflow services as I could get the benefits of AppFabric for my workflows.
Is there any other solutions to handle batch of records/ids? Otherwise workflow service has to be called 1000s of times just to register every record in a workflow instance which is a not a good solution.
I would like to know how to implement same thing using Workflow services as I could get the benefits of AppFabric for my workflows.
This is pretty straight forward. You're going to have one workflow that reads the file and loops through the results using the looping activities that exist. Then, inside the loop you'll be starting up the workflow that each record needs (the "Service") by calling the endpoint with a Send activity.
Now, as for the workflow that is the Service, you're going to have a Receive activity at the top of the workflow that also has CanCreateInstance set the true. The everything after the Receive is no different than any other workflow. You may consider having a Send activity right after the Receive just to let the caller know that the Service has been started. But that's not a requirement -- the Receive will be required because it forces WF to build the workflow to use the WorkflowServiceHost.
Is there any other solutions to handle batch of records/ids? Otherwise workflow service has to be called 1000s of times just to register every record in a workflow instance which is a not a good solution.
Are you indicating that a for a web server to receive 1000's of requests is not a good solution? Consider the fact that an IIS server can handle roughly 25-50 requests, per instant in time, per core. Now consider the fact that you're loop that's loading the workflows isn't going to average more than maybe 5 in that instant of time but probably more like 1 or 2.
I don't think the web server is going to be your issue. I've started up literally 10,000's of workflows on a server via a loop just like the one you're going to build and it didn't break a sweat.
One way would be to use WCF's MSMQ binding to launch your workflows. Requests can come in normally through HTTP, and WCF would route them to MSMQ and process the load. You can throttle how many workflow instances are used through the MSMQ binding + IIS settings.
Download this word document that describes setting up a workflow application with WCF and MSMQ: http://www.microsoft.com/en-us/download/details.aspx?id=21245
In the spirit of the doing the simplest thing that could work, you can bring the subworkflow in as an activity to the main workflow and use a parallel for each to execute the branch for each input from your file. No extra invoking is required and the tooling supports this out of the box because all workflows are activities. Hosting the main process in a service so you can avoid contention with the rest of your IIS users, real people that they may are, might be a good idea.
I do agree that calling IIS or a WCF service 1000's of times is not a problem though, unless you want to do it in a few seconds!
It is important to remember that one of the good things about workflow is that it has fairly low overhead (compared to other workflow products) so you should be more concerned about what your workflow does than just the idea of launching lots of instances. The idea of batches like your example is very common.
We have an existing, proprietary data processing application that runs on one of our servers and we wish to expose it as a web service for our clients to submit jobs remotely. In essence, the system takes a set of configuration parameters and one or more data files (the number of files depends on the particular configuration template, but the normal config is 2 files). The application then takes the input files, processes them, and outputs a single result data file (all files are delimited text / CSV or tab).
We want to now expose this process as a service. Based on our current setup and existing platforms, we are fairly confident that we want to go with WCF 4.0 as the framework and likely REST for the service format, though a SOAP implementation may be required at some point.
Although I am doing a lot of reading on SOA, WCF and REST, I am interested in other thoughts on how to model this service. In particular, the one-to-many relationship of job to required files for input. It seems pretty trivial to model a "job" in REST with the standard CRUD commands. However, the predefined "job type" parameter defines the number of files that must be included. A job type of "A" might call for two input files, while "B" requires 3 before the job can run.
Given that, what is best way to model the job? Do I include the multiple files in the initial creation of the job? Do I create a job and then have an "addFile" method where by I can then upload the necessary number of files?
The jobs will then have to run asynchronously because they can take time. Once complete, is it best to then just have a status field in the job object and require the client to regularly query the system for job status, or perhaps have the client provide a URL to "ping" when the job is complete?
We are only in the planning stages for the service, so any insights would be appreciated.
To model it for REST, think of resources. Are the files part of the job resource or are they seperate resources.
If they are seperate resources then I would have a method to upload them seperately. How they link is up to you - so you could have a way to associate a file to a job when you upload the file or do you have a way to create links (now treating links as individual resources too) between existing files and jobs.
If you files are not seen as seperate resources then I would have them inline with job, as a single create.
We have an ASP.Net application that provides administrators to work with and perform operations on large sets of records. For example, we have a "Polish Data" task that an administrator can perform to clean up data for a record (e.g. reformat phone numbers, social security numbers, etc.) When performed on a small number of records, the task completes relatively quickly. However, when a user performs the task on a larger set of records, the task may take several minutes or longer to complete. So, we want to implement these kinds of tasks using some kind of asynchronous pattern. For example, we want to be able to launch the task, and then use AJAX polling to provide a progress bar and status information.
I have been looking into using the BackgroundWorker class, but I have read some things online that make me pause. I would love to get some additional advice on this.
For example, I understand that the BackgroundWorker will actually use the thread pool from the current application. In my case, the application is an ASP.Net web site. I have read that this can be a problem because when the application recycles, the background workers will be terminated. Some of the jobs I mentioned above may take 3 minutes, but others may take a few hours.
Also, we may have several hundred administrators all performing similar operations during the day. Will the ASP.Net application thread pool be able to handle all of these background jobs efficiently while still performing it's normal request processing?
So, I am trying to determine if using the BackgroundWorker class and approach is right for our needs. Should I be looking at an alternative approach?
Thanks and sorry for such a long post!
Kevin
In your case it actually sounds like the solution you will be looking for is multifaceted (and not a simple in and done project).
Since you said that some processes can last for hours that is absolutely not something for ASP.NET to own. This should be ran inside a windows service and managed with native windows threading.
You will need to implement some type of work queue in your service and a way to communicate with the queue. One way is to expose a WCF service for all actions your service will govern. Another would be to have service poll a database table and pick up work from the table.
To be able express the status of the process you will want the ASP.NET application to be able to have some reference to the processID for example the WCF service returns a guid identifier. Then you have a method that when you give it the processID it will return the status of the process. You can then implement the polling of that service call using AJAX and display any type of modal you wish.
Another thing to remember is that you need to design your processes to have knowledge of where it is and where it will be when it is finished so it can track the state it's in. For example, BatchJobA is run and will have 1000 records to process. The service needs to know what record it's on or what the current % of competition is for it to be able to return information to the UI. For sql queries that take a very long time to execute this can be very problematic to accurately gauge where it is unless you do alot of pre and post processing of temp tables that you can in the middle of it read the status of the temp tables to understand where it is.
Based on what you are saying I think that BackgroundWorker is not a good choice.
Furthermore keeping this functionality as a part of your main app can be problematic, specifically because you do not want the submitted processing to be interrupted if the main app recycles. You can play with asynch processing but it still will be a part of the main app AppDomain - all of it will die if the app recycles.
I would suggest buidling a separate app implementing this functionality. In a similar situation I separated background processing to a Windows service and hosted a web service in it as a means of communication
You might consider a slightly different approach.
For example, have a command and control table in which you send commands like "REFORMAT PHONE NUMBERS" or whatever.
Then have a windows service monitoring that table. Whenever a record shows up, run the command.
This eliminates any sort of worry about a background thread. Further you have a bit more flexibility with regards to what's in the queue, order of operations including priority, etc. Finally, you would have a definitive list of what is running or needs to run.
As an option, instead of a windows service you might just use a SQL job to execute every so often to watch your control table and perform the requested action.
I am not sure if I ask the right question, but this is the scenario I am trying to run:
Multiple files (XML and a few related files, "attachments") have to get into BizTalk as a single message. I have looked into existing adapters, and don't see that done with existing once. To be more accurate, files are taken from file system. Files are not found at the same time, but arrive one at a time, when order is not ensured. XML (content) file is the one that knows what attachments it has to have (what other files).
We are looking into BizTalk 2009 and I was wondering would be that responsibility of a custom Adaptor, or something else. And were I could look for samples.
Thanks.
It is probably possible to do what you want using a custom adapter, though I'd recommend against it. You can achieve what you require using orchestration.
What you are looking for is likey a convoy, or at the least some use of correlation.
In BizTalk a convoy is a messaging pattern (as opposed a BizTalk feature) that allows groups of messages to be processed by a single orchestration.
You essentially use correlation on a receive port to group messages together in either a parallel (what you probably want) or sequential fashion.
There is an article [here](http://msdn.microsoft.com/en-us/library/ms942189(BTS.10\).aspx) by Stephen W. Thomas about convoys (it is for BT 2004 but the concepts still hold) and there is a lot of additional information on the web and in books (Professional BizTalk server 2006 has a subsection on them)
Without more details on your scenario it is hard to know exactly how the convoy would be built but below are two approaches to look at (also, I've not had a chance to properly use BT2009 so there may be extended support for correlation scenarios that help you out).
Flexible Correlation
If you don't know anything about the files listed in the context XML you will probably need a pattern like the one described by Charles Young in this post.
Non-uniform sequential convoy
If you do have a little bit of info before hand one way might be as follows (basically a Non-uniform sequential convoy):
This makes the assumption that there is some way of linking all the files together so you can correlate them.
Create a single orchestration that subscribes to you inbound receive port (which contains the file receive location).
This orchestration will have a single activation receive shape that is set up for your content file.
Once the orchestration is started by a content file a second correlated receive shape starts picking up the messages that match that content file. (this second receive could possible be in a loop to allow for variable numbers of files)
You then pack them all together into a single outbound file of your design and send them out once the full number of files has been received.
Seems to me a better approach would be to implement the above requirements with a combination of a custom pipeline component and/or a custom adapter. I assume you do not really need to manipulate the incoming files - except for the content XML file - or that you couldn't since they are in binary format. This calls for a custom pipeline component.
What you can do is develop a custom BizTalk adapter to interact with the file system and to implement the listening and looping logic. Next you can develop a custom pipeline component to create a single BizTalk message perhaps with base64 data type in it for binary data. Additionally you could also promote messages right in this component to enable orchestration subscriptions.
Orchestrations are more suited for implementing business work-flow scenarios where the messages are already in XML format. This do not appear to be the case. In any case I think at the very least a custom pipeline component would be needed.
David's answer is the correct answer.
Even in cases where you don't know absolutely nothing about the contents of the expected attachments, surely you know their names and locations. Therefore you can use the Flexible Correlation linked to in david's answer like this:
The key to the solution is to correlate on the builtin BTS.ReceivedFileName property.
First, create a custom receive pipeline, with a custom pipeline component that promotes the BTS.ReceivedFileName context property of the received messages. This simple custom component is fairly easy to write but you can make it straightforward by using third-party frameworks such as, (shameless plug, here) my PipelineComponentBase class or the excellent BizTalk Server Pipeline Component Wizard.
Now for the easy part:
Attachments are received in a specific location, designated by its path on the filesystem.
Create a receive location that listens to an alternate location, used only to control when files are actually swallowed by BizTalk.
In your orchestration, create a correlation type with the BTS.ReceivedFileName property and a correlation set base on this correlation type.
When you want to receive binary attachments, send a dummy message with the BTS.ReceivedFileName context property set to the filename of the binary attachment but with the path matching the alternate location ; the one used by the receive location. Initialize the correlation on the send shape.
Use an expression shape to copy the binary file from its original location to the one used by the receive location.
Finally, use a receive shape bound to the receive port that contains the receive location whose custom receive pipeline will promote the BTS.ReceivedFileName property.
Notice that you actually need to send a message in order to initialize the correlation. It does not matter what message you send actually. What I'd do is send the message through a send pipeline that contains an empty pipeline component. That is a pipeline component that reads the message but return null (so that the message vanishes into thin air before it reaches the adapter). A more elaborate solution would be to use a null adapter. That is an adapter that reads the message but does not do anything about it.
These two solutions avoid having many files accumulate in a temporary location somewhere, just for the sake of initializing a correlation!