EDI Receive Pipeline performance issue

EDI Receive Pipeline performance issue - biztalk

I have a file receive location with edireceive pipeline configed to receive incoming HIPPA 5010 837 files.
The normal incoming file size is 4 to 6 megabytes, contains 3K to 5K records. The 837 schema deployed is the "multiple" version which have the subdocument_break="yes". So the file been processed will generate 3K to 5K messages per file.
The pipeline works fine and can split the file into multiple messages as expected. for 1 single file, BizTalk takes less than 5 mins to process.
The problem is when more than 10 files was put to the incoming folder at same time, Biztalk will start process these files parallel. But it will take hours to process these files and the BizTalk Host consumes more than 10G memory.
Some other info:
The BizTalk host is a dedicated 64bit receive host
No file lock by other applications found
Batching setting in file adapter is Num of Msgs in a batch = 1; Max batch size = 10240000
Rename file while reading is checked.
My question is: Is this performance normal? how can I improve it?

You are correct, 5K message is not really the issue, it 5 batchs of 5K message at the same time that's causing the problem.
To serialize the debatching you can use an Ordered Delivery Two Way Send Port with an Loopback Adapter which debatches the EDI on the Receive Side. In this case, the initial Receive Location would be a PassThrough.
You can find several Loopback Adapters here: http://social.technet.microsoft.com/wiki/contents/articles/12824.biztalk-server-list-of-custom-adapters.aspx#jjj

BizTalk isn't really made to process multiple large files at once, and the file adapter doesn't have any built in way to limit how many files it will pull at once.
There's a commercial solution available to help handle this (disclosure: I work for Tallan and work on this solution) called the T-Connect EDI Splitter (https://www.tallan.com/products/t-connect/edi-file-splitter/). The use case is splitting the files on a pipeline into more manageable chunks to be consumed elsewhere. This is not a trivial task, unfortunately.
If your files are small enough to process without splitting them before they hit the EDI recieve pipeline (you don't need to split them further, you just need to process them one at a time), you'll have to come up with a more complicated messaging flow to deal with that - receive them using PassThrough transmit, send them somewhere that can just consume them, then poll them using a second receive location that offers more precise control of polling.
You could also just write your own adapter that offers polling and interval settings, but that's much more complicated and messy.

Related

BizTalk Server: maximum number of receive locations per host

I have more than 900 receive locations associated with the same host.
All receive locations are enabled but sometimes some of them are not working (and are still enabled).
When I disabled and re-enabled it, the receive location works but another one is going into trouble.
Are there any known limitations of the number of receive locations that can be associated with the same host in BizTalk 2016?

I don't know if there is a limitation number, but if you associate all the receive locations to the same Host, problably your problems are due to the Throttling mechanism.

While there are no hard limits to Receive Locations or Send Ports, there are still practical limits based on available resources.
900 is a lot for a single Host. Even if everything was running perfectly, I would still break that up across ~3 Hosts.
If these are File Receive Locations, there are other techniques to reduce the amount even more. Some options:
Use a Windows Scheduler task to move files from various locations to fewer, or maybe one location. If 'source' information is necessary, you can add a tag to the file name which can be extracted in a custom Pipeline Component.
Modify the sample File Adapter in the SDK to scan sub-folders as well. You can combine this with option 1 if you cannot modify the filename for some reason.
Similar to option 1, the script can write a meta-data file before moving the file with any data you need to preserve. The meta-data can then be read in a Pipeline Component.

How to make a UNIX socket faster?

I'm running a Google Cloud Compute VM as my application server for an app that's available on iOS and Android. The server runs Django within uWSGI, fronted with nginx. The communication between uWSGI and nginx happens through a unix file socket.
Recently I started noticing timeouts at client end. I did a bit of experimentation, and found that uWSGI sometimes errors out while writing data to the file socket. When I increase the 'max-time' parameter at the client end, it goes through smoothly. For example, a sample request that returns about 200KB of json data, takes about 1 sec for Django to compute. But the UNIX socket seems to take another 1-2 secs, which seems too high for a 200KB response. If the client is expecting a response within 2 secs, this often leads to a write error (as shown in the screenshot below) at uWSGI. When I increase the timeout at the client end, it goes through smoothly.
I want to know if there are some configuration changes that can make reading and writing on a UNIX socket faster. 200KB is a very minor size for a JSON response from my server - so I won't be able to bring it down. And I can't have a timeout of more than 2 secs at my client (iOS or Android), for business reasons.

Several unix entities are represented by files but are no file at all. Pipes and sockets are examples of entities represented by files that are not files.
So, writing, and reading from a unix socket is not bound to file system I/O and does not share file system time responses. In fact, unix socket is one of fastest ways of IPC, being more efficient than a TCP socket, since it does not use network I/O at all.
That stated, here is some hints on how to solve your particular problem:
Evaluate your app for performance issues. Profile it and check where it might be spending too much time. Usually, I/O is the main villain on performance issues. Also, bad algorithms, linear searches on long lists are also common guilties.
Check your configuration on both web server and your application gateway.
Check processes scheduling. If everybody is running on the same box, process concurrency may be an issue for heavy loads. Be sure to have all processes running under proper priorities.
Good luck!

BizTalk 2010 Determine Host Throttle settings for receive locations

Because a required pipeline components seems to have trouble hitting a database to details of messages, I am planning to use Host Throttling to limit the amount of files BizTalk is processing at the receive location. I want to be able to indicate that X number of messages should be processed within Y seconds (or any other feasible timespan). Does anyone know which throttling settings can be used to force this behavior?
I know how to set values, however I cannot find the best configuration.
(note: one of the solutions might also adjust the pipeline, but it contains third party components which cannot be adjusted.)

From How BizTalk Server Implements Host Throttling BizTalk looks at
Amount of memory in use (both systemwide and host process memory).
Number of in-process messages being delivered or processed (threshold for outbound throttling).
Number of threads in use. Database size, measured by the number of items in the queue tables for all hosts and the number of items in the spool and tracking tables.
Number of concurrent database connections.
Rate of message publishing (inbound)
and delivery or processing (outbound).
The only one that throttles inbound is the Rate of message publishing, but that is possible after the pipeline/port has processed the message so may not be of any use in this scenario, but you would have to test that.
You will probably want to set up that process under it's own host so if it hits throttling thresholds it does not throttle everything else as well.
If possible you should move the component to a send port pipeline as throttling send ports is much more controllable. One way is to set the send port to ordered delivery, although that can cause a backlog especially if you get a suspended message.

I think your most straightforward approach here would be to write a custom adapter. Unfortunately, the out of the box File Adapter does not directly support throttling/polling intervals, and I don't think the suggestions given already would not directly impact the custom pipeline processing if it's directly hitting the DB through ADO.NET (but it couldn't hurt to try). You can set the BatchSize property on the file adapter settings, but even then there's nothing stopping it from submitting that batchsize as fast as it possibly can over and over again.
A custom adapter could be created to wait for a period before submitting additional files for processing. You could base it on the SDK File Adapter sample.

Reduce the BizTalk receive location file input speed

We have a BizTalk 2010 receive location, which will get a 70MB file and then using inbound map (in receive location) and outbound map (in send port) to produce a 1GB file.
While performing the above process, a lot of disk I/O resource is consumed in SQL Server. Another receive location processes performance are highly affected.
We have tried reduce the maximum disk I/O threads in host instance of that receive location, but it still consumes a lot of disk I/O resource in SQL Server.
In fact this process priority is very very low. Is there any method to reduce the disk I/O resource usage of this process such that other processes performance can be normal?

This issue isn't related to the speed of the file input, but, as you mentioned in a comment, to the load this places on the messagebox when trying to persist the 1gb map output to the MessageBox. You have a few options here to try to minimize the impact this will have on other processes:
Adjust the throttling settings on the newly created host to something very low. This may or may not work the way you want it to though.
Set a service window on the recieve location for these files so that they only run during off hours. This would be ideal if you don't have 24/7 demand on the MessageBox and can afford to have slow response time in the middle of the night (say 2-3am)
If your requirements can handle this, don't map the file in the recieve port, but instead route it to an Orchestration and/or custom pipeline component that will split it into smaller pieces and then map the smaller pieces. This should at least give you more fine grained control over the speed at which these are processed (have a delay shape in the loop that processes the pieces). There'd still possibly be issues when you joined them back together, but it shouldn't be as bad as your current process.
It also may be worth looking at your map. If there are lots of slow/processor heavy calls you might be able to refactor it.

Ideally you should debatch the file. Apply business logic including map on each individual segments and then load them into sql one at a time. Later you can use pipeline or some other .NET component to pull data from SQL and rebatch the data. Handling big xml (10 times size as compared to flat file) in BizTalk messagebox is not a very good practice.
If however it was a pure messaging scenario, you can convert file into stream and route it to destination.

Unexpected data found error during BizTalk Simultaneous Receive

I have a receive port with two FILE receive locations polling the same network share. The only difference between the receive locations is that they use a different file mask. They both use a custom pipeline with single Flat file disassembler component. I have a send port subscribing to the receive port. (this is just the minimal setup where I can reproduce the problem).
When processing a group of files (up to 1mb in size) occasionally the pipeline throws an error. This only occurs when more than one file is copied to the receive location file share at once and occurs irregularly. The error generally reads:
An error occurred when parsing the incoming document: "Unexpected data
found while looking for: '\r\n' The current definition being parsed is
GIRMFile. The stream offset where the error occured is 491540. The
line number where the error occured is 2446. The column where the
error occured is 199.".
Examining the suspended message at the line number shown, consistently 512 bytes of data is different from the incoming message. This 512 bytes of data always matches data from one of the other input files consumed at the same time! Or in a few rare cases the incorrect 512 bytes of data is data from a file consumed at the same time but after it had been processed by the pipeline (i.e. the suspended flat file has a 512 byte chunk of xml!). The 512 bytes is never in a consistent location within the suspended messages.
Thinking the BizTalk databases were corrupted in some way, I deleted them and re-configured. The problem returned after a few hundred files were processed successfully.
This only occurs on our test box (a VMWare vm) so I suspect the machine is the problem in some way. But it seems odd that the machine isn't reporting other errors in other processes.

Interesting - I recall seeing similar things in BizTalk 2004 but haven't seen anything like that with BT2006.
It sounds like the pipeline may be running into threading issues - perhaps due to receiving the files from the same file location.
Have you tried any of the advanced file receive location properties?
I'm thinking in particular the 'Rename files while reading' checkbox. Perhaps if the issue is with non-threadsafe stream reads, this process of creating a renamed file (which I think just uses standard IO libraries) will allow BizTalk to get a clean stream.
Only guessing though - please report back if you find a solution!

This only occurs on our test box (a VMWare vm)
If you do not succeed in reproducing this on another machine with the same configuration, I'd mark this off as a non-issue, or external. Agree with the aforementioned that concurrency problems are highly unlikely

I have to say I find this very strange, I would find it very hard to believe that 5 years into BizTalk (counting from 2004 :-)), the FILE adapter and the standard disassemblers have threading issues.,
Are the files coming into the pick up location over the network? what file masks are you using? is there a chance that one of the receive location is picking up the files before their transfer is complete?

You said the receive location network is a network share - perhaps it's a network problem? Can you reproduce this on local drive?

A few more thoughts... is the share a DFS share? Can you put the receive locations on different hosts and see what happens then?

We have similar issues with programs running on VMWare VMs accessing shares. For some reason files will appear to be corrupt.
This was not BizTalk related, it was happening with an in house developed application.
Rebooting the VM fixes our issue for a while. In our case we were able to reconfigured our process to not use shares. We never did persue finding a solution to the real problem.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex