Biztalk File Adapter Duplication - biztalk

biztalk 2010 cu4, win2k8 server, no anti virus
I'm having an issue where the biztalk file adapter is picking up the exact same file twice intermittently. This happens to both receive locations that are either unc remote or local across 2 different receive locations in 2 different applications.
The receive location has all default settings. I've tried setting rename files ticked and unticked with no resolution to the issue. The file masks are of \H3OR*.txt.
The time of pickup being the 'unparsed interchanges' between the duplicates is never greater than 1 second. 2 ms is common. Looking at the unparsed interchanges of the duplicates, the context properties 'receivedfilename' is exactly the same. The occurrence of the duplication is roughly 1 in 8 files being received.
The receive location does have credentials to the unc path and it does delete files after it's done with them.
Restarting both the receive location and the biztalk host has no effect.
Let me know if you need any more info.
thanks.

Sometimes the problem lies elsewhere. Are you sure the upstream process, which creates these files, in the first place, isn't duplicating them i.e. sending the same file in quick succession?
You might test this by creating another send port, which subscribes to these files but writes them out to a folder but appends the %MessageID% to the %SourceFileName%.
If you have 2 files with the same %SourceFileName% but different %MessageID% with 1 or more seconds apart, it proves the problem is upstream.

Related

BizTalk Server: maximum number of receive locations per host

I have more than 900 receive locations associated with the same host.
All receive locations are enabled but sometimes some of them are not working (and are still enabled).
When I disabled and re-enabled it, the receive location works but another one is going into trouble.
Are there any known limitations of the number of receive locations that can be associated with the same host in BizTalk 2016?
I don't know if there is a limitation number, but if you associate all the receive locations to the same Host, problably your problems are due to the Throttling mechanism.
While there are no hard limits to Receive Locations or Send Ports, there are still practical limits based on available resources.
900 is a lot for a single Host. Even if everything was running perfectly, I would still break that up across ~3 Hosts.
If these are File Receive Locations, there are other techniques to reduce the amount even more. Some options:
Use a Windows Scheduler task to move files from various locations to fewer, or maybe one location. If 'source' information is necessary, you can add a tag to the file name which can be extracted in a custom Pipeline Component.
Modify the sample File Adapter in the SDK to scan sub-folders as well. You can combine this with option 1 if you cannot modify the filename for some reason.
Similar to option 1, the script can write a meta-data file before moving the file with any data you need to preserve. The meta-data can then be read in a Pipeline Component.

BizTalk SFTP receive port not picking files larger than 1GB

BizTalk SFTP receive port not picking files larger than 1GB(in my csae i can receive upto 5GB files). Even though it picks the file its very slow and before the whole file is dropped into the file folder the orchestration starts unzipping the zip file and throws error : cannot unzip as the file is being used by another process.Any help?
What you are seeing is not a problem with BizTalk Server or the SFTP Adapter.
This is happening because the SFTP Server is allowing the download to start before the file is completely written. This can be because the SFTP Server is not honoring a write lock or the actual source app is doing multiple open-write-close cycles while emitting the data.
So, this is actually not your problem and not a problem you can solve as the client.
The SFTP Server either needs to block the download or a temporary location/filename must be used until the file is complete.
This is not an uncommon problem, but must be fixed on the server side.

EDI Receive Pipeline performance issue

I have a file receive location with edireceive pipeline configed to receive incoming HIPPA 5010 837 files.
The normal incoming file size is 4 to 6 megabytes, contains 3K to 5K records. The 837 schema deployed is the "multiple" version which have the subdocument_break="yes". So the file been processed will generate 3K to 5K messages per file.
The pipeline works fine and can split the file into multiple messages as expected. for 1 single file, BizTalk takes less than 5 mins to process.
The problem is when more than 10 files was put to the incoming folder at same time, Biztalk will start process these files parallel. But it will take hours to process these files and the BizTalk Host consumes more than 10G memory.
Some other info:
The BizTalk host is a dedicated 64bit receive host
No file lock by other applications found
Batching setting in file adapter is Num of Msgs in a batch = 1; Max batch size = 10240000
Rename file while reading is checked.
My question is: Is this performance normal? how can I improve it?
You are correct, 5K message is not really the issue, it 5 batchs of 5K message at the same time that's causing the problem.
To serialize the debatching you can use an Ordered Delivery Two Way Send Port with an Loopback Adapter which debatches the EDI on the Receive Side. In this case, the initial Receive Location would be a PassThrough.
You can find several Loopback Adapters here: http://social.technet.microsoft.com/wiki/contents/articles/12824.biztalk-server-list-of-custom-adapters.aspx#jjj
BizTalk isn't really made to process multiple large files at once, and the file adapter doesn't have any built in way to limit how many files it will pull at once.
There's a commercial solution available to help handle this (disclosure: I work for Tallan and work on this solution) called the T-Connect EDI Splitter (https://www.tallan.com/products/t-connect/edi-file-splitter/). The use case is splitting the files on a pipeline into more manageable chunks to be consumed elsewhere. This is not a trivial task, unfortunately.
If your files are small enough to process without splitting them before they hit the EDI recieve pipeline (you don't need to split them further, you just need to process them one at a time), you'll have to come up with a more complicated messaging flow to deal with that - receive them using PassThrough transmit, send them somewhere that can just consume them, then poll them using a second receive location that offers more precise control of polling.
You could also just write your own adapter that offers polling and interval settings, but that's much more complicated and messy.

Classic file system problem - concurrent remote processing on a directory

I have an application that processes files in a directory and moves them to another directory along with the processed output. Nothing special about that. An interesting requirement was introduced:
Implement fault tolerance and processing throughput by allowing multiple remote instances to work on the same file store.
Additional considerations are that we can not assume the file system, as we support both Windows and NFS.
Of course the problems is, how do I make sure that the different instances do not try and process the same work, potentially corrupting work or reducing throughput? File locking can be problematic, especially across network shares. We can use a more sophisticated method, such as a simple database or messaging framework, (a la JMS or similar), but the entire cluster needs to be fault tolerant. We can't have one database or messaging provider because of the single point of failure that it introduces.
We've implemented a solution that uses multicast messages to self-discover processing instances and elect a supervisor who assigns work. There's a timeout in case the supervisor goes down and another election takes place. Our networking library, however, isn't very mature and the our implementation of messages is clunky.
My instincts, however, tell me that there is a simpler way.
Thoughts?
I think you can safely assume that rename operations are atomic on all network file systems that you care about. So if you arrange an amount of work to be a single file (or keyed to a single file), then have each server first list the directory containing new work, pick a piece of work, and then have it rename the file to its own server name (say, machine name or IP address). For one of the instances who concurrently perform the same operation, the rename will succeed, so they should then process the work. For the others, it will fail, so they should pick a different file from the listing they got.
For creation of new work, assume that directory creation (mkdir) is atomic, but file creation is not (for file creation, the second writer might overwrite the existing file). So if there are multiple producers of work also, create a new directory for each piece of work.

Unexpected data found error during BizTalk Simultaneous Receive

I have a receive port with two FILE receive locations polling the same network share. The only difference between the receive locations is that they use a different file mask. They both use a custom pipeline with single Flat file disassembler component. I have a send port subscribing to the receive port. (this is just the minimal setup where I can reproduce the problem).
When processing a group of files (up to 1mb in size) occasionally the pipeline throws an error. This only occurs when more than one file is copied to the receive location file share at once and occurs irregularly. The error generally reads:
An error occurred when parsing the incoming document: "Unexpected data
found while looking for: '\r\n' The current definition being parsed is
GIRMFile. The stream offset where the error occured is 491540. The
line number where the error occured is 2446. The column where the
error occured is 199.".
Examining the suspended message at the line number shown, consistently 512 bytes of data is different from the incoming message. This 512 bytes of data always matches data from one of the other input files consumed at the same time! Or in a few rare cases the incorrect 512 bytes of data is data from a file consumed at the same time but after it had been processed by the pipeline (i.e. the suspended flat file has a 512 byte chunk of xml!). The 512 bytes is never in a consistent location within the suspended messages.
Thinking the BizTalk databases were corrupted in some way, I deleted them and re-configured. The problem returned after a few hundred files were processed successfully.
This only occurs on our test box (a VMWare vm) so I suspect the machine is the problem in some way. But it seems odd that the machine isn't reporting other errors in other processes.
Interesting - I recall seeing similar things in BizTalk 2004 but haven't seen anything like that with BT2006.
It sounds like the pipeline may be running into threading issues - perhaps due to receiving the files from the same file location.
Have you tried any of the advanced file receive location properties?
I'm thinking in particular the 'Rename files while reading' checkbox. Perhaps if the issue is with non-threadsafe stream reads, this process of creating a renamed file (which I think just uses standard IO libraries) will allow BizTalk to get a clean stream.
Only guessing though - please report back if you find a solution!
This only occurs on our test box (a VMWare vm)
If you do not succeed in reproducing this on another machine with the same configuration, I'd mark this off as a non-issue, or external. Agree with the aforementioned that concurrency problems are highly unlikely
I have to say I find this very strange, I would find it very hard to believe that 5 years into BizTalk (counting from 2004 :-)), the FILE adapter and the standard disassemblers have threading issues.,
Are the files coming into the pick up location over the network? what file masks are you using? is there a chance that one of the receive location is picking up the files before their transfer is complete?
You said the receive location network is a network share - perhaps it's a network problem? Can you reproduce this on local drive?
A few more thoughts... is the share a DFS share? Can you put the receive locations on different hosts and see what happens then?
We have similar issues with programs running on VMWare VMs accessing shares. For some reason files will appear to be corrupt.
This was not BizTalk related, it was happening with an in house developed application.
Rebooting the VM fixes our issue for a while. In our case we were able to reconfigured our process to not use shares. We never did persue finding a solution to the real problem.

Resources