Problem Description:
1.There is Biztalk Application receiving formatted/zipped data file containing > 2 million data records.
2.Created pipeline component that process file and 'de batching' these 2 million records of data into smaller slice-messages with ~2000 records each .
3.Slice-messages are being sent to SQL port and processed by stored procedure.Slice-messages contains filename and batch id.
Questions:
A.What would be the best way to know that all slice-messages received and processing of whole file completed on SQL side ?
B.Is there any way in biztalk port to say "do not send message of type B, until all messages of Type A have been send" (messages priority)?
Here are possible solutions I've tried :
S1.Add specific 'end of file' tags to end of last slice-message saying that file is being processed and stored procedure will receive this part of message mark file as completed.
But because messages are being delivered asynchronously last message can be received on sql earlier that other messages and I will have false-competed event.
So this solution is only possible for "Ordered delivery ports" - but this type of ports have poor performance because sending only one message at a time.
S2.Transfer total records count into every slice-message and run count() sql statement after every slice-message received.
Because table where data is stored is super huge, even running count with filename as parameter takes time.
I'm wondering if there is better solution to know that all messages are being received ?
Have your pipeline component emit a "batch" message that contains the count of the records in the batch and some unique identifier that can link it back to the slice-messages records.
Have both the stored procedure that processes the slice-message and the batch message check to see if the batch total (if it exists yet for the slice-message process) matches the processed total, if they match, then you've finished processing them all.
Here's how I would approach this.
Load the 2MM records into a SQL Server table or tables by SSIS.
Drain the table at whatever rate give you an acceptable performance profile.
Delete records as they are processed (completed).
When no more records for "FILE001.txt" exist, SQL Server will return a flag
saying "FILE001.txt complete".
Do further processing.
When the staging table is empty, the Polling SP can either return nothing (the Adapter will silently ignore the response) or return a flag that says "nothing to do" and you handle that yourself.
Related
I have a strange issue with my WCF_SQL receive location polling. The BizTalk server is BizTalk 2010. The polling executes every 1 min and involves executing a Stored_Proc selecting records from a table and updating the selected records status to something like 'Processing'
Select top 10 * from ProcessingTable where Status = 'New'
Update ProcessingTable Set Status = 'Processing' where Status = 'New'
The receive pipeline is XMLReceive which will debatch the records and route to another orchestration for processing. At the end of the orchestration, there will be a Send port for updating the Status to 'Processed'.
Here comes the issue, during the period when we have our maintenance and the BizTalk DB/Application servers are brought down, host instances will be down and these records will be stuck in 'New' state. After the maintenance and host instances initialized, these records will get picked up immediately and have its status updated to 'Processing'. Strange thing is that it's stuck at this status and never proceed to get updated to 'Processed'. This is only happening for the top 10 records (first pull/pick up). Subsequently, all other remaining 'New' records get picked up and processed successfully. Currently the workaround is to always monitor for those records stuck in 'Processing' and Update these records to 'New' again to retrigger the processing. Anyone has an answer to this problem?
Have you used singleton pattern orchestration for this ? If no then try it once and see if your getting same problem as I suspect it is facing race condition
We receive many large data files daily in a variety of formats (i.e. CSV, Excel, XML, etc.). In order to process these large files we transform the incoming data into one of our standard 'collection' message classes (using XSLT and a pipeline component - either built-in or custom), disassemble the large transformed message into individual 'object' messages and then call a series of SOAP web service methods to handle business logic and database operations.
Unlike other files received, the latest file will contain all data rows each day and therefore, we have to handle the differences to prevent identical records from being re-processed each day.
I have a suitable mechanism for handling inserts and updates but am currently struggling with the deletes (where the record exists in the database but not in the latest file).
My current thought process is to flag the deleted records in the database using a 'cleanup' task at the end of the entire process but this would require a method to be called once all 'object' messages from the disassembled file have completed.
Is it possible to monitor individual messages from a multi-record file and call a method on completion of the whole file? Currently, all research is pointing to an orchestration with some sort of 'wait' but is this the only option?
Example: File contains 100 vehicle records. This is disassembled into 100 individual XML messages which are processed using 100 calls to a web service method. Wish to call cleanup operation when all 100 messages are complete.
The best way I've found to handle the 'all rows every day' scenario is to pre-stage the data in SQL Server where it's easier to compare the 'current' set to the 'previous' set. The INTERSECT and EXCEPT operators make it pretty easy in most cases.
Then drain the records with a Polling statement.
The component that does the de-batching would need to publish a start of batch message with the number of individual records and a correlation key.
The components that do the insert & update would need to publish a completion message with the same correlation key when it is completed processing.
The start of batch message would have spun up an Orchestration that would would listen for the completion messages with that correlation key and count the number, and either after it has received the correct number or after a timeout period it would call the cleanup or raise an exception.
I'm looking to loop data received from SQL Server data received from wcf-sql adapter.
I use for loop and and the following
itostring=i.ToString();
MessageOne=xpath(MessagePolling,"/*[local-name()='MainData' and namespace-uri()='http..["+itostring+"]");
When the XPath in for the first receive message path[i]
Is this the correct way?
There are two ways^ to loop on multiple records contained within an Xml message received by BizTalk:
Envelope Schemas
When you define the schema that represents the message, mark it as an Envelope Schema. This tells the Receive Pipeline Disassembler to create (and publish) one message to the BizTalk Message Box for each record in the incoming message (in your case from the WCF-SQL Adapter). This will cause a single Orchestration instance to be started for each record in your incoming message.
Richard Seroter has a great blog post on doing this from the WCF-SQL Adapter - http://seroter.wordpress.com/2010/04/08/debatching-inbound-messages-from-biztalk-wcf-sql-adapter/
Be aware that with this approach, you don't want to be de-batching tens of thousands of records from the incoming message as BizTalk will grind to a halt :-)
XPath Inside an Orchestration
If you do not use an Envelope Schema, you will start a single Orchestration instance for the incoming message (containing multiple records). Within an Expression Shape in your Orchestration, you can use XPath (and some other magic) to loop around each record an extract each to an Orchestration variable (which you can then map on etc.)
Take a look at the following links that will help you with extracting via XPath:
http://social.technet.microsoft.com/wiki/contents/articles/6944.biztalk-orchestrations-xpath-survival-guide.aspx
http://www.biztalkgurus.com/biztalk_server/biztalk_blogs/b/biztalk/archive/2004/10/25/using-xpath-inside-biztalk-orchestrations.aspx
http://blog.eliasen.dk/2006/11/05/LoopingAroundElementsOfAMessage.aspx
http://www.codeproject.com/Articles/534627/BizTalk-Looping-through-repeating-message-nodes-in
^There is also a third way to achieve this as of BizTalk Server 2009 (I think - it seems like so long ago) whereby you can execute a Receive Pipeline within an Orchestration, so you could perform your Envelope de-batching in an Orch, instead of a Receive Location's Receive Pipeline.
Before I tackle this solution, I wanted to run it by the community to get feedback.
Questions:
Is my approach feasible? i.e. can it even be done this way?
Is it the right/most efficient solution?
If it isn’t the right solution, what would be a better approach?
Problems:
Need to send mass emails through the application.
The shared hosted server only permits a maximum of 500 emails to be sent per hour before getting labeled a spammer
Server timeout while sending batch emails
Proposed Solution:
Upon task submittal (i.e. the user provides all necessary email information using a form and frontend template, selects the target audience, etc..), the action will then:
Determines how many records (from a stored db of contacts) the email will be sent to
If the number of records in #1 above is more than 400:
Assign a batch number to all these records in the DB.
Run a CRON job that:
Every hour, selects 400 records in batch “X” and sends the saved email template until there are no more records with batch “X”. Each time a batch of 400 is sent, it’s batch number is erased (so it won’t be selected again the following hour).
If there is an unfinished CRON JOB scheduled ahead of it (i.e. currently running), it will be placed in a queue.
Other clarification:
To send these emails I simply iterate over the SWIFT mailer using the following code:
foreach($list as $record)
{
mailers::sendMemberSpam($record, $emailParamsArray);
// where the above simply contains: sfContext::getInstance()->getMailer()->send($message);
}
*where $list is the list of records with a batch_number of “X”.
I’m not sure this is the most efficient of solutions, because it seems to be bogging down the server, and will eventually time out if the list or email is long.
So, I’m just looking for opinions at this point... thanks in advance.
I have a BizTalk receive port monitoring an FTP location. I expect a file to arrive at least once per day in that location and for BizTalk to pick it up and kick off an orchestration. This part is working fine.
However, sometimes the sender fails to send a message during a day, in which case I want an email to sent to notify the users that something is amiss.
I could solve this outside of BizTalk, by creating a daily job that looks in our database for processed files and makes sure there is at least one in any given day. However, I'd prefer to solve this "in line" with the BizTalk solution that is already in place, and not deploy a separate, unrelated job which will increase maintenance headaches.
Is there any functionality in BizTalk that would allow me to send a notification if a receive port doesn't receive something in a given timeframe?
Short answer: Not really.
The logic you want to implement would require a customised version of the FTP Adapter. Depends on how comfortable you are rolling up your sleeves and getting into the Adapter SDK.
If you wanted to keep your solution "Purely BizTalk", you could set up a secondary Orchestration using a SQL Receive Location tied to a stored procedure. This stored procedure executes regularly and looks for records in your "Processed File" table received in the past (business) day. If none are found, it fabricates a record and returns it via the SQL Receive Location. This would be your trigger to send the email notification.
One solution, not elegant though, is to have a secondary FILE receive location, with a schedule window, outside your cutoff time.
Failure scenario:
In this FILE receive location, you have an intelligent/dummy message conforming to the same schema as FTP receive. The intelligent part is to have one of the fields in the message telling us when was the last time we received the file from FTP. The rest of the content is dummy.
Within your orchestration, you check where you received your file from. If its the secondary receive location (using the context property BTS.ReceiveLocationName), you check the date field of this dummy/intelligent message and if it is in past 24 hours ( or similar logic) send an email notifying you did not receive the file from the upstream FTP process and also save a copy of the dummy message (received) back to the secondary FILE receive location unchanged.
Success Scenario:
Apart from normal processing, you save a copy of the dummy/intelligent message to the secondary FILE receive location, with the datetime field reflecting when you processed the file you received from FTP receive location.
Initialising:
You start with a dummy/intelligent message in the secondary FILE receive location with the datetime field value well in the past ( assuming we never received the file from FTP) or with yesterday's date ( assuming we received a file successfully from FTP the day before.)
Overview:
Your orchestration has two trigger points.
When you receive a file via FTP
A scheduled FILE receive location, triggered after the cut-off time.