How to Combine multiple files in BizTalk? - biztalk

I have multiple flatfiles (CSV) (with multiple records) where files will be received randomly. I have to combine them (records) with unique ID fields.
How can I combine them, if there is no common unique field for all files, and I don't know which one will be received first?
Here are some files examples:
In real there are 16 files.
Fields and records are much more then in this example.

I would avoid trying to do this purely in XSLT/BizTalk orchestrations/C# code. These are fairly simple flat files. Load them into SQL, and create a view to join your data up.
You can still use BizTalk to pickup/load the files. You can also still use BizTalk to execute the view or procedure that joins the data up and sends your final message.
There are a few questions that might help guide how this would work here:
When do you want to join the data together? What triggers that (a time of day, a certain number of messages received, a certain type of message, a particular record, etc)? How will BizTalk know when it's received enough/the right data to join?
What does a canonical version of this data look like? Does all of the data from all of these files truly get correlated into one entity (e.g. a "Trade" or a "Transfer" etc.)?
I'd probably start with defining my canonical entity, and then look towards the path of getting a "complete" picture of that canonical entity by using SQL for this kind of case.

Related

Combine files records with repeated ID in BizTalk

I have another case where I don't know how to find a solution with BizTalk.
I have this two flat files (in real there are 9 files to combine) and the output must be like shown in the picture:
How can I combine files which ID repeat several times in the main file.
In the below picture, the main file is "People". Is there way to do this without writing any code in BizTalk, or must I store this data in SQL DB after that i join them with a stored procedure?
Can you help me lay-out the steps I need to take, because I know how to combine files together but that is without the repeated ID's.

2sxc: Merge more streams in a Default one

I have an app that can have one or more streams
Example:
Book of author A
Book of author B
Book of author C
So my queries can have one or more relationship filters.
Assuming that I would like to use only one template for more views, and a view can have more streams so I can't have the names of each one in my template, how can i do that?
Basically in my template I would like to have a unique list even if I got more streams
AsDynamic(Data["Default"]) //This should get all the streams in my data
Is that possible? Maybe aggregating them in Visual query?
I'm trying to have an out stream coming from many but giving the same name I got and error.
At the moment this is not possible (2sxc 8.5.6). There are a few problems related to this idea
the same item could occur multiple times, this is not supposed to happen in a stream
you will probably loose the "which author was this for" information
As of now, I recommend to either just merge them in js or server-side code if this is what you need.

Best format to store incremental data in regularly using R

I have a database that is used to store transactional records, these records are created and another process picks them up and then removes them. Occasionally this process breaks down and the number of records builds up. I want to setup a (semi) automated way to monitor things, and as my tool set is limited and I have an R shaped hammer, this looks like an R shaped nail problem.
My plan is to write a short R script that will query the database via ODBC, and then write a single record with the datetime, the number of records in the query, and the datetime of the oldest record. I'll then have a separate script that will process the data file and produce some reports.
What's the best way to create my datafile, At the moment my options are
Load a dataframe, add the record and then resave it
Append a row to a text file (i.e. a csv file)
Any alternatives, or a recommendation?
I would be tempted by the second option because from a semantic point of view you don't need the old entries for writing the new ones, so there is no reason to reload all the data each time. It would be more time and resources consuming to do that.

Should I use Wordpress Transient API in this case?

I'm writing a simple Wordpress plugin for work and am wondering if using the Transients API is practical in this case, or if I should seek out another way.
The plugin's purpose is simple. I'm making a call to USZip Web Service (http://www.webservicex.net/uszip.asmx?op=GetInfoByZIP) to retrieve data. Our sales team is using a Lead Intake sheet that the plugin will run on.
I wanted to reduce the number of API calls, so I thought of setting a transient for each zip code as the key and store the incoming data (city and zip). If the corresponding data for a given zip code already exists, then no need to make an API call.
Here are my concerns:
1. After a quick search, I realized that the transient data is stored in the wp_options table and storing the data would balloon that table in no time. Would this cause a significance performance issue if the db becomes huge?
2. Is this horrible practice to create this many transient keys? It could easily becomes thousands in a few months time.
If using Transient is not the best way, could you please help point me in the right direction? Thanks!
P.S. I opted for the Transients API vs the Options API. I know zip codes don't change often, but they sometimes so. I set expiration time of 3 months.
A less-inflated solution would be:
Store a single option called uszip with a serialized array inside the option
Grab the entire array each time and simply check if the zip code exists
If it doesn't exist, grab the data and save the whole transient again
You should make sure you don't hit the upper bounds of a serialized array in this table (9,000 elements) considering 43,000 zip codes exist in the US. However, you will most likely have a very localized subset of zip codes.

How to setup data model for customizable application

I have an ASP.NET data entry application that is used by multiple clients. The application consists of multiple data entry modules that are common to all clients.
I now have multiple clients that want their own custom module added which will typically consist of a dozen or so data points. Some values will be text, others numeric, some will be dropdown selections, etc.
I'm in need of suggestions for handling the data model for this. I have two thoughts on how to handle. First would be to create a new table for each new module for each client. This is pretty clean but I don't particular like it. My other thought is to have one table with columns for each custom data point for each client. This table would end up with a lot of columns and a lot of NULL values. I don't really like either solution and suspect there's a better way to do this, so any feedback you have will be appreciated.
I'm using SQL Server 2008.
As always with these questions, "it depends".
The dreaded key-value table.
This approach relies on a table which lists the fields and their values as individual records.
CustomFields(clientId int, fieldName sysname, fieldValue varbinary)
Benefits:
Infinitely flexible
Easy to implement
Easy to index
non existing values take no space
Disadvantage:
Showing a list of all records with complete field list is a very dirty query
The Microsoft way
The Microsoft way of this kind of problem is "sparse columns" (introduced in SQL 2008)
Benefits:
Blessed by the people who design SQL Server
records can be queried without having to apply fancy pivots
Fields without data don't take space on disk
Disadvantage:
Many technical restrictions
a new field requires DML
The xml tax
You can add an xml field to the table which will be used to store all the "extra" fields.
Benefits:
unlimited flexibility
can be indexed
storage efficient (when it fits in a page)
With some xpath gymnastics the fields can be included in a flat recordset.
schema can be enforced with schema collections
Disadvantages:
not clearly visible what's in the field
xquery support in SQL Server has gaps which makes getting your data a real nightmare sometimes
There are maybe more solutions, but to me these are the main contenders. Which one to choose:
key-value seems appropriate when the number of extra fields is limited. (say no more than 10-20 or so)
Sparse columns is more suitable for data with many properties which are filled out infrequent. Sounds more appropriate when you can have many extra fields
xml column is very flexible, but a pain to query. Appropriate for solutions that write rarely and query rarely. ie: don't run aggregates etc on the data stored in this field.
I'd suggest you go with the first option you described. I wouldn't over think it. The second option you outlined would be a bad idea in my opinion.
If there are fields common to all the modules you're adding to the system you should consider keeping those in a single table then have other tables with the fields specific to a particular module related back to the primary key in the common table. This is basically table inheritance (http://www.sqlteam.com/article/implementing-table-inheritance-in-sql-server) and will centralize the common module data and make it easier to query across modules.

Resources