I have a problem I’m trying to solve with BizTalk and was hoping you would have some thoughts on the best way to solve it. I am loading in a flat file using a flat file adapter. This file contains records if a number of different types. Type 1 records are parent records. The rest of the types all link to the parent records using various foreign keys. I am looking to develop a transform that will take the message created by loading the flat file (with records of all types) and transform that to a message which is the result of joining all the records based on the foreign keys. Each record in the resulting message would therefore have columns from all of the record types as the message records will be the result of a join.
For example, the flat file might contain the following records:
* Type1
* Type1
* Type2
* Type2
* Type3
* Type4
* Type4
The message after the transform might contain records like this:
* Type1 columns, type2 columns, type3 columns, type4 columns
* Type1 columns, type2 columns, type4 columns
I have been looking into the options for achieving this but was hoping some people might have advice on the way to go.
Things I have tried are:
Splitting the original message into messages with records of each type then using a multi-source map to join them up. I had difficultly achieving this as I wasn’t sure what functoids to use.
Inserting the child records into a SQL database so that the additional columns in the parent records can be populated using SQL lookup functoids. This was a bit slower than I would have hoped and has the additional dependency of using a database.
Using DTS to load the file, transform it into the format and writing out a file in the joined format for BizTalk to consume. This is simple and fast but isn’t using BizTalk.
Any advice on how to go forward to make the best use of BizTalk for this would be appreciated.
This sounds more like an ETL problem. I'd suggest SSIS. I've used BizTalk for ETL in the past and it never ended well.
Maybe you could try to write a custom xslt for the mapping using the socalled the 'Muenchian method'. It's a way to sort xml input using xslt. The input in this case could be the xml representation of your flat file as provided by BizTalk.
See the Q2 2009 issue of HotRod magazine at http://biztalkhotrod.com/Documents/BizTalk_HotRod_Issue6_Q2_2009.pdf
This issue also handles the subject of using custom xslt in BizTalk.
Related
I have multiple flatfiles (CSV) (with multiple records) where files will be received randomly. I have to combine them (records) with unique ID fields.
How can I combine them, if there is no common unique field for all files, and I don't know which one will be received first?
Here are some files examples:
In real there are 16 files.
Fields and records are much more then in this example.
I would avoid trying to do this purely in XSLT/BizTalk orchestrations/C# code. These are fairly simple flat files. Load them into SQL, and create a view to join your data up.
You can still use BizTalk to pickup/load the files. You can also still use BizTalk to execute the view or procedure that joins the data up and sends your final message.
There are a few questions that might help guide how this would work here:
When do you want to join the data together? What triggers that (a time of day, a certain number of messages received, a certain type of message, a particular record, etc)? How will BizTalk know when it's received enough/the right data to join?
What does a canonical version of this data look like? Does all of the data from all of these files truly get correlated into one entity (e.g. a "Trade" or a "Transfer" etc.)?
I'd probably start with defining my canonical entity, and then look towards the path of getting a "complete" picture of that canonical entity by using SQL for this kind of case.
We have several pairs of databases that relate to each other, i.e.
db1a and db1b
db2a and db2b
db3a and db3b
etc.
where there are cross-db joints between db1a and db1b, 2a and 2b, etc.
Having an open sqlite database connection, they can be attached, i.e.
ATTACH 'db1a' as a; ATTACH 'db1b' as b;
and later detached and replaced with other pairs when needed.
How should the database "connection" be created, though, as there is originally no real database to attach to? Using the first one as the main database gives it much more significance which is not meaningful - and as it is not detachable it's a hinderance later on.
The only way I can see is opening a :memory: or a temporary ('') database connection. Is there some better option available?
In the absence of any better alternative, :memory: is the best choice. (:temp: is a normal file name, and invalid in many OSes; a temp DB would have an empty file name.)
If you have some meta database that lists all other databases, you could use that.
Please note that when you have multiple attached databases, any change to one of them will involve a multi-database transaction.
So if the various database pairs do not have any relationship with other databases with different numbers, consider using a new connection for each pair.
I am using sqlite3 (maybe sqlite4 in the future) and I need something like dynamic tables.
I have many tables with the same format: values_2012_12_27, values_2012_12_28, ... (number of tables is dynamic) and I want to select dynamically the table that receives some data.
I am using _sqlite3_prepare with INSERT INTO ? VALUES(?,?,?). Ofcourse this fails to compile (syntax error near ?). There is a nice and simple way to do this in sqlite ?
Thanks
Using SQL parameters is not possible for identifiers such as table or column names.
If you don't want to keep so many prepared statements around, just prepare them on the fly whenever you need one.
If your database were properly normalized, you would have a single big values table with an extra date column.
This organization is usually to be preferred, unless you have measured both and found that the better performance (if it actually exists) outweighs the overhead of managing multiple tables.
I have an ASP.NET data entry application that is used by multiple clients. The application consists of multiple data entry modules that are common to all clients.
I now have multiple clients that want their own custom module added which will typically consist of a dozen or so data points. Some values will be text, others numeric, some will be dropdown selections, etc.
I'm in need of suggestions for handling the data model for this. I have two thoughts on how to handle. First would be to create a new table for each new module for each client. This is pretty clean but I don't particular like it. My other thought is to have one table with columns for each custom data point for each client. This table would end up with a lot of columns and a lot of NULL values. I don't really like either solution and suspect there's a better way to do this, so any feedback you have will be appreciated.
I'm using SQL Server 2008.
As always with these questions, "it depends".
The dreaded key-value table.
This approach relies on a table which lists the fields and their values as individual records.
CustomFields(clientId int, fieldName sysname, fieldValue varbinary)
Benefits:
Infinitely flexible
Easy to implement
Easy to index
non existing values take no space
Disadvantage:
Showing a list of all records with complete field list is a very dirty query
The Microsoft way
The Microsoft way of this kind of problem is "sparse columns" (introduced in SQL 2008)
Benefits:
Blessed by the people who design SQL Server
records can be queried without having to apply fancy pivots
Fields without data don't take space on disk
Disadvantage:
Many technical restrictions
a new field requires DML
The xml tax
You can add an xml field to the table which will be used to store all the "extra" fields.
Benefits:
unlimited flexibility
can be indexed
storage efficient (when it fits in a page)
With some xpath gymnastics the fields can be included in a flat recordset.
schema can be enforced with schema collections
Disadvantages:
not clearly visible what's in the field
xquery support in SQL Server has gaps which makes getting your data a real nightmare sometimes
There are maybe more solutions, but to me these are the main contenders. Which one to choose:
key-value seems appropriate when the number of extra fields is limited. (say no more than 10-20 or so)
Sparse columns is more suitable for data with many properties which are filled out infrequent. Sounds more appropriate when you can have many extra fields
xml column is very flexible, but a pain to query. Appropriate for solutions that write rarely and query rarely. ie: don't run aggregates etc on the data stored in this field.
I'd suggest you go with the first option you described. I wouldn't over think it. The second option you outlined would be a bad idea in my opinion.
If there are fields common to all the modules you're adding to the system you should consider keeping those in a single table then have other tables with the fields specific to a particular module related back to the primary key in the common table. This is basically table inheritance (http://www.sqlteam.com/article/implementing-table-inheritance-in-sql-server) and will centralize the common module data and make it easier to query across modules.
Ok, the basic situation: Due to a few mixed up starts, a project ends up with not one, but three separate databases, each containing a portion of the overall project data. All three databases are the same, it's just that, say 10% of the project was run into the first, then a new DB was made due to a code update and 15% of the project was run into the new one, then another code change required another new database for the rest of the project. Again, the pertinent tables are all exactly the same across all three databases.
Now, assume I wanted to take all three of those databases - bearing in mind that they can't just be compiled into a single databases due to Primary Key issues and so on - and run a single query that would look through all three of them, select a given set of data from each, then compile those three sets into one single result and return it to the reporting page I'm working on.
For reference, at its endpoint the data is output to an ASP.Net/VB.Net backed page, specifically a Gridview object. It doesn't need to be edited, fortunately, just displayed.
What would be the best way to approach this mess? I'm thinking that creating a temporary table would be my best bet, but honestly I'm stepping into a portion of SQL that I'm not familiar with here, and would appreciate any guidance somebody more experienced might have.
I'd say your best bet is to suck it up and combine the databases, even if it is a major pain to combine the primary keys. It may be a major pain now, but it is going to be 10x as painful over the life of the project.
You can do a union across multiple databases as Scott has pointed out, but you are in for a world of trouble as the application gets more complex. For example, even if you circumvent the technical limitations by having multiple tables/databases for the same entity, having duplicates in the PK for a logical entity is a world of trouble.
Implement the workaround solution if you must, but I guarantee you will hate yourself for it later.
Why not just use 3 part naming on the tables and union them all together?
select db1.dbo.Table1.Field1,
db1.dbo.Table1.Field2
from db1.dbo.Table1
UNION
select db2.dbo.Table1.Field1,
db2.dbo.Table1.Field2
from db2.dbo.Table1
UNION
select db3.dbo.Table1.Field1,
db3.dbo.Table1.Field2
from db3.dbo.Table1
-- where ...
-- order by ...
You should create what is called a Partitioned View for each of your tables of interest. These views do a union of the underlying base tables and eventually add a syntetic column to uniquefy the rows:
CREATE VIEW vTableXDB
AS
SELECT 'DB1' as db_key, *
FROM DB1.dbo.table
UNION ALL
SELECT 'DB2' as db_key, *
FROM DB2.dbo.table
UNION ALL
SELECT 'DB3' as db_key, *
FROM DB3.dbo.table;
You create one such view for each table and then design your reports on these views, not on the base tables. You must add the db_key to your join conditions. The query optimizzer has some understanding of the partitioned views and might be able to create plans that do the right thing and avoid joins that span multiple dbs, but that is not guaranteed. If things go haywire and the optimizer does not recognize the partitioning resulting in very bad execution times, you may have to move the db_key into the tables themselves and add some artificial check constraints on the base tables so that the optimizer can understand the partitioning (see the article I linked for details).
You can actually join tables on different databases. If I remember right the syntax is changed from "tablename.columnName" to "Server.Owner.tablename.columnName". You will need to run some stored procedures as an admin to allow this connectivity. It's also pretty slow but the effort to get it working is low.
If you have time to do it right look at data warehouse concepts. That's basically a temp table that collects the data you need to report on.
Building on Scott Ivey's excellent example above,
Use table name aliasing to simplify your code
Use UNION ALL instead of UNION assuming that your data is unique between the three databases
Code:
select
d1t1.Field1,
d1t1.Field2
from db1.dbo.Table1 AS d1t1
UNION ALL
select
d2t1.Field1,
d2.Field2
from db2.dbo.Table1 AS d2t1
UNION ALL
select
d3t1.Field1,
d3t1.Field2
from db3.dbo.Table1 AS d3t1
-- where ...
-- order by ...