Add file name as column in data factory pipeline destination

Add file name as column in data factory pipeline destination - pipeline

I am new to DF. i am loading bunch of csv files into a table and i would like to capture the name of the csv file as a new column in the destination table.
Can someone please help how i can achieve this ? thanks in advance

If you use a Mapping data flow, there is an option under source settings to hold the File name being used. And later in it can be mapped to column in Sink.

If your destination is azure table storage, you could put your filename into partition key column. Otherwise, I think there is no native way to do this with ADF. You may need custom activity or stored procedure.

A post said the could use data bricks to handle this.
Data Factory - append fields to JSON sink
Another post said they are using USQL to hanlde this.
use adf pipeline parameters as source to sink columns while mapping
For stored procedure, please reference this post. Azure Data Factory mapping 2 columns in one column

Related

debatch big input flat file into smaller multiple output files with specific count

I have a positional input flat file schema of the following kind.
<Employees>
<Employee>
<Data>
In mapping, I need to extract the strings on position basis to pass on to the target schema.
I have the following conditions -
If Data has 500 records, there should be 5 files of 100 records at the output location.
If Data has 522 records, there should be 6 files (5*100, 1*22 records) at the output location.
I have tried few suggestions from internet like
Setting “Allow Message Breakup At Infix Root” to “Yes” and setting maxoccurs to "100". This doesn't seem to be working. How to Debatch (Split) a Flat File using Flat File Schema ?
I'm also working on a custom receive pipeline component suggested at Split Flat Files into smaller files (on row count) using Custom Pipeline but I'm quite new to this so it's taking some time.
Please let me know if there is any simpler way of doing this, without implementing the custom pipeline component.
I'm following the approach to divide the input flat file into multiple small files as per condition and write at the receive location, then process the files with native flat file dissembler. Please correct me if there is a better approach.

You have two options:
Import the flat file to a SQL table using SSIS.
Parse the input file as one Message, then map to a Composite Operation to insert the records into a SQL table. You could use in Insert Updategram also.
After either 1 or 2, call a Stored Procedure to retrieve the Count and Order of messages you need.

A simple way for a flat file structure without writing custom C# code is to just use a Database table. Just insert the whole file as records into the table, and then have a Receive Location that polls for records in the batch size you want.
Another approach is called the Scatter Gather Pattern, in this case you do set the Occurs to 1 which will debatch into individual records, and you then have an Orchestration that re-assembles it into the batch size you want. You will have to read up about Correlations Sets to do this.

Consuming multiple csv files describing a nested structure in BizTalk

I have a requirement to consume a csv "dataset" consisting of 3 flat files - a control file, a headers file, and a line file - which together define a nested data structure.
The control file items have a field called ControlID, which can be used in the headers file to identify those header records which "belong" to that control item.
The header records have a field called HeaderID, which can be used in the lines file to identify those line records which "belong" to a given header record.
I'd like to consume all three files and then map them into some kind of nested schema structure. My question is how would I do this? Can I do it in a pipeline component?

I would look at two options. Both involve correlation all three files to an Orchestration using a Parallel Convoy.
Use a Multi-input Map to join the files. You should be able to use the HeaderID as filter using the Equal Function to match the lines to their header.
Use a SQL Stored Procedure to group the data as described here: BizTalk: Sorting and Grouping Flat File Data In SQL Instead of XSL

Extracting policy data into csv file

I want to take data from the the gx model and save it into csv file. There are following challenge i am facing -
How to store gxmodel file into a object.
after storing the object, what are the way to store it in csv file.

Question 1:
If my understanding is correct, U have an xml input and u need to store the data into a POJO class(object).
If that is the requirement, u need to parse the xml -> pick each value and map it to the pojo.
If its a gx model, u might have the schema (XSD) also, in that case GW auto generates(in ver 9) and u can directly access the xml content through the instance created by GW for the particular GX model. The rest is mapping which wont be a great deal.
Question 2:
For ListIterator we can directly export the content to csv (or any oth) format. You can refer --Batch History Export-- functionality for the same.
Many jars are available for the same also.
CSV means comma separated values. U can even create a file with the same logic and save it with .csv extension.
Hope this makes sense. :)

BizTalk - Delete without a schema

I am importing a file with 200+ records into a master table.
The BizTalk package only services one source, other packages service other sources
I am using strongly type stored procedures for all SQL CRUD
All records inside the file come from the same source
The file does not contain source name or source Id
I want to determine source from package hard coded value
The Master table contains records from several sources
Before import: delete inside Master table existing records from source
Unlike the file import, the delete statement happens once
DELETE FROM Master WHERE SourceID = #SourceID
The file import works, but how can I hard code the delete source ID?

In your delete transform (just above the send shape) you can set up a SourcID property for the outgoing message. You can then populate the message context with this SourceID. This sourceID can then be used in your delete statement.

If I understand correctly, you want to delete all existing records for the SourceID before inserting new ones?
If so, you need to have access to the SourceID value on the inbound message into the orchestration.
To do this, use property promotion.
You can either do this:
inside a pipline component configured on the receive port so that the property is available when the message arrives on the orchestration, or,
inside the orchestration, which will require you moving the construct shape for the InsertCSV message above the delete construct shape, and promoting the property within the contruct shape.
Of these options, the first one is probably the best option as assigning properties should ideally be done during message dissasembly.
Alternatively, you can use an xpath() call within an Expression shape to interrogate the message using xpath, and retrieve the value like that. This way you can avoid thinking about property promotion.
However, while quicker to implement, this approach is not best practise because it makes your orchestration very sensitive to changes in message schema.

Pentaho Kettle: Mailing the result of a transformation

I am having a kettle job and transformation.
Transformation will write the result set of a select sql into a csv file.
job will get the result file and mail it to the user.
I need to mail only if the file consists any data, else should not mail the result to user.
or how to find the result of a transformation is empty or not(is there any file size validator job entry available?).
I am not able to find any job entries for this kind of conditioning.
Thanks in advance.

You can use the Evaluate files metrics job step in the Conditions branch. Set your condition on the Advanced tab.

You can set your transformation to generate the file only if there is data and then use in your main job the File exists? step.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Add file name as column in data factory pipeline destination - pipeline

I am new to DF. i am loading bunch of csv files into a table and i would like to capture the name of the csv file as a new column in the destination table. Can someone please help how i can achieve this ? thanks in advance

If you use a Mapping data flow, there is an option under source settings to hold the File name being used. And later in it can be mapped to column in Sink.

If your destination is azure table storage, you could put your filename into partition key column. Otherwise, I think there is no native way to do this with ADF. You may need custom activity or stored procedure.

Related

debatch big input flat file into smaller multiple output files with specific count

Consuming multiple csv files describing a nested structure in BizTalk

Extracting policy data into csv file

BizTalk - Delete without a schema

Pentaho Kettle: Mailing the result of a transformation

Categories

Resources