Is there a way to compare file vs table record with creating new mapping using Informatica? - unix

I'm working on a scenario where I have to compare a data record which is coming from a file with the data from a table as part of validation check before loading the data file into the staging table. I have come up with a couple of possible scenarios which involve something that needs to change within the load mapping, but my team suggested to me to make a change to something that is easy to notice since it is a non-standard approach.
Is there any approach that we can handle within the workflow manager using any of the workflow tasks or session properties?

Create a mapping that will read the file, join data with the table, do the required validation and will write nothing out (use a filter with FALSE condition) and set a variable to 0/1 to indicate if the loading should start.
Next, run the loading session if the validation passed.
This can be improved a bit if you want to store the validation errors in some audit table. Then you don't need a variable - the condition can refer to $PMTargetName#numAffectedRows built-in variable. If it's more then zero - meaning there were some errors - don't start the load.

create a workflow with command line where you need to write a script which will pull the data from the table by using JDBC connections and try to compare with data present in the file and then flag whether to load or not .
based on this command line output you need to go ahead with staging workflow or not..
Use awk commands for comparison of the data , where you ll get flexibility to compare date parts in a column
FYR : http://www.cs.unibo.it/~renzo/doc/awk/nawkA4.pdf

Related

Track of structure changes in a Progress database

I am asked to automate the tracking of changes in the structure of the database: Any modification, addition or removal of tables, fields, indexes, etc.
I have searched the audit but only found that it can track changes in the "Database schema", which is something else.
Do you know if it is possible to do that?
We use 11.6.3.
One wonders how those magical changes in the schema (I think you clarified that it was actually schema changes you wanted to automate) occur. Optionally it could be up to those making the changes to also keep track of them. Usually (hopefully) the database is updated using "delta df-files". Those df-files if kept are a changelog of the database.
Another option is to daily/hourly/weekly dump the data definitions:
CREATE ALIAS DICTDB FOR DATABASE sports.
DISPLAY LDBNAME("DICTDB").
RUN prodict/dump_df.p ("ALL",
"c:/temp/sports.df",
"").
DELETE ALIAS DICTDB. /* Optional */
Taken from this entry in the knowledge base: https://community.progress.com/s/article/15884
Then you can diff that df-file using your favorite tool or keep as it is.
If you actually mean structure (that's more how the data is stored in different files on disc) you can use the prostrct command to save a new st-file to disc:
prostrct list sports
This will save a file called sports.st. Handle it as above and you will have a changelog of the database structure.

Re-executing CREATE FUNCTION script using ADF not reflected in output of show journal

I have a create function script .create-or-alter function that I am submitting to an ADX cluster every 5 mins using Azure Data Factory (ADF).
I keep firing the command .show journal to detect whether this was executed. The first time ADF submitted this script, when the function was not already there, the function got created and I could even see its entry using .show journal command. But after that I could not see 'ADD-FUNCTION' event in the latest output of .show journal even though I kept checking for a long time, and during this the pipeline has been succeeding.
I don't understand that if the pipeline is successfully submitting the existing create function script without any change , why ADX is not allowing to go through?
If I open existing function script in Kusto Explorer and just re-execute it without any change, it is reflected in .show journal but logically the same thing ADF is doing but that is not reflected in .show journal.
Just to experiment , I dropped this function using Kusto Explorer.
So, the next time when the ADF pipeline ran , it created the function again and that entry was reflected in the output of .show journal.
It means whenever we are re-submitting create function script from ADF to ADX , probably ADF checks if the function definition is changed , if not , it ignores the command ?
But then this check is not performed when we do the same thing from Kusto Explorer, which is strange.
ADX behavior should not change depending on how we are submitting commands.
Another interesting fact is that this behavior is unique to functions ,
I also tested re-creating the same update policy for a table through ADF again and again without any change and every time it ends up showing up in the output of .show journal.
Is this behavior a feature or a bug in case of functions?
From the ADX service's perspective, when you execute an .alter function or a .create-or-alter command that results with an existing function having the exact same body, parameters, folder and docstring - the command does nothing, and therefore nothing is written to the journal.
If you're seeing differently, I would recommend that you open a support ticket.

debatch big input flat file into smaller multiple output files with specific count

I have a positional input flat file schema of the following kind.
<Employees>
<Employee>
<Data>
In mapping, I need to extract the strings on position basis to pass on to the target schema.
I have the following conditions -
If Data has 500 records, there should be 5 files of 100 records at the output location.
If Data has 522 records, there should be 6 files (5*100, 1*22 records) at the output location.
I have tried few suggestions from internet like
Setting “Allow Message Breakup At Infix Root” to “Yes” and setting maxoccurs to "100". This doesn't seem to be working. How to Debatch (Split) a Flat File using Flat File Schema ?
I'm also working on a custom receive pipeline component suggested at Split Flat Files into smaller files (on row count) using Custom Pipeline but I'm quite new to this so it's taking some time.
Please let me know if there is any simpler way of doing this, without implementing the custom pipeline component.
I'm following the approach to divide the input flat file into multiple small files as per condition and write at the receive location, then process the files with native flat file dissembler. Please correct me if there is a better approach.
You have two options:
Import the flat file to a SQL table using SSIS.
Parse the input file as one Message, then map to a Composite Operation to insert the records into a SQL table. You could use in Insert Updategram also.
After either 1 or 2, call a Stored Procedure to retrieve the Count and Order of messages you need.
A simple way for a flat file structure without writing custom C# code is to just use a Database table. Just insert the whole file as records into the table, and then have a Receive Location that polls for records in the batch size you want.
Another approach is called the Scatter Gather Pattern, in this case you do set the Occurs to 1 which will debatch into individual records, and you then have an Orchestration that re-assembles it into the batch size you want. You will have to read up about Correlations Sets to do this.

BizTalk - Delete without a schema

I am importing a file with 200+ records into a master table.
The BizTalk package only services one source, other packages service other sources
I am using strongly type stored procedures for all SQL CRUD
All records inside the file come from the same source
The file does not contain source name or source Id
I want to determine source from package hard coded value
The Master table contains records from several sources
Before import: delete inside Master table existing records from source
Unlike the file import, the delete statement happens once
DELETE FROM Master WHERE SourceID = #SourceID
The file import works, but how can I hard code the delete source ID?
In your delete transform (just above the send shape) you can set up a SourcID property for the outgoing message. You can then populate the message context with this SourceID. This sourceID can then be used in your delete statement.
If I understand correctly, you want to delete all existing records for the SourceID before inserting new ones?
If so, you need to have access to the SourceID value on the inbound message into the orchestration.
To do this, use property promotion.
You can either do this:
inside a pipline component configured on the receive port so that the property is available when the message arrives on the orchestration, or,
inside the orchestration, which will require you moving the construct shape for the InsertCSV message above the delete construct shape, and promoting the property within the contruct shape.
Of these options, the first one is probably the best option as assigning properties should ideally be done during message dissasembly.
Alternatively, you can use an xpath() call within an Expression shape to interrogate the message using xpath, and retrieve the value like that. This way you can avoid thinking about property promotion.
However, while quicker to implement, this approach is not best practise because it makes your orchestration very sensitive to changes in message schema.

Pentaho Kettle: Mailing the result of a transformation

I am having a kettle job and transformation.
Transformation will write the result set of a select sql into a csv file.
job will get the result file and mail it to the user.
I need to mail only if the file consists any data, else should not mail the result to user.
or how to find the result of a transformation is empty or not(is there any file size validator job entry available?).
I am not able to find any job entries for this kind of conditioning.
Thanks in advance.
You can use the Evaluate files metrics job step in the Conditions branch. Set your condition on the Advanced tab.
You can set your transformation to generate the file only if there is data and then use in your main job the File exists? step.

Resources