Mapping Data Flows - Working with Dynamic Business Keys - azure-mapping-data-flow

I am building a parameterised Mapping dataflow pipeline and have run into a problem that I need help with.
My ADF Load is based on a config file, a sample of which is given below:
I would like the ability to join using the Stagekeys column in my config file using the EXISTS transformation shown below
Any suggestions on how I can achieve it?
Kind Regards

If my understanding was right we can parameterize key columns and prepare Exists Expression.
FYI, attached condition for single key we can extend that with multi keys as "source1#keyColumn1 == source2#keyColumn1 && source1#keyColumn2 == source2#keyColumn2"
--Dataflow Parameter
--Exists Expressions
For multiple keys from same target table can use following expression and send key columns as array
array(byNames($pKeyColumns,'sourceADLSCSV')) == array(byNames($pKeyColumns,'targetASQL'))
--Pipeline Parameter
--Dataflow Parameter
--Exists Expressions

Related

How to Pass Parameter Values at Runtime in Informatica Mapping Parameter

I have a scenario like we need to load data from source file to target table from a particular date [like LOAD_DATE], So I’ll create a mapping parameter for LOAD_DATE and pass that in Source Qualifier query. My query looks like this.
SELECT * FROM my_TABLE where DATE >= ‘$$LOAD_DATE’
So here I need to pass parameter values for ‘$$LOAD_DATE’ from another external database. I know that I need to pass the values from the Parameter file.
But my requirement is not to hardcore the values in the Parameter file but to feed it in runtime from another database. I will appreciate your help and thoughts on this.
You dont have to hardcode.
You can do it like this -
option 1. Create a mapping to create the param file in particular format.
Read for the other DB.
In expression transformation create below port which will generate actual param string. Pls note, we need to add new line so its recognized like a actual param file.
out_str = '[<<name of folder . name of workflow or sessoin>>]' || chr(12) ||
'$$LOAD_DATE='|||| CHR(39) ||<<date value from another DB>>|| CHR(39)
Then link above port to a flat file target. Name the output file as session_param.txt or whatever suitable. Pls make sure the parameter is generated correctly.
Use above file as a parameter file in your actual workflow.
Option 2 - You can join another table with original table flow. This can be difficult and need to change existing mapping.
Join the another table from another DB with main table based on a dummy condition. make sure you get distinct values of LOAD_DATE from another table. Make sure you always have 1 value from this DB.
Once you have the LOAD_DATE field from another table, you can use it in filter transformation to filter the data.
After this point you can add your original mapping.
Whole mapping should be like this-
SQ_MAIN_TABLE ----------------------->|
sq_ANOTHER_TABLE --DISTINCT_LOAD_DT-->JNR--FIL on LOAD_DT --><<your mapping logic>>

Can SQLite return default values for non-existent columns instead of error?

I know how to use IFNULL to get default values for non-existent rows or null values, but for creating queries that are compatible with older schema versions, it would be nice to be able to do this:
Schema v1: CREATE TABLE Employee (Name TEXT, Phone TEXT)
Schema v2: CREATE TABLE Employee (Name TEXT, Phone TEXT, Address TEXT)
Theoretical backward compatible query:
SELECT Name, Phone, IFNULL(Address, '') FROM Employee
Obviously this doesn't work for a file created with schema v1. Is there some way to do this though?
There are 2 alternative workflows, but both are rather annoying. Either 1) update the old db by adding missing columns (which would start with null values); or 2) build the query code dynamically based on schema version.
Create a temporary view that references a particular schema, substituting default values (or even transforming other data) for individual columns which differ between the base schemas.
Sqlite views can even be made modifiable by defining appropriate triggers.
This still requires programming some conditional logic upon connection, but it would allow more uniform queries and interaction with different versions of the schema.
The suggested syntax would perhaps be convenient in some limited cases, but this approach is much more useful since it can be expanded beyond simple "if column exists" Boolean operations and instead could be used to perform dynamic transformation of one schema into another, perhaps joining tables and providing more advanced logic for updates of differing schema, etc.
Pseudo code mixed with view definitions to demonstrate:
db <- Open database connection
db_schema <- determine schema version
If db_schema == 1 Then
db.execute( "CREATE VIEW temp.EmployeeX AS
SELECT Name, Phone, '' AS Address
FROM main.Employee;" )
Else If db_schema == 2 Then
db.execute( "CREATE VIEW temp.EmployeeX AS
SELECT Name, Phone, Address
FROM main.Employee;" )
End If
#Later in code
data <- db.getdata("SELECT Name, Address
FROM EmployeeX")
If you're really averse to conditional statements for the schema this may still be annoying, but it would at least reduce/eliminate conditional statements throughout the code--ideally occurring as part of the connection logic at one location in the code.
You might further notice that this pattern is really what object-oriented programming is supposed to solve. There's no mention of the language in the question, but a well-designed object model could be created in a similar fashion so that all database access is done through a unified interface. The implementation details for different schemas are internal to different objects that derive (i.e. implement interfaces and/or inherit from base class) from a basic set of interfaces. Consider the language you're using to see if the problem could be solved this way.

Not able to have commands in User-Defined functions in Kusto

I am trying to create a function that will accept name of tag and a datetime value and drop a extent within a specific table which has that tag and then ingest a new record into that table with the same tag and the input datetime value -- sort of 'update' simulation. I am not bothered about performance, it's just going to hold metadata -- maybe 20-30 rows at max.
So this is how the create table looks:-
.create table MyTable(sometext:string,somevalue:datetime)
And shown below is my function creation step, which is failing:-
.create-or-alter function MyFunction(arg_sometext:string,arg_somedate:datetime)
{
.drop extents <| .show table MyTable extents where tags has arg_sometext;
.ingest inline into table MyTable with (tags="[arg_sometext]") <| arg_somedate
}
So you can see I am trying to do something simple -- I am suspecting that Kusto won't allow commands in a function. Is there any workaround for achieving this?
Generally:
Kusto mandates that control commands start with a dot (.), and that this must be the first character in the text of the command. As queries, functions, etc. don't start with a dot, this precludes them from invoking control commands.
This is an intentional limitation that prevents a wide range of code injection attacks. By imposing this rule, Kusto makes it easy to guarantee that any query that does not begin with a dot will only have read access to the data and metadata, and never be able to alter them.
Specifically: with regards to your specific scenario:
I'm assuming it's triggered automatically (even if you did have the option to create a function), which suggests you should be able to achieve your goal using Kusto's API / Client libraries and a simple script/app.
An alternative, and perhaps even better approach, would be to re-consider if you actually need to delete or update specific records, or you can use summarize arg_max() in order to query for only the latest "versions" of the records (you could also create a function which encapsulates that logic and overrides the table, by naming the function with the table's name).

How to use dynamic values while executing SQL scripts in R

My R workflow now involves dealing with a lot of queries (RPostgreSQL library). I really want to make code easy to maintain and manage in the future.
I started loading large queries from separate .SQL files (this helped) and it worked great.
Then I started using interpolated values (that helped) which means that I can write
SELECT * FROM table WHERE value = ?my_value;
and (after loading it into R) interpolate it using sqlInterpolate(ANSI(), query, value = "stackoverflow").
What happens now is I want to use something like this
SELECT count(*) FROM ?my_table;
but how can I make it work? sqlInterpolate() only interpolates safely by default. Is there a workaround?
Thanks
In ?DBI::SQL, you can read:
By default, any user supplied input to a query should be escaped using
either dbQuoteIdentifier() or dbQuoteString() depending on whether it
refers to a table or variable name, or is a literal string.
Also, on this page:
You may also need dbQuoteIdentifier() if you are creating tables or
relying on user input to choose which column to filter on.
So you can use:
sqlInterpolate(ANSI(),
"SELECT count(*) FROM ?my_table",
my_table = dbQuoteIdentifier(ANSI(), "table_name"))
# <SQL> SELECT count(*) FROM "table_name"
sqlInterpolate() is for substituting values only, not other components like table names. You could use other templating frameworks such as brew or whisker.

Determine flyway variables from earlier SQL step

I'd like to use flyway for a DB update with the situation that an DB already exists with productive data in it. The problem I'm looking at now (and I did not find a nice solution yet), is the following:
There is an existing DB table with numeric IDs, e.g.
create table objects ( obj_id number, ...)
There is a sequence "obj_seq" to allocate new obj_ids
During my DB migration I need to introduce a few new objects, hence I need new
object IDs. However I do not know at development time, what ID
numbers these will be
There is a DB trigger which later references these IDs. To improve performance I'd like to avoid determine the actual IDs every time the trigger runs but rather put the IDs directly into the trigger
Example (very simplified) of what I have in mind:
insert into objects (obj_id, ...) values (obj_seq.nextval, ...)
select obj_seq.currval from dual
-> store this in variable "newID"
create trigger on some_other_table
when new.id = newID
...
Now, is it possible to dynamically determine/use such variables? I have seen the flyway placeholders but my understanding is that I cannot set them dynamically as in the example above.
I could use a Java-based migration script and do whatever string magic I like - so, that would be a way of doing it, but maybe there is a more elegant way using SQL?
Many thx!!
tge
If the table you are updating contains only reference data, get rid of the sequence and assign the IDs manually.
If it contains a mix of reference and user data, you need to select the id based on values in other columns.

Resources