Airflow:Push XCOM from SQL File - airflow

Is it possible to push to xcom from the SQL file?
I have a SnowflakeOperator which executes the SQL file and I would like to pass the row count to the xcom
select count(*) from table

To access the results of the query, you can use the Snowflake hook with PythonOperator. Something along these lines:
def row_count(**kwargs):
dwh_hook = SnowflakeHook(snowflake_conn_id="snowflake_conn")
result = dwh_hook.get_first("select count(*) table")
return result
get_count = PythonOperator(task_id="get_count", python_callable=row_count)
source
The return value of python_callable is automatically pushed to XCOM.
You can check out the code of the hook for more interesting functions.

Related

How to execute MariaDB stored procedure from azure data factory?

I wanted to 'Call' MariaDB Procedure from Azure Data Factory.
How can this be achieved, are there any other service which can be integrated with ADF to call this MariaDB procedures
I tried calling the procedure by writing the query using lookup activity.
It fails while showing this error.
ErrorCode=InvalidParameter,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=The value of the property 'columns' is invalid: 'Value cannot be null.
Parameter name: columns'.,Source=,''Type=System.ArgumentNullException,Message=Value cannot be null.
Parameter name: columns,Source=Microsoft.DataTransfer.Common,'
Lookup activity reads and returns the content of the query. I tried to repro this by creating three stored procedures in Azure SQL database for Maria DB.
First Stored procedure is written to update the data in the table.
DELIMITER $$
CREATE PROCEDURE update_inventory()
BEGIN
UPDATE inventory SET quantity = 150
WHERE id = 1;
END$$
DELIMITER ;
When this procedure is called in ADF lookup activity, error occurs.
Second stored procedure is written with select query.
DELIMITER $$
CREATE PROCEDURE select_inventory()
BEGIN
select * from inventory;
END$$
DELIMITER ;
When this SP is called, ADF pipeline is executed successfully.
In order to execute the stored procedure with update statements (or any statements), a select statement is added in the Stored procedure.
DELIMITER $$
CREATE PROCEDURE update_select_inventory()
BEGIN
UPDATE inventory SET quantity = 150
WHERE id = 1;
select * from inventory;
END$$
DELIMITER ;
When this stored procedure is called through Lookup activity, it got executed successfully.
Try adding select statement in the stored procedure and execute it in Lookup activity. Or add Select statement after Call stored procedure statement.
By selecting the 'query' option, you can call the stored procedure using lookup activity. From your error message, it looks like you are missing the parameter columns while calling the stored procedure.
Did you try executing the same code using the client tools like MySQL workbench? If you can execute the stored proc from other client tools, then you should be able to execute the same using the lookup activity.
I tested from my end and was able to execute the Stored procedure using lookup activity. Please see the below screenshot for your reference.

Passing Trigger DAG Id value xcom

I am building a parent dag that basically does a conditional trigger of another dag depending on a xcom value.
Say, I have the following dags in my system
Load Contracts
Load Budget
Load Sales
Trigger Data Loads (Master)
In my master DAG, I have two tasks - (1) Check File Name (2) Trigger appropriate dag based on file name. If File Name is Sales, then trigger 'Load Sales' and like wise for Budget / Contracts.
Here is how my TriggerDAGRunOperator is configure
def _check_file_version(**context):
context["task_instance"].xcom_push(key="file_name",value="Load Sales")
with dag:
completion_file_check = PythonOperator(
task_id="completion_file_check"
python_callable=_check_file_version,
provide_context=True
)
trigger_dependent_dag = TriggerDagRunOperator(
task_id = "trigger_dependent_dag",
provide_context=True,
trigger_dag_id={{ task_instance.xcom_pull(task_ids='completion_file_check', key='file_name') }},
wait_for_completion=True
)
I want to modify the trigger_dag_id value based on the filename. Is there a way to pull in xcom variable value into it? Looking at this link -DAG Dependencies, I see that this value is jinja templated i.e. can be modified using the variables. However, my use case is to have it configured via xcom pull. Can it be done?
When I put in the xcom_pull as suggested by the replies, I get syntax error ass shown below
You can use ti.xcom_pull to get xcom value that you pushed in previous task.
{{ ti.xcom_pull(task_ids='sales_taskid', key='filename_key') }}
XCom is identified by key, task_id and dag_id. For your requirement, you can use xcom_pull where you can provide task id and keys. XCom is used for small amounts of data and larger values are not allowed. If the task auto-pushes the results into the XCom key called return_value then, xcom_pull will use that as default value for keys. For more information, you can check this link.
If you are auto-pushing the results into key, you can use below code :
value = task_instance.xcom_pull(task_ids='pushing_task')
For trigger_dag_id value based on the filename, you can use it in template as below :
{{ task_instance.xcom_pull(task_ids='your_task_id', key='file_name') }}

How to insert/ingest Current timestamp into kusto table

I am trying to insert current datetime into table which has Datetime as datatype using the following query:
.ingest inline into table NoARR_Rollout_Status_Dummie <| #'datetime(2021-06-11)',Sam,Chay,Yes
Table was created using the following query:
.create table NoARR_Rollout_Status_Dummie ( Timestamp:datetime, Datacenter:string, Name:string, SurName:string, IsEmployee:string)
But when I try to see data in the table, I could not see TimeStamp being filled. Is there anything I am missing?
the .ingest inline command parses the input (after the <|) as a CSV payload. therefore you cannot include variables in it.
an alternative to what you're trying to do would be using the .set-or-append command, e.g.:
.set-or-append NoARR_Rollout_Status_Dummie <|
print Timestamp = datetime(2021-06-11),
Name = 'Sam',
SurName = 'Chay',
IsEmployee = 'Yes'
NOTE, however, that ingesting a single or a few records in a single command is not recommended for production scenarios, as it created very small data shards and could negatively impact performance.
For queued ingestion, larger bulks are recommended: https://learn.microsoft.com/en-us/azure/data-explorer/kusto/api/netfx/kusto-ingest-best-practices#optimizing-for-throughput
otherwise, see if your use case meets the recommendations of streaming ingestion: https://learn.microsoft.com/en-us/azure/data-explorer/ingest-data-streaming

R odbc does not return SCOPE_IDENTITY after insert

I want to insert a new row into a database using R and the odbc package.
The setup is as follow:
I use SQL Server to host the database
I have setup an ODBC Driver on my machine in order to connect to the database
I created a new table
CREATE TABLE [dbName].[dbSchema].[TestTableName]
(
MyID INT IDENTITY(1,1),
MyValue VARCHAR(255),
PRIMARY KEY (MyID)
)
As far as I understand the problem it doesn't matter how the table was created.
Now I want to insert a new entry to this table and keep the new auto incremented value of MyID. To this end I can run the following SQL statement:
INSERT INTO [dbName].[dbSchema].[TestTableName] (MyValue)
VALUES ('test');
SELECT SCOPE_IDENTITY() as newID;
When I run this statement in SQL Server Management Studio, it happily returns a 1x1 table containing the new ID.
And now my problem starts: when I try to do the same form within R by using:
con <- odbc::dbConnect(
odbc::odbc(),
dsn = ...,
database = "dbName",
UID = ...,
PWD = ...
) # the connection is tested and works
createQuery <- "INSERT INTO [dbName].[dbSchema].[TestTableName] (MyValue ) VALUES ('test'); SELECT SCOPE_IDENTITY() as newID;"
dbSendRes <- odbc::dbSendQuery(conn = con, statement = createQuery)
The result in dbSendRes is:
<OdbcResult>
SQL INSERT INTO [dbName].[dbSchema].[TestTableName] (MyValue ) VALUES ('test'); SELECT SCOPE_IDENTITY() as newID;;
ROWS Fetched: 0 [complete]
Changed: 1
Hence, the insert statement is performed successfully on the server, however I do not get the newly assigned ID as a result. The content of dbFetch is an empty data.frame. I also tried using dbSendStatement and dbExecute without any progress. I also tried digging into the additional parameters like immediate=TRUE without any success. I also tried to switch back to the old RODBC package and use the methods therein. Nothing works.
Am I missing something?
Problem remains that I somehow have to keep track on the new ID which is assigned to the db entry! Without that ID I can't proceed. I could do a workaround and query the max ID after complete insert by another separate statement, however this may of course lead to other more fundamental problems if in the meantime another statement would insert another entry...
Happy for any suggestions!
Best
Based on zero knowledge of R... To have the INSERT return the newly created id you could add an OUTPUT clause to the INSERT statement. It appears this would work.
Something like this
INSERT INTO [dbName].[dbSchema].[TestTableName] (MyValue )
output inserted.MyID
VALUES
('test');

Why is my Airflow MySqlOperator Insert Command Denied?

I'm running a dag with an insert query. Here is some of the code:
QUERY = '''
INSERT INTO bi.target_skus (skus)
SELECT
distinct od.sku,
FROM
bi.orders as od'''
t1 = MySqlOperator(
sql=QUERY,
mysql_conn_id = MYSQL_CONN_ID,
task_id='target_skus',
dag=dag)
It's giving me the following error:
ERROR - (1142, "INSERT command denied to user 'xyz' for table 'target_skus'")
A few notes:
Devops said my user has permission to make inserts into that table
Select commands work fine
The error message does not include the database name (bi) even though my insert query does.
This looks like a standard MySQL "not enough privileges" error.
Are you sure you can perform INSERTs with your user, regardless of what your DBA is saying? You should test the same operation using another tool (like MySQL Workbench) setting up the connection in the same way you set it up in Airflow, i.e. same user, same password, same default schema.
It looks like a privilege error from the user trying to insert but there is a comma in the insert that can cause problems too:
QUERY = '''
INSERT INTO bi.target_skus (skus)
SELECT
distinct od.sku
FROM
bi.orders as od'''

Resources