how to print the value of a variable in airflow - airflow

I'm trying to print the content of a variable computed in airflow dag ,therefore I used an echo in a bash operator, but It doesn't work
I tried with a predefined variable but, I got the same output
here is an example :
with DAG(
dag_id="tmp",
) as dag:
test="test"
test_dag = BashOperator(
task_id="test_dag",
bash_command='echo $test',
)
the output is always empty :
Output:
INFO -

You can pass the variable in a f-string. You can even set the task_id based on your variable. See:
with DAG(
dag_id="tmp",
) as dag:
test="test"
test_dag = BashOperator(
task_id=f"t1_{test}",
bash_command=f"echo {test}",
)

You can also use the env parameter like so:
with DAG(dag_id="tmp", start_date=datetime(2022, 1, 1), schedule_interval=None) as dag:
test_dag = BashOperator(
task_id="t1",
bash_command="echo $test",
env={"test": "something"},
)

Related

Can take xcom value from the dag, but not within PythonOperator

I am trying to pull an xcom from within a PythonOperator, but I think I might be missing an element of understanding.
This is my dag :
# dag arguments
default_args = {
"start_date": datetime(2022, 11, 23),
"retries": 1,
"retry_delay": timedelta(minutes=5),
# "max_active_runs": 3,
"schedule_interval": "#daily",
"catchup": False, # enable if you don't want historical dag runs to run
# "dagrun_timeout": timedelta(minutes=20),
}
with DAG(
dag_id="migrate_dtm",
default_args=default_args,
) as dag:
load_data_migration_mapping = PythonOperator(
task_id="load_data_migration_mapping",
python_callable=load_data_migration_map,
)
list_objects = GCSListObjectsOperator(
task_id="list_objects",
bucket=LEGACY_DTM_HISTORY_BUCKET,
prefix="raw_data/datamart/{{ds_nodash}}",
impersonation_chain=IMPERSONATED_SERVICE_ACCOUNT, # Only use in dev env
)
match_mapping_with_data = PythonOperator(
task_id="match_mapping_with_data",
python_callable=match_data_with_migration_map,
provide_context=True,
op_kwargs={
"initial_migration_map": "initial_migration_map",
# "files_and_prefixes": "{{task_instance.xcom_pull(task_ids='list_objects')}}",
"files_and_prefixes": "list_objects",
},
)
(load_data_migration_mapping >> list_objects >> match_mapping_with_data)
In my third PythonOperator : match_mapping_with_data, when I pass the following macro
"files_and_prefixes": "{{task_instance.xcom_pull(task_ids='list_objects')}}"
I can see that it is correctly interpreted within the logs. But when I try to only give it the list_objects task name, in order to perform an xcom_pull in the subsequent function match_data_with_migration_map I get a None.
A partial content of my function until the point where it breaks :
def match_data_with_migration_map(
ti,
**kwargs,
):
# Set arguments' (default) values
xcom_initial_migration_map = kwargs.setdefault(
"initial_migration_map",
"initial_migration_map",
)
xcom_files_and_prefixes = kwargs.setdefault(
"files_and_prefixes",
"list_objects",
)
# Parse Xcoms:
initial_migration_map = ti.xcom_pull(
key=xcom_initial_migration_map,
)
if not initial_migration_map:
err_msg = (
"The xcom initial_migration_map has not "
f"been correctly set : {initial_migration_map}"
)
raise XcomNULL(err_msg)
print(
f"xcom : initial_migration_map - Key: {xcom_initial_migration_map}, "
f"Value = {initial_migration_map}"
)
files_and_prefixes = ti.xcom_pull(
key=xcom_files_and_prefixes,
)
if not files_and_prefixes:
err_msg = (
"The xcom files_and_prefixes has not "
f"been correctly set : {files_and_prefixes}"
)
raise XcomNULL(err_msg)
print(
f"xcom : files_and_prefixes - Key: {xcom_files_and_prefixes}, "
f"Value = {files_and_prefixes}"
)
The custom class..
class XcomNULL(Exception):
pass
N.B : The initial_migration_map XCom is correctly pulled because I set it in the first PythonOperator: load_data_migration_mapping via ti.xcom_push(key="initial_migration_map")
EDIT (Kind of a workaround) :
It works when I set : "files_and_prefixes": "return_value",, how do I rename this XCom in order to have several within my dag run?

I want to pass arguments from dag to trigger another dag

The code below is a situation in which var1 and var2 are passed using the conf parameter when triggering another dag from the first dag.
trigger = TriggerDagRunOperator(
trigger_dag_id='dag2',
task_id="trigger",
wait_for_completion=True,
reset_dag_run=False,
poke_interval=30,
do_xcom_push=True,
execution_date="{{ execution_date }}",
conf={
"var1": "{{ task_instance.xcom_pull(task_ids='task1', key='var1') }}",
"var2": "{{ task_instance.xcom_pull(task_ids='task1', key='var2') }}",
},
dag=dag
)
In the second dag, I tried to print the var1 and var2 that are expected to be passed to conf.
def print_conf(**kwargs):
conf = kwargs['dag_run'].conf
print(conf)
print(conf['var1'])
print(conf['var2'])
print_op = PythonOperator(
task_id='print',
provide_context=True,
python_callable=print_conf,
dag=dag
)
But the output was that the values ​​of var1 and var2 were None.
{"var1": "None", "var2": "None"}
And even if I check the conf passed to the run in the airflow ui, the value was None.
How to pass arguments between dags through conf?
What could I have done wrong?
There are cases where it fails to import data saved with xcom_push to xcom_pull.
I solved it by using return_value from PythonOperator.
alter_table = EmrAddStepsOperator(
task_id="alter_table",
job_flow_name="rpda-emr",
cluster_states=["WAITING", "RUNNING"],
aws_conn_id="aws_default",
steps="{{ task_instance.xcom_pull(task_ids='init_variables', key='return_value', dag_id=task_instance.dag_id)['alter_table_step']
}}",
do_xcom_push=True,
dag=dag
)

Airflow Subdag tasks are stuck in None state while subdag is showing as Running

I have a problem with my dag getting stuck at subdag. The subdag is in RUNNING state but on zooming in all the tasks of the subdag are in None status.
Using Airflow 2.1.1 with LocalExecutor.
Below is the main dag:
default_args = {
'owner' : 'airflow',
'retries' : 1,
'depends_on_past' : False
}
dag = DAG('loop_example',
start_date = datetime(2022,1,1),
schedule_interval = None,
catchup = False,
tags=['loop']
)
## function to filter src_name based on a DB table/log file entry
def check_valid_src(src_name):
hook = MySqlHook(mysql_conn_id='mysql_conn')
sql='SELECT src_name FROM ingsted_src_log_table'
myresult=hook.get_records(sql)
valid_src_names = []
for src in myresult:
valid_src_names.append(src[0])
if src_name in valid_src_names:
return True
else:
return False
first = DummyOperator(task_id = 'first',dag=dag)
last = DummyOperator(task_id = 'last',dag=dag)
options = ['branch_a','branch_b','branch_c','branch_d']
for option in options:
if check_valid_src(option):
t = SubDagOperator(task_id = f'section_{option}',
subdag=subdag('loop_example',f'section_{option}',default_args,option),
dag=dag
)
first >> t >> last
subdag code:
def subdag(parent_dag_name, child_dag_name, args,option):
dag_subdag = DAG(
dag_id=f'{parent_dag_name}.{child_dag_name}',
default_args=args,
start_date = datetime(2022,1,1),
schedule_interval=None,
)
t1= BashOperator(
task_id=f'Echo_source_name',
bash_command = f'echo {option}',
default_args=args,
dag=dag_subdag
)
t2= BashOperator(
task_id=f'Echo_source_number',
bash_command = f'echo "{option}" | cut -d "_" f2',
default_args=args,
dag=dag_subdag,
)
t1 >> t2
return dag_subdag
Earlier the start_date of the main_dag and subdag was not same so I tried running again making the start_date as same but still it gets stuck.
Is there anything that I am missing here
You have to pass is_paused_upon_creation=False in subdag.
dag_subdag = DAG(
dag_id=f'{parent_dag_name}.{child_dag_name}',
default_args=args,
start_date = datetime(2022,1,1),
schedule_interval=None,is_paused_upon_creation=False
)

In Airflow- Can we get the output of sql executed in JdbcOperator?

Can we see or get the output of a sql executed in JdbcOperator?
with DAG(dag_id='Exasol_DB_Checks',schedule_interval= '#hourly',default_args=default_args,catchup=False,template_searchpath=tmpl_search_path) as dag:
start_task=DummyOperator(task_id='start_task',dag=dag)
sql_task_1 = JdbcOperator(task_id='sql_cmd',
jdbc_conn_id='Exasol_db',
sql = ['select current_timestamp;','select current_user from DUAL;',"test.sql"],
autocommit=True,
params={
"my_param": "{{ var.value.source_path }}"}
)
start_task >> sql_task_1
Maybe you can use a JdbcHook inside a PythonOperator for your needs:
task = PythonOperator(
task_id='task1',
python_callable=do_work,
dag=dag
)
def do_work():
jdbc_hook = JdbcHook(jdbc_conn_id="some_db"),
jdbc_conn = jdbc_hook.get_conn()
jdbc_cursor = jdbc_conn.cursor()
jdbc_cursor.execute('SELECT ......')
row = jdbc_cursor.fetchone()[0]
task1 > task2
https://airflow.apache.org/docs/stable/concepts.html#hooks

create tasks in Airflow by looping over a list and pass arguments

edit:
This will work, I defined ex_func_airflow(var_1 = i) which was causing the issue
I would like to create tasks in airflow by looping on a list.
tabs = [1,2,3,4,5]
for i in tabs:
task = PythonOperator(
task_id = name,
provide_context=False,
op_args = [i],
python_callable=ex_func_airflow,
dag=dag)
task_0 >> task >> task_1
When this is run in airflow the argument that is passed is always the last element in that list.
So i'm essentially running:
ex_func_airflow(6)
five times instead of running
ex_func_airflow(1)
ex_func_airflow(2)
ex_func_airflow(3)
..etc.
How would can I pass the correct arguments for each task?
The following codes work for me.
def print_context(ds, **kwargs):
print("hello")
def ex_func_airflow(i):
print(i)
dag = DAG(
dag_id="loop_dag",
schedule_interval=None,
start_date=datetime(2018, 12, 31),
)
task_0 = PythonOperator(
task_id='task_0',
provide_context=True,
python_callable=print_context,
dag=dag)
task_1 = PythonOperator(
task_id='task_1',
provide_context=True,
python_callable=print_context,
dag=dag)
tabs = [1, 2, 3, 4, 5]
for i in tabs:
task_id = f'task_tab_{i}'
task = PythonOperator(
task_id=task_id,
provide_context=False,
op_args=[i],
python_callable=ex_func_airflow,
dag=dag)
task_0 >> task >> task_1

Resources