problem passing values via XCom in my Airflow DAG - airflow

I'm trying to pass a value from one Task to another in my DAG.
I have this piece of code in my DockerOperator task:
'''ti = kwargs['ti']'''
'''command=ti.xcom_pull(key='return_value', task_ids=['run_this']),'''
Yet I am get error 'invalid syntax' from this line with 'ti.xcom_pull' above.
This DAG has two tasks, the first one is Task ID run_this which calls a Python Callable Function called run_this_now that has this as its return statement:
return payload
I am looking at the UI under Admin-->XComs and see the key 'return_value' with its value as shown to be returned by task ID 'run_this'.
So I know I have an XCom with a return value and yet the syntax is still giving me this error.
I am running Airflow 2.0.2

The fix was to use double {{'s under the DockerOperator
I didn't need to reference the variable as the run_this function only returned one value via 'return' statement
command="{{ti.xcom_pull(task_ids = 'run_this')}}",

Related

How to pull value from dictionary that pushed to Airflow XCom

Let's say I have some Airflow operator, and one of the arguments to the operator needs to take the value from the xcom. I've managed to do it in the following way -
f"model_id={{{{ ti.xcom_pull(task_ids='train', key='{task_id}')}}}}"
Where model_id is the argument name to the docker operator the airflow runs and task_id is the name of the key for that value in the xcom.
Now I want to do something more complex and save under task_id a dictionary instead of one value, and be able to take it from it somehow.
Is there a similar way to do it to the one I mentioned above? something like -
f"model_id={{{{ ti.xcom_pull(task_ids='train', key='{task_id}')}}}}[value]"
By default, all the template_fields are rendered as strings.
However Airflow offers the option to render fields as native Python objects.
You will need to set you DAG as:
dag = DAG(
...
render_template_as_native_obj=True,
)
You can see example of how to render as dictionary in the docs.
My answer for a similar issue was this.
f"model_id={{{{ ti.xcom_pull(task_ids='train', key='{task_id}')[value]}}}}"

XComs Not visible in Xcom Section in composer-1.17.1-airflow-2.1.2

I have created a task in my airflow DAG that returns some value and that value should be visible in xcom section on airflow but its not.
I have set do_xcom_push = True
but still it wont show.
I was using Airflow 2.0.0 before this and the same task used to push returned value to xcom but in Airflow 2.1.2 its not happening.
I dont understand what am I missing.
Here is a snippet of airflow task:
task = python_operator.PythonOperator(task_id="invoke_cf",python_callable=invoke_cloud_function,do_xcom_push=True)
PythonOperator pushes the return value from the python callable to Xcom by default. There is no need to specify do_xcom_push=True - It's the default value of BaseOperator (See source code)
if nothing is pushed to Xcom it means that the function invoke_cloud_function doesn't return anything.

Airflow - How to pass data the output of one operator as input to another task

I have a list of http endpoints each performing a task on its own. We are trying to write an application which will orchestrate by invoking these endpoints in a certain order. In this solution we also have to process the output of one http endpoint and generate the input for the next http enpoint. Also, the same workflow can get invoked simultaneously depending on the trigger.
What I have done until now,
1. Have defined a new operator deriving from the HttpOperator and introduced capabilities to write the output of the http endpoint to a file.
2. Have written a python operator which can transfer the output depending on the necessary logic.
Since I can have multiple instances of the same workflow in execution, I could not hardcode the output file names. Is there a way to make the http operator which I wrote to write to some unique file names and the same file name should be available for the next task so that it can read and process the output.
Airflow does have a feature for operator cross-communication called XCom
XComs can be “pushed” (sent) or “pulled” (received). When a task pushes an XCom, it makes it generally available to other tasks. Tasks can push XComs at any time by calling the xcom_push() method.
Tasks call xcom_pull() to retrieve XComs, optionally applying filters based on criteria like key, source task_ids, and source dag_id.
To push to XCOM use
ti.xcom_push(key=<variable name>, value=<variable value>)
To pull a XCOM object use
myxcom_val = ti.xcom_pull(key=<variable name>, task_ids='<task to pull from>')
With bash operator , you just set xcom_push = True and the last line in stdout is set as xcom object.
You can view the xcom object , hwile your task is running by simply opening the tast execution from airflow UI and clicking on the xcom tab.

Airflow - how to send email based on operator result?

I have a python script that is called from BashOperator.
The scripts return can return statuses 0 or 1.
I want to trigger email only when the status 1.
Note these statuses are not to be confused with Failure/Success. This is simply an indication that something was changed with the data and requires attention from the developer.
This is my operator:
t = BashOperator(task_id='import',
bash_command="python /home/ubuntu/airflow/scripts/import.py",
dag=dag)
I looked over the docs but all email related addressed the issue of On Failure which is irrelevant in my case.
If you don't want to override an operator or anything fancy, you might be able to use Xcoms and the BranchPythonOperator
If your condition is based on a 0 or a 1, you can just push that value to XCom (set xcom_push to True).
Then, you can use the PythonBranchOperator to check that value, and use that value to execute the appropriate task. You can find an example of the BranchPythonOperator and pulling from XCom in the Airflow example_dags.

Apache Airflow - How to retrieve dag_run data outside an operator in a flow triggered with TriggerDagRunOperator

I set up two DAGs, let's call the first one orchestrator and the second one worker. Orchestrator work is to retrieve a list from an API and, for each element in this list, trigger the worker DAG with some parameters.
The reason why I separated the two workflows is I want to be able to replay only the "worker" workflows that fail (if one fails, I don't want to replay all the worker instances).
I was able to make things work but now I see how hard it is to monitor, as my task_id are the same for all, so I decided to have dynamic task_id based on a value retrieved from the API by "orchestrator" workflow.
However, I am not able to retrieve the value from the dag_run object outside an operator. Basically, I would like this to work :
with models.DAG('specific_workflow', schedule_interval=None, default_args=default_dag_args) as dag:
name = context['dag_run'].name
hello_world = BashOperator(task_id='hello_{}'.format(name), bash_command="echo Hello {{ dag_run.conf.name }}", dag=dag)
bye = BashOperator(task_id='bye_{}'.format(name), bash_command="echo Goodbye {{ dag_run.conf.name }}", dag=dag)
hello_world >> bye
But I am not able to define this "context" object. However, I am able to access it from an operator (PythonOperator and BashOperator for instance).
Is it possible to retrieve the dag_run object outside an operator ?
Yup it is possible
What I tried and worked for me is
In the following code block, I am trying to show all possible ways to use the configurations passed,
directly to different operators
pyspark_task = DataprocSubmitJobOperator(
task_id="task_0001",
job=PYSPARK_JOB,
location=f"{{{{dag_run.conf.get('dataproc_region','{config_data['cluster_location']}')}}}}",
project_id="{{dag_run.conf['dataproc_project_id']}}",
gcp_conn_id="{{dag_run.conf.gcp_conn_id}}"
)
So either you can use it like
"{{dag_run.conf.field_name}}" or "{{dag_run.conf['field_name']}}"
Or
If you want to use some default values in case the configuration field is optional,
f"{{{{dag_run.conf.get('field_name', '{local_variable['field_name_0002']}')}}}}"
I don't think it's easily possible currently. For example, as part of the worker run process, the DAG is retrieved without any TaskInstance context provided besides where to find the DAG: https://github.com/apache/incubator-airflow/blob/f18e2550543e455c9701af0995bc393ee6a97b47/airflow/bin/cli.py#L353
The context is injected later: https://github.com/apache/incubator-airflow/blob/c5f1c6a31b20bbb80b4020b753e88cc283aaf197/airflow/models.py#L1479
The run_id of the DAG would be good place to store this information.

Resources