This is my situation:
I am trying to access the instance of my custom operator running in another DAG. I am able to get the correct DagRun and TaskInstance objects by doing the following.
dag_bag = DagBag(settings.DAGS_FOLDER)
target_dag = dag_bag.get_dag('target_dag_id')
dr = target_dag.get_dagrun(target_dag.latest_execution_date)
ti = dr.get_task_instance('target_task_id')
I have printed the TaskInstance object aquired by the above lines and it is correct (it is running/has the correct task_id etc.), however when I am unable to access the task object, which would allow me to interface with the running operator. I should be able to do the following:
running_custom_operator = ti.task #AttributeError: TaskInstance has not attribute task
Any help would be much appreciated, either following my approach or if someone knows how to access the task object of a running task instance.
Thank you
You can simply grab the task object from the DAG object: target_dag.task_dict["target_task_id"]
Related
I followed the docs and created slack function:
It does work and I get notifications in channel, but get the name of the task and link to log are to another task, and not to the one that gets failed.
It gets the context of the upstream failed task, but not the failed task itself:
I tried with different operators and hooks, but get the same result.
If anyone could help, I would really appreciate it.
Thank you!
The goal of the on_failure_callback argument on the Dag level, is running this callback once when the DagRun fails, so we provide the context of the DagRun which is identical between the task instances, for that we don't care which task instance context we provide (I think we provide the context of the last defined task in the dag regardless its state).
If you want to run the callback on each failed ti, you can remove the on_failure_callback argument from the dag and add it to the default args: default_args=dict(on_failure_callback=task_fail_slack_alert).
I have an airflow DAG that is triggered externally via cli.
I have a requirement to change order of the execution of tasks based on a Boolean parameter which I would be getting from the CLI.
How do I achieve this?
I understand dag_run.conf can only be used in a template field of an operator.
Thanks in advance.
You can not change tasks dependency with runtime parameter.
However you can pass runtime parameter (with dag_run.conf) that according to it's value tasks will be executed or be skipped for that you need to place operators in your workflow that can handle this logic for example: ShortCircuitOperator, BranchPythonOperator
on Airflow, we currently are using the {{ prev_execution_date_success }} at the dag level to execute queries.
I was wondering if it was possible to have it by task (i.e. retrieving the last successful execution date of a task in particular and not of the whole DAG)
Thanks for your help
from the current DAG run you can access to the task instance and look up for the previous task in success state.
from airflow.models.taskinstance import TaskInstance
from airflow.utils.state import State
ti = TaskInstance(task_id=your_task_id,
dag_id=your_task_id,
execution_date=execution_date)
prev_task_success_state = ti.get_previous_ti(state=State.SUCCESS)
Note that get_previous_ti returns TaskInstance object thus you can access anything related to the task.
Is it possible to somehow extract task instance object for upstream tasks from context passed to python_callable in PythonOperator. The use case is that I would like to check status of 2 tasks immediately after branching to check which one ran and which one is skipped so that I can query correct task for return value via xcom.
Thanks
Before executing the DAG, I want to check whether a particular connection id is present in the connection list or not. I dont have any mechanismn of retaining a connection. Even if I create a connection through GUI, when server reboots all the connections gets removed.
Following is the task I thought I should add but thenI got an ascii error when I ran it, may be because the command return a table that might not be adequately parsed by the logger.
def create_connection(**kwargs):
print(kwargs.get('ds'))
list_conn = BashOperator(
task_id='list_connections',
bash_command='airflow connections --l',
xcom_push=True)
conns = list_conn.execute(context=kwargs)
logging.info(conns)
if not conns:
new_conn = Connection(conn_id='xyz', conn_type='s3',
host='https://api.example.com')
session = settings.Session()
session.add(new_conn)
session.commit()
logging.info('Connection is created')
Question: Is there any way I would get to know in Airflow DAG itself that the connection is added or not. If its already there then I would not create a new connection.
session.query(Connection) should do the trick.
def list_connections(**context):
session = settings.Session()
return session.query(Connection)
list_conn = PythonOperator(
task_id='list_connections',
python_callable=list_connections,
provide_context=True,
)
Please make sure all the code is contained within tasks. Or to phrase it correctly, they should execute during run time instead of load time. Adding the code directly in DAG file cause it to run during load time which is not recommended.
The accepted answers work perfectly. I had a scenario where I needed to get a connection by connection id to create the DAG. So I had to get it outside the task and in the DAG creation itself.
The following code worked for me:
from airflow.hooks.base_hook import BaseHook
conn = BaseHook.get_connection(connection)
Hope this might help someone! :)