I know there is a way to delete a DAG. But is it possible to delete a DAG run with a specific run_id? Something like:
airflow delete_dag_run <dag_id> <run_id>
To delete a DAG Run from the Airflow UI:
Browse > "DAG Runs".
Select the DAG Runs you want to delete with the checkboxes on the left.
"With selected" > Delete.
You can also delete the "DAG Runs" from airflow database:
DELETE FROM dag_run
WHERE dag_id='my_dag_id' AND
state='STATE_I_WANT_TO_DELETE'
Related
I have been trying past 2 days to resolve this. There is a DAG python script which I created and saved it in the dags folder in airflow which is being referred to in the "airflow.cfg" file. The other dags are getting updated except for one dag. I tried to restart scheduler and also tried to reset the airflow db using airflow db reset and then tried airflow db init once again but still the same issue exists.
Some ideas on what you could check:
Do all of your DAGs have a unique dag_id? (I lost a few hours to this once, if two dags have the same name, the scheduler will randomly pick one to display with every dag_dir_list_interval)
If you are using a the #dag decorator: are you calling the DAG below its definition? Like so:
from airflow.decorators import dag, task
from pendulum import datetime
#dag(
dag_id="unique_name",
start_date=datetime(2022,12,10),
schedule=None,
catchup=False
)
def my_dag():
#task
def say_hi():
return "hi"
say_hi()
# without this line the DAG will not show up in the UI
my_dag()
What is the output of airflow run dags list and airflow run dags list-import-errors ?
If you have a lot of DAGs in your environment you might want to increase the dagbag_import_timeout.
Does your DAG work if thrown into a new Airflow instance (the easiest way to check is by spinning up a project with the Astro CLI and putting the dag into the dags folder created by astro dev init)
Disclaimer: I work at Astronomer, who develops the Astro CLI as an OS project.
I have like 30 tasks in one DAG. At times, I may want to run each task separately.Could anyone please let me know whether I can run the 30 tasks separately on a need basis?
Also, looks like either I can create a DAG with all the 30 tasks or create separate DAG each with one task. Which one is better? When to use one DAG with many tasks and when to use one DAG with one task (ending up with many DAGs)
Thanks in advance!
You can run a single airflow task using the CLI
From the docs:
airflow tasks run [-h] [--cfg-path CFG_PATH] [--error-file ERROR_FILE] [-f]
[-A] [-i] [-I] [-N] [-l] [-m] [-p PICKLE] [--pool POOL]
[--ship-dag] [-S SUBDIR]
dag_id task_id execution_date_or_run_id
Choosing how to structure your dags and tasks will depend on the problem you are trying to solve
I have the following DAG defined in code:
from datetime import timedelta, datetime
import airflow
from airflow import DAG
from airflow.operators.docker_operator import DockerOperator
from airflow.contrib.operators.ecs_operator import ECSOperator
default_args = {
'owner': 'airflow',
'retries': 1,
'retry_delay': timedelta(minutes=5),
'start_date': datetime(2018, 9, 24, 10, 00, 00)
}
dag = DAG(
'data-push',
default_args=default_args,
schedule_interval='0 0 * * 1,4',
)
colors = ['blue', 'red', 'yellow']
for color in colors:
ECSOperator(dag=dag,
task_id='data-push-for-%s' % (color),
task_definition= 'generic-push-colors',
cluster= 'MY_ECS_CLUSTER_ARN',
launch_type= 'FARGATE',
overrides={
'containerOverrides': [
{
'name': 'push-colors-container',
'command': [color]
}
]
},
region_name='us-east-1',
network_configuration={
'awsvpcConfiguration': {
'securityGroups': ['MY_SG'],
'subnets': ['MY_SUBNET'],
'assignPublicIp': "ENABLED"
}
},
)
This should create a DAG with 3 tasks, one for each color in my colors list.
This seems good, when i run:
airflow list_dags
I see my dag listed:
data-push
And when I run:
airflow list_tasks data-push
I see my three tasks appear as they should:
data-push-for-blue
data-push-for-red
data-push-for-yellow
I then test run one of my tasks by entering the following into terminal:
airflow run data-push data-push-for-blue 2017-1-23
And this runs the task, which I can see appear in my ECS cluster on the aws dashboard so I know for a fact the task runs on my ECS cluster and the data is pushed succesfully and everything is great.
Now when I try and run the DAG data-push from the Airflow UI is where i run into a problem.
I run:
airflow initdb
followed by:
airflow webserver
and now go into the airflow UI at localhost:8080.
I see the dag data-push in the list of dags, click it, and then to test run the entire dag i click the "Trigger DAG" button. I don't add any configuration json and then click 'Trigger'. The tree view for the DAG then shows a green circle on the right of the tree structure, seemingly indicating the DAG is 'running'. But the green circle just stays there for ages and when I manually check my ECS dashboard I see no tasks actually running so nothing is happening after triggering the DAG from the Airflow UI, despite the tasks working when i manually run them from the CLI.
I am using the SequentialExecutor if that matters.
My two main theories as to why the triggering the DAG does nothing when running the individual tasks from the CLI works are that maybe I am missing something in my python code where I define the dag (maybe because I dont specifiy any dependencies for the tasks?) or that I am not running the airflow scheduler but if I am manually triggering the DAGS from the Airflow UI i don't see why the scheduler would need to be running and why it wouldn't show me an error saying this is a problem.
Any ideas?
Sounds like you did not unpause your dag:
Toggle On/Off switch in the upper left of Web UI or using cli: airflow unpause <dag_id>.
in airflow show information about Broken DAG
Broken DAG: [/data/airflow/dags/copy_from_Oracle_to_MySQL.py] No module named Oracle_to_MySQL_plugin
I tried to move file with DAG copy_from_Oracle_to_MySQL.py from /data/airflow/dags.
But airflow showing information
Broken DAG: [/data/airflow/dags/copy_from_Oracle_to_MySQL.py] No module named Oracle_to_MySQL_plugin
What I must to do that information about Broken DAG clean from GUI?
So removing the DAG file won't clean them all, you will need to delete all dag_id in your metadata DB then UI won't show them, you can find a list of tables here: https://issues.apache.org/jira/browse/AIRFLOW-1002.
Where did you place Oracle_to_MySQL_plugin? Did you check is it in your pythonpath?
I have a dag that we'll deploy to multiple different airflow instances and in our airflow.cfg we have dags_are_paused_at_creation = True but for this specific dag we want it to be turned on without having to do so manually by clicking on the UI. Is there a way to do it programmatically?
I created the following function to do so if anyone else runs into this issue:
import airflow.settings
from airflow.models import DagModel
def unpause_dag(dag):
"""
A way to programatically unpause a DAG.
:param dag: DAG object
:return: dag.is_paused is now False
"""
session = airflow.settings.Session()
try:
qry = session.query(DagModel).filter(DagModel.dag_id == dag.dag_id)
d = qry.first()
d.is_paused = False
session.commit()
except:
session.rollback()
finally:
session.close()
airflow-rest-api-plugin plugin can also be used to programmatically pause tasks.
Pauses a DAG
Available in Airflow Version: 1.7.0 or greater
GET - http://{HOST}:{PORT}/admin/rest_api/api?api=pause
Query Arguments:
dag_id - string - The id of the dag
subdir (optional) - string - File location or directory from which to
look for the dag
Examples:
http://{HOST}:{PORT}/admin/rest_api/api?api=pause&dag_id=test_id
See for more details:
https://github.com/teamclairvoyant/airflow-rest-api-plugin
supply your dag_id and run this command on your command line.
airflow pause dag_id.
For more information on the airflow command line interface: https://airflow.incubator.apache.org/cli.html
I think you are looking for unpause ( not pause)
airflow unpause DAG_ID
The following cli command should work per the recent docs.
airflow dags unpause dag_id
https://airflow.apache.org/docs/apache-airflow/stable/cli-and-env-variables-ref.html#unpause
Airflow's REST API provides a way using the DAG patch API: we need to update the dag with query parameter ?update_mask=is_paused and send boolean as request body.
Ref: https://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html#operation/patch_dag
airflow pause dag_id.
has been discontinued.
You will have to use:
airflow dags pause dag_id
You can do this using in the python operator of any dag to pause and unpause the dags programatically . This is the best approch i found instead of using cli just pass the list of dags and rest is take care
from airflow.models import DagModel
dag_id = "dag_name"
dag = DagModel.get_dagmodel(dag_id)
dag.set_is_paused(is_paused=False)
And just if you want to check if it is paused or not it will return boolean
dag.is_paused()