From Airflow manual at https://airflow.apache.org/tutorial.html#testing, I found that I can run something like following to test a specific task:
airflow test dag_id task_id
When I did, I only got this message:
[2018-07-10 18:29:54,346] {driver.py:120} INFO - Generating grammar tables from /usr/lib/python2.7/lib2to3/Grammar.txt
[2018-07-10 18:29:54,367] {driver.py:120} INFO - Generating grammar tables from /usr/lib/python2.7/lib2to3/PatternGrammar.txt
[2018-07-10 18:29:54,477] {__init__.py:45} INFO - Using executor SequentialExecutor
[2018-07-10 18:29:54,513] {models.py:189} INFO - Filling up the DagBag from /var/lib/airflow/dags
It doesn't look like it is really running it. Am I misunderstood? Or is there another way to run a DAG locally?
I copied this example call from the paragraph in the page you have linked to:
# command layout: command subcommand dag_id task_id date
# testing print_date
airflow test tutorial print_date 2015-06-01
# testing sleep
airflow test tutorial sleep 2015-06-01
So just include the date as shown above and the DAG task should run as expected.
for airflow version 2.4.0
airflow tasks test tutorial sleep 2015-06-01
Related
I have a simple DAG which connects to an impala db and runs an sql script. The dag runs fine when running independently:
airflow dags test original_dag_name 2022-9-27
However, when I test using TriggerDagRunOpertor from another DAG, the DAG fails:
airflow tasks test other_dag_name trigger_task 2022-9-27
Looking at the logs for original_dag_name I see the following:
[Cloudera][ODBC] (11560) Unable to locate SQLGetPrivateProfileString function
...which appears to be driver related, which doesn't make sense as it works fine when I trigger the DAG on its own. Is there some sort of config not getting set correctly when triggering via TriggerDagRunOperator?
Here is the TriggerDagRunOperator task:
task_run_original_dag = TriggerDagRunOperator (
task_id='run_original_dag',
trigger_dag_id='original_dag_name',
execution_date='{{ ds }}',
reset_dag_run=True,
wait_for_completion=True,
poke_interval=60
)
I'm using experimental api of airflow
and would like to failed running DAG
i didn't see it in documentation
i tried the below, but it creating new running dag/task
would be glad to get the right experimental api/method/payload for it
thanks in advnace
'''session.post(
url=f'{airflow_env}/api/experimental/dags/{dag_name}/dag_runs',
json={'state':'failed',
"dag_run_id":"scheduled__2022-03-13T20:30:32.887761+00:00"})
'''
On Airflow 2 my dag is not showing on the UI, and I'm getting DAG Import Errors (...) for it.
The error message is insufficient for me to debug (it's a custom operator, with a lot of custom logic - so I don't want to get into details of the error itself).
On Airflow 1.X I could use cli:
airflow list_dags
to get more elaborated debug message, is there anything analogical on airflow 2 ?
I'm looking for a cli command/UI option that will provide me with more elaborated error message, than the one I'm getting on the main screen of the webserver.
As described in the Airlfow's documentation, to test DAG loading you can simply run:
python your-dag-file.py
If there is any problem during the DAG loading phase you will get a stack trace here.
The later sections also describe how to test custom operators.
As explained in the upgrading manual the
airflow list_dags has been changed to airflow dags list
The full syntax is:
airflow dags list [-h] [-o table, json, yaml] [-S SUBDIR]
for more information see docs
I have a dag which checks for new workflows to be generated (Dynamic DAG) at a regular interval and if found, creates them. (Ref: Dynamic dags not getting added by scheduler )
The above DAG is working and the dynamic DAGs are getting created and listed in the web-server. Two issues here:
When clicking on the DAG in web url, it says "DAG seems to be missing"
The listed DAGs are not listed using "airflow list_dags" command
Error:
DAG "app01_user" seems to be missing.
The same is for all other dynamically generated DAGs. I have compiled the Python script and found no errors.
Edit1:
I tried clearing all data and running "airflow run". It ran successfully but no Dynamic generated DAGs were added to "airflow list_dags". But when running the command "airflow list_dags", it loaded and executed the DAG, (which generated Dynamic DAGs). The dynamic DAGs are also listed as below:
[root#cmnode dags]# airflow list_dags
sh: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8\nLANG=en_US.UTF-8)
sh: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8\nLANG=en_US.UTF-8)
[2019-08-13 00:34:31,692] {settings.py:182} INFO - settings.configure_orm(): Using pool settings. pool_size=15, pool_recycle=1800, pid=25386
[2019-08-13 00:34:31,877] {__init__.py:51} INFO - Using executor LocalExecutor
[2019-08-13 00:34:32,113] {__init__.py:305} INFO - Filling up the DagBag from /root/airflow/dags
/usr/lib/python2.7/site-packages/airflow/operators/bash_operator.py:70: PendingDeprecationWarning: Invalid arguments were passed to BashOperator (task_id: tst_dyn_dag). Support for passing such arguments will be dropped in Airflow 2.0. Invalid arguments were:
*args: ()
**kwargs: {'provide_context': True}
super(BashOperator, self).__init__(*args, **kwargs)
-------------------------------------------------------------------
DAGS
-------------------------------------------------------------------
app01_user
app02_user
app03_user
app04_user
testDynDags
Upon running again, all the above generated 4 dags disappeared and only the base DAG, "testDynDags" is displayed.
When I was getting this error, there was an exception showing up in the webserver logs. Once I resolved that error and I restarted the webserver it went through normally.
From what I can see this is the error that is thrown when the webserver tried to parse the dag file and there is an error. In my case it was an error importing a new operator I added to a plugin.
Usually, I check in Airflow UI, sometimes the reason of broken DAG appear in there. But if it is not there, I usually run the .py file of my DAG, and error (reason of DAG cant be parsed) will appear.
I never got to work on dynamic DAG generation but I did face this issue when DAG was not present on all nodes ( scheduler, worker and webserver ). In case you have airflow cluster, please make sure that DAG is present on all airflow nodes.
Same error, the reason was I renamed my dag_id in uppercase. Something like "import_myclientname" into "import_MYCLIENTNAME".
I am little late to the party but I faced the error today:
In short: try executing airflow dags report and/or airflow dags reserialize
Check out my comment here:
https://stackoverflow.com/a/73880927/4437153
I found that airflow fails to recognize a dag defined in a file that does not have from airflow import DAG in it, even if DAG is not explicitly used in that file.
For example, suppose you have two files, a.py and b.py:
# a.py
from airflow import DAG
from airflow.operators.dummy_operator import DummyOperator
def makedag(dag_id="a"):
with DAG(dag_id=dag_id) as dag:
DummyOperator(task_id="nada")
dag = makedag()
and
# b.py
from a import makedag
dag = makedag(dag_id="b")
Then airflow will only look at a.py. It won't even look at b.py at all, even to notice if there's a syntax error in it! But if you add from airflow import DAG to it and don't change anything else, it will show up.
I'm running Airflow and attempting to iterate on some task we're building from the command line.
When running a airflow webserver, everything works as expected. But when I run airflow backfill dag task '2017-08-12', airflow returns:
[2017-08-15 02:52:55,639] {__init__.py:57} INFO - Using executor LocalExecutor
[2017-08-15 02:52:56,144] {models.py:168} INFO - Filling up the DagBag from /usr/local/airflow/dags
2017-08-15 02:52:59,055 - airflow.jobs.BackfillJob - INFO - Backfill done. Exiting
...and doesn't actually run the dag.
When using airflow test or airflow run (i.e. commands involving running a task rather than a dag), it works as expected
Am I making a basic mistake? What can I do to debug from here?
Thanks
Have you run those DAG on that date range already? You will need to clear the DAG first then backfill. Base on what Maxime mentioned here: https://groups.google.com/forum/#!topic/airbnb_airflow/gMY-sc0QVh0
If a task has a #monthly schedule, then if you try and run it with a start_date mid-month, it will merely state Backfill done. Exiting.. If a task has a schedule of '30 5 * * *', this also prevents backfill from the command line
(Updated to reflect better information, and this discussion)
Two possible reasons:
Execution date specified via -e option is outside of the DAG's [start_date, end_date) range.
Even if execution date is between the dates, please keep in mind that if you DAG has schedule_interval=None then it won't backfill iteratively: it will only run for a single date (specified as --start_date or --end_date if the first is omitted).