I'm encountering a problem where valid dags that show up in airflow dags list don't show up in the UI
The dags_folder points to the right location and I have restarted the webserver and scheduler many times, as well as ran the occasional airflow db reset
I think that my airflow is just ignoring the cfg file, because I changed load_examples to False and all the example dags still show up
I'm on airflow 2.4.0 working in a venv on ubuntu. What could be causing this?
Related
I deleted dag from airflow dag_bag and corresponding .pyc file as well. When I try to delete the same dag from airflow UI it is showing this error:
Dag id MY_DAG_ID is still in DagBag. Remove the DAG file first.
The airflow version I am using is 1.10.4
Even after restarting airflow I'm not able to delete from UI. I was using 1.10.3 previously, but I never faced this issue. I was able to delete from UI after deleting from dags folder.
When I click on that dag in UI it is showing :
DAG "MY_DAG_ID" seems to be missing.( this is expected as I deleted dag from folder)
Try stopping the scheduler and the webserver and then deleting the DAG from the command line:
airflow delete_dag 'MY_DAG_ID'
I had the same issues after I upgraded to 1.10.6. Here's what I did:
Before removing the DAG, make sure no instance is on running, retry status. Then Pause it
Delete on UI or using the command airflow delete_dag dag_id
Restart the scheduler and webserver
Try to execute airflow list_dags to see if it really got deleted.
If it doesn't work, try to upgrade to the latest version.
When there is a task running, Airflow will pop a notice saying the scheduler does not appear to be running and it kept showing until the task finished:
The scheduler does not appear to be running. Last heartbeat was received 5 minutes ago.
The DAGs list may not update, and new tasks will not be scheduled.
Actually, the scheduler process is running, as I have checked the process. After the task finished, the notice will disappear and everything back to normal.
My task is kind of heavy, may running for couple hours.
I think it is expected for Sequential Executor. Sequential Executor runs one thing at a time so it cannot run heartbeat and task at the same time.
Why do you need to use Sequential Executor / Sqlite? The advice to switch to other DB/Executor make perfect sense.
You have started airflow webserver and you haven't started your airflow scheduler.
Run airflow scheduler in background
airflow scheduler > /console/scheduler_log.log &
I had the same issue.
I switch to postgresql by updating airflow.cfg file > sql_alchemy_conn =postgresql+psycopg2://airflow#localhost:5432/airflow
and executor = LocalExecutor
This link may help how to set this up locally
https://medium.com/#taufiq_ibrahim/apache-airflow-installation-on-ubuntu-ddc087482c14
A quick fix could be to run the airflow scheduler separately. Perhaps not the best solution but it did work for me. To do so, run this command in the terminal:
airflow scheduler
I had a similar issue and have been trying to troubleshoot this for a while now.
I managed to fix it by setting this value in airflow.cfg:
scheduler_health_check_threshold = 240
PS: Based on a recent conversation in Airflow Slack Community, it could happen due to contention at the Database side. So, another workaround suggested was to scale up the database. In my case, this was not a viable solution.
EDIT:
This was last tested with Airflow Version 2.3.3
I have solved this issue by deleting airflow-scheduler.pid file.
then
airflow scheduler -D
Check the airflow-scheduler.err and airflow-scheduler.log files.
I got an error like this:
Traceback (most recent call last):
File "/home/myVM/venv/py_env/lib/python3.8/site-packages/lockfile/pidlockfile.py", ine 77, in acquire
write_pid_to_pidfile(self.path)
File "/home/myVM/venv/py_env/lib/python3.8/site-packages/lockfile/pidlockfile.py", line 161, in write_pid_to_pidfile
pidfile_fd = os.open(pidfile_path, open_flags, open_mode)
FileExistsError: [Errno 17] File exists: '/home/myVM/venv/py_env/airflow-scheduler.pid'
I removed the existing airflow-scheduler.pid file and started the scheduler again by airflow scheduler -D. It was working fine then.
Our problem is that the file "logs/scheduler.log" is too large, 1TB. After cleaning this file everything is fine.
I had the same issue while using sqlite. There was a special message in Airflow logs: ERROR - Cannot use more than 1 thread when using sqlite. Setting max_threads to 1. If you use only 1 thread, the scheduler will be unavailable while executing a dag.
So if use sqlite, try to switch to another database. If you don't, check max_threads value in your airflow.cfg.
On Composer page, click on your environment name, and it will open the Environment details, go to the PyPIPackages tab.
Click on Edit button, increase the any package version.
For example:
I increased the version of pymsql packages, and this restarted the airflow environment, it took a while for it to update. Once it is done, I'm no longer have this error.
You can also add a Python package, it will restart the airflow environment.
I've had the same issue after changing the airflow timezone. I then restarted the airflow-scheduler and it works. You can also check if the airflow-scheduler and airflow-worker are on different servers.
In simple words, using LocalExecutor and postgresql could fix this error.
Running Airflow locally, following the instruction, https://airflow.apache.org/docs/apache-airflow/stable/start/local.html.
It has the default config
executor = SequentialExecutor
sql_alchemy_conn = sqlite:////Users/yourusername/airflow/airflow.db
It will use SequentialExecutor and sqlite by default, and it will have this "The scheduler does not appear to be running." error.
To fix it, I followed Jarek Potiuk's advice. I changed the following config:
executor = LocalExecutor
sql_alchemy_conn = postgresql://postgres:masterpasswordforyourlocalpostgresql#localhost:5432
And then I rerun the "airflow db init"
airflow db init
airflow users create \
--username admin \
--firstname Peter \
--lastname Parker \
--role Admin \
--email spiderman#superhero.org
After the db inited. Run
airflow webserver --port 8080
airflow scheduler
This fixed the airflow scheduler error.
This happens to me when AIRFLOW_HOME is not set.
By setting AIRFLOW_HOME to the correct path, the indicated executor will be selected.
If it matters: somehow, the -D flag causes a lot of problems for me. The airflow webserver -D immediately crashes after starting, and airflow scheduler -D somehow does next to nothing for me.
Weirdly enough, it works without the detach flag. This means I can just run the program normally, and make it run in the background, with e.g. nohup airflow scheduler &.
After change executor from SequentialExecutor to LocalExecutor, it works!
in airflow.cfg:
executor = LocalExecutor
I write something wrong in my sql_test.py,and run python sql_test.py,the error is 'no module named xxx',and in web-ui it shows a red error - Broken DAG.
And then I run airflow list_dags the same error occurs again .This is strange and I don't know what's happening.
I tried to run airflow delete_dags sql_test but there is no such id.
How can I :
remove the waning in web-ui
get sql_test out of list_dags
There's some syntactical mistake in your dag-definition file, resulting in failure in parsing the DAG. When Airflow fails to parse a DAG, several functionalities get broken (like list_dags in your case)
Of course deleting the problematic dag-definition file would fix it, but that's not a solution. So here's how you can understand what's wrong and fix it
From linux shell, go to Airflow's logs folder
cd $AIRFLOW_HOME/logs/scheduler/latest/
Run tree command to see directory structure
tree -I "__init__.py|__pycache__|*.pyc"
View the last few lines of the log file of your corresponding broken dag
tail -n 25 /path/to/my/broken-dag.py.log
This will give you the stack-trace that Airflow threw while trying to parse your broken dag file. That would hopefully help you diagnose the problem and patch it.
Once your dag-definition file is fixed
the broken dag message would disappear from UI
DAG would appear in the UI (refresh it a few times)
list_dags command would also start working
If you don't want to repair your DAG and ignore it, you can remove the unwanted DAG by specifying the DAG's underlying file in an .airflowignore file.
We installed Airflow 1.7, and used it for several months. I used PIP to uninstall ariflow 1.7 and install 1.9 (gory details are [here|airflow initdb failed: ImportError: No module named log.logging_mixin
Since then, I haven't had a single DAG run. I renamed and moved log files to match the 1.9 expectations, but still nothing happens.
I have a "run every 40 minutes" DAG, it hasn't run since 3/28. When I manually trigger it, no log file is created, nothing happens except I get a running DAG listed under "DAG Runs" (I do NOT get anything listed under "Recent Tasks", and "Last Run" does not get updated.
I have a "Run once" DAG that I created. I triggered it, same behavior.
I have also tried running the example_bash_operator DAG. Same behavior.
The Airflow documentation is a bit thin on all the requirements needed to run DAGs correctly.
Aside from the webserver, make sure the scheduler is running as well. Also check if the DAG is configured with a correct schedule and there is scheduling information in the "Task Instance Information" page.
See this answer for more "checkpoints": Airflow 1.9.0 is queuing but not launching tasks
Airflow seems to be skipping the dags I added to /usr/local/airflow/dags.
When I run
airflow list_dags
The output shows
[2017-08-06 17:03:47,220] {models.py:168} INFO - Filling up the DagBag from /usr/local/airflow/dags
-------------------------------------------------------------------
DAGS
-------------------------------------------------------------------
example_bash_operator
example_branch_dop_operator_v3
example_branch_operator
example_http_operator
example_passing_params_via_test_command
example_python_operator
example_short_circuit_operator
example_skip_dag
example_subdag_operator
example_subdag_operator.section-1
example_subdag_operator.section-2
example_trigger_controller_dag
example_trigger_target_dag
example_xcom
latest_only
latest_only_with_trigger
test_utils
tutorial
But this doesn't include the dags in /usr/local/airflow/dags
ls -la /usr/local/airflow/dags/
total 20
drwxr-xr-x 3 airflow airflow 4096 Aug 6 17:08 .
drwxr-xr-x 4 airflow airflow 4096 Aug 6 16:57 ..
-rw-r--r-- 1 airflow airflow 1645 Aug 6 17:03 custom_example_bash_operator.py
drwxr-xr-x 2 airflow airflow 4096 Aug 6 17:08 __pycache__
Is there some other condition that neededs to be satisfied for airflow to identify a DAG and load it?
My dag is being loaded but I had the name of the DAG wrong. I was expecting the dag to be named by the file but the name is determined by the first argument to the DAG constructor
dag = DAG(
'tutorial', default_args=default_args, schedule_interval=timedelta(1))
Try airflow db init before listing the dags. This is because airflow list_dags lists down all the dags present in the database (And not in the folder you mentioned). Airflow initdb will create entry for these dags in the database.
Make sure you have environment variable AIRFLOW_HOME set to /usr/local/airflow. If this variable is not set, airflow looks for dags in the home airflow folder, which might not be existing in your case.
The example files are not in /usr/local/airflow/dags. You can simply mute them by edit airflow.cfg (usually in ~/airflow). set load_examples = False in 'core' section.
There are couple of errors may make your DAG not been listed in list_dags.
Your DAG file has syntax issue. To check this, just run python custom_example_bash_operator.py and see if any issue.
See if the folder is the default dag loading path. For a new bird, I suggest that just create a new .py file and copy the sample from here https://airflow.incubator.apache.org/tutorial.html then see if the testing dag shows up.
Make sure there is dag = DAG('dag_name', default_args=default_args) in the dag file.
dag = DAG(
dag_id='example_bash_operator',
default_args=args,
schedule_interval='0 0 * * *',
dagrun_timeout=timedelta(minutes=60))
When a DAG is instantiated it pops up by the name you specify in the dag_id attribute. dag_id serves as a unique identifier for your DAG
It will be the case if airflow.cfg config is pointed to an incorrect path.
STEP 1: Go to {basepath}/src/config/
STEP 2: Open airflow.cfg file
STEP 3: Check the path it should point to the dags folder you have created
dags_folder = /usr/local/airflow/dags
I find that I have to restart the scheduler for the UI to pick up the new dags, When I make changes to a dag in my dags folder. I find that when I update the dags they appear in the list when I run airflow list_dags just not in the UI until I restart the scheduler.
First try running:
airflow scheduler
There can be two issues:
1. Check the Dag name given at the time of DAG object creation in the DAG python program
dag = DAG(
dag_id='Name_Of_Your_DAG',
....)
Note that many of the times the name given may be the same as the already present name in the list of DAGs (since if you copied the DAG code). If this is not the case then
2. Check the path set to the DAG folder in Airflow's config file.
You can create DAG file anywhere on your system but you need to set the path to that DAG folder/directory in Airflow's config file.
For example, I have created my DAG folder in the Home directory then I have to edit airflow.cfg file using the following commands in the terminal:
creating a DAG folder at home or root directory
$mkdir ~/DAG
Editing airflow.cfg present in the airflow directory where I have installed the airflow
~/$cd airflow
~/airflow$nano airflow.cfg
In this file change dags_folder path to DAG folder that we have created.
If you still facing the problem then reinstall the Airflow and refer this link for the installation of Apache Airflow.
Are your
custom_example_bash_operator.py
has a DAG name different from the others?
If yes, try restart the scheduler or even resetdb. I usually mistook the filename to be the dag name as well, so better to name them the same.
Can you share what is in custom_example_bash_operator.py? Airflow scans for certain magic inside a file to determine whether is a DAG or not. It scans for airflow and for DAG.
In addition if you are using a duplicate dag_id for a DAG it will be overwritten. As you seem to be deriving from the example bash operator did you keep the name of the DAG example_bash_operator maybe? Try renaming that.
You need to set airflow first and initialise the db
export AIRFLOW_HOME=/myfolder
mkdir /myfolder/dags
airflow db init
You need to create a user too
airflow users create \
--username admin \
--firstname FIRST_NAME \
--lastname LAST_NAME \
--role Admin \
--email admin#example.org
If you have done it correctly you should see airflow.cfg in your folder. There you will find dags_folder which shows the dags folder.
If you have saved your dag inside this folder you should see it in the dag lists
airflow dags list
, or using the UI with
airflow webserver --port 8080
Otherwise, run again airflow db init.
In my case, print(something) in dag file prevented printing dag list on command line.
Check if there is print line in your dag if above solutions are not working.
Try Restarting the scheduler. Scheduler needs to be restarted when new DAGS need to be added to the DAG Bag