Airflow DAG task not found after updating adding a task - airflow

I'm having trouble updating a dag file. Dag still have an old version of my dag file. I added a task but it seems not updated when I check the log and UI (DAG->Code).
I have very simple tasks.
I of course checked the dag directory path in airflow.cfg and restarted airflow webserver/scheduler.
I have no issue running it (but with the old dag file).

Looks like a bug of airflow. A temp solution is to delete the task instances from airflow db by
delete from task_instance where dag_id=<dag_name> and task_id=<deleted_task_name>';
This should be simpler and less impactful than the resetdb route which would delete everything including variables and connections set before.

Use terminal and run the below command soon after changing the dag file.
airflow initdb
This worked for me.

You can try to remove the old .pyc file for that dag in the dags folder and generate it again.

UI sometimes is not up to date to me, but the code is actually there in dag bag. You can try to:
Use refresh button to see if code refreshed
Use higher version 1.8+, this happens to me before when I used 1.7.X, but after 1.8+, it seems much better after you refresh dag in UI
You can also use "airflow test" to see if the code is in place, and try the advice from #Him as well.

Same thing happened to me.
In the end the best thing is to "resetdb", add connections and import variables again and then airflow initdb and set the scheduler back again.
I don't know why this happens. Anybody knows? It seems not so easy to add tasks or change names once compiled. Removing *.pyc or logs folder did not work for me.

In DAG page of Airflow webserver, delete the DAG. It will delete the record in the database. After a while the DAG will appear again in the page, but the old task_id is removed.

Related

DAG seems to be missing from google Airflow UI

We are having google Airflow environment set for our needs.
I have read a lot on the stackoverflow but everyone is saying to restart your webserver, which I think I can not do, as it is managed by google.
All the time when we deploy new DAG into environment, its always like DAG is missing.
What is happening is it - it take few hours before everything work fine after deployment, but until that time its hard for me to understand what is wrong and how to fix that.
Could you please help me get rid of this issue permanently.
Please let me know if any more information required here. Thanks in advance.
Every dag_dir_list_interval, the DagFileProcessorManager process list the scripts in the dags folder, then if the script is new or processed since more than the min_file_process_interval, it creates a DagFileProcessorProcess process for the file to process it and generate the dags.
You can check what do you have as values for these variables, and reduce them to add the dag ASAP to the Metadata.

Unable to create dag with apache airflow

I am running airflow 2.0, setting up airflow dag for the first time, and following quick start tutorials.
After creating and running the py file I don't see the dag created it does not list for me.
setting:
airflow.cfg:dags_folder = /Users/vik/src/airflow/dags
my python file is in this folder. There are no errors here.
I am able to see the example dags in example-dags.
I did airflow db init
airflow webserver
airflow scheduler
then try to list the dags
I think I am missing something
I don't know exactly how you installed everything, but I highly recommend Astronomer CLI for simplicity and quick setup. With that you'll be able to setup a first DAG pretty quickly. Here is also the video tutorial that helps you understand how to install / setup everything.
A few things to try:
Make sure the scheduleris running (run airflow scheduler) or try to restart it .
Using the Airflow CLI, run airflow config list and make sure that the loaded config is in fact what you are expecting, check the value of dags_folder.
Try running airflow dags list from the CLI, and check the the filepath if your dag is shown in the results.
If there was an error parsing your DAG, and therefore could not be loaded by the scheduler, you can find the logs in ${AIRFLOW_HOME}/logs/scheduler/${date}/your_dag_id.log

Airflow dag file is in place but it's not showing up when I do airflow dags list

I placed a dag file in the dags folder based on a tutorial with slight modifications, but it doesn't show up in the GUI or when run airflow dags list.
Answering my own question: Check the python file for Exceptions by running it directly. It turns out one exception in the dag's python script due to a missing import made the dag not show up in the list. I note this just in case another new user comes across this. To me the moral of the story is that dag files should often be checked by running with python directly when they are modified because there won't be an obvious error showing up otherwise; they may just disappear from the list.

Pick DAG related constants from configuration file WITHOUT restarting airflow

I am using some constants in my DAG which are being imported from another (configuration) Python file in the project directory.
Scenario
Airflow is running, I add a new DAG. I am importing the schedule_interval from that configuration file and some other constant as well which I am passing to a function being called in the PythonOperator in my DAG.
I update the code base, so new dag gets added in the airflow_dag folder and the configuration file gets updated with the new constants.
Problem
The schedule_interval does not work and the dag does not get scheduled. It also throws an exception (import error) for any other constant which is being imported in the dag.
In the web ui I can see the new dag, but I can also see a red label error that displays could not found constant XYZ in configuration_file.py while it's actually there.
It does not come here no matter how long I wait.
Bad Solution
I go and restart the airflow scheduler (and webserver as well just in case), and everything starts working again.
Question
Is there a solution to this where I will not have to restart airflow and update those things?
Note: The proposed solution to refresh dag in question Can airflow load dags file without restart scheduler did not work for me.

Refreshing dags without web server restart apache airflow

Is there any way to reload the jobs without having to restart the server?
In your airflow.cfg, you've these two configurations to control this behavior:
# after how much time a new DAGs should be picked up from the filesystem
min_file_process_interval = 0
dag_dir_list_interval = 60
You might have to reload the web-server, scheduler and workers for your new configuration to take effect.
I had the same question, and didn't see this answer yet. I was able to do it from the command line with the following:
python -c "from airflow.models import DagBag; d = DagBag();"
When the webserver is running, it refreshes dags every 30 seconds or so by default, but this will refresh them in between if necessary.
Dags should be reloaded when you update the associated python file.
If they are not, first try to manually refresh them in UI by clicking the button that looks like a recycle symbol:
If that doesn't work, delete all the .pyc files in the dags folder.
Usually though, when I save the python file the dag gets updated within a few moments.
I'm pretty new to airflow, but I had initially used sample code, which got picked up right away and then edited it to call my own code.
This was ultimately giving an error, but I only found this out once I had deleted the DAG with the example code, on the airflow webserver's UI (the trash button):
Once deleted, it showed me the error which was preventing it from loading the new dag.

Resources