Unable to create dag with apache airflow - airflow

I am running airflow 2.0, setting up airflow dag for the first time, and following quick start tutorials.
After creating and running the py file I don't see the dag created it does not list for me.
setting:
airflow.cfg:dags_folder = /Users/vik/src/airflow/dags
my python file is in this folder. There are no errors here.
I am able to see the example dags in example-dags.
I did airflow db init
airflow webserver
airflow scheduler
then try to list the dags
I think I am missing something

I don't know exactly how you installed everything, but I highly recommend Astronomer CLI for simplicity and quick setup. With that you'll be able to setup a first DAG pretty quickly. Here is also the video tutorial that helps you understand how to install / setup everything.

A few things to try:
Make sure the scheduleris running (run airflow scheduler) or try to restart it .
Using the Airflow CLI, run airflow config list and make sure that the loaded config is in fact what you are expecting, check the value of dags_folder.
Try running airflow dags list from the CLI, and check the the filepath if your dag is shown in the results.
If there was an error parsing your DAG, and therefore could not be loaded by the scheduler, you can find the logs in ${AIRFLOW_HOME}/logs/scheduler/${date}/your_dag_id.log

Related

How to watch tasks logs via CLI in Airflow?

So I am having a problem: there are no logs displaying in Airflow UI I am currently working with, I don't know the reason, but I've already informed my colleagues about it and they're looking for a solution.
Meanwhile I need to watch logs of certain tasks of my Dag. Is there any way to do it via airflow CLI?
I am using airflow tasks run command but it only seems to run tasks, but doesn't show a thing in command line.
By default Airflow should store your logs at $AIRFLOW_HOME/logs/ maybe you'll find them there, if they are still generated.
In the meantime you can use airflow tasks test <dag_id> <task_id> this tests a specific task and displays the logs in your terminal.

Airflow initdb stuck on "add max tries column to task instance"

I'm using Airflow with MSSQL 2016 as backend. I started Airflow for the first time, running first Airflow initdb.
It seems to be fine until it get stuck (for more than an hour) on Running upgrade 127d2bf2dfa7 -> cc1e65623dc7, add max tries column to task instance.
I'm not sure why does it take so long, as this is the first time I'm running Airflow, thus the DB is empty, and no actual migration should happen...
If you look at migration file you'll see the loop for all dags and tasks. You probably have a lot of them. Just make airflow think there is no dags in the dags folder.
Solution: set environment variable AIRFLOW__CORE__DAGS_FOLDER to, for example, /tmp/123 (any empty directory will work), and run airflow initdb again

Scheduler not updating package files

I'm developing a DAG on Cloud Composer; my code is separated into a main python file and one package with subfolders, it looks like this:
my_dag1.py
package1/__init__.py
package1/functions.py
package1/package2/__init__.py
package1/package2/more_functions.py
I updated one of the functions on package1/functions.py to take an additional argument (and update the reference in my_dag1.py). The code would run correctly on my local environment and I was not getting any errors when running
gcloud beta composer environments run my-airflow-environment list_dags --location europe-west1
But the Web UI raised a python error
TypeError: my_function() got an unexpected keyword argument
'new_argument'
I have tried to rename the function and the error changed to
NameError: name 'my_function' is not defined
I tried changing the name of the DAG and to upload the files to the dag folder zipped and unzipped, but nothing worked.
The error disappeared only after I renamed the package folder.
I suspect the issue is related to the scheduler picking up my_dag1.py but not package1/functions.py. The error appeared out of nowhere as I have made similar updates on the previous weeks.
Any idea on how to fix this issue without refactoring the whole code structure?
EDIT-1
Here's the link to related discussion on Google Groups
I've run into a similar issue. the "Broken DAG" error won't dismiss in Web UI. I guess this is a cache bug in Web server of AirFlow.
Background.
I created a customized operator with Airflow Plugin features.
After I import the customized operator, the airflow Web UI keep shows the Broken DAG error says that it can't find the customized operator.
Why I think it's a bug in Web server Airflow?
I can manually run the DAG with the command airflow test, so the import should be correct.
Even if I remove the related DAG file from the /dags/ folder of airflow, the error still there.
Here are What I did to resolve this issue.
restart airflow web service. (sometimes you can resolve the issue only by this).
make sure no DAG is running, restart airflow scheduler service.
make sure no DAG is running, restart airflow worker
Hopefully, it can help someone has the same issue.
Try restarting the webserver with:
gcloud beta composer environments restart-web-server ENVIRONMENT_NAME --location=LOCATION

Airflow DAG "seems to be existing only locally. The master scheduler doesn't seem to be aware of its existence"

I used airflow for workflow of Spark jobs. After installation, I copy the DAG files into DAGs folder set in airflow.cfg. I can backfill the DAG to run the BashOperators successfully. But there is always a warning like the one mentioned. I didn't verify if the scheduling is fine, but I doubt scheduling can work as the warning said the master scheduler doesn't know of my DAG's existence. How can I eliminate this warning and get scheduling work? Anybody run into the same issue who can help me out?
This is usually connected to the Scheduler not running or the refresh interval being too wide. There are no log entries present so we cannot analyze from there. Also, unfortunately the very cause might have been ignored, because this is usually the root of the problem:
I didn't verify if the scheduling is fine.
So first you should check if both of the following services are running:
airflow webserver
and
airflow scheduler
If that won't help, see this post for more reference: Airflow 1.9.0 is queuing but not launching tasks

Airflow DAG task not found after updating adding a task

I'm having trouble updating a dag file. Dag still have an old version of my dag file. I added a task but it seems not updated when I check the log and UI (DAG->Code).
I have very simple tasks.
I of course checked the dag directory path in airflow.cfg and restarted airflow webserver/scheduler.
I have no issue running it (but with the old dag file).
Looks like a bug of airflow. A temp solution is to delete the task instances from airflow db by
delete from task_instance where dag_id=<dag_name> and task_id=<deleted_task_name>';
This should be simpler and less impactful than the resetdb route which would delete everything including variables and connections set before.
Use terminal and run the below command soon after changing the dag file.
airflow initdb
This worked for me.
You can try to remove the old .pyc file for that dag in the dags folder and generate it again.
UI sometimes is not up to date to me, but the code is actually there in dag bag. You can try to:
Use refresh button to see if code refreshed
Use higher version 1.8+, this happens to me before when I used 1.7.X, but after 1.8+, it seems much better after you refresh dag in UI
You can also use "airflow test" to see if the code is in place, and try the advice from #Him as well.
Same thing happened to me.
In the end the best thing is to "resetdb", add connections and import variables again and then airflow initdb and set the scheduler back again.
I don't know why this happens. Anybody knows? It seems not so easy to add tasks or change names once compiled. Removing *.pyc or logs folder did not work for me.
In DAG page of Airflow webserver, delete the DAG. It will delete the record in the database. After a while the DAG will appear again in the page, but the old task_id is removed.

Resources