What might cause our Airflow (Airflow 1.10.9 and MariaDB: 10.4.15) DAGs to disable suddenly? We noticed that many of our dags were disabled automatically without human intervention.
Related
So I am having a problem: there are no logs displaying in Airflow UI I am currently working with, I don't know the reason, but I've already informed my colleagues about it and they're looking for a solution.
Meanwhile I need to watch logs of certain tasks of my Dag. Is there any way to do it via airflow CLI?
I am using airflow tasks run command but it only seems to run tasks, but doesn't show a thing in command line.
By default Airflow should store your logs at $AIRFLOW_HOME/logs/ maybe you'll find them there, if they are still generated.
In the meantime you can use airflow tasks test <dag_id> <task_id> this tests a specific task and displays the logs in your terminal.
I downloaded Airflow without Docker, I created a DAG folder and put the DAGS file inside, but it does not appear on the site
i do scheduler again and reset DB and reset airflow webserver but doesn't work
I am using Cloud Composer to schedule multiple DAGs. These DAGs are built dynamically using this method and they use custom plugins.
I would like to know how to proceed when adding / modifying a plugin which concerns all DAGs (let's say it adds a new task to each DAGs)?
Do we need to pause all the running DAGs when doing so?
What I did so far when adding /modifying a plugin, is :
Upload the plugins into the plugins bucket of the Composer cluster (using gcloud composer command)
Do a dummy update in the Airflow config -> add a dummy value to the airflow.cfg (using gcloud composer commands)
I did that to force the DAGs to pause, and once the update is finished then the DAGs are resumed but with the new plugins and hence the new tasks (or if its not in this dagrun then it's the next one). Is it useless?
Thanks if you can help.
As explained in the architecture diagram, the Airflow webserver where you view your DAG and plugin code runs in a Google-managed tenant project, whereas the Airflow workers which actually run your DAG and plugin code are directly in your project.
When a DAG/Plugin is placed in the Composer bucket, the Airflow webserver(which falls under tenant project) validates the code and updates any new scheduling changes in the Airflow database.
At the same time, the Airflow scheduler (in your project) asks the Airflow database for the next DAG to run and notifies the Airflow workers to perform the scheduled work. The Airflow workers (in your project) then grab the DAG/Plugin code from the Composer bucket and compile them in order to run that specific task.
Thus, any updates made to your DAGs/Plugin code is read separately by the Airflow webserver and Airflow workers, at different times.
If you do not see your new code in the Airflow webserver, it should still be picked up by the Workers when they grab the code fresh on the new task run.
Therefore you shouldn't have to restart Composer for the workers to pick up the changes.
You cannot force a worker to grab and re-compile the new code mid task execution.
There are two ways to refresh the Airflow Webserver to see the Plugin code changes if it is not updating:
Set the reload_on_plugin_change property to True in the [webserver] section via the ‘AIRFLOW CONFIGURATIONS OVERRIDE’ tab in the Console.
OR, you can specifically add/remove/update a PYPI Package via the ‘PYPI PACKAGES’ Console tab. Non PYPI Package changes will not trigger the web server restart. Note this will also initiate an entire Composer environment restart which may take ~20 minutes.
Is there a setting in Cloud Composer / Airflow that can disable new DAGs in the DAGs folder by default, without the need for specifying this in the DAG files themsleves?
I want to be able to load these DAGs in to a development environment where users should just run these DAGs manually rather than them being scheduled.
I had a look here, https://github.com/apache/airflow/blob/master/airflow/config_templates/default_airflow.cfg
but I couldn't find anything obvious.
Yes there is one.
It's called dags_are_paused_at_creation
I used airflow for workflow of Spark jobs. After installation, I copy the DAG files into DAGs folder set in airflow.cfg. I can backfill the DAG to run the BashOperators successfully. But there is always a warning like the one mentioned. I didn't verify if the scheduling is fine, but I doubt scheduling can work as the warning said the master scheduler doesn't know of my DAG's existence. How can I eliminate this warning and get scheduling work? Anybody run into the same issue who can help me out?
This is usually connected to the Scheduler not running or the refresh interval being too wide. There are no log entries present so we cannot analyze from there. Also, unfortunately the very cause might have been ignored, because this is usually the root of the problem:
I didn't verify if the scheduling is fine.
So first you should check if both of the following services are running:
airflow webserver
and
airflow scheduler
If that won't help, see this post for more reference: Airflow 1.9.0 is queuing but not launching tasks