I downloaded Airflow without Docker, I created a DAG folder and put the DAGS file inside, but it does not appear on the site
i do scheduler again and reset DB and reset airflow webserver but doesn't work
Related
I have airflow 1.10.12 installed on a server and I’ve launched a dag with a trigger dag configuration which is json. I know how to access the config in my dag’s code, but I want to get it in Airflow UI. Where should I look?
It's not very convenient, but you can view the DAG run conf via Browse -> DAG Runs -> view "Conf" column (and filter for your specific DAG run).
There is code on the main branch which adds a separate DAG run page, but that's not released yet: https://github.com/apache/airflow/pull/19705.
I ran airflow web server under one of my virtual environments (myenv). When I tried to add some new dummy DAGs, my result didn't go as I expected.
Here is the story:
First, I created a new DAG which is literally a copy of "example_bash_operator" with another dag_id. Then I put this dag under the same directory as other example days were, which is "~/myenv/lib/python3.8/site-packages/airflow/example_dags". But when I opened the web server UI, this newly created dag wasn't shown.
I'm really confused. Should I change AIRFLOW_HOME? I did export AIRFLOW_HOME=~/airflow as Airflow documentation indicates. What's more, why the example dags are collected under the virtual environment directory for site packages instead of the airflow home that I declared?
with DAG(
dag_id='my_test',
default_args=args,
schedule_interval='0 0 * * *',
start_date=days_ago(2),
dagrun_timeout=timedelta(minutes=60),
tags=['some_tag'],
params={"example_key": "example_value"},
) as dag
🔼🔼 This is the only place that I changed from example_bash_operator.
Example DAGS are just example DAGs - they are "hard-coded" in airflow installation and shown only when you enable them in config. And they are mostly there to be able to quickly see some examples.
Your own dags should be placed in ${AIRFLOW_HOME}/dags. not in example_dags folder. Airflow only scans regularly the DAGs folder for changes because it does not expect example dags to change. Ever. It's a strange idea to change data inside installed python package.
Just place your dag in ${AIRFLOW_HOME}/dags and if they will have no problems, they should be shown quickly. You can also disable examples in airflow.cfg and then you will have a cleaner list of only your dags from "dags" folder.
I am using Cloud Composer to schedule multiple DAGs. These DAGs are built dynamically using this method and they use custom plugins.
I would like to know how to proceed when adding / modifying a plugin which concerns all DAGs (let's say it adds a new task to each DAGs)?
Do we need to pause all the running DAGs when doing so?
What I did so far when adding /modifying a plugin, is :
Upload the plugins into the plugins bucket of the Composer cluster (using gcloud composer command)
Do a dummy update in the Airflow config -> add a dummy value to the airflow.cfg (using gcloud composer commands)
I did that to force the DAGs to pause, and once the update is finished then the DAGs are resumed but with the new plugins and hence the new tasks (or if its not in this dagrun then it's the next one). Is it useless?
Thanks if you can help.
As explained in the architecture diagram, the Airflow webserver where you view your DAG and plugin code runs in a Google-managed tenant project, whereas the Airflow workers which actually run your DAG and plugin code are directly in your project.
When a DAG/Plugin is placed in the Composer bucket, the Airflow webserver(which falls under tenant project) validates the code and updates any new scheduling changes in the Airflow database.
At the same time, the Airflow scheduler (in your project) asks the Airflow database for the next DAG to run and notifies the Airflow workers to perform the scheduled work. The Airflow workers (in your project) then grab the DAG/Plugin code from the Composer bucket and compile them in order to run that specific task.
Thus, any updates made to your DAGs/Plugin code is read separately by the Airflow webserver and Airflow workers, at different times.
If you do not see your new code in the Airflow webserver, it should still be picked up by the Workers when they grab the code fresh on the new task run.
Therefore you shouldn't have to restart Composer for the workers to pick up the changes.
You cannot force a worker to grab and re-compile the new code mid task execution.
There are two ways to refresh the Airflow Webserver to see the Plugin code changes if it is not updating:
Set the reload_on_plugin_change property to True in the [webserver] section via the ‘AIRFLOW CONFIGURATIONS OVERRIDE’ tab in the Console.
OR, you can specifically add/remove/update a PYPI Package via the ‘PYPI PACKAGES’ Console tab. Non PYPI Package changes will not trigger the web server restart. Note this will also initiate an entire Composer environment restart which may take ~20 minutes.
After deleting multiple DAG files from the dags folder, need to run airflow delete_dag dag_id for each DAG or delete the DAG entries one by one from the web UI.
Instead of doing this, is there a functionality or command to purge the deleted DAGs at one go?
You can always execute queries against your Airflow database directly; we have an Airflow process which cleans up our database periodically by removing entries from log, task_fail, etc.
As with anything regarding making changes to your database, make sure you have a backup.
I recently just switched my airflow server over from SequentialExecutor to a LocalExecutor. I changed my sql_alchemy_conn from sqlite to an sql database. I initialized the database and changed the executor over. However, none of my dags are running. I need them to run in parallel and they do not run at all. After changing over the config files I reset the scheduler and it says the scheduler is running without issue when i check its status on ubuntu.
However, the UI says that it does not appear to be running and that new tasks will not be scheduled.