Custom logs folder path for each airflow DAG - airflow

I know airflow supports logging into S3/GCS/Azure etc.,
But is there a way to log into specific folders inside this storage based on some configuration inside the DAGs?

Airflow does not support this feature yet. There is a centralised log folder to be configured in airflow.cfg where all logs get saved irrespective of the dag

Related

AWS MWAA (Managed Apache Airflow) where to put the python code used in the dags?

Where do you put your actual code? The dags must be thin, this assumes that when the task starts to run it would do the imports, and run some python code.
When we were on the standalone airflow I could add to the PYTHON_PATH my project root and do the imports from there, but in the AWS managed airflow I don't find any clues.
Put your DAGs into S3. Upon initialization of your MWAA environment, you will determine the S3 bucket containing your code.
E.g., create a bucket <my-dag-bucket> and place your DAGs in a subfolder dags
s3://<my-dag-bucket>/dags/
Also make sure to define all python dependencies in a requirements file and put that one in the same bucket as well:
s3://<my-dag-bucket>/requirements.txt
Finally, if you need to provide own modules, zip them up and put the zip file in the bucket, too:
s3://<my-dag-bucket>/plugins.zip
See https://docs.aws.amazon.com/mwaa/latest/userguide/get-started.html

Where to put airflow_local_settings.py in Composer?

In composer (airflow 1.10.10), is it possible to create an airflow_local_settings.py file? And if so where should it be stored? I need this as I need an initContainer for my pod.
Add a airflow_local_settings.py file to your $PYTHONPATH or to $AIRFLOW_HOME/config folder.
For me, the above statement is unclear for Cloud-composer, as this config folder in an env bucket would probably not be synced with a worker.
Based on Slack discussions in the Apache Airflow Community Slack. It is not supported yet.
airflow_local_settings.py can be provided into dags folder in Composer GCS bucket according to Airflow initialization implementation where DAGs folder is added to sys.path before the configuration folder ($AIRFLOW_HOME/config). Keep in mind that by doing so you are overriding the Composer default policies.

Cloud composer import custom plugin to all existing dags

I am using Cloud Composer to schedule multiple DAGs. These DAGs are built dynamically using this method and they use custom plugins.
I would like to know how to proceed when adding / modifying a plugin which concerns all DAGs (let's say it adds a new task to each DAGs)?
Do we need to pause all the running DAGs when doing so?
What I did so far when adding /modifying a plugin, is :
Upload the plugins into the plugins bucket of the Composer cluster (using gcloud composer command)
Do a dummy update in the Airflow config -> add a dummy value to the airflow.cfg (using gcloud composer commands)
I did that to force the DAGs to pause, and once the update is finished then the DAGs are resumed but with the new plugins and hence the new tasks (or if its not in this dagrun then it's the next one). Is it useless?
Thanks if you can help.
As explained in the architecture diagram, the Airflow webserver where you view your DAG and plugin code runs in a Google-managed tenant project, whereas the Airflow workers which actually run your DAG and plugin code are directly in your project.
When a DAG/Plugin is placed in the Composer bucket, the Airflow webserver(which falls under tenant project) validates the code and updates any new scheduling changes in the Airflow database.
At the same time, the Airflow scheduler (in your project) asks the Airflow database for the next DAG to run and notifies the Airflow workers to perform the scheduled work. The Airflow workers (in your project) then grab the DAG/Plugin code from the Composer bucket and compile them in order to run that specific task.
Thus, any updates made to your DAGs/Plugin code is read separately by the Airflow webserver and Airflow workers, at different times.
If you do not see your new code in the Airflow webserver, it should still be picked up by the Workers when they grab the code fresh on the new task run.
Therefore you shouldn't have to restart Composer for the workers to pick up the changes.
You cannot force a worker to grab and re-compile the new code mid task execution.
There are two ways to refresh the Airflow Webserver to see the Plugin code changes if it is not updating:
Set the reload_on_plugin_change property to True in the [webserver] section via the ‘AIRFLOW CONFIGURATIONS OVERRIDE’ tab in the Console.
OR, you can specifically add/remove/update a PYPI Package via the ‘PYPI PACKAGES’ Console tab. Non PYPI Package changes will not trigger the web server restart. Note this will also initiate an entire Composer environment restart which may take ~20 minutes.

Disable new DAGs by default

Is there a setting in Cloud Composer / Airflow that can disable new DAGs in the DAGs folder by default, without the need for specifying this in the DAG files themsleves?
I want to be able to load these DAGs in to a development environment where users should just run these DAGs manually rather than them being scheduled.
I had a look here, https://github.com/apache/airflow/blob/master/airflow/config_templates/default_airflow.cfg
but I couldn't find anything obvious.
Yes there is one.
It's called dags_are_paused_at_creation

Very confused about Airflow Dags folder

I am very new to Airflow, I have set-up everything according to what are stated on their website. However I find it very confusing to figure out my dag folder location. NOTE: I configure **airflow.cfg (/airflow/dags) within this folder has two files.
/airflow/dags
---dag1.py
---dag2.py
But when I try to do airflow list_dags, it still shows the dags inside example_dags folder on
usr/local/lib/python2.7/dist_packages/airflow/example_dags
How can I see the path when I do airflow list_dags and to change it ? Any helps is appreciated.
There is an airflow.cfg value under the [core] section called load_examples that you can set to false to exclude the example DAGs. I think that should clean up the output you’re seeing from list_dags.

Resources