Can I turn off the Airflow experimental API? - airflow

Is it possible to disable the airflow experimental API? I've searched through examples of airflow.cfg but didn't see anything there that would achieve it. Perhaps something can be done using auth_backend but I'm not sure what.

Since Airflow 1.10.11, the experimental API denies all requests by default (See PR).
If you are running an older version of Airflow, you can set
auth_backend = airflow.api.auth.backend.deny_all

Related

Datadog cannot find check in "catalog" when implementing integration

I've been trying to implement a Datadog integration, more specifically Airflow's. I've been following this piece of documentation using the containerized approach (I've tried using pod annotations and adding the confd parameters to the agent's Helm values). I've only made advances when adding the airflow.yaml config to the confd sectrion of the cluster agent. However, I get stuck when I try to validate the integration as specified in the documentation by running datadog-cluster-agent status. Under the "Running Checks" section, I can see the following:
airflow
-------
Core Check Loader:
Check airflow not found in Catalog
On top of being extremely generic, this error message mentions a "Catalog" that is not referenced anywhere else in the DD documentation. It doesn't tell me or give me any hints on what could possibly wrong with the integration. Anyone had the same problem and knows how to solve it or at least how can I get more info/details/verbosity to debug this issue?
You may need to add cluster_check: true to your airflow.yaml confd configuration.

How do I enable supplied backend authentication in the Airflow 2.1.0 version?

I have installed Airflow 2.1.0 version recently. But unable to find an option to enable airflow login authentication with supplied backend like in older versions. Can you anyone help me in providing steps to achieve this?
Since Airflow 2.0, Airflow's basic auth for the UI is the default behavior. You can consult the following documentation to read that:
https://airflow.apache.org/docs/apache-airflow/stable/security/webserver.html
If you want to use different security backends, you will need to provide an additional configuration file at your $AIRFLOW_HOME dir, the webserver_config.py. It will load its configuration to Flask-AppBuilder, and have the ability to define various security backends.
Some links:
Airflow documentation on Other Security Methods
Flask-AppBuilder documentation
example webserver_config.py file from Airflow's repository

Airflow DAG Versioning

Is DAG versioning a thing ? I can't find much on the subject with a few Google searches. I would like to look at the DAGs screen in Airflow and be sure of what DAG code is in the wild.
The simplest solution would be to include a version number as part of the dag_id, but I would appreciate knowing if anyone has better, alternative solution. Tags would work too and migjht look good in the UI - they are designed for for filtering though, I'm not sure if there would be undesirable side-effects.
As the author of the DAG Versioning AIP, I can say that this work has been deferred post 2.0 mainly to support end-to-end DAG Versioning.
Originally, we (Airflow Core Committers) were planning to have a Webserver-only DAG Versioning i.e. to improve the visibility behaviour but not execution:
The scope of this AIP to make sure that the visibility behavior of
Airflow is correct, without changing the execution behaviour which
will continue to be based on the most recent version of the DAG.
This means it overcomes the issues where you can go back to an old version of the DAG, to view the shape of the DAG few months back and you can see the correct representation instead of "always-latest".
Currently, Airflow suffers from the issue where if you add/remove a task, it gets added/removed in all the previous DagRuns in the Webserver.
However, what we have decided is that we will accomplish Remote DAG Fetcher + DAG Versioning and enable versioning of DAG on the worker side, so a user will be able to run a DAG with the previous version too.
Currently, we don't have a date but mostly planning to do it around the end of 2021.
The Airflow project has a draft feature open to support DAG versions. The answer currently is that Airflow does not support versions.
The first use case in the link describes a key limitation, log files from previous runs can only surface nodes from the current DAG.
As mentioned above, as of yet, Airflow doesn't has its own functionality of versioning workflows. However you can manage that on your own by managing DAGs on their own git repository and fetching its state into airflow reposiroty as submodules. More on that;
https://www.youtube.com/watch?v=a-4yRne3ba4&lc=UgwiIO-ECVFSZPz1hOt4AaABAg

Do I need to be concerned with Airflow scheduler details for Google Cloud Composer?

In Airflow scheduler, there are things like heartbeat and max_threads.
See How to reduce airflow dag scheduling latency in production?.
If I am using Google Cloud Composer, do I have to worry/set these values?
If not, what are the values that Google Cloud Composer uses?
You can see the airflow config in the composer instance bucket gs://composer_instance_bucket/airflow.cfg. You can tune this configuration as you wish, keeping in mind that cloud composer has some configurations blocked.
Also, if you go in the Airflow UI -> Admin -> Configuration you can see the full configuration.
If you'd like more control/visibility of these variables, consider hosted Airflow # Astronomer https://www.astronomer.io/cloud/ as it runs vanilla Airflow.

Airflow DataprocClusterCreateOperator

In Airflow DataprocClusterCreateOperator settings:
Do we have a chance to set the Primary disk type for master and worker to pd-ssd?
The default setting is standard.
I was looking into the documentation - I don't find any parameters.
Unfortunately, there is no option to change the Disk Type in DataprocClusterCreateOperator.
In Google API it is available if you pass a parameter to https://cloud.google.com/dataproc/docs/reference/rest/v1/projects.regions.clusters#diskconfig
I will try and add this feature and should be available in Airflow 1.10.1 or Airflow 2.0.
For now, you can create an Airflow plugin that modifies the current DataprocClusterCreateOperator.
There seem to be two fields in regard to this:
master_machine_type: Compute engine machine type to use for the master node
worker_machine_type: Compute engine machine type to use for the worker nodes
I have found this simply looking into the source code here (this is for latest, but no version was provided so I assumed the latest version):
https://airflow.readthedocs.io/en/latest/_modules/airflow/contrib/operators/dataproc_operator.html

Resources