Using fail_on_empty in Airflow V2 - airflow

I am using Airflow V2 for my ETL. I am using the SnowflakeSqlSensor in this ETL. In Airflow V1, I was using the parameter "fail_on_empty=True" within the SnowflakeSqlSensor. When I use the same parameter in Airflow V2, I get an error saying "Invalid argument". In the Airflow V2 documentation, we see that this is a valid argument. Are there any additional parameters that need to be used in association with fail_on_empty in Airflow V2? Also what is the default value of this parameter in Airflow V2?

Related

airflow API get all unpaused/running dags

How do I get a list of all unpaused(running) dags using airflow API?
I tried GET /dags endpoints but I did not find a query string to filter paused dags, isn't there something like is_paused query parameter or body parameter perhaps?
P.S I'm currently using airflow version 2.2.3 +
Currently Airflow API doesn't support this filter, you should get all the dags and filter them locally.
If you really need this filter, you can create an Airflow plugin which exposes a simple API to fetch the unpaused dags and return them.
Update: this filter will be available in Airflow API from 2.6.0 (PR)
Actually there is plugin made for this. You can fetch the dags along with status. Please explore this plugin. May be this is what you are looking for.
Airflow API Plugin
Dag Run Endpoints
Or else you can write your custom python script/API to fill the dagbag and then filter the list to get the list of dags with status which you want.

Airflow AWS ECS operator not fetching Cloudwatch logs when used with firelens

I am using the Airflow EcsOperator to run AWS ECS Tasks. As part of this, I am using a custom fluentbit container that is setup to log the container logs to Cloudwatch and AWS Open Search. The logging to both destinations work fine. However, I noticed that the Cloudwatch log groups are getting generated in the format {awslogs_stream_prefix}-{ecs_task_id}. Braces are added to just show the two parts separately, the actual prefix is of the form "ecsprime-generator-container-firelens-977be157d3be4614a84537cd081152d7" where the string starting with 977 is the Task Id. Unfortunately, Airflow code that reads Cloudwatch logs expects the log group name to be in the format {awslogs_stream_prefix}/{ecs_task_id}. Due to this, I am not able to have the Airflow EcsOperator display the corresponding Cloudwatch logs.
Are there any workarounds to address this?

Is there a way to pass a parameter to an airflow dag when triggering it manually

I have an airflow DAG that is triggered externally via cli.
I have a requirement to change order of the execution of tasks based on a Boolean parameter which I would be getting from the CLI.
How do I achieve this?
I understand dag_run.conf can only be used in a template field of an operator.
Thanks in advance.
You can not change tasks dependency with runtime parameter.
However you can pass runtime parameter (with dag_run.conf) that according to it's value tasks will be executed or be skipped for that you need to place operators in your workflow that can handle this logic for example: ShortCircuitOperator, BranchPythonOperator

How to pass a conf to a scheduled DAG

When the DAG is triggered manually there are multiple ways to pass the config. It could be done from the UI, via the airflow CLI using --conf argument & using the REST API.
But when a DAG is scheduled using a cron expression, the DAG always fails because the tasks in the DAG are expecting the values from conf.
Is there a DAG level configuration which can be used to set "default" values for conf values (WITHOUT doing a null check in the Python code itself and hardcoding a default value)
The reason I do not want to do this null check in the code itself is because I want the conf keys & default values to be exposed via an Airflow API if possible

Distributed logging in Apache Airflow

We are using Cloud Composer (Managed Airflow in GCP) to orchestrate our tasks. We are moving all our logs to sumo logic (a standard process in our org). Our requirement is to track an entire log of a single execution of a DAG, as of now there seems to be no way to track.
Currently, the first task in DAG will generate a unique ID and pass it to other tasks via xcom. The problem here is we were not able to inject the unique ID in Airflow operators log(like BigQueryOperator).
Is there any other way to inject the custom unique ID in Airflow operators log?
Composer integrates with stackdriver logging and you could filter per-DAG logs by "workflow:{your-dag-name}" and "execution-date:{your-dag-run-date}", e.g.,
You could read log entries with the following filters:
resource.type="cloud_composer_environment"
resource.labels.location="your-location"
resource.labels.environment_name="your-environment-name"
logName="projects/cloud-airflow-dev/logs/airflow-worker"
labels."execution-date"="your-dag-run-date"
labels.workflow="your-dag-id"

Resources