Unable to access Airflow REST API - airflow

I have setup airflow in my local machine. I am trying to access the below airflow link:
http://localhost:8080/api/experimental/test/
I am getting Airflow 404 = lots of circles
I have tried to set auth_backend to default, but no luck.
What changes do i need to make in airflow.cfg to be able to make REST API calls to airflow for triggering DAGs?

Experimental API is disabled by default in Airlfow 2. It was used in 1.10 but it has been deprecated and disabled by default in Airflow 2. Instead you should use the fully-fledged REST API which uses completely different URL scheme:
https://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html
In Airflow UI you can even browse and try the API (just look at the menus of Airflow).

Related

airflow API get all unpaused/running dags

How do I get a list of all unpaused(running) dags using airflow API?
I tried GET /dags endpoints but I did not find a query string to filter paused dags, isn't there something like is_paused query parameter or body parameter perhaps?
P.S I'm currently using airflow version 2.2.3 +
Currently Airflow API doesn't support this filter, you should get all the dags and filter them locally.
If you really need this filter, you can create an Airflow plugin which exposes a simple API to fetch the unpaused dags and return them.
Update: this filter will be available in Airflow API from 2.6.0 (PR)
Actually there is plugin made for this. You can fetch the dags along with status. Please explore this plugin. May be this is what you are looking for.
Airflow API Plugin
Dag Run Endpoints
Or else you can write your custom python script/API to fill the dagbag and then filter the list to get the list of dags with status which you want.

Airflow ECS-Operator not fetching CloudWatch Logs

I'm using Airflow's EcsOperator, ECS tasks writing to Cloudwatch.
Sometimes Airflow log fetcher collects logs from CloudWatch and sometimes does not.
On the CloudWatch console, I always see the logs.
On tasks that take a long time, I usually see the log or at least part of it.
Someone had the same issue with ECSOperator?
First ECSOperator is deprecated and removed in provider version 5.0.0
You should switch to EcsRunTaskOperator.
In EcsRunTaskOperator there is awslogs_fetch_interval which control over the interval to fetch logs from Ecs. The default is 30 seconds. If you wish for more frequent polls then set the parameter value accordingly.
You didn't mention what provider version you are on but this part of the code was refactored in version 5.0.0 (PR) so upgrading the Amazon provider might also resolve your issue.

Airflow dag cannot find connection-id

I am managing a Google Cloud Composer environment which runs Airflow for a data engineering team. I have recently been asked to troubleshoot one of the dags they run which is failing with this error : [12:41:18,119] {credentials_utils.py:23} WARNING - [redacted-name] connection ID not available, falling back to Google default credentials
The job is basically a data pipeline which reads from various sources and stores data into GBQ. The odd part is that they have a strictly similar Dag running for a different project and it works perfectly.
I have recreated the .json credentials for the service account behind the connection as well as the connection itself in Airflow. I have sanitized the code to see if there was any hidden spaces or so.
My knowledge of Airflow is limited and I have not been able to find any similar issue in my research, any one have encountered this before?
So the DE team came back to me saying it was actually a deployment issue where an internal module involved in service account authentication was being utilized inside another DAG running in stage environment, rendering it impossible to proceed to credential fetch from the connection ID.

Understanding `None` scheduling preset in Airflow

I am getting started with Apache Airflow and trying to setup a event driven DAG in Airflow. My event is a file being landed in Linux directory. This file can be landed multiple number of time throughout the day. I am using File Sensor operator for file monitoring.
My requirement is every time the file lands(with same name) in directory the Dag should kick off.
I was reading the official scheduling documentation and based on my understanding I see with option None I can make my Dag to be triggered externally based on event and it can be triggered multiple times throughout the day based on that external event.
Is my understanding correct? The official documentation doesn't have detailed information on it.
https://airflow.apache.org/scheduler.html?highlight=scheduling
That is correct. Having the schedule_interval as None means Airflow will never automatically schedule a run of the Dag.
You can schedule dag_runs externally a few different ways:
through the Airflow CLI
using a Local client from within a python script
through the Airflow REST API
manually via the trigger button in the Web UI

Google Cloud Functions with Trace Agent connection

I need to connect monitoring and tracing tools for our application. Our main code is on Express 4 running on Google Cloud Functions. All requests incoming from front nginx proxy server that handle domain and pretty routes names. Unfortunately, trace agent traces this requests, that coming on nginx front proxy without any additional information, and this is not enough to collect useful information about app. I found the Stack Driver custom API, which, as I understand might help to collect appropriate data on runtime, but I don't understand how I can connect it to Google Cloud Functions app. All other examples saying, that we must extend our startup script, but Google Cloud Functions fully automated thing, there is no such possibility here.
Found solution. I included require("#google-cloud/trace-agent"); not at the top of the index.js. It should be included before all other modules. After that it started to work.
Placing require("#google-cloud/trace-agent") as the very first import didn't work for me. I still kept getting:
ERROR:#google-cloud/trace-agent: express tracing might not work as /var/tmp/worker/node_modules/express/index.js was loaded before the trace agent was initialized.
However I managed to work around it by manually patching express:
var traceApi = require('#google-cloud/trace-agent').get();
require("#google-cloud/trace-agent/src/plugins/plugin-express")[0].patch(
require(Object.keys(require('module')._cache).find( _ => _.indexOf("express") !== -1)),
traceApi
);

Resources