Creating a google cloud trace for airflow composer cluster in kubernetes - airflow

I want to create a cloud trace for a composer cluster in gke(kubernetes)
Example: Want to see a trace for the airflow monitoring deployment running
Any help appreciated

Related

How can I add EFS to an Airflow deployment on Amazon-EKS?

Kubernetes and EKS newbie here.
I've set up an Elastic Kubernetes Service (EKS) cluster and added an Airflow deployment on top of it using the official HELM chart for Apache Airflow. I configured gitsync and can successfully run my DAGS. For some of the DAGs, I need to save the data to an Amazon EFS. I installed the Amazon EFS CSI driver on eks following the instruction on the amazon documentation.
Now, I can create a new pod with access to the NFS but the airflow deployment broke and stay in a state of Back-off restarting failed container. I also got the events with kubectl -n airflow get events --sort-by='{.lastTimestamp} and I get the following messages:
TYPE REASON OBJECT MESSAGE
Warning BackOff pod/airflow-scheduler-599fc856dc-c4pgz Back-off restarting failed container
Normal FailedBinding persistentvolumeclaim/redis-db-airflow-redis-0 no persistent volumes available for this claim and no storage class is set
Warning ProvisioningFailed persistentvolumeclaim/ebs-claim storageclass.storage.k8s.io "ebs-sc" not found
Normal FailedBinding persistentvolumeclaim/data-airflow-postgresql-0 no persistent volumes available for this claim and no storage class is set
I have tried this on EKS version 1.22.
I understand from this that airflow is expecting to get an EBS volume for its pods but the NFS driver changed the configuration of the pvs.
The pvs before I install the driver are this:
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-###### 100Gi RWO Delete Bound airflow/logs-airflow-worker-0 gp2 1d
pvc-###### 8Gi RWO Delete Bound airflow/data-airflow-postgresql-0 gp2 1d
pvc-###### 1Gi RWO Delete Bound airflow/redis-db-airflow-redis-0 gp2 1d
After I install the EFS CSI driver, I see the pvs have changed.
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
efs-pvc 5Gi RWX Retain Bound efs-storage-claim efs-sc 2d
I have tried deploying airflow before or after installing the EFS driver and in both cases I get the same error.
How can I get access to the NFS from within Airflow without breaking the Airflow deployment on EKS. Any help would be appreciated.
As stated in the error above no persistent volumes available for this claim and no storage class is set and storageclass.storage.k8s.io "ebs-sc" not found, you have to deploy a storage class called efs-sc using the EFS CSI driver as a provisioner.
Further documentation could be found here
An example of creating your missing storage class and persistent volume could be found here
These steps are also described in the AWS EKS user guide

Airflow stored in the cloud?

I would like to know if I can make the airflow UI accessible to all people who have a user, web page type. For this, I would have to connect it to a server, no? Which server do you recommend for this? I was looking around and some were using Amazon EC2.
If your goal is just making the airflow UI visible to public, there is a lot of solutions, where you can do it even in your local computer (of course it is not a good idea).
Before choosing the cloud provider and the service, you need to think about the requirements:
in your team, do you have the skills and the time to manage the server? if no you need a managed service like GCP cloud composer or AWS MWAA.
which executor yow want to use? KubernetesExecutor? CeleryExecutor on K8S? if yes you need a K8S service and not just a VM.
do you have a huge loading? do you need a HA mode? what about the scalability?
After defining the requirements, you can choose between the options:
Small server with LocalExecutor or CeleryExecutor on a VM -> AWS EC2 with a static IP and Route 53 for DNS name
A scalable server in HA mode on a K8S cluser -> AWS EKS or google GKE
A managed service and focusing only on the development part -> google cloud composer

Airflow doesn't "see" that the underlying Kubernetes pod completed

We are using a hosted Airflow 1.10.2 in Google Composer 1.7.5 to launch jobs via the KubernetesPodOperator (tasks that will be run in a Kubernetes pod inside a worker cluster)
There has been several occasions in which the Kubernetes pod itself successfully completes, but Airflow doesn't "see" that the pod has completed (it doesn't get the memo), so Airflow thinks the pod is still running and doesn't move onto the next task.
We are planning to move to Composer 2 with Airflow 2.1.4, which I'm fairly confident it manages pods and communication with Kubernetes better, but...
... is there a "quick" tweak we can do? Even a link on how to start investigating would be helpful.
Thank you in advance.

How to send Airflow Metrics to datadog

We have a requirement where we need to send airflow metrics to datadog. I tried to follow the steps mentioned here
https://docs.datadoghq.com/integrations/airflow/?tab=host
Likewise, I included statsD in airflow installation and updated the airflow configuration file (Steps 1 and 2)
After this point, I am not able to figure out how to send my metrics to datadog. Do I follow the Host configurations or containarized configurations? For the Host configurations, we have to update the datadog.yaml file which is not in our repo and for containerized version, they have specified how to do it for Kubernetics but we don't use Kubernetics.
We are using airflow by creating a docker build and running it over on Amazon ECS. We also have a datadog agent running parallely in the same task (not part of our repo). However I am not able to figure out what configurations I need to make in order to send the StatsD metrics to datadog. Please let me know if anyone has any answer.

Airflow Remote logging doesn't work for both scheduler & dag_process_manager logs

Hi based on airflow docs I am able to set up cloud/remote logging.
Remote logging is working for dag and task logs but it's not able to back up or remotely store following logs of.
scheduler
dag_processing_manager
I am using docker_hub airflow docker image.

Resources