composer doesn't use imported variable - airflow

I import a json file that define variables to be used by composer.
I used the gcloud beta composer environments storage data import command, I can see that the file is imported correctly to the <composer_bkt>/data/variables, however, when I accessed to airflow webUI, I find that there is no variable declared !

Moving the file to to <COMPOSER_BCKT>/data/variables is not enough by itself to import the variables to Airflow. Apart from that you need to run the Airflow CLI command:
airflow variables --i <JSON_FILE>
To do that in Composer you have to run the following command as described here:
gcloud composer environments run <ENVIRONMENT_NAME> --location=<LOCATION> variables -- --i /home/airflow/gcs/data/variables/variables.json

Thank you for the answer #itroulli. It appears that my composer version (v2.2.5) failed on that command, but instead a command of this form worked:
gcloud composer environments run <ENVIRONMENT_NAME> --location=<LOCATION> variables -- import /home/airflow/gcs/data/variables/variables.json
I'll leave this for anyone else that comes across this problem

Related

Airflow DAG using a custom operator not working when python code for operator is placed in seperate file

I created a custom airflow operator, basically modified some code related to run_id for TriggerDagRunOperator and named it as CustomTriggerDagRunOperator.
This new operator is working fine. When I place the operator class in my DAGs code then my dag runs fine and the modifications are also performed as expected.
But when I created a seperate python file for this operator say, my_custom_operator.py and placed this file in the same folder as DAG. Thereafter, added import statement in DAG as from my_custom_operator import CustomTriggerDagRunOperator. The airflow UI doesn't give any DAG error. But when I try to run the DAG it doesn't work nor does it display any logs, even the tasks not related to this operator also fail to execute. It is confusing as I just shifted the code related to operator to a different file so that the custom operator can be used accross all my DAGs. Need some suggestions.
Airflow Version: 2.1.3
Using Astronomer, hosted on Kubernetes
In order to import classes/methods from your module, you need to add the module package to python path, in this case the DagFileProcessor will be able to import the classes/methods when it processes the dag script.
DAGS_FOLDER/
dag.py
my_operators/
operator1.py
In your scheduler and all the workers, you need to set change the python path to PYTHONPATH=$PYTHONPATH:/path/to/DAGS_FOLDER, and in your dag script, you need to import from my_operators package and not from .:
from my_operators.operator1 import CustomTriggerDagRunOperator
For your development, you can select the DAGS_FOLDER as source folder for your project, which is similar to adding it to python path.

The airflow scheduler stops working after updating pypi packages on google cloud composer 2.0.1

I am trying to migrate from google cloud composer composer-1.16.4-airflow-1.10.15 to composer-2.0.1-airflow-2.1.4, However we are getting some difficulties with the libraries as each time I upload the libs, the scheduler fails to work.
here is my requirements.txt
flashtext
ftfy
fsspec==2021.11.1
fuzzywuzzy
gcsfs==2021.11.1
gitpython
google-api-core
google-api-python-client
google-cloud
google-cloud-bigquery-storage==1.1.0
google-cloud-storage
grpcio
sklearn
slackclient
tqdm
salesforce-api
pyjwt
google-cloud-secret-manager==1.0.0
pymysql
gspread
fasttext
spacy
click==7.1.2
papermill==2.1.1
tornado>=6.1
jupyter
Here is the code I use to update the libs :
gcloud composer environments update $AIRFLOW_ENV \
--update-pypi-packages-from-file requirements.txt \
--location $AIRFLOW_LOCATION
It works with success but then the dag tasks are not scheduled anymore and the scheduler heartbeat becomes read.
I have tried to remove all the libs and it is scheduled again some times after. I have tried to only add via the interface simple libraries : pandas or flashtext but right after the update, the schedule becomes red again and the tasks stays unscheduled.
I can't find any error log in the log interface. Would you have an idea on how I could see some logs regarding those errors or if you know why those libs are making my env fail ?
Thanks
We have found out what was happening. The root cause was the performances of the workers. To be properly working, composer expects the scanning of the dags to take less than 15% of the CPU ressources. If it exceeds this limit, it fails to schedule or update the dags. We have just taken bigger workers and it has worked well

Lambda function failing with /lib64/libc.so.6: version `GLIBC_2.18' not found

I am trying to create a layer of simple-salesforce (Python Library) in AWS lambda, and trying to use it (import it) from my python code. I am having windows machine.
Though I read that there might be issues due to compilation windows so I install ubuntu1804 from windows store and then went ahead with creating zip for lambda layers. (zip is created for python folder with structure "python/lib/python3.6/site-packages/......")
I am using Python 3.6. I went through few articles for this issue but could find any resolution. this Video helped me creating a layer for Pandas & requests in AWS successfully with minor tweaks in pip commands I used
sudo python3 -m pip install simple-salesforce -t build/python/lib/python3.6/site-packages
Exactly same process i used for Simple salesforce and I am getting below error is as below:
Unable to import module 'lambda_function': /lib64/libc.so.6: version `GLIBC_2.18' not found (required by /opt/python/lib/python3.6/site-packages/cryptography/hazmat/bindings/_rust.abi3.so)
Edit: --
Another approach I tried using .whl though this was not giving above error but giving error as "request module not found" and when I add request module layer it gives error authlib not found. (request layers work fine if I comment salesforce related things. Even tried uploading as simple layer same authlib issue I got)
Edit :
Lambda code I am using is as below
the code I am using is basic code which doesnt have any logic with empty imports
import json
import pandas as pd
import requests as req
from simple_salesforce.format import format_soql
def lambda_handler(event, context):
#TODO
I also received the same error while installing pysftp on lambda which uses cryptography library(python)
the error was similiar to (required by /opt/python/lib/python3.6/site-packages/cryptography/hazmat/bindings/_rust.abi3.so)
The solution that worked for me is
1] pip uninstall cryptography
2] pip install cryptography==3.4.8
The following github link explains it in detail
https://github.com/pyca/cryptography/issues/6390
AWS lambda functions are like virtual environments, they do not come with the .so files which are kernel level packages. When installing the python packages you have to make sure the system dependent files are installed with it. This can be achieved by passing the argument --platform to pip install.
From AWS post How do I add Python packages with compiled binaries to my deployment package and make the package compatible with Lambda?:
To create a Lambda deployment package or layer that's compatible with Lambda Python runtimes when using pip outside of Linux operating system, run the pip install command with manylinux2014 as the value for the --platform parameter.
pip install \
--platform manylinux2014_x86_64 \
--target=my-lambda-function \
--implementation cp \
--python 3.9 \
--only-binary=:all: --upgrade \
simple-salesforce
I changed my code to not use simple_salesforce library and work out all the logic with Requests ( using Salesforce REST APIs).
This is not really ideal but I could get it working as I had some deliveries to meet.

Cloud Composer: How to Run A Single Task in a DAG Run via gcloud CLI

I am trying to run a single task within a DAG on a GCP cloud composer airflow instance and mark all other tasks in the dag both upstream and downstream as successful. However, the following airflow command seems to not be working for me on cloud composer.
Does anyone know what is wrong with the followinggcloud cli command?
dag_id: "airflow_monitoring" <br>
task_id: "echo1" <br>
execution_date: "2020-07-03" <br>
gcloud composer environments run my-composer --location us-centra1 \
-- "airflow_monitoring" "echo1" "2020-07-03"
Thanks for your help.
If you aim just to correctly compose the above mentioned gcloud command, triggering the specific DAG, then after fixing some typos and propagating Airflow CLI sub-command parameters, I got this works:
gcloud composer environments run my-composer --location=us-central1 \
--project=<project-id> trigger_dag -- airflow_monitoring --run_id=echo1 --exec_date="2020-07-03"
I would also encourage you to check out the full Airflow CLI sub-command list.
In case you expect to get some different functional result, then feel free to expand the initial question, adding more essential content.

You have two airflow.cfg files

I have created a venv project and installed airflow with in this venv. I have also set the export AIRFLOW_HOME - to a directory ( airflow_home ) with in this venv project. First time, after I ran
$airflow version
this created airflow.cfg and logs directory under this 'airflow_home' folder. However, when I repeat the same on next day, now I have the error message that I have two airflow.cfg.
one airflow.cfg under my venv project
another one under /home/username/airflow/airflow.cfg
Why is that ? I haven't installed airflow anywhere outside this venv project.
Found the issue. If I don't set the environment variable AIRFLOW_HOME, by default it creates a new ariflow.cfg under /home/usernme/airflow. To avoid this, AIRFLOW_HOME should be set before calling airflow each time after terminal starts or add to bash profile.

Resources