airflow plugins not getting picked up correctly - airflow

We are using Apache 1.9.0. I have written a snowflake hook plugin. I have placed the hook in the $AIRFLOW_HOME/plugins directory.
$AIRFLOW_HOME
+--plugins
+--snowflake_hook2.py
snowflake_hook2.py
# This is the base class for a plugin
from airflow.plugins_manager import AirflowPlugin
# This is necessary to expose the plugin in the Web interface
from flask import Blueprint
from flask_admin import BaseView, expose
from flask_admin.base import MenuLink
# This is the base hook for connecting to a database
from airflow.hooks.dbapi_hook import DbApiHook
# This is the snowflake provided Connector
import snowflake.connector
# This is the default python logging package
import logging
class SnowflakeHook2(DbApiHook):
"""
Airflow Hook to communicate with Snowflake
This is implemented as a Plugin
"""
def __init__(self, connname_in='snowflake_default', db_in='default', wh_in='default', schema_in='default'):
logging.info('# Connecting to {0}'.format(connname_in))
self.conn_name_attr = 'snowflake_conn_id'
self.connname = connname_in
self.superconn = super().get_connection(self.connname) #gets the values from Airflow
{SNIP - Connection stuff that works}
self.cur = self.conn.cursor()
def query(self,q,params=None):
"""From jmoney's db_wrapper allows return of a full list of rows(tuples)"""
if params == None: #no Params, so no insertion
self.cur.execute(q)
else: #make the parameter substitution
self.cur.execute(q,params)
self.results = self.cur.fetchall()
self.rowcount = self.cur.rowcount
self.columnnames = [colspec[0] for colspec in self.cur.description]
return self.results
{SNIP - Other class functions}
class SnowflakePluginClass(AirflowPlugin):
name = "SnowflakePluginModule"
hooks = [SnowflakeHook2]
operators = []
So I went ahead and put some print statements in Airflows plugin_manager to try and get a better handle on what is happening. After restarting the webserver and running airflow list_dags, these lines were showing the "new module name" (and no errors
SnowflakePluginModule [<class '__home__ubuntu__airflow__plugins_snowflake_hook2.SnowflakeHook2'>]
hook_module - airflow.hooks.snowflakepluginmodule
INTEGRATING airflow.hooks.snowflakepluginmodule
snowflakepluginmodule <module 'airflow.hooks.snowflakepluginmodule'>
As this is consistent with what the documentation says, I should be fine using this in my DAG:
from airflow import DAG
from airflow.hooks.snowflakepluginmodule import SnowflakeHook2
from airflow.operators.python_operator import PythonOperator
But the web throws this error
Broken DAG: [/home/ubuntu/airflow/dags/test_sf2.py] No module named 'airflow.hooks.snowflakepluginmodule'
So the question is, What am I doing wrong? Or have I uncovered a bug?

You need to import as below:
from airflow import DAG
from airflow.hooks import SnowflakeHook2
from airflow.operators.python_operator import PythonOperator
OR
from airflow import DAG
from airflow.hooks.SnowflakePluginModule import SnowflakeHook2
from airflow.operators.python_operator import PythonOperator

I don't think that airflow automatically goes through the folders in your plugins directory and runs everything underneath it. The way that I've set it up successfully is to have an __init__.py under the plugins directory which contains each plugin class. Have a look at the Astronomer plugins in Github, it provides some really good examples for how to set up your plugins.
In particular have a look at how they've set up the mysql plugin
https://github.com/airflow-plugins/mysql_plugin
Also someone has incorporated a snowflake hook in one of the later versions of airflow too which you might want to leverage:
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/snowflake_hook.py

Related

Airflow - Custom XCom backend on Ubuntu

I'm trying to implement custom XCOM backend.
Those are the steps I did:
Created "include" directory at the main Airflow dir (AIRFLOW_HOME).
Created these "custom_xcom_backend.py" file inside:
from typing import Any
from airflow.models.xcom import BaseXCom
import pandas as pd
class CustomXComBackend(BaseXCom):
#staticmethod
def serialize_value(value: Any):
if isinstance(value, pd.DataFrame):
value = value.to_json(orient='records')
return BaseXCom.serialize_value(value)
#staticmethod
def deserialize_value(result) -> Any:
result = BaseXCom.deserialize_value(result)
result = df = pd.read_json(result)
return result
Set at config file:
xcom_backend = include.custom_xcom_backend.CustomXComBackend
When I restarted webserver I got:
airflow.exceptions.AirflowConfigException: The object could not be loaded. Please check "xcom_backend" key in "core" section. Current value: "include.cust...
My guess is that it not recognizing the "include" folder
But how can I fix it?
*Note: There is no docker. It is installed on a Ubuntu machine.
Thanks!
So I solved it:
Put custom_xcom_backend.py into the plugins directory
set at config file:
xcom_backend = custom_xcom_backend.CustomXComBackend
Restart all airflow related services
*Note: Do not store DataFrames that way (bad practice).
Sources I used:
https://www.youtube.com/watch?v=iI0ymwOij88

DagBag does not populate dags as expected

I want to test my dags to make sure they have certain default arguments and also to make sure that all dags are not having importation errors.
I am using DagBag to populate dags and then iterate through each dag and check for the values of each dag to make sure they are what I want them to be.
Because DagBag can fetch also the example dags that are shipped with airflow, I am passing the argument include_example = False however when I do this I realize that none of my dags is pulled into dagbags.
Am I using DagBag wrongly? or is there another better way to pull and inspect dags when testing?
My code
def test_no_import_errors():
dag_bag = DagBag(include_examples=False)
assert len(dag_bag.import_errors) == 0, "No Import Failures"
I was able to reproduce the problem, when creating the DagBag object, if you don't provide a value to dag_folder parameter, no DAG is added to the colleciton.
So as Jarek stated, this works:
def test_no_import_errors():
dag_bag = DagBag(dag_folder="dags/", include_examples=False)
assert len(dag_bag.import_errors) == 0, "No Import Failures"
This is the example I made to test it:
# python -m unittest test_dag_validation.py
import unittest
import logging
from airflow.models import DagBag
class TestDAGValidation(unittest.TestCase):
#classmethod
def setUpClass(cls):
log = logging.getLogger()
handler = logging.FileHandler("dag_validation.log", mode="w")
handler.setLevel(logging.INFO)
log.addHandler(handler)
cls.log = log
def test_no_import_errors(self):
dag_bag = DagBag(dag_folder="dags/", include_examples=False)
self.log.info(f"How Many DAGs?: {dag_bag.size()}")
self.log.info(f"Import errors: {len(dag_bag.import_errors)}")
assert len(dag_bag.import_errors) == 0, "No Import Failures"
When you construct DagBag objects you can pass folder list where DagBag should look for the dag files. I guess this is the problem
By default airflow DagBag looks for dags inside AIRFLOW_HOME/dags folder.
This is usually stored inside airflow.cfg file.
By default it points to ~/airflow folder, but you can point to current working directory by running -
export $AIRFLOW_HOME=abs_path_of_your_folder
If you are using python for Airflow installation, make sure to export the $AIRFLOW_HOME variable first, then activate virtual environment and finally install airflow. This will make sure your path is properly attached to the airflow.cfg file.
Also you can check if your folder loaded properly or not, while running the unittest. In terminal, the file path is printed like
[2022-02-03 20:45:57,657] {dagbag.py:500} INFO - Filling up the DagBag from /Users/kehsihba19/Desktop/airflow-test/dags
An example file for checking import errors in DAGs which include checking typos and cyclic tasks check -
from airflow.models import DagBag
import unittest
class TestDags(unittest.TestCase):
def test_DagBag(self):
self.dag_bag = DagBag(include_examples=False)
self.assertFalse(bool(self.dag_bag.import_errors))
if __name__ == "__main__":
unittest.main()

Airflow 2 - ModuleNotFoundError: No module named 'airflow.operators.sensors'

After upgrading to Airflow 2, I got that error in some DAGs:
ModuleNotFoundError: No module named 'airflow.operators.sensors'
The new one that works:
from airflow.sensors.base import BaseSensorOperator
Chosen answer doesn't work for newer versions of Airflow.
I resolved by change the import.
old one
from airflow.operators.sensors import BaseSensorOperator
the new one that works
from airflow.sensors import BaseSensorOperator
BaseSensorOperator
I was trying to import ExternalTaskSensor and my research led me to this post, it turned out to be this class.
The correct import for me was
from airflow.sensors.external_task import ExternalTaskSensor
Just FYI in case anyone runs into this in the future.
For Airflow 2.1.1 I first installed Amazon provider:
pip install apache-airflow-providers-amazon
and then imported S3KeySensor:
from airflow.providers.amazon.aws.sensors.s3_key import S3KeySensor

not able to see the DAG in Web UI

I have created a new DAG using the following code. It is calling a python script.
Code:
from __future__ import print_function
from builtins import range
import airflow
from airflow.operators.python_operator import PythonOperator
from airflow.models import DAG
from airflow.operators.bash_operator import BashOperator
from airflow.operators.dummy_operator import DummyOperator
args = {
'owner': 'admin'
}
dag = DAG(
dag_id='workflow_file_upload', default_args=args,
schedule_interval=None)
t1 = BashOperator(
task_id='testairflow',
bash_command='python /root/DataLake_Scripts/File_Upload_GCP.py',
dag=dag)
I have placed it in $airflowhome/dags folder.
after that I have run :
airflow scheduler
I am trying to see the DAG in WebUI however it is not visible there. There is no error coming.
I've met the same issue.
I figured out that the problem is in initial sqlite db. I suppose it's some feature of airflow 1.10.3
Anyway I solved the problem using postgresql backend.
These links will help you:
link
link
link
All instructions are suitable for python 3.
You'll see your DAG after execution of 'airflow webserver' and 'airflow scheduler' commands.
Also notice that you should call 'sudo service postgresql restart' exactly before 'airflow initdb' command.

Django Settings Module

I installed Django and was able to double check that the module was in fact in Python, but when attempting to implement basic commands such as runserver or utilize manage.py; I get DJANGO_SETTEINGS_MODULE error. I already used "set DJANGO_SETTINGS_MODULE = mysite.settings" as advised and inserted mysite.settings into the PATH for Python as some documentation online directed me to.
Now instead of undefined it says no such module exists. I can't find anything else in the documentation and I used my test site name instead of "mysite" without any change. Does anyone know what am I missing? All I can find in the module library for Django in my Python is this code.
from future import unicode_literals
from django.utils.version import get_version
VERSION = (1, 11, 5, 'final', 0)
__version__ = get_version(VERSION)
def setup(set_prefix=True):
"""
Configure the settings (this happens as a side effect of accessing the first setting), configure logging and populate the app registry.
Set the thread-local urlresolvers script prefix if `set_prefix` is True.
"""
from django.apps import apps
from django.conf import settings
from django.urls import set_script_prefix
from django.utils.encoding import force_text
from django.utils.log import configure_logging
configure_logging(settings.LOGGING_CONFIG, settings.LOGGING)
if set_prefix:
set_script_prefix(
'/' if settings.FORCE_SCRIPT_NAME is None else force_text(settings.FORCE_SCRIPT_NAME)
)
apps.populate(settings.INSTALLED_APPS)
Are you sure you wrote properly the environment variable? I'm asking cause I see you get the error DJANGO_SETTEINGS_MODULE (settings has a misspelling)...

Resources