We're upgrading to Airflow 2 so I've changed the hooks import from:
from airflow.hooks.base_hook import BaseHook
to
from airflow.hooks.base import BaseHook
and now I'm getting this error:
{plugins_manager.py:225} ERROR - No module named 'airflow.hooks.base'
Here are the docs for this change, but I don't see any other required changes to get airflow.hooks.base to work: https://github.com/apache/airflow/blob/a17db7883044889b2b2001cefc41a8960359a23f/UPDATING.md#changes-to-import-paths
Make sure you are running on Airflow 2.0.
You can check which version you are running with the version command.
airflow version
Related
Using Airflow 2.5.1, reinstalled in virtualenv in Python AWS provider but this problem persists:
from airflow.providers.amazon.aws.hooks.redshift import
RedshiftHook ModuleNotFoundError: No module named'airflow.providers.amazon.aws.hooks.redshift'
while using
from airflow.providers.amazon.aws.hooks.redshift import RedshiftHook
First of all list providers to see if AWS provider is installed:
airflow providers list
After that list hooks to see what hook is supported by your provider:
airflow providers hooks
Turns out that actual hook is
airflow.providers.amazon.aws.hooks.redshift_sql.RedshiftSQLHook
and then you call it with:
hook = RedshiftSQLHook(redshift_conn_id='redshift_ddq')
I can't seem to be able to import Mysql.
like trying to import using this
from airflow.operators.mysql_operator import MySqlOperator
I get this error
"Cannot find reference 'MySqlOperator' in 'mysql_operator.py' "
Assuming you are on version 2.0.0 or greater, the import would be:
from airflow.providers.mysql.operators.mysql import MySqlOperator
Remember to install the MySQL providers package first:
pip install 'apache-airflow-providers-mysql'
Here is an example from the docs.
After upgrading to Airflow 2, I got that error in some DAGs:
ModuleNotFoundError: No module named 'airflow.operators.sensors'
The new one that works:
from airflow.sensors.base import BaseSensorOperator
Chosen answer doesn't work for newer versions of Airflow.
I resolved by change the import.
old one
from airflow.operators.sensors import BaseSensorOperator
the new one that works
from airflow.sensors import BaseSensorOperator
BaseSensorOperator
I was trying to import ExternalTaskSensor and my research led me to this post, it turned out to be this class.
The correct import for me was
from airflow.sensors.external_task import ExternalTaskSensor
Just FYI in case anyone runs into this in the future.
For Airflow 2.1.1 I first installed Amazon provider:
pip install apache-airflow-providers-amazon
and then imported S3KeySensor:
from airflow.providers.amazon.aws.sensors.s3_key import S3KeySensor
Importing the operator in the following way:
from airflow.contrib.operators.gcs_download_operator import GoogleCloudStorageDownloadOperator
Then trying to use it in a DAG:
download_file = GoogleCloudStorageDownloadOperator(bucket='us-central1-scale-training-d7d12089-bucket',
google_cloud_storage_conn_id='google_cloud_default',
object='params.json',
filename='params.json')
Receiving this error:
'GoogleCloudStorageDownloadOperator' is not defined
Edit: I am using Google Cloud Composer so I assume the relevant dependecies are installed.
If you haven't already, you also need to add the GCP dependency to Airflow:
pip install apache-airflow[gcp_api]
There's more information about installation in the docs: https://airflow.apache.org/installation.html
We are using Apache 1.9.0. I have written a snowflake hook plugin. I have placed the hook in the $AIRFLOW_HOME/plugins directory.
$AIRFLOW_HOME
+--plugins
+--snowflake_hook2.py
snowflake_hook2.py
# This is the base class for a plugin
from airflow.plugins_manager import AirflowPlugin
# This is necessary to expose the plugin in the Web interface
from flask import Blueprint
from flask_admin import BaseView, expose
from flask_admin.base import MenuLink
# This is the base hook for connecting to a database
from airflow.hooks.dbapi_hook import DbApiHook
# This is the snowflake provided Connector
import snowflake.connector
# This is the default python logging package
import logging
class SnowflakeHook2(DbApiHook):
"""
Airflow Hook to communicate with Snowflake
This is implemented as a Plugin
"""
def __init__(self, connname_in='snowflake_default', db_in='default', wh_in='default', schema_in='default'):
logging.info('# Connecting to {0}'.format(connname_in))
self.conn_name_attr = 'snowflake_conn_id'
self.connname = connname_in
self.superconn = super().get_connection(self.connname) #gets the values from Airflow
{SNIP - Connection stuff that works}
self.cur = self.conn.cursor()
def query(self,q,params=None):
"""From jmoney's db_wrapper allows return of a full list of rows(tuples)"""
if params == None: #no Params, so no insertion
self.cur.execute(q)
else: #make the parameter substitution
self.cur.execute(q,params)
self.results = self.cur.fetchall()
self.rowcount = self.cur.rowcount
self.columnnames = [colspec[0] for colspec in self.cur.description]
return self.results
{SNIP - Other class functions}
class SnowflakePluginClass(AirflowPlugin):
name = "SnowflakePluginModule"
hooks = [SnowflakeHook2]
operators = []
So I went ahead and put some print statements in Airflows plugin_manager to try and get a better handle on what is happening. After restarting the webserver and running airflow list_dags, these lines were showing the "new module name" (and no errors
SnowflakePluginModule [<class '__home__ubuntu__airflow__plugins_snowflake_hook2.SnowflakeHook2'>]
hook_module - airflow.hooks.snowflakepluginmodule
INTEGRATING airflow.hooks.snowflakepluginmodule
snowflakepluginmodule <module 'airflow.hooks.snowflakepluginmodule'>
As this is consistent with what the documentation says, I should be fine using this in my DAG:
from airflow import DAG
from airflow.hooks.snowflakepluginmodule import SnowflakeHook2
from airflow.operators.python_operator import PythonOperator
But the web throws this error
Broken DAG: [/home/ubuntu/airflow/dags/test_sf2.py] No module named 'airflow.hooks.snowflakepluginmodule'
So the question is, What am I doing wrong? Or have I uncovered a bug?
You need to import as below:
from airflow import DAG
from airflow.hooks import SnowflakeHook2
from airflow.operators.python_operator import PythonOperator
OR
from airflow import DAG
from airflow.hooks.SnowflakePluginModule import SnowflakeHook2
from airflow.operators.python_operator import PythonOperator
I don't think that airflow automatically goes through the folders in your plugins directory and runs everything underneath it. The way that I've set it up successfully is to have an __init__.py under the plugins directory which contains each plugin class. Have a look at the Astronomer plugins in Github, it provides some really good examples for how to set up your plugins.
In particular have a look at how they've set up the mysql plugin
https://github.com/airflow-plugins/mysql_plugin
Also someone has incorporated a snowflake hook in one of the later versions of airflow too which you might want to leverage:
https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/snowflake_hook.py