Using Airflow 2.5.1, reinstalled in virtualenv in Python AWS provider but this problem persists:
from airflow.providers.amazon.aws.hooks.redshift import
RedshiftHook ModuleNotFoundError: No module named'airflow.providers.amazon.aws.hooks.redshift'
while using
from airflow.providers.amazon.aws.hooks.redshift import RedshiftHook
First of all list providers to see if AWS provider is installed:
airflow providers list
After that list hooks to see what hook is supported by your provider:
airflow providers hooks
Turns out that actual hook is
airflow.providers.amazon.aws.hooks.redshift_sql.RedshiftSQLHook
and then you call it with:
hook = RedshiftSQLHook(redshift_conn_id='redshift_ddq')
Related
In our project we are using DataprocClusterCreateOperator which was under contrib from airflow.contrib.operators import dataproc_operator. It is working fine with airflow version 1.10.14.
We are in a process of upgrading to Airflow 2.1.2 wherein while testing or dags which requires spinning of DataProc Cluster we found error as airflow.exceptions.AirflowException: Invalid arguments were passed to DataprocClusterCreateOperator (task_id: <task_id>). Invalid arguments were: **kwargs: {'config_bucket': None, 'autoscale_policy': None}
I am not able to see any links for this operator support in Airflow 2 so that I can identify the new params or the changes which happened.
Please share the relevant link.
We are using google-cloud-composer version 1.17.2 having Airflow version 2.1.2.
Since Airflow 2.0, 3rd party provider (like Google in this case) operators/hooks has been moved away from Airflow core to separate providers packages. You can read more here.
Since you are using Cloud Composer, the Google providers package is already installed.
Regarding the DataprocClusterCreateOperator, it has been renamed to DataprocCreateClusterOperator and moved to airflow.providers.google.cloud.operators.dataproc so you can import it with:
from airflow.providers.google.cloud.operators.dataproc import DataprocCreateClusterOperator
The accepted parameters differ from the one included in Airflow 1.x. You can find an example of usage here.
The supported parameters for the DataprocCreateClusterOperator in Airflow 2 can be found here, in the source code. The cluster configuration parameters that can be passed to the operator can be found here.
The DataprocClusterCreateOperator has been renamed as DataprocCreateClusterOperator since January 13, 2020 as per this Github commit and has been ported from airflow.contrib.operators to airflow.providers.google.cloud.operators.dataproc import path.
As given in #itroulli's answer, an example implementation of the operator can be found here.
Snowflake is not showing in the connections dropdown.
I am using MWAA 2.0 and the providers are already in the requirements.txt
MWAA uses python 3.7 dont know if this can be a thing
Requirements.txt:
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.0.2/constraints-3.7.txt"
asn1crypto
azure-common
azure-core
azure-storage-blob
boto3
botocore
certifi
cffi
chardet
cryptography
greenlet
idna
isodate
jmespath
msrest
numpy
oauthlib
oscrypto
pandas
pyarrow
pycparser
pycryptodomex
PyJWT
pyOpenSSL
python-dateutil
pytz
requests
requests-oauthlib
s3transfer
six
urllib3
apache-airflow-providers-http
apache-airflow-providers-snowflake
#apache-airflow-providers-snowflake[slack]
#apache-airflow-providers-slack
snowflake-connector-python >=2.4.1
snowflake-sqlalchemy >=1.1.0
\
If anyone is in this trouble, instead of choosing Snowflake in the dropdown, you can choose AWS as the connection and will work fine.
It took me a while to finally figure this one out after trying many different parameter combinations.
My full Snowflake URL is:
https://xx12345.us-east-2.aws.snowflakecomputing.com
The correct format for the Host field is:
xx12345.us-east-2.snowflakecomputing.com
For the Extra field, this is what worked for me:
{
"account": "xx12345.us-east-2.aws",
"warehouse": "my_warehouse_name",
"database": "my_database_name"
}
Make sure you put Amazon Web Services for the Conn Type, like #AXI said.
Also, I have these modules defined in my requirements.txt file:
apache-airflow-providers-snowflake==1.3.0
snowflake-connector-python==2.4.5
snowflake-sqlalchemy==1.2.4
My Airflow version is 2.0.2.
According to MWAA docs, it should be enough to add apache-airflow-providers-snowflake==1.3.0 to the requirements file. When I added it to the existing MWAA env, where I had already tried many different combinations of packages, it helped partially. It was possible to create a connection using CLI, but not with UI.
But, when I created a new clean MWAA env with the requirements file as stated in mentioned AWS doc, it worked well. The connection was available in UI.
We're upgrading to Airflow 2 so I've changed the hooks import from:
from airflow.hooks.base_hook import BaseHook
to
from airflow.hooks.base import BaseHook
and now I'm getting this error:
{plugins_manager.py:225} ERROR - No module named 'airflow.hooks.base'
Here are the docs for this change, but I don't see any other required changes to get airflow.hooks.base to work: https://github.com/apache/airflow/blob/a17db7883044889b2b2001cefc41a8960359a23f/UPDATING.md#changes-to-import-paths
Make sure you are running on Airflow 2.0.
You can check which version you are running with the version command.
airflow version
I am trying to send notifications tom slack when DAG run fails in airflow in google cloud composer. The version of airflow used is 1.9 so I cannot use slack webhooks. But when I add my code , i get this strange error : No module named 'slackclient'
I am not sure how to make this work in google cloud composer. I tried installing the slack package by adding PyPi variables in composer. But till now nothing works.
Anybody please help?
My code:
from slackclient import SlackClient
from airflow.operators.slack_operator import SlackAPIPostOperator
slack_channel= 'gsdgsdg'
slack_token = 'ssdfhfdrtxcuweiwvbnw54135f543589zdklchvfö'
def task_fail_slack_alert(context):
slack_msg = \
"""
:red_circle: Task Failed.
*Task*: {task}
*Dag*: {dag}
*Execution Time*: {exec_date}
*Log Url*: {log_url}
""".format(task=context.get('task_instance'
).task_id, dag=context.get('task_instance').dag_id,
ti=context.get('task_instance'),
exec_date=context.get('execution_date'),
log_url=context.get('task_instance').log_url)
failed_alert = SlackAPIPostOperator(
task_id = 'airflow_etl_failed',
channel = slack_channel,
token = slack_token,
text = slack_msg
)
return failed_alert.execute(context=context)
The SlackAPIPostOperator requires the slackclient library installed, which is usually done using
$ pip install apache-airflow[slack]
BUT as you're running Google Cloud Composer this clearly won't work. Instead, you can install slackclient using the environment's PyPI package installer.
Navigate to the
Google Cloud Console -> Google Cloud Composer -> Your Airflow environment
and then choose the "PYPI PACKAGES" tab. You can then specify the slackclient module to be installed, and maybe pin it at a version (>=1.3.2 ?)
Importing the operator in the following way:
from airflow.contrib.operators.gcs_download_operator import GoogleCloudStorageDownloadOperator
Then trying to use it in a DAG:
download_file = GoogleCloudStorageDownloadOperator(bucket='us-central1-scale-training-d7d12089-bucket',
google_cloud_storage_conn_id='google_cloud_default',
object='params.json',
filename='params.json')
Receiving this error:
'GoogleCloudStorageDownloadOperator' is not defined
Edit: I am using Google Cloud Composer so I assume the relevant dependecies are installed.
If you haven't already, you also need to add the GCP dependency to Airflow:
pip install apache-airflow[gcp_api]
There's more information about installation in the docs: https://airflow.apache.org/installation.html