How to execute SQL in redshift with Airflow (version 1) - airflow

Is there a way to trigger a stored procedure in redshift from airflow?
Best scenario would be to not use a python operator, but I haven't found a reshift operator in airflow version < 2.

In Airflow <2.1 there is no RedShiftSqlOperator because RedShift is compatible with PostgreSQL so you can just use PostgresOperator:
from airflow.providers.postgres.operators.postgres import PostgresOperator
PostgresOperator(
sql='SELECT * FROM my_table',
postgres_conn_id='redshift_default',
task_id='task',
)
That said, in more recent versions (Airflow >= 2.1) since some users were struggling with this question as well (See GitHub issue) Airflow added RedshiftSQLOperator which is available in Amazon provider version 2.4.0 :
pip install apache-airflow-providers-amazon>=2.4.0
Then you can use the operator as:
from airflow.providers.amazon.aws.operators.redshift import RedshiftSQLOperator
setup__task_create_table = RedshiftSQLOperator(
task_id='task',
redshift_conn_id="redshift_default"
sql="SELECT * FROM my_table",
)

Related

Issue installing apache-airflow-backport-providers-google module on airflow instance of Google Composer

I need execute Data Fusion pipelines from Composer, using de operatos for this:
from airflow.providers.google.cloud.operators.datafusion import (
CloudDataFusionCreateInstanceOperator,
CloudDataFusionCreatePipelineOperator,
CloudDataFusionDeleteInstanceOperator,
CloudDataFusionDeletePipelineOperator,
CloudDataFusionGetInstanceOperator,
CloudDataFusionListPipelinesOperator,
CloudDataFusionRestartInstanceOperator,
CloudDataFusionStartPipelineOperator,
CloudDataFusionStopPipelineOperator,
CloudDataFusionUpdateInstanceOperator,
)
The issue I have is about modulo "apache-airflow-backport-providers-google", with the support of this links i knew what I need to use this modulo:
reference to install the modulo in airflow instance (answered by #Gonzalo Pérez Fernández): https://airflow.apache.org/docs/apache-airflow-providers-google/stable/operators/cloud/datafusion.html
when i tried to install python dependency on Composer like PyPi Package i get this error:
UPDATE operation on this environment failed 7 minutes ago with the following error message:
Failed to install PyPI packages.
apache-airflow-providers-google 5.0.0 has requirement google-ads>=12.0.0, but you have google-ads 7.0.0. Check the Cloud Build log at https://console.cloud.google.com/cloud-build/builds/a2ecf37a-4c47-4770-9489-6fb65e87d82f?project=341768372632 for details. For detailed instructions see https://cloud.google.com/composer/docs/troubleshooting-package-installation
the log deail is:
apache-airflow-providers-google 5.0.0 has requirement google-ads>=12.0.0, but you have google-ads 7.0.0.
apache-airflow-backport-providers-google 2021.3.3 has requirement apache-airflow~=1.10, but you have apache-airflow 2.1.2+composer.
The command '/bin/sh -c bash installer.sh $COMPOSER_PYTHON_VERSION fail' returned a non-zero code: 1
ERROR
ERROR: build step 0 "gcr.io/cloud-builders/docker" failed: step exited with non-zero status: 1
is there any way to use de module "apache-airflow-backport-providers-google" without depedency issues on composer instance?, or What would be the best way to use data fusion operators no need to change or parse package versions in python?.
Composer Image version used:
composer-1.17.0-airflow-2.1.2
Thanks.
There is no need to install apache-airflow-backport-providers-google in Airflow 2.0+. This package actually backports Airflow 2 operators into Airflow 1.10.*. In addition, in Composer version composer-1.17.0-airflow-2.1.2 the apache-airflow-providers-google==5.0.0 package is already installed according to the documentation. You should be able to import the Data Fusion operators with the code snippet you posted as is.
However, if this is not the case, you should probably handle the conflict shown in the logs when trying to reinstall apache-airflow-providers-google==5.0.0:
apache-airflow-providers-google 5.0.0 has requirement google-ads>=12.0.0, but you have google-ads 7.0.0.
You can add the requirement for google-ads=12.0.0 in your PyPi dependencies and see if it works.

How to Brew Install SQLite3 FTS5 extension?

Is there an easy way to enable the FTS5 extension for SQLite3 installed with Brew? Some older posts say there should be an install option --with-fts5, however:
$ brew reinstall sqlite3 --with-fts5
...
Error: invalid option: --with-fts5
The fts3_tokenizer is not enabled. I assume there must be an easy way to install/enabled the extension with Brew without compiling from source outside of Brew.
$ sqlite3
SQLite version 3.35.5 2021-04-19 18:32:05
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> .dbconfig
defensive off
dqs_ddl on
dqs_dml on
enable_fkey off
enable_qpsg off
enable_trigger on
enable_view on
fts3_tokenizer off
legacy_alter_table off
legacy_file_format off
load_extension on
no_ckpt_on_close off
reset_database off
trigger_eqp off
trusted_schema on
writable_schema off
$ brew info sqlite3
sqlite: stable 3.35.5 (bottled) [keg-only]
Command-line interface for SQLite
https://sqlite.org/
/usr/local/Cellar/sqlite/3.35.5 (11 files, 4.2MB)
Built from source on 2021-05-18 at 08:54:33
From: https://github.com/Homebrew/homebrew-core/blob/HEAD/Formula/sqlite.rb
...
Try to use CFLAGS environment:
CFLAGS="-DSQLITE_ENABLE_FTS5" brew reinstall sqlite
Edited:
Rebuild to enable fts5 is unnecessary. The sqlite 3.35.5 package is already fts5 module enabled.
$ brew fetch sqlite
...
$ tar xzf ~/Library/Caches/Homebrew/downloads/61d40ad2021e894bcf4c7475eea2dbbfee14c4426b1bbb1816c4055ad1c70b50--sqlite--3.35.5.catalina.bottle.tar.gz -O sqlite/3.35.5/lib/libsqlite3.0.dylib \
| strings - | grep '^fts5: 20\|trigram'
trigram
fts5: 2021-04-19 18:32:05 1b256d97b553a9611efca188a3d995a2fff712759044ba480f9a0c9e98fae886
I wrote you on the trac mailing list too but will post here too.
If you check https://sqlite.org/fts5.html there seems to be a option --enable-fts5 but it also seems disabled by the fault. That said they point to a " amalgamation" (there is a link to it in the page) where you can use this option if you compile the " amalgamation".
Markus
FTS5 is enabled. I was mistaken about the source of the problem I'm experiencing.
>>> import sqlite3
>>> import pprint
>>> db = sqlite3.connect(':memory:')
>>> cursor = db.execute('PRAGMA COMPILE_OPTIONS')
>>> pprint.pprint(cursor.fetchall())
[(u'BUG_COMPATIBLE_20160819',),
(u'COMPILER=clang-12.0.5',),
(u'DEFAULT_CACHE_SIZE=2000',),
(u'DEFAULT_CKPTFULLFSYNC',),
(u'DEFAULT_JOURNAL_SIZE_LIMIT=32768',),
(u'DEFAULT_PAGE_SIZE=4096',),
(u'DEFAULT_SYNCHRONOUS=2',),
(u'DEFAULT_WAL_SYNCHRONOUS=1',),
(u'ENABLE_API_ARMOR',),
(u'ENABLE_COLUMN_METADATA',),
(u'ENABLE_DBSTAT_VTAB',),
(u'ENABLE_FTS3',),
(u'ENABLE_FTS3_PARENTHESIS',),
(u'ENABLE_FTS3_TOKENIZER',),
(u'ENABLE_FTS4',),
(u'ENABLE_FTS5',),
(u'ENABLE_JSON1',),
(u'ENABLE_LOCKING_STYLE=1',),
(u'ENABLE_PREUPDATE_HOOK',),
(u'ENABLE_RTREE',),
(u'ENABLE_SESSION',),
(u'ENABLE_SNAPSHOT',),
(u'ENABLE_SQLLOG',),
(u'ENABLE_STMT_SCANSTATUS',),
(u'ENABLE_UNKNOWN_SQL_FUNCTION',),
(u'ENABLE_UPDATE_DELETE_LIMIT',),
(u'HAS_CODEC_RESTRICTED',),
(u'HAVE_ISNAN',),
(u'MAX_LENGTH=2147483645',),
(u'MAX_MMAP_SIZE=1073741824',),
(u'MAX_VARIABLE_NUMBER=500000',),
(u'OMIT_AUTORESET',),
(u'OMIT_LOAD_EXTENSION',),
(u'STMTJRNL_SPILL=131072',),
(u'THREADSAFE=2',),
(u'USE_URI',)]

Issues installing airflow locally

I installed airflow locally because i am testing sftp operator in airflow (2.0.0). When I try running this code
from airflow.providers.sftp.operators import sftp_operator
from airflow import DAG
import datetime
dag = DAG(
'test_dag',
start_date = datetime.datetime(2020,1,8,0,0,0),
schedule_interval = '#daily'
)
get_operation = SFTPOperator(
task_id="operation",
ssh_conn_id="ssh_default",
local_filepath="route_to_local_file",
remote_filepath="remote_route_to_copy",
operation="get",
dag=dag
)
get_operation
When I run this code python code I am getting this error.
Traceback (most recent call last):
File "test_dags.py", line 1, in <module>
from airflow.providers.sftp.operators import sftp_operator
ModuleNotFoundError: No module named 'airflow.providers.sftp'
can anyone please tell if I am missing anything in my installation?
Since you don't specify how you installed Airflow I'm assuming you did something like pip install apache-airflow>=2.0.0. If you look at the Python dependencies in that environment with pip freeze you won't see apache-airflow-providers-sftp because as of version 2, Airflow extracts its functionality into provider packages, the vast majority of which need to installed manually, eg: pip install apache-airflow-providers-sftp. Now it should work. Supporting documentation https://airflow.apache.org/docs/apache-airflow-providers/packages-ref.html#apache-airflow-providers-sftp.

Airflow authentication setups fails with "AttributeError: can't set attribute"

The Airflow version 1.8 password authentication setup as described in the docs fails at the step
user.password = 'set_the_password'
with error
AttributeError: can't set attribute
It's better to simply use the new method of PasswordUser _set_password:
# Instead of user.password = 'password'
user._set_password = 'password'
This is due to an update of SqlAlchemy to a version >= 1.2 that introduced a backwards incompatible change.
You can fix this by explicitly installing a SqlAlchemy version <1.2.
pip install 'sqlalchemy<1.2'
Or in a requirement.txt
sqlalchemy<1.2
Fixed with
pip install 'sqlalchemy<1.2'
I'm using apache-airflow 1.8.2
In case anyone's curious about what the incompatible change in SQLAlchemy 1.2 (mentioned in #DanT's answer) actually is, it is a change in how SQLAlchemy deals with hybrid proprerties. Beginning in 1.2, methods have to have the same name as the original hybrid, which was not required before. The fix for Airflow is very simple. The code in contrib/auth/backends/password_auth.py should change from this:
#password.setter
def _set_password(self, plaintext):
self._password = generate_password_hash(plaintext, 12)
if PY3:
self._password = str(self._password, 'utf-8')
to this:
#password.setter
def password(self, plaintext):
self._password = generate_password_hash(plaintext, 12)
if PY3:
self._password = str(self._password, 'utf-8')
See https://bitbucket.org/zzzeek/sqlalchemy/issues/4332/hybrid_property-gives-attributeerror for more details.

Sqlite Library for Python

Is there any link to download SQLite Library for Python 2.7?
I'm particularly interested in working with a SQLite Library for Python.i tried googling it , but the links provided are not actually links to SQLite Library ,most of them is to download the SQLite. any help would be appreciated.
The documentation says:
Python has a “batteries included” philosophy.
The sqlite3 module is built in:
$ python
>>> import sqlite3
>>> conn = sqlite3.connect('/tmp/test.db')
>>> c = conn.cursor()
>>> c.execute("CREATE TABLE IF NOT EXISTS t(x)")
>>> c.execute("INSERT INTO t VALUES(42)")
>>> c.execute("SELECT * FROM t")
>>> c.fetchall()
[(42,)]

Resources