I want to decrypt the passwords(getting the value from connection table) for airflow connections. Is there any way I can decrypt the password value.
You can do:
from airflow.hooks.base_hook import BaseHook
connection = BaseHook.get_connection("conn_name")
conn_password = connection.password
conn_login = connection.login
Export your connections
airflow connections export connections.json
Install ejson to encrypt your file
brew tap shopify/shopify && brew install ejson or download the .deb package from Github Releases.
Add the public key at the top of your file, as shown in the image
ejson keygen -w
Encrypt your connections
ejson encrypt connections.json
Version, the file in Git, decrypt the connections, and import them into the DB within your CI/CD pipeline
credits to Marc Lamberti from Astronomer
Recently encountered a similar issue. You can now export connections in json or yaml format in Airflow 2.3.2. This will provide all the key values that an Airflow connection is represented by.
Command to run:
airflow connections export connections.yml --file-format yaml
See the Airflow documentation for more details:
https://airflow.apache.org/docs/apache-airflow/2.0.2/howto/connection.html#exporting-connections-from-the-cli
Related
I have keystore in windows as below -
secretkeys.skr
publickeys.pkr
I want to add new pgp pubic keys to above keystore. Can someone help me with the command.
I tried using some tool and gpg commands but no luck as the keys are not updating in the above files.
We use below command to list and encrypt
pkzipc -listcertificates=AddressBook
pkzipc -add -archivetype=pgp -cryptalg=AES,128 -recipient="!encryptionKey!" "!encrptedFileDestination!\%%~nxA" "%%~fA"
Can someone help with any command or tool where I can set the keyring to above file and import the keys to that store.
Thanks,
Arpit
Snowflake is not showing in the connections dropdown.
I am using MWAA 2.0 and the providers are already in the requirements.txt
MWAA uses python 3.7 dont know if this can be a thing
Requirements.txt:
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.0.2/constraints-3.7.txt"
asn1crypto
azure-common
azure-core
azure-storage-blob
boto3
botocore
certifi
cffi
chardet
cryptography
greenlet
idna
isodate
jmespath
msrest
numpy
oauthlib
oscrypto
pandas
pyarrow
pycparser
pycryptodomex
PyJWT
pyOpenSSL
python-dateutil
pytz
requests
requests-oauthlib
s3transfer
six
urllib3
apache-airflow-providers-http
apache-airflow-providers-snowflake
#apache-airflow-providers-snowflake[slack]
#apache-airflow-providers-slack
snowflake-connector-python >=2.4.1
snowflake-sqlalchemy >=1.1.0
\
If anyone is in this trouble, instead of choosing Snowflake in the dropdown, you can choose AWS as the connection and will work fine.
It took me a while to finally figure this one out after trying many different parameter combinations.
My full Snowflake URL is:
https://xx12345.us-east-2.aws.snowflakecomputing.com
The correct format for the Host field is:
xx12345.us-east-2.snowflakecomputing.com
For the Extra field, this is what worked for me:
{
"account": "xx12345.us-east-2.aws",
"warehouse": "my_warehouse_name",
"database": "my_database_name"
}
Make sure you put Amazon Web Services for the Conn Type, like #AXI said.
Also, I have these modules defined in my requirements.txt file:
apache-airflow-providers-snowflake==1.3.0
snowflake-connector-python==2.4.5
snowflake-sqlalchemy==1.2.4
My Airflow version is 2.0.2.
According to MWAA docs, it should be enough to add apache-airflow-providers-snowflake==1.3.0 to the requirements file. When I added it to the existing MWAA env, where I had already tried many different combinations of packages, it helped partially. It was possible to create a connection using CLI, but not with UI.
But, when I created a new clean MWAA env with the requirements file as stated in mentioned AWS doc, it worked well. The connection was available in UI.
I am trying to establish a SFTP connection in Airflow 1.10.14 with the SFTPOperator from airflow.providers.sftp.operators.sftp or airflow.contrib.operators.sftp_operator.
The contrib operator and the providers package are equivalent ("providers" packages are backported from Airflow 2.0 which do not make use of the contrib operators anymore), and depend on the same Python modules: paramiko, pysftp, and sshtunnel.
My pip freeze:
paramiko==2.7.2 (latest release)
pysftp==0.2.9 (latest release)
sshtunnel==0.1.5 (latest release is 0.4.0)
It works fine with a simple user/password and a private key without a passphrase but it fails when I use an encrypted key protected by a passphrase:
"ERROR - private key file is encrypted" when I set "private_key_passphrase" param alone in the connection
"ERROR - Authentication failed" when I set the "password" alone or both "password" and "private_key_passphrase".
Note that it works well in all cases with the SSHOperator (in this case, the key passphrase is set in the "password" param), thus I believe the problem is in the pysftp module.
Thanks for your help.
I've build an OSS project https://github.com/datlinq/scalafiniti
The Travis-ci pipeline works perfectly, but for 1 final step.
I followed these guides:
http://www.scala-sbt.org/0.13/docs/Using-Sonatype.html
http://www.scala-sbt.org/sbt-pgp/usage.html
https://github.com/xerial/sbt-sonatype
Locally I got all steps working fine and actually published to the Nexus.
In the .travis.yml I import the key before install (Encrypted in travis.ci env)
before_install:
- echo "$PGP_SECRET" | base64 --decode | gpg --import
- echo "$PGP_TRUST" | base64 --decode | gpg --import-ownertrust
The $PGP_PASS is also encrypted in Travis env and available for the build.sbt
I checked it actually gets the key in this command
pgpPassphrase := sys.env.get("PGP_PASS").map(_.toArray)
Now if Travis runs the command
sbt publishSigned
It still prompts for a passphrase for my key
You need a passphrase to unlock the secret key for user:
"com.datlinq.datalabs (Key for Datalabs OSS) "
2048-bit RSA key, ID 305DA15D, created 2017-09-01
Enter passphrase:
I don't know what I should do to make this work
This moment in time is captured:
code:
https://github.com/datlinq/scalafiniti/tree/0d8a6a92bf111bae2a1081b17005a649f8fd00c9
build log:
https://travis-ci.org/datlinq/scalafiniti/builds/271328874
So, the reason it prompted for a password and ignored all sbt based configurations was due to the fact the build script used the local gnupg installation instead of the one packaged with sbt-pgp (bouncy castle).
The local gpg wants you to manually enter the password the first time. A bit hard using TravisCI
So the solution was ignore the local gpg and use the bundled one, that uses the pgpPassphrase setting
Looking back to the documentation:
http://www.scala-sbt.org/sbt-pgp/usage.html
In one of the first lines it actual says:
If you’re using the built-in Bouncy Castle PGP implementation, skip this step.
The first step towards using the GPG command line tool is to make sbt-pgp gpg->aware.
useGpg := true
So the solution was to set useGpg := false
For more details look at the current repo:
https://github.com/datlinq/scalafiniti
Or check this blog (which I found later) https://alexn.org/blog/2017/08/16/automatic-releases-sbt-travis.html
I try to configure Airbnb AirFlow to use the CeleryExecutor like this:
I changed the executer in the airflow.cfg from SequentialExecutor to CeleryExecutor:
# The executor class that airflow should use. Choices include
# SequentialExecutor, LocalExecutor, CeleryExecutor
executor = CeleryExecutor
But I get the following error:
airflow.configuration.AirflowConfigException: error: cannot use sqlite with the CeleryExecutor
Note that the sql_alchemy_conn is configured like this:
sql_alchemy_conn = sqlite:////root/airflow/airflow.db
I looked at Airflow's GIT (https://github.com/airbnb/airflow/blob/master/airflow/configuration.py)
and found that the following code throws this exception:
def _validate(self):
if (
self.get("core", "executor") != 'SequentialExecutor' and
"sqlite" in self.get('core', 'sql_alchemy_conn')):
raise AirflowConfigException("error: cannot use sqlite with the {}".
format(self.get('core', 'executor')))
It seems from this validate method that the sql_alchemy_conn cannot contain sqlite.
Do you have any idea how to configure the CeleryExecutor without sqllite? please note that I downloaded rabitMQ for working with the CeleryExecuter as required.
It is said by AirFlow that the CeleryExecutor requires other backend than default database SQLite. You have to use MySQL or PostgreSQL, for example.
The sql_alchemy_conn in airflow.cfg must be changed to follow the SqlAlchemy connection string structure (see SqlAlchemy document)
For example,
sql_alchemy_conn = postgresql+psycopg2://airflow:airflow#127.0.0.1:5432/airflow
To configure Airflow for mysql
firstly install mysql this might help or just google it
goto airflow installation director usually /home//airflow
edit airflow.cfg
locate
sql_alchemy_conn = sqlite:////home/vipul/airflow/airflow.db
and add # in front of it so it looks like
#sql_alchemy_conn = sqlite:////home/vipul/airflow/airflow.db
if you have default sqlite
add this line below
sql_alchemy_conn = mysql://:#localhost:3306/
save the file
run command
airflow initdb
and done !
As other answers have stated you need to use a different database besides SQLite. Additionally you need to install rabbitmq, configure it appropriately, and change each of your airflow.cfg's to have the correct rabbitmq information. For an excellent tutorial on this see A Guide On How To Build An Airflow Server/Cluster.
If you run it on a kubernetes cluster. Use the following config:
airflow:
config:
AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://postgres:airflow#airflow-postgresql:5432/airflow