How Do I properly Encode `extra` parameters while using `airflow connections add` - airflow

Problem Statement
When editing the UI I can add modify the extra field to contain {"no_host_key_check": true}
But when I attempt to add this connection from the CLI with this command which follows the connections documentation format
airflow connections add local_sftp --conn-uri "sftp://test_user:test_pass#local_spark_sftp_server_1:22/schema?extra='{\"no_host_key_check\": true}"
The connection is added as {"extra": "'{\"no_host_key_check\": true}"}
How do I need to modify my airflow connections add command to properly format this connection configuration?

This is fundamentally a misunderstanding that all schema parameters for the airflow connections add command are extra parameters by default
airflow connections add local_sftp --conn-uri "sftp://test_user:test_pass#local_spark_sftp_server_1:22/schema?no_host_key_check=true"
Correctly sets the desired parameter.

Related

specify a database name in databricks sql connection parameters

I am using airflow 2.0.2 to connect with databricks using the airflow-databricks-operator. The SQL Operator doesn't let me specify the database where the query should be executed, so I have to prefix the table_name with database_name. I tried reading through the doc of databricks-sql-connector as well here -- https://docs.databricks.com/dev-tools/python-sql-connector.html and still couldn't figure out if I could give the database name as a parameter in the connection string itself.
I tried setting database/schema/namespace in the **kwargs, but no luck. The query executor keeps saying that the table not found, because the query keeps getting executed in the default database.
Right now it's not supported - primarily reason is that if you have multiple statements then connector could reconnect between their execution, and result of use will be lost. databricks-sql-connector also doesn't allow setting of the default database.
Right now you can workaround that by adding explicit use <database> statement into a list of SQLs to execute (the sql parameter could be a list of strings, not only string).
P.S. I'll look, maybe I'll add setting of the default catalog/database in the next versions

Multiple SQL statements in flyway.initSql

Is something like this possible and if yes, is initSql executed each time a connection is established? For example i want to make sure that on each connection the time zone is set to UTC. Or is there a better way to execute connection-init-sql statements?
flyway:
datasources:
default:
initSql: SET ROLE test_role; SET time zone 'UTC'; <--- is this second statement executed?
Thanks!
Yes, initsql supports multiple statements. It is run "to initialize a new database connection immediately after opening it," so yes, it's run each time a connection is established.
For example, against a test SQL Server database, I ran:
flyway -initSql="select 1; select 2" info
This output the normal content of flyway info, but also included the following after the usual connection info header:

How to define https connection in Airflow using environment variables

In Airflow http (and other) connections can be defined as environment variables. However, it is hard to use an https schema for these connections.
Such a connection could be:
export AIRFLOW_CONN_MY_HTTP_CONN=http://example.com
However, defining a secure connection is not possible:
export AIRFLOW_CONN_MY_HTTP_CONN=https://example.com
Because Airflow strips the scheme (https) and in the final connection object the url gets http as scheme.
It turns out that there is a possibility to use https by defining the connection like this:
export AIRFLOW_CONN_MY_HTTP_CONN=https://example.com/https
The second https is called schema in the airflow code (like in DSN's e.g. postgresql://user:passw#host/schema). This schema is then used as the scheme in the construction of the final url in the connection object.
I am wondering if this is by design, or just an infortunate mixup of scheme and schema.
For those who land in this question in the future, I confirm that #jjmurre 's answer works well for 2.1.3 .
In this case we need URI-encoded string.
export AIRFLOW_CONN_SLACK='http://https%3a%2f%2fhooks.slack.com%2fservices%2f...'
See this post for more details.
Hope this can save other fellows an hour which I've spent on investigating.
You should use Connections and then you can specify schema.
This is what worked for me using bitnami airflow:
.env
MY_SERVER=my-conn-type://xxx.com:443/https
docker-compose.yml
environment:
- AIRFLOW_CONN_MY_SERVER=${MY_SERVER}

How to mask the credentials in the Airflow logs?

I want to make sure some of the encrypted variables does not appear in airflow log.
I am passing AWS keys to Exasol Export sql, in the Airflow log it is getting printed.
Currently, this is not possible out-of-the-box. You can, however, configure your own Python Logger and use that class by changing logging_config_class property in airflow.cfg file.
Example here: Mask out sensitive information in python log
Are the AWS keys sent as part of the data for SQL export or are they sent for the connection?
If they are sent for connection, then hiding these credentials is possible. You simply would have to create a connection and send export the data from the connection.

How to create Hive connection in airflow to specific DB?

I am trying to create Hive connection in airflow to point to specific Database. I tried to find the params in HiveHook and tried the below in the extra options.
{"db":"test_base"} {"schema":"test_base"} and {"database":"test_base"}
But looks like nothing works and always points to default db.
could someone help me to pointout what are the possible parameters we can pass in extra_options ?

Resources