How to mask the credentials in the Airflow logs? - airflow

I want to make sure some of the encrypted variables does not appear in airflow log.
I am passing AWS keys to Exasol Export sql, in the Airflow log it is getting printed.

Currently, this is not possible out-of-the-box. You can, however, configure your own Python Logger and use that class by changing logging_config_class property in airflow.cfg file.
Example here: Mask out sensitive information in python log

Are the AWS keys sent as part of the data for SQL export or are they sent for the connection?
If they are sent for connection, then hiding these credentials is possible. You simply would have to create a connection and send export the data from the connection.

Related

Provide aws credentials to Airflow GreatExpectationsOperator

I would like to use GreatExpectationsOperator to perform data quality validations.
The validation results data should be stored in S3.
I don't see an option to send an airflow connection name to the GE operator, and the AWS credentials in my organization are stored in an airflow connection.
How can great expectations retrieve s3 credentials from airflow connection? and not from the default aws credentials in .aws dir?
Thanks!
We ended up creating a new oprator that inherit from GE operator and the operator get the connection as part of its ecxeute method.

How Do I properly Encode `extra` parameters while using `airflow connections add`

Problem Statement
When editing the UI I can add modify the extra field to contain {"no_host_key_check": true}
But when I attempt to add this connection from the CLI with this command which follows the connections documentation format
airflow connections add local_sftp --conn-uri "sftp://test_user:test_pass#local_spark_sftp_server_1:22/schema?extra='{\"no_host_key_check\": true}"
The connection is added as {"extra": "'{\"no_host_key_check\": true}"}
How do I need to modify my airflow connections add command to properly format this connection configuration?
This is fundamentally a misunderstanding that all schema parameters for the airflow connections add command are extra parameters by default
airflow connections add local_sftp --conn-uri "sftp://test_user:test_pass#local_spark_sftp_server_1:22/schema?no_host_key_check=true"
Correctly sets the desired parameter.

How to connect to postgres read replica in Hasura

I have the main postgres database running with Hasura. I want to add a new Hasura service that connects only to the read replica.
But it is getting this error
..."hint":"You can use REPEATABLE READ instead.","message":"cannot use serializable mode in a hot standby","status_code":"0A000"...
I also tried adding --tx-iso=repeatable-read but no luck.

How to define https connection in Airflow using environment variables

In Airflow http (and other) connections can be defined as environment variables. However, it is hard to use an https schema for these connections.
Such a connection could be:
export AIRFLOW_CONN_MY_HTTP_CONN=http://example.com
However, defining a secure connection is not possible:
export AIRFLOW_CONN_MY_HTTP_CONN=https://example.com
Because Airflow strips the scheme (https) and in the final connection object the url gets http as scheme.
It turns out that there is a possibility to use https by defining the connection like this:
export AIRFLOW_CONN_MY_HTTP_CONN=https://example.com/https
The second https is called schema in the airflow code (like in DSN's e.g. postgresql://user:passw#host/schema). This schema is then used as the scheme in the construction of the final url in the connection object.
I am wondering if this is by design, or just an infortunate mixup of scheme and schema.
For those who land in this question in the future, I confirm that #jjmurre 's answer works well for 2.1.3 .
In this case we need URI-encoded string.
export AIRFLOW_CONN_SLACK='http://https%3a%2f%2fhooks.slack.com%2fservices%2f...'
See this post for more details.
Hope this can save other fellows an hour which I've spent on investigating.
You should use Connections and then you can specify schema.
This is what worked for me using bitnami airflow:
.env
MY_SERVER=my-conn-type://xxx.com:443/https
docker-compose.yml
environment:
- AIRFLOW_CONN_MY_SERVER=${MY_SERVER}

Airflow logs in s3 bucket

Would like to write the airflow logs to s3. Following are the parameter that we need to set according to the doc-
remote_logging = True
remote_base_log_folder =
remote_log_conn_id =
If Airflow is running in AWS, why do I have to pass the AWS keys? Shouldn't the boto3 API be able to write/read to s3 if correct permission are set on IAM role attached to the instance?
Fair point, but I think it allows for more flexibility if Airflow is not running on AWS or if you want to use a specific set of credentials rather than give the entire instance access. It might have also been easier implementation as well because the underlying code for writing logs into S3 uses the S3Hook (https://github.com/apache/airflow/blob/1.10.9/airflow/utils/log/s3_task_handler.py#L47), which requires a connection id.

Resources