How to point to the airflow unittest.cfg? - airflow

Airflow creates a unittest.cfg file in the AIRFLOW_HOME environment variable path.
My question is: how can I point to unittest.cfg in the same why that I point to the airflow.cfg via the environment variable AIRFLOW_CONFIG?
The reason why I want to do this is because I don't want to have any config files in the AIRFLOW_HOME directory.
Also, if anyone knows better, could you please explain what is the unittest.cfg is for as there is no documentation I could find on it.

unittest.cfg test configuration file is the default configuration file used when Airflow is running in test mode.
Test mode can be activated by setting the unit_test_mode configuration option in airflow.cfg or AIRFLOW__CORE__UNIT_TEST_MODE environment variable to True .
The configuration values in test configuration file overwrite those in airflow.cfg in runtime when test mode is activated.
# Source: https://github.com/apache/airflow/blob/1.10.5/airflow/configuration.py#L558,L561
def get_airflow_test_config(airflow_home):
if 'AIRFLOW_TEST_CONFIG' not in os.environ:
return os.path.join(airflow_home, 'unittests.cfg')
return expand_env_var(os.environ['AIRFLOW_TEST_CONFIG'])
The AIRFLOW_TEST_CONFIG environment variable can be set to the path of your test configuration file.

Related

Airflow DeprecationWarning

I'm running a distributed Airflow 2.4.0 setup using the official Docker image. All the containers use the same .env file and same version of Airflow image. When I log into one of the Airflow containers I get this warning:
/home/airflow/.local/lib/python3.7/site-packages/airflow/configuration.py:545: DeprecationWarning: The sql_alchemy_conn option in [core] has been moved to the sql_alchemy_conn option in [database] - the old setting has been used, but please update your config.
option = self._get_environment_variables(deprecated_key, deprecated_section, key, section)
/home/airflow/.local/lib/python3.7/site-packages/airflow/configuration.py:545: DeprecationWarning: The auth_backend option in [api] has been renamed to auth_backends - the old setting has been used, but please update your config.
option = self._get_environment_variables(deprecated_key, deprecated_section, key, section)
/home/airflow/.local/lib/python3.7/site-packages/airflow/configuration.py:367: FutureWarning: The auth_backends setting in [api] has had airflow.api.auth.backend.session added in the running config, which is needed by the UI. Please update your config before Apache Airflow 3.0.
FutureWarning,
I checked the airflow.cfg inside the container and it has the up to date variables. Why do I still get the warning messages?
You are seeing these variables because of the section. airflow.cfg is configuration file with section. settings are expected to be in the proper section.
In your case your airflow.cfg has sql_alchemy_conn where you override the default value. Prior to 2.3.0 this setting in core section and in 2.3.0 it was moved to database section. (see PR)
What you need to do is simply open airflow.cfg and move the setting to the proper section. For example:
[core]
sql_alchemy_conn = sqlite:///{AIRFLOW_HOME}/airflow.db
to:
[database]
sql_alchemy_conn = sqlite:///{AIRFLOW_HOME}/airflow.db
The reason why it's like that is also explained in the docs. Airflow reference settings by environment variables of the format AIRFLOW__{SECTION}__{KEY} so in this case it will be:
AIRFLOW__DATABASE__SQL_ALCHEMY_CONN so the section is important to find the variable.

How to output Airflow's scheduler log to stdout or S3 / GCS

We're running Airflow cluster using puckel/airflow docker image with docker-compose. Airflow's scheduler container outputs its logs to /usr/local/airflow/logs/scheduler.
The problem is that the log files are not rotated and disk usage increases until the disk gets full. Dag for cleaning up the log directory is available but the DAG run on worker node and log directory on scheduler container is not cleaned up.
I'm looking for the way to output scheduler log to stdout or S3/GCS bucket but unable to find out. Is there any to output the scheduler log to stdout or S3/GCS bucket?
Finally I managed to output scheduler's log to stdout.
Here you can find how to use custom logger of Airflow. The default logging config is available at github.
What you have to do is.
(1) Create custom logger class to ${AIRFLOW_HOME}/config/log_config.py.
# Setting processor (scheduler, etc..) logs output to stdout
# Referring https://www.astronomer.io/guides/logging
# This file is created following https://airflow.apache.org/docs/apache-airflow/2.0.0/logging-monitoring/logging-tasks.html#advanced-configuration
from copy import deepcopy
from airflow.config_templates.airflow_local_settings import DEFAULT_LOGGING_CONFIG
import sys
LOGGING_CONFIG = deepcopy(DEFAULT_LOGGING_CONFIG)
LOGGING_CONFIG["handlers"]["processor"] = {
"class": "logging.StreamHandler",
"formatter": "airflow",
"stream": sys.stdout,
}
(2) Set logging_config_class property to config.log_config.LOGGING_CONFIG in airflow.cfg
logging_config_class = config.log_config.LOGGING_CONFIG
(3) [Optional] Add $AIRFLOW_HOME to PYTHONPATH environment.
export "${PYTHONPATH}:~"
Actually, you can set the path of logging_config_class to anything as long as the python is able to load the package.
Setting handler.processor to airflow.utils.log.logging_mixin.RedirectStdHandler didn't work for me. It used too much memory.
remote_logging=True in airflow.cfg is the key.
Please check the thread here for detailed steps.
You can extend the image with the following or do so in airflow.cfg
ENV AIRFLOW__LOGGING__REMOTE_LOGGING=True
ENV AIRFLOW__LOGGING__REMOTE_LOG_CONN_ID=gcp_conn_id
ENV AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER=gs://bucket_name/AIRFLOW_LOGS
the gcp_conn_id should have the correct permission to create/delete objects in GCS

Envrypt sql_alchemy_conn in airflow config file (ansible)

Is there a way to encrypt the airflow config file sql_alchemy_conn string , the password shown in example is plaintext . What options are there to secure it. Also if the password has special chars how it must be escaped in the config file
Trying to install airflow using airflow role.
# See: https://www.sqlalchemy.org/
sql_alchemy_conn:
value: "postgresql+psycopg2://pgclusteradm#servername:PLAINTEXTPASSWORD#server.postgres.database.azure.com/airflow2"
Way to encrypt password, couldn't find how to encrypt this.
You can provide the database URI through environment variables instead of the config file. This doesn't encrypt it or necessarily make it more secure, but it at least isn't plainly sitting in a permanent file.
In your airflow.cfg you can put a placeholder:
[core]
...
sql_alchemy_conn = override_me
...
Then set AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql+psycopg2://... in an environment variable when you bring up Airflow components. This way of setting and overriding configuration options through environment variables is detailed in the docs, but the basic format is AIRFLOW__{SECTION}__{KEY}=<value>.
There are 2 ways of securing this as mentioned in docs:
1) Environment Variable:
You can override the setting in airflow.cfg by setting the following environment variable:
AIRFLOW__CORE__SQL_ALCHEMY_CONN=my_conn_string
This way you can keep the setting in airflow.cfg as empty so no one can view the password.
2) Get string by running command:
You can also derive the connection string at run time by appending _cmd to the key like this:
[core]
sql_alchemy_conn_cmd = bash_command_to_run

Hard override entry in tnsnames.ora

I have a set of shell scripts and sqlplus commands.
These connect to Oracle DB_ONE and DB_TWO.
I am upgrading DB_ONE.
For my testing I override the DB_ONE entry in a local tnsnames.ora.
There exists a global tnsnames.ora with all connections in it.
export TNS_ADMIN=/path/to/local/tnsnames:/path/to/global/tnsnames
This way, I am able to connect to DB_ONE on my_new.server and DB_TWO on some.other.server as expected.
However, if I break my_new.server, sqlplus automatically connects to DB_ONE on original.server. So it fails silently and fails over to the connection in the global tnsnames file. I would like this connection to fail completely.
Is there a way to have a 'hard' override such that sqlplus will only try a DB_ONE connection from the local tnsnames.ora, whilst being free to try DB_TWO connections from all tnsnames.ora files?
My local tnsnames.ora
DB_ONE=
(DESCRIPTION=
(ADDRESS_LIST=
(ADDRESS=
(PROTOCOL=TCP)
(PORT=1524)
(HOST=my_new.server)
)
)
(CONNECT_DATA=
(SERVICE_NAME=DB_ONE)
)
)
Global tnsnames.ora which I cannot change
DB_ONE=
(DESCRIPTION=
(ADDRESS_LIST=
(ADDRESS=
(PROTOCOL=TCP)
(PORT=1524)
(HOST=original.server)
)
)
(CONNECT_DATA=
(SERVICE_NAME=DB_ONE)
)
)
DB_TWO=
(DESCRIPTION=
(ADDRESS_LIST=
(ADDRESS=
(PROTOCOL=TCP)
(PORT=1524)
(HOST=some.other.server)
)
)
(CONNECT_DATA=
(SERVICE_NAME=DB_TWO)
)
)
This is not valid:
export TNS_ADMIN=/path/to/local/tnsnames:/path/to/global/tnsnames
TNS_ADMIN is a single directory path, not a searchable list like $PATH or $LD_LIBRARY_PATH etc. The documentation mentions that:
If the TNS_ADMIN environment variable is not set, then Oracle Net will check the ORACLE_HOME/network/admin directory.
It doesn't say so, but it also defaults to checking the network/admin directory if the TNS_ADMIN variable does not point to a valid directory, and as your colon-seperated list isn't a valid directory path, it will use the tnsnames.ora under $ORACLE_HOME/network/admin.
That means your local 'override' file is never being used, and you were accessing which ever instance DB_ONE points to in the global file. It's not that the TNS entry from the second file is used if the first fails - that mechanism just doesn't exist. (You can have failover within a file but that's different).
Assuming you have connection strings using a TNS alias like user/pwd#DB_ONE and you can't change those for your testing, your only real option is to make a complete copy of the global file and just edit the entry for DB_ONE:
cp /path/to/global/tnsnames/tnsnames.ora /path/to/local/tnsnames/
edit /path/to/local/tnsnames/tnsnames.ora
export TNS_ADMIN=/path/to/local/tnsnames
Or as #ibre5041 mentioned in a comment, as you're on Linux you could skip the TNS_ADMIN environment variable and use ~/.tnsnames.ora for your local copy.
As you mention that won't reflect any changes made to the global file, but presumably once you've finished your testing you can trash your local file or revert to the global TNS_ADMIN anyway.

Symfony2 Composer and environment variables

I would like to set the configuration of my symfony2 project using environment variables.
In the server I have defined:
SYMFONY__DATABASE__USER
SYMFONY__DATABASE__PASSWORD
SYMFONY__DATABASE__NAME
SYMFONY__DATABASE__HOST
SYMFONY__DATABASE__DRIVER
My parameters.yml.dist looks like this:
#app/config/parameters.yml.dist
parameters:
database_host: "%database.host%"
database_port: ~
database_name: "%database.name%"
database_user: "%database.user%"
database_password: "%database.password%"
database_driver: "%database.driver%"
when I run composer I get an exception
composer install --dev --no-interaction --prefer-source
[Symfony\Component\DependencyInjection\Exception\ParameterNotFoundException]
You have requested a non-existent parameter "database.driver". Did you mean one of these: "database_user", "database_driver"?
These variables are defined in the server so I can modify the parameters.yml.dist to define these values. But this does not seams the right way, because wat I really want to use are the environment variables.
Note: I want to read this environment variables in travis, heroku and my vagrant machine. I only want to have in the repository the vagrant machine variables.
Which is the proper way to do this?
How should look my parameters.yml.dist?
Looks you are doing everything okay.
Here is the complete documentation for Setting Environment Variables which I believe you already read.
What is important to note is this:
Also, in order for your console to work (which does not use Apache),
you must export these as shell variables. On a Unix system, you can
run the following:
$ export SYMFONY__DATABASE__USER=user
$ export SYMFONY__DATABASE__PASSWORD=secret
I remember once I have a similar issue, I was setting everything on APACHE, but when running commands it wasn't working because I forgot to EXPORT the variables on the system.
Be aware that using export is a temp solution, if you reset your server those values will be lost, you will need to setup in a permanent way according to your OS.
I think you solved this long time ago, but the problem is actually that you have 2 _ between DATABASE and USER and the parser for this have a string replace function that replaces every __ with a . .
For your example to work you should have written like this:
SYMFONY__DATABASE_USER -> database_user
SYMFONY__DATABASE__USER -> database.user
You can try this bundle if your system version is >= 2.6.2:
This bundle provides a way to read parameters from environment
variables at runtime. The value defined in the container parameter is
used as fallback when the environment variable is not available.

Resources